What is bot traffic and how to filter it out

This article was last updated on March 26, 2024

In the realm of web analytics, accurate data is paramount for informed decision-making. However, the presence of bots and irrelevant traffic can skew the insights derived from analytics platforms. To combat this, Google Analytics offers robust filtering mechanisms, including Bot Filtering and IP Filtering. With the recent transition to Google Analytics 4 (GA4), users might wonder how these filtering features differ from Universal Analytics (UA) and how they can leverage them effectively. In this article, we’ll delve into the nuances of bot and IP filtering in GA4, explore any disparities with UA, and provide insights on optimizing data integrity.

 

In Early February 2021, we had a number of clients who saw unnatural spikes of traffic within their Google Analytics accounts. After a quick evaluation, we noticed that all of them had been hit with around 500 sessions from a “traffic-bot” within the same week. As for a lot of people, it’s not clear why this traffic is showing up and what’s the purpose of this traffic we decided to write a quick blog post explaining what this is all about and how to make sure your Analytics accounts are set-up in a way that your metrics do not get flawed by these traffic bots.

What is Bot traffic and why is it showing up in my Google Analytics?

Bot traffic is traffic generated by so-called “bots” which are automated programs that perform website activities such as website visits, clicks or the placement of comments. In the last months, there are a handful of providers who send Bot sessions as a means of doing Marketing for their services within your Google Analytics account. This is the latest Trend in analytics Spam (however, nothing new) and these services are offered as a way to influence your website metrics. As this “unnatural and faked traffic” can increase website visits, bounce rates and/or time on site, some companies might think it’s a good idea to make use of these services.

Bot Filtering serves as a shield against spam bots and irrelevant traffic, ensuring that analytics data accurately reflects genuine user interactions. In GA4, Bot Filtering is enabled by default, leveraging Google’s advanced algorithms to identify and exclude known bots and spiders from data collection. This automatic filtering helps maintain the integrity of analytics data without requiring manual intervention.

We can however make a very clear statement that you should always refrain from using these services in order to “manipulate” your website metrics.
These “bot-sessions” do not reflect actual user behaviour and will completely mess-up your actual website data. Again, we do not recommend anyone using these service as You want to optimize your website for actual users, and actual users only. Therefore it’s important to keep your data clean and to make sure you exclude “unnatural and unwanted traffic” from your reporting. 

What is Spider traffic?

Next to bot traffic, there’s another type of traffic that is accessing your page, however, should also not show up in your analytics reports. This traffic is generated by so-called spiders or web crawlers.  You should not see any of this traffic showing up in your analytics data, however, you will be able to identify them in your server logs.

Crawlers access your website in order to access and process data. One of the most common spiders is the “Google Spider” which crawls the web in order to find “new content”. However, other crawlers might be tools that you actively use for on-page optimization, website tracking or even something as simple as uptime monitoring.
Within the next steps, we will focus on bot traffic only, however, most of these rules can be applied for spider/crawler traffic.

Important!
Before you continue – always make sure you have a “test-view” within your Google Analytics account and test all of your changes in this test view before deploying anything in your production or reporting views.
Next to a testing view, we recommend that you have an “all website data view” which contains all your raw data and does not contain any filtering.

How to identify uncommon bot traffic

Every now and then we see common traffic spikes across the client accounts which are not covered within the standard bot and spider filtering.
In order to exclude these, we suggest excluding them using a filter.

However, before you start excluding ”bot traffic” you will need to identify which traffic should be excluded.
There are few indicators for unnatural / bot traffic which you can search for within your Google Analytics property.

Some of the most common indicators we have seen are:

  • Sudden spikes of traffic within a very short timespan (hours up to a day)
  • Spikes of traffic where the bounce rate is hundred percent or time on site is 0 seconds
  • Traffic spikes from traffic sources that contain the word “bot”
  • Hostnames that do not reflect your own website
  • Traffic spikes from specific and uncommon locations
  • A 100% new session rate or very close to 100%
  • Exactly 1 page per session
  • Browser dimensions “not set”

Here is a recent example that was rather popular in the GA4 world as certain bot traffic started filling certain properties, bypassing GA4’s ability to automatically filter bot traffic. By analysing the numbers it can be clearly deduced that this is not normal traffic, having thousands of users and sessions with almost no engagement at all.

How can you filter for known Spiders and Bots using Google Analytics?

Google is fully aware of the fact that there are many bots and spiders that visit your websites on a day-to-day basis and therefore had a standard feature built into Google Universal Analytics which was called “Bot Filtering”. This feature can be found under the “view settings” within your “old” Universal Analytics admin panel. There you had the possibility to tick the field “Exclude all hits from known bots and spiders” under the bot filtering section.

This automatically excluded all known box and spiders from your Google Analytics property starting from now.
Keep in mind that this filtering does not work retroactively.

Changes from UA to GA4 

The above described mechanism is one significant disparity between GA4 and UA in how bot traffic is handled nowadays. While in Universal Analytics, users needed to manually enable Bot Filtering through the admin settings, GA4 streamlines this process by automatically filtering out known bots without user intervention. This enhancement in GA4 simplifies the setup process and reduces the likelihood of inadvertently including bot traffic in reports.

IP Filtering allows users to exclude specific IP addresses or ranges from analytics tracking, effectively eliminating internal traffic, such as employees accessing the website, or unwanted traffic sources. While both GA4 and UA offer IP Filtering capabilities, the process of implementing filters varies slightly between the two versions.

While in UA, users navigated to the Admin panel, selected the desired view, and accessed the Filters section to create an IP filter, conversely, GA4 integrates IP Filtering directly into the reporting interface, offering a more intuitive approach. Users can define IP filters within the Analysis Hub or create custom segments based on IP addresses, granting greater flexibility in excluding unwanted traffic sources.

Bot Filtering in 2024

Bot and IP filtering play pivotal roles in maintaining the accuracy and reliability of analytics data in both Google Analytics 4 and Universal Analytics.

While GA4 streamlines the process with automatic bot filtering and integrated IP filtering, the fundamental objective remains consistent across both versions: to provide users with actionable insights based on authentic user interactions.

By understanding the nuances between GA4 and UA filtering mechanisms and implementing best practices, businesses can leverage analytics data effectively to drive informed decisions and optimize online performance.

HOW IT’S DONE IN GA4

One of the biggest changes between UA and GA4 is GA4’s native ability to filter bot traffic automatically in comparison to manually setting this up in the previous version. However as we showed in our previous example, bot or unwanted traffic can still fill your GA4 property. 

What can be additionally done is to set up IP filtering in GA4. You can filter unwanted, internal or bot traffic if you are able to locate the IP in question.

First, head to your admin settings that you can find on the lower left hand side of your GA4 property, and go to Data Streams.

Second, head to Configure Tag settings:

Third, go to Define internal traffic (When you click on configure tag settings in the step above, you will not see all these settings below by default, you have to click show more:

Then it would prompt you to create an internal traffic rule that would lead you to this window: 

And in here you specify what type of traffic it is (by default it’s internal, but you can exclude any IP that you don’t want to fill your properties, whether it was internal or not and have a custom name for the type of rule). 

Your Bot Traffic should now be filtered out

Your analytics data should now be free of cluttering bot sessions, keep in mind that these filters will start working from the point you hit deploy, meaning your historic data will not be “updated.

If you need help with setting up and managing your Google Analytics account, please contact us to see if we can quickly help you in your way again.

How You Used to Exclude unknown Traffic Bots in Universal Analytics

In order to exclude unknown bot traffic make sure you work in a test view. Create one if you have not done so yet.

So you have selected the “Exclude all hits from known bots and spiders” and you are still seeing bot traffic in your GA account? Then we need to exclude them individually using view filters. In the next 7 steps we’ll show you exactly how to exclude bot traffic, based on the example used above.

1. Identify the Bot Traffic you want to exclude
Identify the bot you want to exclude based on the indicators mentioned above.
In our example we will use traffic from the source “trafficbot.live”

2. Navigate to Filters
Open the “admin panel” in your Google Analytics accounts navigate to your “test view” and click on “filters” within the view column.

3. Create a Filter to exclude the Traffic Bot
Click on “Add Filter”

4. Setting your Filter Criteria
Name your filter “Bot Traffic” and select the “custom filter” type and define the field you would like to filter for.
In our case we use “hostname” and we define the filter pattern using regex.

As we want to exclude the following hostnames:

Trafficbot.life
Bot-traffic.icu
Trafficbot.live
bottraffic.xyz (this bot tends to work with multiple variables such as bottraffic459.xyz, so we exclude all domains that contain “bottraffic”)

We will use the following regex code in order to also exclude any variations of the domains stated above.
Just copy the code below, and add any bot you want to exclude by adding .*NAME OF BOT.* and split them by “|” which means AND in regex.

Filter Pattern
.*trafficbot.*|.*bot-traffic.*|.*bottraffic*.

5. Verify your Filter
Click on “verify this filter” in order to test your filter function.
Do keep in mind it is verification filters on a subset of your data, and older what stations might not show up.

6. Save and monitor your filter
Click “Save” and check within the next few days whether or not your bot-traffic is correctly being filtered out within your test view.

7. Deploy to Production
If your filter works correctly within your test view, its time to deploy your filter to your reporting or production view by clicking on “Add Filter to View“ in your reporting view and selecting “Apply Existing filter”, select the filter you want to select from the available filter list and click on “Add” and save your settings.

As a final tip we do recommend adding an annotation to your analytics view, in order to make sure that all analytics users are aware of your changes.

Latest Posts

Are you facing similar challenges?

We would be happy to discuss ways we can best assist you. Do not hesitate to book a free consultation at a date of your choice!