What is bot traffic and how to filter it out

In Early February 2021, we had a number of clients who saw unnatural spikes of traffic within their Google Analytics accounts. After a quick evaluation, we noticed that all of them had been hit with around 500 sessions from a “traffic-bot” within the same week. As for a lot of people, it’s not clear why this traffic is showing up and what’s the purpose of this traffic we decided to write a quick blog post explaining what this is all about and how to make sure your Analytics accounts are set-up in a way that your metrics do not get flawed by these traffic bots.

What is Bot traffic and why is it showing up in my Google Analytics?

Bot traffic is traffic generated by so-called “bots” which are automated programs that perform website activities such as website visits, clicks or the placement of comments. In the last months, there are a handful of providers who send Bot sessions as a means of doing Marketing for their services within your Google Analytics account. This is the latest Trend in analytics Spam (however, nothing new) and these services are offered as a way to influence your website metrics. As this “unnatural and faked traffic” can increase website visits, bounce rates and/or time on site, some companies might think it’s a good idea to make use of these services.

We can however make a very clear statement that you should always refrain from using these services in order to “manipulate” your website metrics.
These “bot-sessions” do not reflect actual user behaviour and will completely mess-up your actual website data. Again, we do not recommend anyone using these service as You want to optimize your website for actual users, and actual users only. Therefore it’s important to keep your data clean and to make sure you exclude “unnatural and unwanted traffic” from your reporting. 

What is Spider traffic?

Next to bot traffic, there’s another type of traffic that is accessing your page, however, should also not show up in your analytics reports. This traffic is generated by so-called spiders or web crawlers.  You should not see any of this traffic showing up in your analytics data, however, you will be able to identify them in your server logs.

Crawlers access your website in order to access and process data. One of the most common spiders is the “Google Spider” which crawls the web in order to find “new content”. However, other crawlers might be tools that you actively use for on-page optimization, website tracking or even something as simple as uptime monitoring.
Within the next steps, we will focus on bot traffic only, however, most of these rules can be applied for spider/crawler traffic.

Before you continue – always make sure you have a “test-view” within your Google Analytics account and test all of your changes in this test view before deploying anything in your production or reporting views.
Next to a testing view, we recommend that you have an “all website data view” which contains all your raw data and does not contain any filtering.

How can you filter for known Spiders and Bots using Google Analytics?

Google is fully aware of the fact that there are many bots and spiders that visit your websites on a day-to-day basis and therefore has a standard feature built into Google Analytics which is called “Bot Filtering”. This feature can be found under the “view settings” within your Google Analytics admin panel. You have the possibility to tick the field “Exclude all hits from known bots and spiders” under the bot filtering section.
This will automatically exclude all known box and spiders from your Google Analytics property starting from now.
Keep in mind that this filtering does not work retroactively.

How to identify uncommon bot traffic

Every now and then we see common traffic spikes across the client accounts which are not covered within the standard bot and spider filtering.
In order to exclude these, we suggest excluding them using a filter. 

However, before you start excluding ”bot traffic” you will need to identify which traffic should be excluded.
There are few indicators for unnatural / bot traffic which you can search for within your Google Analytics property. 

Some of the most common indicators we have seen are:

  • Sudden spikes of traffic within a very short timespan (hours up to a day)
  • Spikes of traffic where the bounce rate is hundred percent or time on site is 0 seconds
  • Traffic spikes from traffic sources that contain the word “bot”
  • Hostnames that do not reflect your own website
  • Traffic spikes from specific and uncommon locations
  • A 100% new session rate or very close to 100%
  • Exactly 1 page per session
  • Browser dimensions “not set”

Here is an example of how we could easily use the word “bot” to filter for traffic sources in order to exclude and identify which traffic (in this case traffic sources) should be excluded. As mentioned above, this traffic can very clearly be identified as bot-traffic, as we see that the bounce rate is 100%, sessions are exactly 1 and the average time per session is 0 seconds.

How to exclude unknown Traffic Bots in 7 easy steps

In order to exclude unknown bot traffic make sure you work in a test view. Create one if you have not done so yet.

So you have selected the “Exclude all hits from known bots and spiders” and you are still seeing bot traffic in your GA account? Then we need to exclude them individually using view filters. In the next 7 steps we’ll show you exactly how to exclude bot traffic, based on the example used above.

1. Identify the Bot Traffic you want to exclude
Identify the bot you want to exclude based on the indicators mentioned above.
In our example we will use traffic from the source “trafficbot.live”

2. Navigate to Filters
Open the “admin panel” in your Google Analytics accounts navigate to your “test view” and click on “filters” within the view column.

3. Create a Filter to exclude the Traffic Bot
Click on “Add Filter”

4. Setting your Filter Criteria
Name your filter “Bot Traffic” and select the “custom filter” type and define the field you would like to filter for.
In our case we use “hostname” and we define the filter pattern using regex.

As we want to exclude the following hostnames:

bottraffic.xyz (this bot tends to work with multiple variables such as bottraffic459.xyz, so we exclude all domains that contain “bottraffic”)

We will use the following regex code in order to also exclude any variations of the domains stated above.
Just copy the code below, and add any bot you want to exclude by adding .*NAME OF BOT.* and split them by “|” which means AND in regex.

Filter Pattern

5. Verify your Filter
Click on “verify this filter” in order to test your filter function.
Do keep in mind it is verification filters on a subset of your data, and older what stations might not show up.

6. Save and monitor your filter
Click “Save” and check within the next few days whether or not your bot-traffic is correctly being filtered out within your test view.

7. Deploy to Production
If your filter works correctly within your test view, its time to deploy your filter to your reporting or production view by clicking on “Add Filter to View“ in your reporting view and selecting “Apply Existing filter”, select the filter you want to select from the available filter list and click on “Add” and save your settings.

As a final tip we do recommend adding an annotation to your analytics view, in order to make sure that all analytics users are aware of your changes.

Your bot traffic should now be filtered out

Your analytics data should now be free of cluttering bot sessions, keep in mind that these filters will start working from the point you hit deploy, meaning your historic data will not be “updated.

If you need help with setting up and managing your Google Analytics account, please contact us to see if we can quickly help you in your way again.

Latest Posts

Are you facing similar challenges?

We would be happy to discuss ways we can best assist you. Do not hesitate to book a free consultation at a date of your choice!