What is bot traffic and how to filter it out
In Early February 2021, we had a number of clients who saw unnatural spikes of traffic within their Google Analytics accounts. After a quick evaluation, we noticed that all of them had been hit with around 500 sessions from a “traffic-bot” within the same week. As for a lot of people, it’s not clear why this traffic is showing up and what’s the purpose of this traffic we decided to write a quick blog post explaining what this is all about and how to make sure your Analytics accounts are set-up in a way that your metrics do not get flawed by these traffic bots.
What is Bot traffic and why is it showing up in my Google Analytics?
We can however make a very clear statement that you should always refrain from using these services in order to “manipulate” your website metrics.
These “bot-sessions” do not reflect actual user behaviour and will completely mess-up your actual website data. Again, we do not recommend anyone using these service as You want to optimize your website for actual users, and actual users only. Therefore it’s important to keep your data clean and to make sure you exclude “unnatural and unwanted traffic” from your reporting.
What is Spider traffic?
Crawlers access your website in order to access and process data. One of the most common spiders is the “Google Spider” which crawls the web in order to find “new content”. However, other crawlers might be tools that you actively use for on-page optimization, website tracking or even something as simple as uptime monitoring.
Within the next steps, we will focus on bot traffic only, however, most of these rules can be applied for spider/crawler traffic.
Before you continue – always make sure you have a “test-view” within your Google Analytics account and test all of your changes in this test view before deploying anything in your production or reporting views.
Next to a testing view, we recommend that you have an “all website data view” which contains all your raw data and does not contain any filtering.
How can you filter for known Spiders and Bots using Google Analytics?
Keep in mind that this filtering does not work retroactively.
How to identify uncommon bot traffic
In order to exclude these, we suggest excluding them using a filter.
However, before you start excluding ”bot traffic” you will need to identify which traffic should be excluded.
There are few indicators for unnatural / bot traffic which you can search for within your Google Analytics property.
Some of the most common indicators we have seen are:
- Sudden spikes of traffic within a very short timespan (hours up to a day)
- Spikes of traffic where the bounce rate is hundred percent or time on site is 0 seconds
- Traffic spikes from traffic sources that contain the word “bot”
- Hostnames that do not reflect your own website
- Traffic spikes from specific and uncommon locations
- A 100% new session rate or very close to 100%
- Exactly 1 page per session
- Browser dimensions “not set”
Here is an example of how we could easily use the word “bot” to filter for traffic sources in order to exclude and identify which traffic (in this case traffic sources) should be excluded. As mentioned above, this traffic can very clearly be identified as bot-traffic, as we see that the bounce rate is 100%, sessions are exactly 1 and the average time per session is 0 seconds.
How to exclude unknown Traffic Bots in 7 easy steps
So you have selected the “Exclude all hits from known bots and spiders” and you are still seeing bot traffic in your GA account? Then we need to exclude them individually using view filters. In the next 7 steps we’ll show you exactly how to exclude bot traffic, based on the example used above.
1. Identify the Bot Traffic you want to exclude
Identify the bot you want to exclude based on the indicators mentioned above.
In our example we will use traffic from the source “trafficbot.live”
2. Navigate to Filters
Open the “admin panel” in your Google Analytics accounts navigate to your “test view” and click on “filters” within the view column.
3. Create a Filter to exclude the Traffic Bot
Click on “Add Filter”
4. Setting your Filter Criteria
Name your filter “Bot Traffic” and select the “custom filter” type and define the field you would like to filter for.
In our case we use “hostname” and we define the filter pattern using regex.
As we want to exclude the following hostnames:
bottraffic.xyz (this bot tends to work with multiple variables such as bottraffic459.xyz, so we exclude all domains that contain “bottraffic”)
We will use the following regex code in order to also exclude any variations of the domains stated above.
Just copy the code below, and add any bot you want to exclude by adding .*NAME OF BOT.* and split them by “|” which means AND in regex.
Click on “verify this filter” in order to test your filter function.
Do keep in mind it is verification filters on a subset of your data, and older what stations might not show up.
6. Save and monitor your filter
Click “Save” and check within the next few days whether or not your bot-traffic is correctly being filtered out within your test view.
7. Deploy to Production
If your filter works correctly within your test view, its time to deploy your filter to your reporting or production view by clicking on “Add Filter to View“ in your reporting view and selecting “Apply Existing filter”, select the filter you want to select from the available filter list and click on “Add” and save your settings.
As a final tip we do recommend adding an annotation to your analytics view, in order to make sure that all analytics users are aware of your changes.
Your bot traffic should now be filtered out
If you need help with setting up and managing your Google Analytics account, please contact us to see if we can quickly help you in your way again.