The first step is to admit you have a bot problem and understand that no one is immune – no one online. It is inconceivable that the growing multi-trillion dollar online economy will not continue to be the target of fraudsters, career criminals, organized crime, and other nefarious organizations seeking cash. Also, there is your competition. Everyone is already “monitoring” the online marketing and content/pricing of the online competition … and this is the work of bots.
Second, you have to understand what the bot traffic is on your own website. There are two ways to assess this for yourself: (1) manually review the dynamic page requests in your weblogs or (2) sign up for a trial from a bot detection service.
When looking at your weblogs here are some analyses to get you started:
- Identify patterns in requests (bots may increment a single variable for each request to build a database).
- Correlate referring links and poor performance data (bots don’t buy)
- Find repeated requests from a user agent (bot user agents often change IP without changing requests).
- Count the number of IP addresses with user agents identifying themselves as Googlebot, Bingbot and other search engines and lookup the WHOIS (searchbot user agents are frequently masqueraded and you need to lookup WHOIS on those IP addresses).
- Look for large number of IP addresses with a single request for a page (bots avoid rate limits by changing IP Addresses often).
- If you use rate limit settings on your WAF or Firewall, look for IP Addresses with one or two requests less than your current rate limit settings (bots frequently can detect rate limits and evade detection).
- Identify any user agents and IP Addresses with requests that violate your robots.txt file.
Third, you have to quantify and model how bots affect your online business
- How much time does your technical team spend and how much does that equate to as a monthly expense, e.g. 5 hours per week -> 20 hours per month @ $150/hour = $3,000 in tech labor per month
- How much do bots throw off your marketing analytics and spend? For instance, if 10% of your cpc spend is click fraud, is you are spending $50,000 per month then that is $5,000 per month wasted. Moreover, if 10% of your traffic on the website is bots, how much does that skew your agile analytics? Your UI navigation analyses?
- If your SEO content is used without attribution, can you determine how having your posts distributed onto 50 other sites without your links, how many unique huma visitors are you losing each month?
- Given your data is certainly being scraped, the 10% of your traffic that is bots taking that data, can you estimate how much revenue you lose?
Now that you have quantified the amount of bots on your site and can estimate the business costs of bots on your website, you can make a clear ROI case for addressing the bots. You are now ready to evaluate your solution.
- When compiling the list of services to evaluate, be sure to eliminate all WAF and DOS services – these are not going to address the threat that scraper bots pose to your business.
- Don’t build it in house. For the same reason you do not write your own word processor, firewall, and antivirus… stay focused on your business.
Consider some key criteria for selection of a Bot Detection and Mitigation service:
- Accuracy: Cannot rely on a simple solution where IP = Bot or use a simple rate limit setting. Stay away from IP lists; bots are independent of IP addresses and you do not want to block legitimate human users.
- Real-time: Require a real-time solution with less than 20ms latency
- Easy to Implement: No coding; you should not have to modify your site or infrastructure
- Proven: need to have some results from reputable references: large enterprises, highly trafficked sites, or Fortune 500 will suffice – you deserve nothing less
Here are ten questions to begin with when building your selection process:
- Does the service have ability to identify bot devices and software independent of the IP address?
- Does the service use rate limits based on IP Addresses?
- What is the measured accuracy of the solution?
- What types of bot technologies can the service detect?
- CURL, python, Ruby, etc
- Headless browsers, etc
- Impostor user agents
- WATR, Selenium 2.0 or other browser automation scripts
- What technologies and techniques are used to detect bots? How many?
- What is the latency of the service?
- How is the service deployed? What is the architecture? Flexible/customizable?
- Are there professional services required? Setup fees? Additional hardware?
- What is the core offering and expertise of the company?
- Can references be provided?
Robots have been commonplace in manufacturing operations for years and indeed robots are now commonplace online too. The incorporation of bots into our online ecosystem is irreversible. It is now just a case of detecting and managing how we allow inbound bots on our websites to interact and affect our online business.
About the AuthorFollow on Twitter More Content by Charlie Minesinger