Five Lessons in Bot Prevention
We recently had the opportunity to work with SC Magazine and the Online Trust Alliance to present the top-level findings from our 2015 Bad Bot Report. Distil Networks’ CEO and Co-Founder, Rami Essaid, and OTA President Craig Speizle highlighted six areas where businesses need to focus their efforts to protect site integrity against bad bots. If you have the time, we encourage you to watch the full presentation, which runs for a little under an hour, but if you don’t, we’ve summarized the content in this post.
Lesson #1 – Know Who’s Visiting Your Site
For the first time, the 2015 Bad Bot Report noted that 60% of all web traffic is non-human. That means 60% of your web infrastructure costs are going to support bots. Fortunately, more than half of those bots are good bots like Googlebot and Bingbot, but even those bots must abide by the “robots.txt rule” which tells bots where they can legitimately go on a site. Unfortunately, there’s no enforcement attached to this, so we take the view that, if a bot is not explicitly good - ie, bringing value - it should be blocked.
Good bots tend to focus their activities more on larger sites, simply because those sites have more content that’s valuable in user searches. Smaller sites attract a greater percentage of bad bot traffic and consequently tend to suffer more - they’re far less likely to be able to stand up to the random traffic spikes of bad bot attacks.
Lesson #2 – Understand Your Vulnerabilities
The business you’re in also makes a big difference as to the type of bots your site attracts. Online travel websites are of little interest to search engine bots because their pages are so dynamic - but this is exactly what makes them attractive to bad bots. Those time-limited offers are just what web scrapers deployed by competitors are looking for. There’s much more on threats to the travel industry in our How to Defend Travel Websites in the Era of Site Scraping white paper.
Web scraping is also a big threat to digital publishing sites - their web content is, after all, their IP. They’re also vulnerable to digital ad fraud (redirecting ad traffic), a multi-billion dollar business that’s graphically described in our From Bots to Bottom Lines infographic. Banks and other online services, on the other hand, are more concerned about brute force and man-in-the-middle attacks. Pretty much every type of site wants to stop form spam and fake registrations.
Lesson #3 – Identify and Block the Worst Offenders
The channels bad bots are using to reach your site have also changed significantly. This year’s report showed a 300% increase in bad bot activity on consumer ISP channels like Comcast and Time Warner. By contrast, business-focused ISPs like Verizon Business and Level3 saw a significant drop in bad bot traffic.
Amazon’s traffic is still 80% bad bots, likely due to the ease with which anyone can get a web service up and running there - unfortunately, this is just as helpful to the bad guys as to legitimate web developers. The company also does a lot of web indexing to support its price comparison services, not always with the permission of the site owners - putting them firmly in the “if it isn’t explicitly good, it’s bad” category.
Most hosting providers are not able to inspect traffic passing through their servers, so if your company doesn’t need to do business with high bad-bot activity providers, blocking them is an easy way to protect your web infrastructure.
Lesson #4: Mobile Sites Are Firmly in the Bad Guys’ Sights
Most website traffic now travels over mobile networks; where traffic goes, bad bots soon follow, and so we saw a tenfold increase in mobile bots last year. The Android Webkit Browser was one of the top five user agents leveraged by bad bots to mask their identity. Mobile sites are also easier to scrape – the same characteristics that make a mobile-optimized site easy to get around for humans also makes them prime targets for bad bots.
Along with mobile sites, mobile networks are also being targeted, thanks to the widespread availability of cheap, good wireless connections and the tendency for connections to be proxied together. This is particularly evident in China, with its 1.5 billion mobile users – the top three worst-offending mobile carriers are all based there, and more than 30% of the country’s mobile traffic is bad bots.
Lesson #5 – Watch Out for Countries with “Bad Bot GDP”
It seems obvious that companies should focus on blocking traffic from those countries originating the largest number of bad bots, but that would mean blocking US traffic - clearly not a feasible strategy. The US has more cheap hosting providers than any other country, so it presents the most attractive distribution network, but it’s not where the bad bots are originating.
Measuring the number of bad bots per online user - what we’ve dubbed the Bad Bot GDP - tells a very different story, and a credible one when you take a closer look:
Singapore is a popular data center hub for China and Southeast Asia, and has a small population relative to its well-developed infrastructure
Israel has a similarly small population and the most comprehensive data center and Internet infrastructure in the Middle East.
Slovenia was where Matjaz Skorjanc, the developer of the Mariposa botnet, was arrested after his malware hijacked more than 12 million computers.
Criminal events also impacted the Maldives; Russian hacker Roman Selezney was arrested there for allegedly stealing millions of dollars’ worth of credit card data.
Bad bots are now a round-the-clock and round-the-world operation. Fourteen countries, almost double the number in 2013, originated at least 1% of bad bot traffic in 2014, and those attacks are spread more evenly throughout the day compared with 2013, when attackers appeared to be waiting for IT personnel to leave work before launching their attacks.
If your company operates in well-defined geographical markets, you can use Geo-IP Fencing to block traffic from irrelevant sources. But remember that IP address blocks are not static; the blocked IP space needs to be regularly redefined to avoid excluding legitimate prospects.
Super Trend – Sophisticated Bots Need Sophisticated Defenses
Bots are getting more sophisticated by the day. The 2015 Bad Bot Report found a whopping 41% of bad bots mimicking human behavior, and 7% went a step further, disguising themselves as good bots. When a bad bot masked as the Googlebot enters a site, it can cause major problems without arousing any suspicion.
Simple bad bots are relatively easy to spot, because they will leverage bad users agents or fail basic browser integrity checks. Average bad bots can be trapped by forcing them to prove they’re using a real web browser. Sophisticated bad bots are the most challenging, because they closely mimic human behavior.
A well-tuned web application firewall does a decent job of catching the simple bots, but it’s clearly ineffective as a strategy to block the volume, variety and sophistication levels of today’s bots. Specialized tools are needed, and that’s where Distil Networks, with our unique ability to help businesses visualize the bot activity on their websites.
It’s All a Matter of Understanding
Wondering who or what is coming to your website? Distil Networks is offering a free threat analysis of your website security. Visit Distil Networks and enter the promo code SCMAGBLOG for your personalized report.
About the AuthorFollow on Twitter More Content by Courtney Brady