Meanwhile, web applications are increasingly vulnerable to automated threats and are subjected to click fraud, scraping, account takeover, comment spam, and more. These and other malicious or illicit activities are described in detail in the OWASP Automated Threat Handbook for Web Applications.
Scripted using Python or cURL, for years dumb bots have been able to download the entire contents of a website. They connect from an ISP and make web requests using a command prompt. Together with rate limiting, standard bot mitigation practice has been to use log analysis to identify IP addresses of the bad bots and then rewrite firewall rules to block any future requests from those locations.
Marty Boos, Sr. Director of Technology Operations at StubHub, explains in his video testimonial, “Until recently, bots were fairly unsophisticated. People used cURL or some other non-browser based tools to mine a lot of data off of our site,” he said. “That’s morphed into browser-based plugins—Selenium, DejaClick—things like that.”
Evolving bot sophistication poses a challenging problem for defenders. The bot builders are all too aware that yesterday’s web defense strategies primarily hinged on IP recognition and blocking—a strategy that works for simple bots. But the game has changed.
The Rise of Advanced Persistent Bots (APBs)
Shown above, an APB is controlled from a central computer. Using orchestration software (e.g., SaltStack), the bot herder can spin up multiple instances through such cloud providers as Amazon AWS and DigitalOcean so as to extract desired data from a target site. Each instance can have numerous IPs. And all instances can either attack simultaneously, or be dispatched in waves.
APBs are the perfect weapon to circumvent log analysis and basic IP blocking. They’ll attack from as many addresses as it takes to bypass weak security controls found at the target. And APBs avoid rate limiting thresholds by reducing the amount of requests made per IP.
This is the new adversary. Organizations looking to protect page content, block cyber thieves from taking over customer accounts, or prevent hackers from performing vulnerability scans must have a far more robust solution to defend against such distributed attacks.
How Do You Stop APBs?
The key to preventing bad bot infiltration is to positively ID every visitor—human or bot. One highly effective method for achieving this is to determine if the requester is actually what they say they are. A requester’s header information can offer clues.
Machine Learning Based-Blocking
Some bad bots have become so advanced that they automate an actual browser. In such cases it’s necessary to evaluate requester behavior using machine learning. Here, a bot-browsing pattern appears different than legitimate traffic.
A metrics profile can reveal anomalies that distinguish bots. For example, how did the requester enter the site? What time of day did it come in? Is it connecting from an ISP or a datacenter? What pages did it go through? How did it navigate through the site?
Bots end up being very random or quite systematic. Their patterns help identify what is legitimate and what is not.
Advanced Digital Fingerprinting
Most security solutions today use access control lists (ACLs) or other blocking mechanisms based on the positive ID of a given IP address. By contrast, Distil creates a digital fingerprint by peering deeply into the requesting device’s browser, using 200 unique markers to identify it. It can then block bad actors based on their fingerprint. If the bad actor shifts its IP, the requester profile retains the same fingerprint and can still be identified.
Modern bad bots are advanced and persistent. Attacks are distributed across vast networks to ensure that while security teams are playing whack-a-mole, threat actors are able to steal data. Performing log analysis and using ACLs to isolate and block malicious/illicit IPs isn’t sufficient in solving the bot problem anymore. The key to keeping bad bots off your site is positive identification using a variety of methods: header evaluation, machine learning, and fingerprinting. The rise of the advanced persistent bot needs an advanced persistent defense.Providing bad bot armies to steal content or perform other illicit activities has become a blossoming industry—and now anyone can cheaply and easily target your site. Learn more about the players, technologies, and services in our 2016 Economics of Web Scraping Report.