So, what exactly is a web bot?
We spend so much time talking about bots here at Distil Networks that it’s easy to forget that, like many tech terms, definitions blur, mutate, and evolve over time. So, let’s revisit what we mean when we talk about bots, what they are, and what they can do.
Where did the word “bot” come from and what is the modern definition of a bot?
According to the Online Etymology Dictionary:
bot (n.) in Internet sense, c.2000, short for robot. Its modern use has curious affinities with earlier uses, such as “parasitical worm or maggot” (1520s), of unknown origin; and Australian-New Zealand slang “worthless, troublesome person” (World War I-era).
Wikipedia has a number of individual entries for the word “bot” in the Computing category, all of which essentially feed back into the “robot” root – a piece of software that has been programmed to undertake a specific task or tasks automatically and without human intervention.
Here’s how our Chief Scientist Andrew Stein defines it:
“A bot is any automated tool or script that’s designed to perform a specific task. Some are good, some are bad. “
In other words, while something like Googlebot is essential to the operation of the web as we know it today, a brute force password cracker is the polar opposite of useful to anyone except the person using it.
Bots can be used for fraud, brute force attacks, web scraping, spam, and more
Bots can be programmed to do pretty much anything a human can do on a computer, as long as that task has specific, logical steps. Certain tasks that would take humans many hours to accomplish can be completed in just minutes or seconds by bots, making them more economically efficient than humans. At Distil Networks, our focus is on defending websites against bots that have been programmed to undertake all kinds of activities including:
- Data theft – web scraping, or lifting information and content directly from web pages
- Brute force attacks – locating username and password forms on websites and repeatedly attempting to guess the proper credentials through the systematic use of password dictionaries, stolen usernames and passwords, or algorithms.
- Unauthorized vulnerability scans – bots snooping around a site to find entry points for an attack
- Click fraud – artificially inflating click-thru rates to exhaust competitors’ ad impressions or hijacking search terms
- Form spam – fake form postings, registrations or comments that degrade the user experience and create unnecessary work for site admins
And because these bots are NOT human, they are not distracted by anything else on the page and can find what they’re programmed to look for very, very quickly. We see this widely in the online travel aggregator business – new market entrants will simply scrape everyone else’s sites for content and pricing information, reconfigure it in their own database, and represent it through their own UI, all in a matter of seconds (or the time it takes for a shopper to run a web search for “low fares to xyz”). Learn more about how online travel sites can defend against site scraping.
At Distil Networks, we also create and use bots ourselves to test the effectiveness of our technology.
From basic source code scraping bots to complex, automated (aka “headless”) browsers, and everything in between.
So, then, what is bot detection?
It might seem like an impossible task, but it definitely isn’t. Although bots alter their appearances and evolve their strategies, they still have their weak points. Here are a few ways you can detect bots attempting to abuse your site, mobile app, or API:
- Set up browser validation to detect scrapers and automation tools. This way, you can stop requests from fishy visitors who claim they’re one type of browser, but are missing key features or abilities associated with the browser. In these cases, the visitor is faking it, giving you plenty of reason to deny access.
- Use a high-definition fingerprint. As noted above, bots can cycle through uncountable numbers of IP addresses. But with a high-def fingerprint that involves a complex series of identification points, it’s nearly impossible for bots to disguise themselves and get around your ID checks.
- Leave it to the machines. Machine learning is more adaptive and self-adjusting than ever, and it’s getting more powerful every day. This approach uses evolving models that keep a very low false positive rate, ensuring only bots are blocked rather than good, normal human users.
From basic source code scraping bots to complex, automated (aka “headless”) browsers, and everything in between
This latter type is probably the most widely used approach today. Many of these more intelligent bots are still detectable, however, as they lack essential browser functionality and certain human behavioral patterns. These bots can almost (but not quite) perfectly mimic a human user.
The next generation of bots
This behavior-mimicking is where bots are heading, and it’s a key reason home-grown bot detection tools fail; unlike Distil Networks’ solutions, home-grown tools do not have access to the kind of resources that can differentiate between human users and headless browsers. Our machine learning technology and team of data scientists enables us to understand, model, and predict the behaviors of bots running in headless browsers, making it extremely difficult for even the most sophisticated of bots to avoid detection. This unique bot detection capability, coupled with our cloud-based repository of fingerprints and behavior patterns, enables us to deliver real-time, intelligent bot detection and mitigation that is 99.9% accurate.
So if you’re using a home-grown solution (or a broader-based solution with a checkbox for bot detection) and wondering what you’re missing, contact Distil Networks today.
We welcome your thoughts
We’re doing a lot more work on categorization to make it as easy as possible for our customers to focus on the threats that are of specific concern to them, as well as to generate reports that are meaningful to a broad range of recipients. If you have thoughts on this or any other aspect of bad bot management, please leave us a comment below.
About the AuthorMore Content by Brian Feldman