What is a Web Bot?

April 22, 2015 Brian Feldman

So, what exactly is a web bot?

We spend so much time talking about bots here at Distil Networks that it’s easy to forget that, like many tech terms, definitions blur, mutate, and evolve over time. So, let’s revisit what we mean when we talk about bots, what they are, and what they can do.

Where did the word “bot” come from and what is the modern definition of a bot?

According to the Online Etymology Dictionary:

bot (n.) in Internet sense, c.2000, short for robot. Its modern use has curious affinities with earlier uses, such as “parasitical worm or maggot” (1520s), of unknown origin; and Australian-New Zealand slang “worthless, troublesome person” (World War I-era). 

Wikipedia has a number of individual entries for the word “bot” in the Computing category, all of which essentially feed back into the “robot” root – a piece of software that has been programmed to undertake a specific task or tasks automatically and without human intervention.

Here’s how our Chief Scientist Andrew Stein defines it:

“A bot is any automated tool or script that’s designed to perform a specific task. Some are good, some are bad. “

In other words, while something like Googlebot is essential to the operation of the web as we know it today, a brute force password cracker is the polar opposite of useful to anyone except the person using it.

Bots can be used for fraud, brute force attacks, web scraping, spam, and more

Bots can be programmed to do pretty much anything a human can do on a computer, as long as that task has specific, logical steps. Certain tasks that would take humans many hours to accomplish can be completed in just minutes or seconds by bots, making them more economically efficient than humans. At Distil Networks, our focus is on defending websites against bots that have been programmed to undertake all kinds of activities including:

  • Data theft – web scraping, or lifting information and content directly from web pages
  • Brute force attacks – locating username and password forms on websites and repeatedly attempting to guess the proper credentials through the systematic use of password dictionaries, stolen usernames and passwords, or algorithms.
  • Unauthorized vulnerability scans – bots snooping around a site to find entry points for an attack
  • Click fraud – artificially inflating click-thru rates to exhaust competitors’ ad impressions or hijacking search terms
  • Form spam – fake form postings, registrations or comments that degrade the user experience and create unnecessary work for site admins

And because these bots are NOT human, they are not distracted by anything else on the page and can find what they’re programmed to look for very, very quickly. We see this widely in the online travel aggregator business – new market entrants will simply scrape everyone else’s sites for content and pricing information, reconfigure it in their own database, and represent it through their own UI, all in a matter of seconds (or the time it takes for a shopper to run a web search for “low fares to xyz”). Learn more about how online travel sites can defend against site scraping.

At Distil Networks, we also create and use bots ourselves to test the effectiveness of our technology.

From basic source code scraping bots to complex, automated (aka “headless”) browsers, and everything in between.

So, then, what is bot detection?

It might seem like an impossible task, but it definitely isn’t. Although bots alter their appearances and evolve their strategies, they still have their weak points. Here are a few ways you can detect bots attempting to abuse your site, mobile app, or API:

  • Set up browser validation to detect scrapers and automation tools. This way, you can stop requests from fishy visitors who claim they’re one type of browser, but are missing key features or abilities associated with the browser. In these cases, the visitor is faking it, giving you plenty of reason to deny access.
  • Use a high-definition fingerprint. As noted above, bots can cycle through uncountable numbers of IP addresses. But with a high-def fingerprint that involves a complex series of identification points, it’s nearly impossible for bots to disguise themselves and get around your ID checks.
  • Leave it to the machines. Machine learning is more adaptive and self-adjusting than ever, and it’s getting more powerful every day. This approach uses evolving models that keep a very low false positive rate, ensuring only bots are blocked rather than good, normal human users.

From basic source code scraping bots to complex, automated (aka “headless”) browsers, and everything in between

Basic bots are very effective at grabbing the entire source code of a web page by using technologies like cURL, Wget, python, or other common scripting languages. These bots tend to be pretty easy to detect – they don’t run JavaScript so they’re not sophisticated enough to avoid detection by Distil’s JavaScript check. And at the end of the day, simply downloading the entire source code of a web page is not terribly useful on its own – another layer of processing is still needed to extract any useful information and put that information to productive use.

Then there are the basic bot scripts that get wrapped in components of web browsers, allowing them to parse and interpret the contents of web pages using JavaScript or another control language. These bots can be programmed to interact directly with web pages, for example, to spam forms or throw password dictionaries at user login fields. This capability makes them significantly more threatening and frustrating to online businesses than basic bots alone.

This latter type is probably the most widely used approach today. Many of these more intelligent bots are still detectable, however, as they lack essential browser functionality and certain human behavioral patterns. These bots can almost (but not quite) perfectly mimic a human user.

The next generation of bots

This behavior-mimicking is where bots are heading, and it’s a key reason home-grown bot detection tools fail; unlike Distil Networks’ solutions, home-grown tools do not have access to the kind of resources that can differentiate between human users and headless browsers. Our machine learning technology and team of data scientists enables us to understand, model, and predict the behaviors of bots running in headless browsers, making it extremely difficult for even the most sophisticated of bots to avoid detection. This unique bot detection capability, coupled with our cloud-based repository of fingerprints and behavior patterns, enables us to deliver real-time, intelligent bot detection and mitigation that is 99.9% accurate.

So if you’re using a home-grown solution (or a broader-based solution with a checkbox for bot detection) and wondering what you’re missing, contact Distil Networks today.

We welcome your thoughts

We’re doing a lot more work on categorization to make it as easy as possible for our customers to focus on the threats that are of specific concern to them, as well as to generate reports that are meaningful to a broad range of recipients. If you have thoughts on this or any other aspect of bad bot management, please leave us a comment below.




About the Author

Brian Feldman

Brian Feldman, Solutions Engineer at Distil, graduated from University of Maryland with a degree in Finance after studying international business in Australia and Dubai. As an undergrad, Feldman founded a music management and event production company where he worked with some of the fastest rising stars in hip hop and EDM. Feldman is also the CEO & Founder of Gratii, a mobile marketing and gaming company that has been recognized as one of the hottest startups in the Mid-Atlantic region.

More Content by Brian Feldman
Previous Article
How Bots Impact Travel Distribution Systems
How Bots Impact Travel Distribution Systems

Learn how bots affect travel distribution systems from PhocusWright’s report The Future of Travel Distribut...

Next Article
16 World Renowned IT Security Experts Provide Website Security Tips and What To Avoid
16 World Renowned IT Security Experts Provide Website Security Tips and What To Avoid

Discover the 16 tips from cybersecurity experts to defend against cyber attacks. Protect yourself and avoid...