The term “bot” is more common now than ever. We hear it used in politics, on social media, and when discussing website traffic. But how much do you really understand how damage caused by bots hurts your business?
Most businesses require a strong web presence to generate revenue. Creating an attractive, efficient, and popular website is no easy task. All the while, website owners must be fully aware that bad bots lurk around every corner—looking to steal your content, abuse functionality, alter site metrics, and commit fraud—all to the gain of bot herders.
Bad bots are an all-too-common and growing problem. But what specific damage are they doing on your website? And how does it impact your business?
1. Web Scraping—Content theft, competitive data mining, and price scraping
Bots run rampant through websites around the clock. They “scrape” content, then illicitly repost it on other sites. Backlinks and credit are never given. In fact, you’ll often never know that your content has been scraped unless you invest time looking for it externally using a search tool like Google or Bing.
Duplicated content reflects poorly on the originating website and lowers its Google pagerank. This equates to fewer visitors, which in turn can lead to lower sales. In short, content scraping can severely impact a site’s ability to survive and thrive on the web.
Unscrupulous competitors use bots to scrape competitor information. Online retailers are particularly susceptible to the effects of price scraping, product matching, variation tracking, and availability targeting. E-commerce businesses are an obvious target for bad bots due to flash sales and limited-time promotions—treasure to web scrapers.
When bots scrape price and product information, aggregated data is fed to an analytics engine, thereby enabling competitors to match prices and products in close to real time. Seconds can make the difference between the original retailer keeping a sale and a scraper stealing it.
Travel industry sites like OTAs, airlines, hotels, and metasearch sites are all incessantly scraped by bots. Competitors and aggregators also use web scraping bots to circumvent approved APIs, steal away cross-sell and upsell opportunities, and hijack content.
If you serve proprietary content or pricing information on your website, you can be certain that a bad bot is scraping it.
How to spot it
You won’t know unless you look. Once you start investigating, tell-tale signs include:
- Seeing your content elsewhere on the web—oftentimes on competitors’ sites
- Noticing that your competition rapidly changes their pricing in response to your price changes
Amazon price-scraping bots crush Diapers.com
Diapers.com was once a thriving e-commerce site providing a comprehensive line of baby products. Amazon.com was a successful online bookstore considering entering other target markets—including products for infants. It ran a sniffer bot against Diapers.com, learned its business strategy, and tracked pricing and margins. Eventually it was able to mimic Diapers.com’s product line while also getting real-time price updates.
Within a year, Diapers.com’s investors became alarmed. Jeff Bezos and company had launched Amazon Mom, shortly thereafter acquiring a devalued Diapers.com—the campaign being much quicker than for Amazon to have built their own infant line business from scratch.
2. Skewing—All your web metrics are wrong
Skewing is typically a downstream effect of bots running rampant on your site. Every bot interaction, no matter what it does, skews the metrics associated with that business. Because the number of bots operating on the internet is so significant, business decisions based on such metrics are obviously flawed.
With marketers looking at how people find their website and examining how customers progress to making a purchase, conversion rate data is a key factor in funnel optimization. Businesses regularly perform A/B page testing and look at analytical data to judge specific page performance. Marketing departments also consider lead attribution data so as to understand the effectiveness of their marketing spend and determine future plans. But what happens when all the data is skewed by bots that have no intention of ever making a purchase? Bots inflate the cost of doing business and lead to wasted marketing spend.
Looking at another angle, bot operators can easily alter someone’s reputation, influence others, or gain online notoriety—particularly through social media. The public discourse about real or fake Twitter followers is an example of a visible metric being skewed by bogus accounts created and run by bots. And online polls are another example where results are seen in real time, but where the data is easily manipulated by bots seeking to achieve a preferred result.
How to spot it
Metrics are skewed by every bot on your site. Some ways to spot them on your network include:
- Examine your analytics platform – Your analytics could reveal unexplained spikes that signal a bot attack, such as a number of login credentials being attempted to gain access, or even a vulnerability scan run against your site.
- Examine sudden changes in traffic origin – An example could be an unusually high group of requests from a specific country. To you, it appears as if it originates from there as you examine your traffic—but it could easily be a bot targeting your business. Do you spend more money on marketing and sales in that country based on the apparent requests? Can you trust your business decision-making from the data you have? Without knowing if the traffic is human, your decision-making is at risk.
Skewed analytics affects conversion rates at Hayneedle
Online retailer Hayneedle had a daily struggle against competitors using bots to scrape their pricing. Fraudsters were using stolen credit card numbers, substituting CVV numbers until they found valid matches. Such activity skewed their marketing metrics; bots were loading up items in carts, but not purchasing, thereby affecting funnel conversion metrics.
Source: Distil Case Study – Hayneedle
Source: Distil & Hayneedle Webinar – Are Bot Operators Eating Your Lunch?
3. Denial of Service
You’re probably familiar with volumetric distributed denial of service (DDoS) attacks, intended to overwhelm servers with their massive request volume so as to bring down a targeted website. But bot attacks, flying under the radar, aren't limited to volumetric attacks.
Application level DoS is very different; it requires only a small number of requests to affect performance or lead to downtime. Occurring at OSI layer 7, an application attack takes down your web application and your backend keels over, while your firewall and load balancer continues to function as if nothing is amiss.
For example, if your homepage traffic triples, your site can probably handle it. But that same amount of traffic directed at your shopping cart page causes problems as your web application sends multiple requests to all components involved in each transaction. This includes contacting the inventory database, connecting with payment processing and fraud tools, and using analysis tools for cross-sell opportunities. It doesn't take much traffic like that to cause a layer 7 denial of service.
How to spot it
The first two examples may seem obvious, but deeper investigation related to applications and services running on your website could be a sign of a bot-driven denial of service attacks targeting your applications:
- An increase in dissatisfied customer calls because the site is not working, or is too slow to use
- Site and applications/services slowdowns
- Excessive spikes on certain pages, coupled with increased service requests to access the shopping cart database, payment processing system, and fraud detection tools
Datalex stops excessive scraper bots
Datalex is an airline e-commerce platform that uses a global distribution system (GDS) to provide flight information to fliers. Authorized web scrapers are common in this industry and many are whitelisted (allowed) to access flight information. But unapproved scraping bots have also taken hold, scraping at levels that, in effect, cause slowdowns and downtime.
Source: Distil Case Study - Datalex
4. Credential Cracking
Credential cracking is a technique practiced by bot operators when they possess a known username, but need to guess the accompanying password. They use brute force dictionary lookups and multiple guessing attempts against application authentication processes to arrive at the missing credential piece.
Email addresses are also commonly used as account usernames. All it takes to gain access, then, is to acquire a list of known email addresses, then use a brute force bot to match them up with common passwords.
Another source of credentials is when usernames or email addresses are made publicly available within comment or message boards on various sites; these can easily be scraped and paired with common passwords to provide authentication.
The average person uses only five different passwords for all online accounts—many of them easily guessed by brute force bots. By quickly validating login credentials, the bots can then commit account takeovers on other sites where those same credential sets are used.
Note: This differs from credential stuffing, which is number five on our list.
How to spot it
Attacks on login pages are relatively easy to detect, but only after the fact when performing forensics. Look for the following:
- An abnormal rate of failed login attempts could indicate stolen credentials are being tested on your site
- Check your user directory and authentication logs
5. Credential Stuffing
Credential stuffing is when known matched username/password pairs are used via brute force to authenticate on websites. Bots run this process faster than any human, testing millions of credentials at a time.
Why does this work? It’s not unusual for users to reuse login credential sets on more than one application or website. This makes the bots’ job that much easier.
An attacker could have directly sourced the stolen credentials from another application, purchased them in a criminal marketplace, or obtained them from publicly available data breach dumps.
How to spot it
Similar to credential cracking, examine the following:
- Check your logs, looking for an abnormal number of attempted logins. This time look for any successful login that follows a series of failed ones
- Then look for fraud associated with those accounts, as well as complaints of customer lockouts (as logged by help centers or shared through social media outlets)
Ashley Madison: The most promiscuous account credentials on the web
When online dating service Ashley Madison was breached in 2015, 32 million login credentials were released and made available for sale. They were subsequently used by perpetrators to login to otherwise unauthorized websites that contained personal user information.
When companies compared their failed logins against the Ashley Madison list, they noticed matches.
Any time there is a public announcement about a massive breach involving stolen credentials, website owners should expect an increase in attempted logins from bots using the revealed credentials.
6. Ad Fraud
Ad fraud is incredibly crafty, costing businesses a significant amount if it’s left unchecked. It occurs when bots survey your website and blithely click on its posted ads, thereby rendering them useless in relation to metrics.
Advertisers expect that their ad buy will attract human eyes. But their ads are reaching fewer humans every day—ads shown to bots will never generate business for an advertiser.
Meanwhile, publishers want to increase their revenues by increasing the number of ads they serve, or the CPM they charge. But when bots come along instead of humans, ad serving resources are drained. Traffic and ad revenue are diverted to the bad guys’ sites. That means not only loss of ad revenue but also damaged reputations.
The end result is a breach of the implicit trust relationship between advertisers and publishers.
Examples of digital ad fraud include:
- Impression fraud: Bogus ad websites are used by bots to repeatedly load the pages, generating false ad impressions
- Click fraud: Bots use fake ad search websites to get paid on expensive search terms
- Retargeting: Sends bots to legitimate ad websites to create a valuable cookie profile, used to earn premium ad revenue
How to Spot It
When a site is being defrauded, conversion rates typically go down. Google ad buys could hit the maximum each day, but at an earlier time than you would usually expect or desire. This is likely because a bot has clicked through your ads instead of real humans. Other symptoms indicating that you might be a click fraud victim:
- Spikes in click volume with little or no change in conversions
- Steep declines in conversions despite no change in keyword bids
- Spikes in clicks from a keyword in one search engine, but not others
- Lower user engagement (e.g., higher bounce rates) during click volume spikes
- Repetitive clicks from the same IP address
Digital Publishing: An Industry Under Attack
Digital publishers work in an industry that is under a focused attack from bad bot operators. In recent studies, it’s estimated that for every $3 spent in digital advertising, one-third is siphoned out of the ecosystem by fraudsters running automated bots.
Source: Digital Publisher’s Guide to Measuring & Mitigating Non-Human Traffic
7. Vulnerability Scanning
Vulnerability scanners are automated tools (a.k.a., bots) that run tests against a website or web application, looking to identify weaknesses and possible vulnerabilities. Many scanners are available; popular ones include Metasploit, Burp Suite, Grendel Scan, and Nmap.
Two groups run vulnerability scans against a company’s web properties; friendly penetration testers and malicious attackers.
Security professionals hire pentesters to examine their web infrastructure. They expect them to run a number of vulnerability scanners to fuzz the site for weaknesses (a testing method that inputs massive amounts of random data). The scanners’ IP addresses are whitelisted with respect to any network security devices. Being an automated tool, it’s effectively a bot. The pentesting report is then used to prioritize weaknesses requiring fortification. Such vulnerability scanning can help improve the security posture of any company.
But the dark side of vulnerability scanning is revealed when it’s used by a malicious attacker. Their scanning tool performs the same systematic examination of the web property, but this time its operation is unknown to security professionals charged with site protection. The tool is not whitelisted; it freely operates as an anonymous bot roaming about within normal traffic. And worst of all, the pentesting vulnerabilities report is likely now in the hands of the hacker, for later evaluation and exploitation.
The untargeted scan also fits in this category. It occurs when a known vulnerability in common software (e.g., WordPress) is exploited. An automated bot can scan the web for a list of sites running the vulnerable version. If your site ends up on the list, expect an unwanted visit.
Vulnerability scanning bots typically perform reconnaissance first, which inevitably leads to escalating malicious behavior. Blocking such recon helps prevent subsequent attacks.
How to spot it:
Vulnerability scanners typically cannot discriminate what they scan. If you see an IP address that systematically accesses every page on your site, that is an indicator of a scanner at work. In addition, a higher number of “page-not-found” requests might indicate a scanner on the loose.
Footprinting is a process by which bots probe applications to identify their properties. It includes testing to learn as much as possible about an application’s underlying logic, structures, algorithms, functions, methods, configuration, and other secrets. It can also determine entry points, collectively referred to as the attack surface.
Footprinting can also include brute force, dictionary attacks, and the guessing of file and directory names. Fuzz testing might be employed to discover holes used to identify further application resources and capabilities.
Footprinting is likely to have occurred when “users” (bots) exercise functionality of the entire application in a manner unlike the behavior of a typical user.
How to spot it
Similar to vulnerability scans, bot-driven footprinting can be identified by spotting vulnerability assessment and network inventory-scanning tools running on your network. It’s difficult to differentiate footprinting from vulnerability scanning (as well as fingerprinting), except when you specifically look at what it is they are scanning.
- More traffic on services that don't normally see that much could signal a bot performing a scan
- A spike in requests for pages or services that don't exist could be an indicator of non-human interaction with your network and applications
- Significant (abnormal) amounts of “page not found” deliveries could also signal that a bot is preparing a footprint of your environment
- Look at your logs to identify what isn’t normal human traffic so you can make an informed decision as to how to block it
Fingerprinting occurs when specific requests attempt to elicit information in an effort to profile an application. Such probing typically examines HTTP header names and values, session identifier names and formats, contents of error page messages, URL path case sensitivity, URL path patterns, file extensions, and whether software-specific files and directories exist.
This method is often reliant on data leakage; its profiling may also reveal something of interest regarding network architecture/topology. By querying a store of exposed application properties—such as those held in a search engine index—fingerprinting can be undertaken without any direct use of the application.How to spot it
Often there are no symptoms of fingerprinting. Such bots typically access resources that aren’t visible on the site and that a normal user shouldn’t access. Seeing this sort of activity indicates something has gone awry.
- If the bot requests pages that don't exist, it could generate a spike in “page not found” deliveries
- Differentiating fingerprinting from footprinting and vulnerability scanning can be difficult when looking within the traffic. For example, some scanners use automation to scan for particular things. If you can identify that it’s not a browser performing such scans, you could deny it access to anything through the website
Without you realizing it, bots might be stuffing your forms with garbage data. Spamming can add questionable or even malicious data to public and private content, databases, and user messages.
Form spam, comment spam, fake registrations, and rogue reviews and product listings pollute your site and backend systems. Traditional fixes like google reCAPTCHAs are a problem because they create friction for legitimate users and hurt conversion rates. Worse yet, CAPTCHA farms and sophisticated bots can solve Google reCAPTCHAs.
If your blog is well known, right now someone may be offering to sell links from it to anyone willing to pay a few dollars (or a few cents). Your blog may even be listed by name, with backlinks for sale at a set price. Spammers do this to game the search engines and to trick your readers into visiting dubious websites. Sometimes their clients are ostensibly harmless, but are often peddling fake pills, porn, scams, and malware. Sometimes they’ll use “buffer sites”—innocent-looking web pages intended to disguise the fact that they’re really advertising something more sinister.
The problem is rampant; large sites having form fields are especially susceptible. For example, National Public Radio (NPR) announced it would no longer feature reader comments on its site due to comment spam distracting readers from the real stories. USA Today also noted that other news organizations are moving away from posting comments as well.
How to spot it
The most obvious signs that a spam bot is running on your website include:
- Spam begins to appear in the reviews or comments section of your site
- Links to other sites that take visitors from your site to another
- Higher bounce rates when sending out email to your house list (Spammers often you fake email accounts when filling out forms)
- Complaints from advertisers or partners that they are receiving bogus or spammy leads from your site
Spam prevention is better than the cure at Drupal
Drupal is open source content management software that is used to create many of the websites and applications people use every day. The Drupal development community is one of the largest in the world, consisting of more than a million passionate designers, developers, trainers, strategists, coordinators, editors, and sponsors all working together. Volunteer moderators complained that spam was taking too much time to manually remove. Upon examination, Drupal noticed that spammers, using bots and on their own, typically created more than one account. By identifying and blocking the devices that opened those accounts, the spam problem was minimized.
Source: Distil Case Study - Drupal.org
This white paper highlights ten ways bots can target your site, but covers only half of the existing automated threats. The Open Web Application Security Project (OWASP) is an important standards body in the application security community. Its automated threats to web applications list also includes: account aggregation, account creation, carding, card cracking, cashing out, CAPTCHA bypass, expediting, scalping, sniping, and token cracking. For more information about these, download the OWASP Automated Threat Handbook.
About the Author
Edward Roberts leads Product Marketing and has over twenty years experience in technology marketing. Previously he worked for Juniper Networks, heading up Product Marketing for the Counter Security team. Before that he ran marketing for Mykonos Software, a web security company.More Content by Edward Roberts