Overview

One of Europe’s largest independent online beauty retailers offers thousands of products across a broad range of brands. This e-tailer knows who its closest competitors are, and suspected several of them of web scraping. The problem became serious when the scraping became more advanced.

Online retailers are particularly susceptible to the effects of advanced bot threats, including price scraping, product matching, variation tracking and availability targeting. When bots scrape pricing and product information, aggregated data is fed to an analytics engine, allowing competitors to match prices and products quickly.

The online beauty retailer realized they were a victim of advanced site scraping; competitors were following their every move and undercutting them in the market in real-time. With the intent to thwart competitors’ efforts to scrape important pricing data and other information from its website, the retailer sought the help of Imperva (formerly Distil Networks).

Challenges

According to the Principal Solutions Developer, competitors were scraping pricing and inventory data on an ongoing basis, and responding very quickly to any changes made on their website. “We noticed that a lot of our traffic wasn’t made of real users coming to buy, but competitors spying on us,” he said. “Our Managing Director realized that whenever we made a change to pricing, inventory or other data, competitors were making similar changes. He had a gut feeling they were monitoring us. The reaction time was too quick for it to be a human; we knew it had to be some kind of bot.”

Bad bots are traced to competitors

The solutions developer began to examine the logs and found a lot of bot traffic on his site. “Some were quite careless — the IP address of those bots matched up with the static IP of our competitors’ offices,” he said. “As soon as we started blocking them, they caught on and moved their bots into the Amazon Cloud. We had lots of traffic coming out of Amazon. So we started blocking parts of Amazon. Then, it started getting messy.”

Manually blocking bots turns into an endless game of whack-a-mole

Initially, the task of blocking the bad bots manually took approximately an hour per week. He recalled, “The time and effort to control the bots got bigger and bigger and bigger as we needed to do more and more analysis of what was going on in the logs.”

The Solutions Development team created a long list of blocked IP addresses, but the bots were spoofing the headers as well, and it was difficult to determine whether an address was genuine or not. “There’s only so much you can do manually by looking at logs,” he said. “The task became more and more time-consuming, and pretty soon we were spending a day and a half every week checking for bots. We realized we needed a more efficient and reliable way to stop malicious bots from scraping the site’s data,” he said.

Requirements

As the Dev team began searching for a solution, they defined their requirements. The solution had to be able to filter all traffic — fast — by routing traffic through a DNS. It needed to be plug-and-play, so the team wouldn’t have to devote a lot of time or IT resources for deployment and ongoing management.

Retailer’s web infrastructure requires a cloud solution

Because their primary eCommerce website is hosted in the Microsoft Azure Cloud, an appliance-based solution wasn’t a viable option. “Many of the solutions were hardware-based,” he said. “You’d simply subscribe, install a piece of hardware, set up a firewall between the solution and the web and you’re done. But because we use Microsoft’s network and cloud, we couldn’t do that. We needed a solution that would route traffic through using a DNS and let someone else host it. But we still needed to be able to filter all the traffic coming in on the domain.” The retailer’s cloud infrastructure also required that the solution work with a CNAME rather than an IP address.

Of particular concern was that any solution block only malicious bots, not desirable bots from legitimate search engines such as Google and Bing. Being able to whitelist legitimate IP addresses was equally important to the online retailer.

WHY Imperva?

The Dev team searched, but was unable to find other viable solutions. “Most of the companies we looked at were trying to hack ways of solving the problem. The vast majority of them didn’t get past step one,” he noted. His team selected Imperva Bot Management because it met all of their requirements and could be implemented quickly.

Imperva’s SaaS solution is implemented in an afternoon

The Dev team engaged with Imperva in a trial, and the number of bots that came across the site was dramatically reduced. They moved forward with a full implementation and were up and running quickly. “We had to do a little bit of work on the website to handle our IP detection for location services, but all in all, we were up and running within a single afternoon,” he said.

Results

Imperva’s service stopped competitors from scraping pricing and product data

Since the online beauty retailer implemented Imperva Bot Management, malicious bots are blocked before they enter the site, preventing competitors from lifting pricing, inventory and availability data.

“The time it takes for our competitors to react to a price change has lengthened, so that’s given us more lead time,” said the solutions developer. “Customers notice there’s a price difference before the competitor is able to change their prices. Most consumers won’t wait to see if the competitor’s price drops; they’ll buy from us first.”

Bot defense reduced site errors and latency, lowering bounce rates

Another benefit is a reduction in site errors and the related bounce rate. Additionally, Imperva’s service has enabled the Dev team to make site changes without causing latency or disrupting the user experience.

Imperva solution saves retailer during Black Friday

“We had some troubles on Black Friday with the amount of load we were shifting through, which increased that day by 100X,” he said. “We suspected we had a faulty server, or a faulty network, within the solution. We fired up a new instance of the website on a different bank of servers. Normally, you’d have to wait for the DNS to resolve over a 24-hour period, but I managed to log into the imperva portal and redirect the traffic from one bank of servers to another in under 10 minutes. It was helpful to be able to move our solution anywhere without changing the DNS.”

Eliminated bad bot traffic, cutting server resource needs by 22%

With fewer bots bombarding the site, the online retailer saves on server resources and provides users better response times. “With Imperva, we don’t need as many servers running all the time, because we’re not wasting resources serving up content for bad bots. In fact, we’ve reduced the load by 22%, and as a result, customers experience faster response times.” According to the solutions developer, the savings in bandwidth and server resources is more than enough to pay for the Imperva service.

Automated and offloaded bot detection, saving a day and a half per week in IT resources

Most importantly, Imperva helps the Dev team thwart competitors who employ malicious bots to scrape data, while removing the time and effort spent tracking them down. “Some people have realized Imperva is filtering for us, and they’re trying to find ways of getting around it. But it’s not our headache anymore. We’ve offloaded that to Imperva. I know competitors will still respond to any pricing changes we make, but the harder we can make it for them, the more advantage we’ve got.”

The retailer’s Dev team is now back to its core responsibility: delivering features for the website. They are confident they have a solid solution for bot detection and mitigation, and after implementation, they were able to move on. The solutions developer concluded, “The Managing Director walked in the office a few days after we implemented Imperva, and he was smiling. That was a happy day for me.”