Protecting Property Portals from Web Scraping Bots

October 22, 2015 Orion Cassetto

Earlier this month, Distil Networks’ CEO Rami Essaid was invited to speak at the 2015 Property Portal Watch Conference in Amsterdam. The theme of the conference was: "What Does 2016 and Beyond Hold for the Property Portal Industry?”, to which our answer is “Bots. Lots and lots of very sophisticated bots.”

Bots and property portals

Securing property portal listing data is harder than ever. Why? Because web scraping is cheap and easy, and bots are growing ever more capable of evading detection. Bots can steal whatever data they’re programmed to scrap – anything from listings and photos to exclusive data that should only be available to legitimate partners and customers. Unlike the good bots that help prospects find your data on the web, bad bots are often deployed by less-than-ethical parties, and can:

  • Damage brand reputation by presenting inaccurate, outdated, or otherwise uncontrolled listings

  • Negatively impact SEO page rankings by duplicating data

  • Skewing analytics, which can easily lead to mis-informed business decisions

  • Slow site performance due to “bot storms” that keep out legitimate traffic

The end result is lost revenue, lost reputation, and lost customers.

Bots are a very real problem

Earlier this year, we undertook a survey among 100 executives and 14 property portal operators representing a total of over 600,000 realtors and 400,000 real estate websites. The results are available in the report "2015 Study: State of Web Scraping Data Theft Across Real Estate Websites & MLS Data".

From our real-estate clients around the world, we know that web scraping is pretty much a fact of life. So we started out by asking participants just how much scraping traffic a property portal could sustain before their business model and the budget began to suffer. The answer? Not a lot. 43% felt that 1% scraping traffic was too much, and another 28% capped their tolerance at 15%.

When we compared this response with the actual amount of bad bot traffic on property portal sites as recorded in our 2015 Bad Bot Landscape Report, we found that 71% of those surveyed were experiencing levels of bot traffic beyond their tolerance levels.

The data shown above was captured in September 2015 from a popular real estate service that uses Distil Networks service. It shows that the vast majority of site traffic – 87.5% - is non-human. While 75% of that traffic is good bots and whitelisted traffic sources (bots from trusted partners), the remaining balance – more than 12% - is from web scrapers.

Why aren’t property portals doing more to protect their IP?

If web scraping is so rampant, why aren’t more property portals taking action against it? The answer, sadly, is that many think they are doing as much as they can. Unfortunately, they’re relying on the wrong tools to deal with the problem.

When we asked our survey participants what technologies they were using to detect bots, almost 80% were relying on log analysis. While this is good practice as part of any cyber security strategy, it’s nowhere near sophisticated or scalable enough to deal with today’s bots. The picture was equally bleak when it came to blocking bad bots. The three major tools in use were IP blocking, rate limiting, and web application firewalls, none of which was designed to deal with bots and all of which fall woefully short. Here’s why:

IP Blocking is too reactive

IP blocking has been the number one choice for years. It’s relatively cheap and easy for businesses to set up in-house, but it’s definitely a case of getting what you pay for. It’s always one step behind the bad guys, who rotate IP addresses from huge pools, spoof genuine addresses, and mask their origins with anonymous proxies. It’s an endless game of whack-a-mole that the good guys can never win but spend enormous amounts of resources to play.

Low and slow attacks bypass rate limiting

Rate limiting is all well and good if all the requests come from a single trackable source. But once an attacker slowly trickles the requests out across a vast pool of rotating IP addresses, there’s no easy target at which to aim the rate limiting and the attacker flies under the radar, straight to the heart of your site.

WAFs can’t deal with fast-changing bot technologies

WAFs are a great way to block OWASP top ten attacks, but today’s sophisticated web attacks are frequently designed to behave like humans, complete with time delays and other unpredictable actions. If it looks and behaves like a human, a WAF will likely let it in.

And, in common with many other areas of online criminal behavior, the law is lagging behind.

Growth in mobile traffic exacerbates the threat

As in many other industries, mobile is becoming an increasingly important component of the real estate marketing mix. Unfortunately, the same characteristics that make a mobile optimized site easy to navigate for humans also makes it a prime target for bad bots, because they provide a more structured approach to website data which is easy to scrape. Wherever humans go online, bots are not far behind.

So what can you do?

Don’t wait for the law to catch up – find out how you can create a secure listing “supply chain” with your upstream and downstream partners. Get a free no-strings trial of Distil Networks’ solution at www.distilnetworks.com/trial

 

 

About the Author

Orion Cassetto

Orion Cassetto joined Distil Networks as Director of Product Marketing in 2015, bringing with him nearly a decade of experience in the Cyber Security industry. His strengths include competitive strategy, positioning, and messaging for web application security and SaaS-based security solutions.

More Content by Orion Cassetto
Previous Article
6 Scary Bots that’ll be Knocking on your Website’s Door this Halloween
6 Scary Bots that’ll be Knocking on your Website’s Door this Halloween

6 Scary Bots for Halloween

Next Article
The State of Online Advertising Fraud & Bot Traffic - 2015 Study
The State of Online Advertising Fraud & Bot Traffic - 2015 Study

The State of Online Advertising Fraud & Bot Traffic - 2015 IAB Study with Distil Networks reveals buyer & s...