Toward the end of the summer, I collaborated with Distil Networks on performing two surveys – one of MLS executives, and another of IDX/VOW vendors that provide real estate websites. The goal was to identify challenges to solving the screen-scraping problem that has plagued our industry for so long.
The survey found that IDX and VOW vendors are almost all using only obsolete methods that don’t address the sophistication of modern scraping methods. For example, about a third of IDX/VOW vendors just use “rate limiting” – limiting one IP address from quickly scooping up all the pages on a site. These, days many scraping programs are designed to conceal their intentions; they move slowly through a small portion of a site and change their IP address before detection, getting around rate limit and page limit protections. The hard fact is that IP address blocking doesn’t really work, especially now that most of the scraping traffic comes from dynamic IP addresses assigned to consumers by Comcast, Time Warner, Verizon, etc.
Through the survey, vendors were educated about the impact of scraping on their business and that of their customers. Still, the most important factor in having website providers implement appropriate security around the listing content is “compliance with MLS rules” – 62% of IDX vendor respondents rated this as highly important. Therefore, one of the most important things the industry needs to do to solve this scraping problem is to improve current MLS IDX policy to address scraping specifically. IDX and VOW Policy – and/or the contracts that reflect those policies – must be robust enough so that obsolete anti-scraping methods are considered insufficient for compliance.
Supporting this change in policy, 95% of the MLS executives polled agreed that IDX sites should be subject to rules specifically mandating scraping protections. However, while 99% say compliance with rules protecting misuse of MLS data is important, the majority of MLSs do not perform compliance testing – even for existing VOW rules regarding anti-scraping. A test suite or certification program to facilitate anti-scraping compliance reviews absolutely has to be developed.
The scraping problem is not just one for IDX and VOW providers: 96% of MLS executives rate it important that their MLS vendor implement anti-scraping solutions, as scrapers attack not only the password-protected MLS but also client collaboration and framed IDX solutions provided by those vendors. MLS vendors should note that 94% of MLS executive respondents indicated that a vendor’s information security practices, including sophistication of anti-scraping technology, are important to them when selecting an MLS vendor. Once industry leaders have taken care of the areas where they have direct control, they will then have a high ground from which to request that portal websites take comparable steps to protect MLS data displayed on their sites.
Based on the near unanimous support for rule changes and a standard for testing compliance, combined with advances in affordable anti-scraper services available, there is a path forward for the industry to solve the scraping problem once and for all.
If you wish to review more of the study results and read the entire white paper, you can download it here.
About the Author
Matt Cohen, Clareity’s Chief Technologist, joined Clareity in 1996 and has over seventeen years of extensive real estate technology and business experience. Matt has consulted for many of the top Associations, regional MLSs, MLS software vendors, and large brokerages, as well as a wide variety of information and technology companies that service the real estate industry.More Content by Matt Cohen