Scraping Just Got a Lot More Dangerous

March 26, 2013 Rami Essaid

News organizations have been fighting to stay alive for years, now they have a paved road to profitability. A Federal Court just severely restricted fair use by upholding the NY Times and The AP’s claim to copyright against Meltwater, a web scraping service, that finds mentions of clients in the news. This ruling means that news organizations, and any content producer, can monetize the use of their content to 3rd parties who previously have been freely scraping content. This isn’t just about syndication of your content anymore; there are a thousand other ways that companies profit off of the content that a publisher creates and now it is their right to share in those profits.

Why is this important? For years everyone on the internet have been under the assumption that when something is posted online, it’s free and fair to use. That meant that despite all the hard work and effort that went into writing an online article, nobody respected the value of that particular article- until now. The court ruled that a web scraper that is monetizing off of someone else’s content is not entitled to fair use and is in essence “stealing.”

Wait. Isn’t Google a web scraper? Well, yes. But the difference is for a search of “The New York Times”, 56% of people see that an exert on Google clicked through, as opposed to .08% for Meltwater because Google has established a reputable reputation for correctly giving credit to articles and web content verses a lesser known site. That is the distinction that separates theft from search engines. It is a slightly blurry line but I believe it will become clearer as more organizations start enforcing their rights.

So moving forward, any online publisher can and should:

  1. Monitor their site for content scrapers by either examining their log files manually or using Distil Networks in monitor only mode
  2. Go after any infringing scrapers to protect their copyright.
  3. Set up a monetization policy and perhaps build an API to sell access to their content to scrapers that need to have continued access to this data.


Reference Article

About the Author

Rami Essaid

Rami Essaid is the Chief Product and Strategy Officer and Co-founder of Distil Networks, the first easy and accurate way to identify and police malicious website traffic, blocking 99.9% of bad bots without impacting legitimate users. With over 12 years in telecommunications, network security, and cloud infrastructure management, Rami continues to advise enterprise companies around the world, helping them embrace the cloud to improve their scalability and reliability while maintaining a high level of security.

Follow on Twitter More Content by Rami Essaid
Previous Article
As a Large Enterprise, Should You Trust Cloud Security Providers?
As a Large Enterprise, Should You Trust Cloud Security Providers?

Not being aware of a security gap is almost always the reason those gaps are eventually exploited. Because ...

Next Article
The Dirty Secret About Robots.txt
The Dirty Secret About Robots.txt

Since 1994, webmasters having been creating “robots.txt” files and using them as that proverbial “please do...