Scraping Just Got a Lot More Dangerous

March 26, 2013 Rami Essaid

News organizations have been fighting to stay alive for years, now they have a paved road to profitability. A Federal Court just severely restricted fair use by upholding the NY Times and The AP’s claim to copyright against Meltwater, a web scraping service, that finds mentions of clients in the news. This ruling means that news organizations, and any content producer, can monetize the use of their content to 3rd parties who previously have been freely scraping content. This isn’t just about syndication of your content anymore; there are a thousand other ways that companies profit off of the content that a publisher creates and now it is their right to share in those profits.

Why is this important? For years everyone on the internet have been under the assumption that when something is posted online, it’s free and fair to use. That meant that despite all the hard work and effort that went into writing an online article, nobody respected the value of that particular article- until now. The court ruled that a web scraper that is monetizing off of someone else’s content is not entitled to fair use and is in essence “stealing.”

Wait. Isn’t Google a web scraper? Well, yes. But the difference is for a search of “The New York Times”, 56% of people see that an exert on Google clicked through, as opposed to .08% for Meltwater because Google has established a reputable reputation for correctly giving credit to articles and web content verses a lesser known site. That is the distinction that separates theft from search engines. It is a slightly blurry line but I believe it will become clearer as more organizations start enforcing their rights.

So moving forward, any online publisher can and should:

  1. Monitor their site for content scrapers by either examining their log files manually or using Distil Networks in monitor only mode
  2. Go after any infringing scrapers to protect their copyright.
  3. Set up a monetization policy and perhaps build an API to sell access to their content to scrapers that need to have continued access to this data.

Ruling
http://www.scribd.com/doc/131847330/Meltwater-AP-Ruling

Reference Article
http://venturebeat.com/2013/03/24/why-scraping-online-news-stories-could-land-you-in-hot-water/

About the Author

Rami Essaid

Rami Essaid, Distil's Co-Founder and CEO, began his career as the founder and CEO of Chit Chat Communications. After a successful exit, he consulted in mobile development. With over 11 years in communications, network security, and infrastructure management, Rami advised enterprise companies to help improve scalability and reliability while maintaining a high level of security. Rami attended North Carolina State University where he majored in computer engineering.

Follow on Twitter More Content by Rami Essaid
Previous Article
As a Large Enterprise, Should You Trust Cloud Security Providers?
As a Large Enterprise, Should You Trust Cloud Security Providers?

Not being aware of a security gap is almost always the reason those gaps are eventually exploited. Because ...

Next Article
The Dirty Secret About Robots.txt
The Dirty Secret About Robots.txt

Since 1994, webmasters having been creating “robots.txt” files and using them as that proverbial “please do...

×

Never miss an update.

First Name
Thank you!
Error - something went wrong!