One Thing Google and Content Thieves Have in Common

March 12, 2012 Courtney Brady

Both Google and Content Thieves know how valuable unique content is. 

Unique text is one of the most valuable types of content that a website has. It is the only type of content that search engines can index effectively, and it is what a potential visitor sees before clicking your website’s link on search results pages. A single in-depth article may represent hours of original research, and the theft of that article — either through stripping the text and republishing it elsewhere or by writing a new article with identical content and different words — can tarnish the reputation of the original publisher and cause a traffic reduction due to search engine penalties. Many webmasters are taking content theft extremely seriously and have begun using content protection services to guard their content and prevent it from being scraped or plagiarized.

How Search Engines Work

Google and other search engines use the text of a Web page to determine what the page is about. Keywords that appear more often or are emphasized with bold or header tags are assumed to be most important. Media content — images, audio and Flash, for example — cannot be indexed in the same way. While it is possible to tag media content with a few relevant keywords, media does not give a website the same opportunity to be ranked for long-tail keyword variations as a full-length article. While search engines are constantly devising new ways to index and rank content, the value of text content for search engine rankings will not decrease in the foreseeable future.

Google’s Duplicate Content Policy

Many webmasters are aware that content must be completely original for the best possible chance of achieving a high search engine ranking, but few may have read Google’s official policy on the matter. When Google finds two pages containing exactly the same content, its algorithm automatically determines which version of the content is best and displays it on search results pages. Duplicate pages are not displayed. Instead, they appear in the section labeled “we have omitted some entries very similar…” at the bottom of the final results page. When content is plagiarized, it may be possible in some cases for the plagiarized content to be listed in Google’s index while the original source is removed by the duplicate content algorithm. The worst case scenario occurs if Google’s algorithm determines that the original source of the content used deceptive practices to copy content and steal search engine traffic; in this case, it is possible for the entire website to be removed from Google’s index. The owner of the original website would then have no recourse except to file a reconsideration request with Google and attempt to have the plagiarized content taken offline.  Why this matters:  The better Google’s results are, the more people will use Google, and the more advertising revenue they can generate.  We’re talking $1,000,000,000+ (One Billion!!).   Now for the content thieves…

Content Scrapers

Content scrapers — also called “autoblogs” — are websites that steal content in part or in whole from other websites using automatic software with the goal of luring visitors from search engines. Content scrapers often frame the text with as many advertisements as possible, hoping that people who stumble on their websites will click an advertisement rather than clicking “Back.” Although Google’s 2011 “Panda” algorithm made it more difficult for content scrapers to earn money, new scraper sites continue to be created every day. Content scrapers provide no value for visitors and often display so many intrusive advertisements that visitors will do almost anything to navigate away from them as quickly as possible. As a webmaster, this practice can do far more than hurt your standing with search engines and reduce your income; it can also harm your reputation. A visitor who sees an article on your website after seeing the same article on a scraper site may automatically dismiss your content as low-quality, and once you have lost a visitor’s trust, it is virtually impossible to get it back.

Protecting Unique Content

Many webmasters worry about the damage that plagiarism can cause. However, they lack the time needed to monitor the Web for duplicate content and do not have the financial resources to hire programmers and develop in-house solutions for content protection. Increasingly, they are relying on third-party services to keep their content safe.

Content Protection Networks

Content protection networks use a proactive approach to safeguarding content. When a visitor navigates to a website protected by a content protection network, the pages requested are sent through the network’s servers rather than the website itself. Content protection networks are able to recognize the digital signature of an automated content scraper and deny access while keeping the website available for legitimate visitors. Because content protection networks can compress code for faster delivery and typically have severs located around the world, it may also increase the reliability and speed of a website.

Search Alerts

Although content protection networks can help prevent automated content theft, the possibility of an individual stealing content manually remains a concern. To combat this, many webmasters set up automatic alerts with search engines such as Google. This involves creating a search alert for an exact phrase that appears nowhere online except in the article the webmaster wants to protect. If someone republishes the article verbatim, the search alert causes a message to be sent to the author, who can then file a request in accordance with the Digital Millennium Copyright Act to have the article removed from the offending website as well as search engine indices. While search alerts are free to create, they are inelegant as a means of content protection because you may receive a “false positive” if someone publishes a legitimate article that repeats a phrase used in your article by chance. In addition, you will not receive an alert if someone plagiarizes an article but modifies the phrase you are monitoring.

Plagiarism Checkers

A plagiarism checking service is a more reliable solution than search alerts for catching plagiarism after it has occurred. Instead of checking for a single identical phrase, a plagiarism checking service regularly monitors the Web for articles with a significant amount of text similar to the ones being monitored. Plagiarism checking services typically are not free, and it can be quite costly to monitor more than a few pages of content. However, a plagiarism checking service can detect stolen content more reliably than a search alert for a single phrase.

So What Does This Mean?

It is difficult to create unique content but it is even harder to keep content unique. Businesses now have the tools to ensure their content is not devalued by web scrapers. Stop website scrapers from stealing the value of your content and ensure your visitors, your brand equity, your revenue, and your business remain with you.

Have thoughts? We want to hear from you. Drop us a line if you agree, disagree, have questions, etc.

Thanks for reading,
Team Distil

The post One Thing Google and Content Thieves Have in Common appeared first on Distil Networks.

About the Author

Courtney Brady

Courtney Brady is the Director of Marketing at Distil Networks. She comes to Distil Networks from a variety of start-up companies, routed in SaaS and DaaS solutions. Formerly the global communications manager at multiple companies, Courtney is responsible for developing the company’s marketing strategy and branding campaign.

Follow on Twitter More Content by Courtney Brady
Previous Article
How to Pitch an Idea to an Investor: 8 Tips
How to Pitch an Idea to an Investor: 8 Tips

Learn the 8 tips on how to successfully pitch your ideas to investor. Distil Networks divulges the secret o...

Next Article
Private User Information Harvested From the Web
Private User Information Harvested From the Web

Online data harvesters regularly scrape websites by the millions. Fast computers, high-speed Internet conne...