Are forums and social media websites safe from data harvesting?
Online data harvesters regularly “scrape” websites by the millions. Fast computers, high-speed Internet connections, and automated software now expedite this task. Scrapers sort the gathered information into enormous databases. Some exploit the data for marketing purposes; others sell it, use it for private investigations, or post it on other websites. Most social media and forum websites lack the security measures to protect users from data harvesters.
Many small-time spam and virus distributors rapidly scour the Internet for users’ personal data; especially email addresses and other direct contact information. Private forums and social media offer little protection from determined harvesters. A human may need to fill out the registration forms and confirm an email address, but the automated scraping software will perform just as well. With a little extra effort, data harvesters can gain valuable information that marketers and investigators will pay to obtain.
Major companies also harvest data. The Wall Street Journal reported on an incident involving The Nielsen Company in October 2010. The website PatientsLikeMe.com discovered that Nielsen used scraping software to retrieve all of the messages from its private discussion forums. Drug manufacturers paid Nielsen for the data. Nielsen did not obtain permission from the website; it merely registered for an account and put its data harvesting software to work.
Like many scraping activities, Nielsen’s actions were not prohibited by U.S. law. Federal regulations do outlaw some activities related to data harvesting. For instance, the government enacted major laws against unsolicited email in 2003. However, it often proves difficult to enforce laws on the Internet. Forum users and owners shouldn’t wait for authorities to crack down on scraping in general; the federal government harvests data for security purposes.
The International Business Times reported in January 2012 that security personnel had arrested two British tourists at the Los Angeles International Airport. One man had posted poorly-worded jokes on Twitter that officials interpreted as terrorist threats. The tourists were interrogated and sent home. The FBI recently expressed an interest in hiring contractors to develop new scraping software, according to PC Magazine. The software would monitor news and social media.
Data harvesting harms the owners and users of interactive websites. Users may receive unwanted phone calls or email messages. Private information might be taken out of context or used to steal a person’s identity. Website owners lose traffic and face higher operating expenses. For example, some members of PatientsLikeMe.com stopped using its forums after they learned about the Nielsen incident. At the same time, data harvesting systems consume bandwidth and increase the cost to run a website.
It’s our belief that forums and social media websites have a responsibility to prevent data harvesting. Although no action can keep users entirely safe, it’s possible to curtail access to most automated harvesters. For example, Distil Inc. offers a service that helps websites withhold data from “robots” that seek to collect information. It does not obstruct well-intentioned robots like search engine “spiders.” Website operators can supplement this type of security by screening new members and discouraging the disclosure of private information.
- Team Distil
About the AuthorFollow on Twitter More Content by Courtney Brady