In a stunning announcement of which professionals should be aware, LinkedIn has made public that nefarious actors used malicious bots to actively scrape user profile data from its site for almost a year.
“During periods... since December 2015, and to this day, unknown persons and/or entities employing various automated software programs (often referred to as ‘bots’) have extracted and copied data from many LinkedIn pages,” said the complaint, filed in the Northern California U.S. District Court.
The implications are huge, as detailed in the filing. “LinkedIn will suffer ongoing and irreparable harm to its consumer goodwill and trust, which [it] has worked hard for years to earn and maintain, if the … conduct continues.”
LinkedIn contends that fraud was committed, its terms of services violated, and that it made reasonable efforts (using security measures to prevent automation) to protect itself. All of that being said, the illegality of web scraping is still up for debate.
Because the stolen data has already been made public by the nature of LinkedIn’s business, how can the legal system intervene? Where is the line between data protection and preferential treatment? What legal claim does LinkedIn have to data that belongs to its members? Society is still grappling with these questions. Few, if any, answers are forthcoming.
Why Scrape LinkedIn?
Since the company didn’t disclose what the web scrapers were after in its filing, one can only speculate. In a Quora post, it’s apparent that scrapers have been active on LinkedIn for a while. The practice is unsavory, but falls within a gray area that unscrupulous organizations are readily willing to exploit.
Scrapers create value by stealing websites or API content that would otherwise require resources (e.g., time and money) to acquire. Third-party web scraping businesses can build platforms with far lower overhead costs on top of the stolen data, subsequently offering competitive products to LinkedIn’s suite at lower rates.
Bad Bots are Ubiquitous
This incident is a potent reminder that content scraping has increasingly become a major problem. Everyone should realize that 20% of all internet traffic today is likely malicious automation (read: bots). Anything of value, whether shared publicly or hidden behind a login page, is a potential target.
The problem is widespread—encompassing online publishers to e-commerce marketplaces, online retail, travel, real estate, and now social networking.
In the travel sector, Distil sees bots being used to scrape flight information from licensed partner sites like Kayak.com, Travelocity, et al., in order to avoid affiliate contracts and associated fees. This enables competitive sites to be quickly put up with lower overhead costs to their owners.
It’s a similar problem in the online real estate market. Legitimate third-party listing sites pull data from centralized sources on a fee basis. Scrapers steal content from them and then repost using the same monetization model—that is, they sell leads or post their own ads.
The Bot Problem is Getting Worse
Distil Networks has been observing that the “bot herders” behind these attacks readily adapt to anti-bot technologies, thereby improving their methods accordingly.
Increasingly businesses are making the requisite investment in web security, but bot prevention is still lower on their priority list. This is a mistake. But others have become Increasingly aware of the problem and are adopting bot mitigation and detection solutions. This strategy makes financial sense once one digs a little deeper into the many benefits:
What You Gain
- Protecting that which belongs to you and your customers – That LinkedIn suffered a large-scale and persistent content scraping attack is by no means unique. The majority of Distil customers suffer similar persistent attacks; it’s a constant battle. It’s estimated that over 50% of all internet traffic is automated, and nearly 20% of it is considered illicit or malicious.
If you derive income from your website content, so can anybody else. And they’ll use bots to make it happen. If parts of your app require secure access (e.g., login pages), a nefarious entity is actively using bots, pounding against your page for pathways into your customer accounts.
- Lowering fraud – Businesses that leverage e-commerce have dealt with online fraud for years. There has been a steady increase in web-based incidents originated by bad bots—so much so that anti-fraud systems were often overloaded. Adding bot prevention effectively lowers the amount of fraudulent traffic reaching e-commerce sites, thereby reducing such incidents.
- Lowering infrastructure costs – On average, Distil customers report a reduction in server load of 20% or higher once bot prevention is in place.
- Improving the value of ad spend – Online advertising is rife with fake traffic. Even the most reputable and widely used ad platforms are exploited by various scams, such as click fraud. Online advertisers often rely on inadequate or inaccurate metrics to measure the legitimacy and quality of traffic generated by their networks. Having visibility into IP sources and those requests that are automated could reduce your marketing budget by countless dollars in ad spend.
- Improved customer experience via improved site performance – Our satisfied customers report lower infrastructure costs and higher website load time/speeds. But in addition they’re also discovering that, prior to implementing the Distil solution, their stack of analytics tools—traffic analysis, A/B testing, customer experience testing, and ultimately inbound pipeline measurements—were all being skewed by bad bots.
- Improved security – Not addressing the bot problem limits the effectiveness of any effective web security program. Bots can detect vulnerable or shadow IT assets. They can then test them for vulnerabilities using such methods as remote file inclusion, SQL injection, and cross-site scripting. Even with security measures such as firewalling and vulnerability management programs in place, web apps are still vulnerable.
While LinkedIn’s content scraping problem may be news to some, the bad bot problem has been internet-wide for many years. Organizations are increasingly turning to existing security technologies to combat it, but evidently even massive web properties—LinkedIn being today’s poster child—remain vulnerable. Bots don’t rest and are ever-increasing in their sophistication.
While firms like LinkedIn pursue legal recourse, other organizations don’t have the time, resources, nor inclination. LinkedIn is about to encounter the same protracted battle many law enforcement agencies have experienced when fighting cyber threat actors across jurisdictions and international borders. The most efficient and effective response is to put a bot prevention solution in place before facing consequences similar to LinkedIn.
Further reading: In Web Scraping: Everything You Wanted to Know (but were afraid to ask) we describe the origins of the bot problem, as well as advancements made by both cyber criminals and bot detection.