Blocking bots is an annoyingly important part of running a public-facing website. CrunchBase leverages Distil to separate the wheat from the chaff.
Founded in 2007, CrunchBase is a website offering massive amounts of data about startup activity. Want to know who founded a startup, who invested in it, or who they're competing with? CrunchBase has the answers. And in a marketplace that is somewhat frothy, CrunchBase is an increasingly heavily trafficked web property. The site contains over 650,000 profiles of individuals and companies and is a massive repository of data. As such, CrunchBase has a massive opportunity to monetize that data, and is accordingly concerned about people who seek to use that data for their own commercial aims.
I spent time talking with Kurt Freytag, head of product at CrunchBase, to have a look at the engineering work that goes into the site. As the site grew in size and traffic, Freytag noticed oddly shaped traffic and random spikes that were putting significant strain on its infrastructure. Of course, it could have simply thrown more horsepower at the site, but Freytag was keen to identify real root causes for the issues. He quickly concluded that bot traffic was hitting the site hard and crawling through its data. While this is a primary concern in terms of performance, it also introduces real commercial risk as third parties use the sites data elsewhere. People were literally stealing CrunchBases's data and monetizing it. Something had to be done.
Freytag was keen to find a fix. He came across security vendor Distil and agreed to take a look at how applicable Distil was to CrunchBase's problem set. Initially, Distil was seen as a tool to deal primarily with the performance issues, and protection of IP became a secondary value offering. Freytag was adamant that he didn’t want to have to make changes to his underlying web infrastructure to implement a solution. Distil runs traffic through a proxy, so it needs to change little within the infrastructure to implement it.