Here comes the cavalry

May 16, 2014 Mike Freenor

Empereur

“Vive L’Empereur” by Édouard Detaille

The engineering team at Distil, like many startups, has been busy since the beginning. Startups are always striving for maximum efficiency, and in the early days this implies a tight workforce. They have to distinguish themselves both from the crowd and from larger companies with far greater resources. Since this is my first rodeo, the constraints are the main side of the equation I have seen; in the early stages, engineers find themselves in a battle on all fronts.

As with any fight, the tide often suddenly turns. This week at Distil we picked up an additional data science hire (welcome, William!). Today’s blog post will reflect upon the experience of gaining another new teammate.

More is different

StormtrooperCorps

One of us is bound to hit the target…

One of the first things I had to learn on the job at Distil was parallel programming for big data — Hadoop mapreduce and Hive mainly. Having access to many machines is a game changer; much of what we do at Distil wouldn’t be possible without big data parallelism (across multiple machines).

Likewise, having more people onboard is a game changer. Much like the step from a single computer to many, the step to worker parallel is fundamental. When it comes to planning our project (how and when we will spend our effort), it’s less of a Robinson Crusoe style economic problem. With more people in the data science mix, along comes the problems of timing and coordination.

Similar to eliminating computational bottlenecks in a parallel program, planning for a team of many people requires effectively striping time; William, for instance, can work on training models while I work on identifying and extracting the next feature for analysis. A little bit of time and consideration spent on how to keep everybody moving forward at all times can help boost the momentum in a project greatly.

The whole is greater

voltron

But I form like Voltron and blast you with my shoulder missiles.

When adding more workers to the mix, there are three distinct possibilities: (1) you increase the amount of work done linearly, as in having people stack boxes or dig a ditch; (2) you decrease productivity, as in the case of adding too many people to the kitchen; or (3) you increase the amount of work done out of proportion. At this stage of growth for Distil, we are experiencing (3) — the whole is greater than the sum of its parts.

I’m just learning the ways in which (3) can be true. For one, it’s much easier to get out of a jam when even a second person is added to the mix. When confused on some technical or mathematical point (or stuck on a bug), it can be incredibly handy to have an outside (and invested) perspective. If one person’s stuck in quicksand, having a second person to help will do more than merely halve the amount of time required. Likewise, having someone available to tell me I’m going down the wrong path (or to suggest a path of lesser resistance) will save a significant chunk of time.

Secondly, time aside, there are psychological benefits to increased division of labor. When people have your back, they do more than save you time — they save you worry. Moving from being one of a few people chiefly concerned with a project to being one of a larger crowd can be a huge relief. Not only are more brains producing ideas, but these ideas are passed under more sets of eyes. Knowing that something has passed the scrutiny of many experts, rather than only a few, is a rather large boost of confidence.

Meditating explicitly on how to work together is key. Few working relationships work optimally right out of the box.

Summing up

Working with multiple people requires a new set of behaviors, attitudes, conventions, and so on. However, it’s a necessary fact of life; few large and complicated things are built by individuals anymore. Luckily, it’s not just a set of obstacles to step around, but a set of advantages to maximize. As with any parallel algorithm, effective teamwork can produce gains that are out of proportion.

About the Author

Mike Freenor

Mike Freenor is all about the numbers. As Senior Data Scientist at Distil Networks, he is responsible for designing and implementing a suite of statistical and behavioral analyses, built in Hadoop mapreduce. Mike is currently wrapping up his doctoral dissertation, critiquing modern macroeconomic methodology at Carnegie Mellon University.

More Content by Mike Freenor
Previous Article
If I Share My Story Would You Share Your Dollar with Me?
If I Share My Story Would You Share Your Dollar with Me?

Distil Networks pushes forward and has several new developments to share including closing a $10 million Se...

Next Article
Bot Traffic and Your KPI
Bot Traffic and Your KPI

Nearly every website is violated by bot traffic on a daily basis and it is costing website operators billio...