Data scientists and machine learning engineers in small and medium businesses often end up over-engineering their machine learning workflow and stack. In a series of posts below, I will share a few tricks learnt over the years related to choosing right components of the ML pipeline. In this first post, let us go through mistakes small teams can make. Later posts will explain possible solutions. Preferring Generalized Solutions Many big companies will perfect a very general solution to machine learning problems by investing obscene amount of resources.
I have been teaching machine learning to programmers since some time. I started this activity in 2013 and till date I have conducted 10+ hands on workshops. All these workshops were typically 3-3.5 hours long and covered some theory, coding examples through Python (ocassionally R) and interactive discussions. They were attended by 10-50 programmers. Here are some learnings from this teaching activity, in no particular order. Many of them were in fact goof-ups that I did at one point of time, so these are indeed lessons from trenches.
Well the title is intentionally exaggerating. May not the biggest but certainly one of the most important area of friction in taking machine learning to production. Scenario Imagine that you/your data scientist has written a functional machine learning pipeline in Python today. And by pipeline I mean data transformation as well as prediction code. For example, you could have a data which is mix of text and numeric features. You might do some text processing to generate n-gram features with some custom filters on the text.
When you are working with a task where you are interactively building a solution, you need a lot of focus. Most of the work that I do fits into this category. Interactively building solutions to larger problems by solving smaller problems is pioneered by data first tools like Clojure and R. The first step to achieving the focus required in such interactive work is to remove distractions and complexities from your environments as much as possible.