Why make all that noise with a detailed blog post then? If it's a custom-fit internal tool, then good for you, the rest of the world doesn't care. Each company has internal tools and stuff.
There is the sharing of ideas. Maybe they couldn't open source it, but were given permission to publish about it. Google never opensourced some of their greatest contributions, just the ideas behind them.
I think blog posts like these are an interesting way to show off what goes on in a large company like Uber.
If you're a tiny startup, then Spark + MLLib is more than enough. Even that would be overkill if your data fits on a single machine.
But if you're at a young, but quickly-growing company with:
- terabytes of data
- tens of thousands of features extracted from the data
- dozens or hundreds of unique machine learning models being tweaked over time
then hopefully a blog post like this is helpful. It shows off various effective patterns for solving machine learning patterns at scale. Presumably, you'll want to build your own internal system with its own set of hooks, but the best practices and lessons learned should be roughly the same.