Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The mis-use of "big data" tools like Hadoop to do with entire giant clusters of systems what can be done more quickly on a MacBook Air encapsulates quite a bit of what is wrong with all modern software development: overengineering, dick size based infrastructure decisions ("look at how big my cluster is!"), stacking up tools to pad one's resume, mindlessly copying what bigger companies do, needless layers of abstraction, etc.

Look up the largest hard drive of any type that you can find for sale. Now spec out a consumer or small business grade NAS with 2-4 of those drives. If your data will fit there, you do not have "big data." If the cost bothers you consider that the cloud footprint (or on-prem mini data center) required to use your big sexy "big data" approach will cost far more than one of those NAS systems, possibly every month.

The only real exception is if you need performance and the computations you are doing are CPU bound or highly parallelizable. If you need rapid turnaround you may want some kind of distributed replicated cluster approach that can do things in parallel. For the majority of jobs though these are periodic or internal facing analytics jobs and getting the results faster is not worth 10X-100X the hardware cost or cloud bill and 10X the developer time.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: