It's Time to Drop the "F" Bomb - or "Lies, Damn Lies, and NoSQL."

arihant · on May 11, 2011

We performed our own crash testing for a Riak and about a million other databases, Riak and MySQL with InnoDB were the only cases scoring more than 5 out of 10.

I think the problem with NoSQL is that they are targeting the wrong people - the lazy people. One of the DBs we reviewed (can't remember which) did not have single datacentre durability, lost like 80% data while crash during updating table contents and was boasting some geo-coordinate built-in datatype on their website. Its the priorities plague. Is built-in geo more important than data?

Why do NoSQL databases have to be all distributed beyond sharding? I think because thats what people wrongly perceive out of them. Google, LinkedIn all use NoSQL which is distributed, so if a NoSQL DB doesn't do it, its a shame. Thats the root of misjudgment I believe. Every DB, NoSQL or not, needs to have a place. One size fits all is what will kill NoSQL, whether enforced by engineering or marketing. Thats why I think marketing "lie".

CouchDB, Riak and Redis are only few exceptions I know of which seemed to have a vision and stuck to it.

tuna · on May 11, 2011

wait until you fill your mysql + innodb with real data, have a crash and have to perform a check to get it back online. stick with riak.

spudlyo · on May 11, 2011

You're confusing InnoDB with MyISAM. InnoDB does redo log recovery like all grown up databases. MyISAM requires a lengthy fsck type operation.

tuna · on May 12, 2011

you are confusing real world with a comment on hackernews. try and do a full recovery on innodb with real data and get back so uncle tuna can hold you tight and promise that will never happen again. hint: corrupted tables crashing mysql processes and you having to start it with innodb_recovery 4 for ro, dumping the table, reapplying it back.

-- Mel Gibson's surgeon on "Payback"

benblack · on May 11, 2011

To some commenters: the C in CAP and the C in ACID are not the same thing. If that is not clear to you, it is unlikely the database you develop will include correct implementations of core concepts. Knowledge is power.

Peace and love to the human family.

- Lil' B

strlen · on May 11, 2011

Just to expand on this, the "C" in CAP corresponds (roughly) to the "A" and "I" in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the "atomic multicast" problem which is a private instance of the general consensus problem.

In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That's a loss of availability that's not acceptable to many applications.

This means, that you can't build a low-latency multi-master system that achieves the "A" and "I" guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with "floating masters" for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the "master-less" token range.

These are not binary "on/off" switches: see Yahoo's PNUTS for a great "middle of the road" system. The paper < http://research.yahoo.com/node/2304 > has an intuitive example explaining the various consistency models.

Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a "table group"/"entity group" (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).

[Disclaimer: I am commiter on Project Voldemort, a Dynamo implementation; in addition to Dynamo, I also find Yahoo's PNUTS and Google's BigTable to be very interesting architectures.]

jdefarge · on May 12, 2011

Oh, great! Here we go again with that CAP flame war. In spite of strlen's well written post, things always degenerate in a pointless discussion everytime CAP is cited (Godwin, right?)

The truth is so simple: some applications can give up milliseconds (even hours) of "A" for strong "C" (but not otherwise), some apps can give up strong "C" for high "A" (but not otherwise). How difficult it is to accept this?

The tricky part is when to give up "C" or "A", where to draw the lines. There's no ready recipe for this, sorry. Basho post seems to point right into that direction when it states that it will provide various options across the CAP spectrum. Smart companies deliver what their customers want.

"But the world is eventual consistent then you always should choose A". Classic non sequitur. Yes, real world is weakly consistent, fractal and uncertain, but we, as computer professionals, aim to build models (i.e., simplifications) of real world processes. Now, try to model and automate all that uncertainty and inconsistency of the world when the deadlines are just around the corner!

benblack · on May 11, 2011

Thanks for bringing some much needed science to these proceedings, my man. Too many people mistaking computer science for a therapy session where their feelings matter. Computer science has much in common with the honey badger. Think upon this and be enlightened.

Yours in perpetual discovery, - Lil' B

alnayyir · on May 13, 2011

Stop signing your posts.

haberman · on May 11, 2011

> To some commenters: the C in CAP and the C in ACID are not the same thing.

This is an interesting point, but I wonder if they are really that different. Even NoSQL systems support atomic updates and sequential consistency at some granularity (like a single key, document, etc.)

I wonder if it's really so inaccurate to think of NoSQL data stores as a set of tiny ACID databases, one for each key/document/etc.

benblack · on May 11, 2011

They really are that different, and I hear 10gen is hiring. Increase the peace.

- Lil' B

haberman · on May 12, 2011

That's deep man, but I posed an actual question.

BigTable (which I use daily and have extensive experience with) offers atomic and transactional updates at the row level. It uses Paxos to guarantee that at most one process at a time owns each row and can mutate it. That process keeps a sequential log that imposes an absolute order over updates to that row.

So it appears to me that despite being a NoSQL database, BigTable rows offer both ACID and the "C" of CAP. In fact, I bet it would be possible to implement a MySQL backend that uses a BigTable row as its storage.

I get strlen's point that C != C (it's more analogous to A/I), but my real point is that NoSQL (at least in the case of BigTable) doesn't appear to be fundamentally different than SQL in the offered guarantees, but rather in the granularity at which those guarantees are offered. NoSQL just takes the traditional one-single-ACID-entity model (a SQL database) and breaks it apart into lots of little ACID entities called rows/keys/documents/etc.

CAP of course applies to both SQL and NoSQL equally.

benblack · on May 12, 2011

I was not aware that BigTable worked that way, but it doesn't change my original point about "C != C". That a given NoSQL system is architected for both does not imply that such a design is always desirable or representative of the NoSQL database space. It is not and it is not. Thank you for your kind and compassionate suggestions about signing comments.

- Lil' B

mmalone · on May 12, 2011

BigTable enforces integrity constraints defined by a schema on a per-row basis? This is interesting news. Why hasn't Google included this information in any of the published materials describing the system?

- Forever Malone

haberman · on May 12, 2011

What about this story is inspiring people to write smarmy comments and sign them with silly pseudonyms?

Bigtable itself doesn't provide a schema or constraints, but it provides primitives that would allow you to implement them AFAICS. That's why I mentioned the idea of implementing a MySQL backend that uses BigTable as its storage -- MySQL would contain the schema and constraint logic, BigTable would provide the sequentially-consistent data storage.

haberman · on May 12, 2011

ps. no need to sign your comments here: http://ycombinator.com/newsguidelines.html

mshneider718 · on May 11, 2011

Good point...CAP and ACID are not overlapping concepts - in fact, you kinda have to throw out ACID rules when you create a Dynamo-inspired data store

roder · on May 11, 2011

Of course the opposite of ACID is BASE: http://queue.acm.org/detail.cfm?id=1394128

mmalone · on May 11, 2011

Credit where credit's due, the 10gen and MongoDB guys have done a great job convincing developers to adopt their product despite the existence of technically superior alternatives. I guess that's what happens when a bunch of ex-DoubleClick execs start a database company.

mnutt · on May 12, 2011

"ex-DoubleClick exec" gives the image of some kind of ad executive or something, which is not what Dwight is.

http://www.10gen.com/video/mongosv2010/replication

aguynamedben · on May 11, 2011

just sayin'...

regularfry · on May 11, 2011

Was this targeted at anyone in particular, or is it a general rant (deserved or not) at the state at the world?

ethangunderson · on May 11, 2011

It was a thinly veiled bash on Mongo.

ericflo · on May 11, 2011

Not just Mongo though, there are a lot of NoSQL companies out there right now whose marketing claims the impossible.

sophacles · on May 11, 2011

It's kind of sad too. There are lots of use cases where I want fast datastores, and you know what, if the database goes down, who cares?

For example, I do lots of experiment logging to a mongodb. If the power goes out, and the data is lost who cares? The data was no longer valid or useful -- but if I slow down my writes for 'safety' I will be causing problems by introducing delay in ways that could cause conflict.

tptacek · on May 11, 2011

Is there a backstory to Mongo vs. Riak? I've noticed that there's definitely a "Mongo user" and a "Riak user" and they tend not to agree about stuff.

ethangunderson · on May 11, 2011

Basho has always had an issue with the way Mogno was architected and marketed, and they have no issue with letting people know. (several blog posts, killdashnine parties)

I actually like both Mongo and Riak. I think they both solve a different problem set, and can actually complement each other in a polyglot persistence setup. It's a shame that there has to be so much negativity between them, because at the end of the day, this type of whiny blog post doesn't really help anyone.

batasrki · on May 11, 2011

To be honest, Mongo's execs have done pretty much the same thing. As I said in another comment, the Changelog episode on Mongo was very illuminating with regards to the marketing tactics of 10gen.

I do like both, as well.

mattyb · on May 11, 2011

Is this the episode?

http://thechangelog.com/post/3742814720/episode-0-5-1-mongod...

batasrki · on May 11, 2011

Yep, that's the one

ethangunderson · on May 11, 2011

If they're doing the same thing, that's just as shitty. But I've been meaning to listen to that episode of the Changelog for awhile now, so thanks for the reminder!

trustfundbaby · on May 11, 2011

I hate subliminal attacks like these ... if you have a problem with someone, come out and say it, don't let me have to figure out what or who you're getting at.

And with regards to Mongo ... they've since added a feature that allows 'safe' writes to your database (confirms the data is written before returning a response) ... so what's the rant about?

batasrki · on May 11, 2011

The 'safe' feature isn't on by default, yet. Also, the benchmarks 10gen publishes are based on default setup, so basically, Mongo writes to RAM, therefore it's fast.

I love Mongo and am using it in a few apps, but their marketing does blow, I admit.

Also, Eliot Horowitz came out and bashed on Riak's eventual consistency promise by basically misleading devs into thinking that writing to MongoDB will always result in 'full consistency'. Listen to the ChangeLog episode on Mongo to hear that.

kchodorow · on May 11, 2011

10gen doesn't publish any benchmarks. See http://www.mongodb.org/display/DOCS/Benchmarks for the official position.

I transcribed the MongoDB vs. Riak part of the Changelog webcast (available at http://thechangelog.com/post/3742814720/episode-0-5-1-mongod...):

------------------------

Riak and all the dynamo-style databases are really distributed key/value stores and I think, you know, I've never used Riak in production, but I have no reason not to believe it's not a very good, highly scalable distributed key/value store.

The difference between something like Riak and Mongo is that Mongo tries to solve a more generic problem. A couple of key points: one is consistency. Mongo is fully consistent, and all dynamo implementations are eventually consistent and for a lot of developers and a lot of applications, eventual consistency just is not an option. So I think for the default data store for a web site, you need something that's fully consistent.

The other major difference is just data model and query-ability and being able to manipulate data. So for example with Mongo you can index on any fields you want, you can have compound indexes, you can sort, you know, all the same types of queries you do with a relational database work with Mongo. In addition, you can update individual fields, you can increment counters, you can do a lot of the same kinds of update operations you would do with a relational database. It maps much closer to a relational database than to a key/value store. Key/value stores are great if you've got billions of keys and you need to store them, they'll work very well, but if you need to replace a relational database with something that is pretty feature-comparable, they're not designed to do that.

-----------------------

It starts at minute 17.

edited: formatting.

batasrki · on May 11, 2011

>Mongo is fully consistent

Can you please explain this for a case where there are multiple replica sets, the database is sharded and nodes are across data centers? What's sacrificed? Something must be.

ethangunderson · on May 11, 2011

When we talk about consistency, we're talking about taking the database from one consistant state to another.

With replica sets, we're still only dealing with one master. We can get inconsistant reads from the replicas, but we're always writing to a single master, which allows that master to determine the integrity of a write.

With sharding, we're still only dealing with one canonical home for a specific key(defined by the shard key). (besides latency, I'm not sure how datacenters would affect this)

What we're giving up in this case is availability. If an entire replica set goes down, we can't read or write any data for the key ranges contained on those machines. This is where Riak shines.

With Riak, any node can accept writes, and nodes contain copys of several other nodes data. What that means is, as long as we have one node up, we can write to the database. Because of this, there is the possibility of nodes having different views of the data. This is handled in a number of ways(read repairs, vector clocks, etc). Check out the Amazon Dynamo paper for more info, great read.

I'm sure I'm missing some stuff, but I think that covers the gist of it.

EDIT: One thing that I want to make clear, I don't think that one architecture is better than the other. They each have their own pros and cons, and are really suited to solve different problems.

batasrki · on May 11, 2011

None of this is guaranteed by default. By default, writes are flushed every 60 seconds. By default, there's no journaling. How can one claim full consistency if the the former two points are true?

Don't get me wrong, I love mongo. I'm building a web app backed by it. But the marketing talk is grating, which whT this post nails.

mathias_10gen · on May 11, 2011

I think those two issues are orthogonal to consistency. In ACID, consistency and durability are two different letters and CAP doesn't even mention durability. Are you referring to another definition of consistency?

batasrki · on May 11, 2011

How is flushing a write every 60 seconds orthogonal to consistency? If there's a server crash between the write to RAM and the subsequent flush, the data is lost, is it not? How do you guarantee the data is there in that case?

stonemetal · on May 12, 2011

That would mean the data set was not durable, it doesn't speak to consistency at all. DB consistency is about transaction ordering. Transaction 1 always comes before transaction 2, but 2 may exist or not as it pleases. Transaction 1 must be present if 2 is present.

kchodorow · on May 11, 2011

MongoDB is partition tolerant and consistent.

You can never have multi-master with MongoDB, which is required for "always writable." However, it can be readable. Our CEO did a series of posts on distributed consistency, see http://blog.mongodb.org/post/475279604/on-distributed-consis....

benblack · on May 11, 2011

If a slave can continue serving reads whilst partitioned from a master that continues to accept writes then you cannot guarantee consistency. If a slave cannot serve reads when partitioned then you aren't available. If a master cannot accept writes when partitioned then you aren't available. See this excellent post from Coda Hale on why it is meaningless to claim a system is partition tolerant http://codahale.com/you-cant-sacrifice-partition-tolerance/.

One love.

- Lil' B

kchodorow · on May 12, 2011

I interpreted "what is sacrificed?" as asking which letter of CAP MongoDB was giving up. Coda's article actually explains exactly the tradeoffs MongoDB makes for CP:

-------------------

Choosing Consistency Over Availability

If a system chooses to provide Consistency over Availability in the presence of partitions (again, read: failures), it will preserve the guarantees of its atomic reads and writes by refusing to respond to some requests. It may decide to shut down entirely (like the clients of a single-node data store), refuse writes (like Two-Phase Commit), or only respond to reads and writes for pieces of data whose "master" node is inside the partition component (like Membase).

This is perfectly reasonable. There are plenty of things (atomic counters, for one) which are made much easier (or even possible) by strongly consistent systems. They are a perfectly valid type of tool for satisfying a particular set of business requirements.

-------------------

jorgeortiz85 · on May 11, 2011

In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed. (You can optionally mark reads as "slaveOk", but then you admit inconsistency.)

This does sacrifice availability (in the CAP sense), but I haven't heard anyone claim otherwise.

benblack · on May 12, 2011

"In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed."

One would hope that reading and writing a single node database was consistent. This is table stakes for something calling itself a persistent store. Claiming partition tolerance in the above is the same as claiming availability. The former claim has been made. Rest left as exercise for the reader.

Namasté.

- Lil' B

jorgeortiz85 · on May 12, 2011

If a slave is partitioned from its master, it won't be able to serve requests. (Unless the request is a read query marked as "slaveOk", in which case you admit inconsistency.) I highly doubt anyone would claim otherwise.

tuna · on May 12, 2011

Which C is lying ? CEO or CAP ? I let to the heart of pure and to the late night sysadmin to decide.

aguynamedben · on May 12, 2011

Lil' B, stop trying to outsmart us all, MongoDB works, supports JSON, and autoshards.

jdefarge · on May 12, 2011

Thanks God. At least it's not Cassandra. :)

mihasya · on May 11, 2011

The implication is that the people for whom eventual consistency is not an option will never reach a data set size or availability requirement that'll require them to use replication and experience the lag (and eventual consistency) involved.

alexpopescu · on May 11, 2011

That's not completely true. Take a look at Google's Megastore paper: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

James Hamilton has a good summary of the ideas in the paper: http://perspectives.mvdirona.com/2011/01/09/GoogleMegastoreT...

mihasya · on May 11, 2011

think you're viewing my statement out of the necessary context..

batasrki · on May 11, 2011

Among major features touted are auto-sharding and replica sets. I don't know if the implication is that it's only for web apps/websites that won't need those

mathias_10gen · on May 11, 2011

In the sharded case, at any given moment each object will still live on exactly one replica set, which will have at most one master. You can do operations (such as findAndModify http://bit.ly/ilomQo) that require a "current" version of an object because all writes are always sent to the master for that object. You can also choose to accept a weaker form of consistency for some reads by directing them to slaves for performance. This decision can be made per-operation from most languages.

As for trade-offs: Relative to a relational db, there is no way to guarantee a consistent view of multiple objects because they could live on different servers which disagree about when "now" is. Relative to an eventually consistent system, you are unable to do writes if you can't contact the master or a majority of nodes are down.

jhugg · on May 11, 2011

Even if it's hard to disagree with anything specific in the post, who does this tone appeal to? Is the author selling or venting?

ericflo · on May 11, 2011

This post is fantastic! I couldn't agree more with its contents. Sorry that this comment is vapid, but I wanted to do something more than just click the upvote button.

coffeemug · on May 11, 2011

You can't escape the laws of physics, but if you know how to get physics on your side you can do things that at first glance appear impossible. See: human flight.

strlen · on May 11, 2011

Practical, heavier than air flight was made possible due to internal combustion engines. It's also still less practical and more expensive for some applications e.g., cargo than other options. Practical, low-latency distributed databases based on "invoke consensus protocol on every commit to the log" would be made possible (on limited size local networks) when networking gear with performance exceeding 10GigE/Infiniband becomes "commodity". Even then, it will still be impractical and too expensive for some scenarios.

At the present time, the fact that I said "invoke consensus on every commit to the log" and "low latency" in the same sentence is making distributed systems engineers cringe (I would _not_ advocate building such a system). The fact that I said "Infiniband" and commodity in the same sentence is also making systems administrators and DBAs cringe.

tuna · on May 12, 2011

Also, a word about Consistency:

“There’ll be a time when all people are alike.” “Which is precisely the ideal society. No mysteries, no romantics, no discussions, no persecution because there’s no one to persecute. When all have received the same conditioning, it will be like…” “Insects.” “Who have existed longer than ourselves and will outlast our race by many millennia.” “Is existence everything? “There’s nothing else.”

bitdiddle · on May 11, 2011

"The world is full of wicked men Wyatt" -- Gene Hackman in Wyatt Earp

VladRussian · on May 11, 2011

Just a piece of self-glorification. I brushed my teeth, it is hard - all 32 teeths, complicated distributed problem, can't do them all at once consistently, yet i still was able to do it.

edoloughlin · on May 11, 2011

It reads like it was written by a marketing person.

jhugg · on May 11, 2011

I disagree. It reads like an engineer who is frustrated that his/her tech is getting out-businessed (perhaps in a slimy way). Engineers generally want to live in a world where the best tech wins, but in the end, we go home to our betamax players and wonder why we can't rent any good movies for them.

br1 · on May 11, 2011

It's hard for the rest of the world but incremental for them... Let's see how they turn ego into code.