We performed our own crash testing for a Riak and about a million other databases, Riak and MySQL with InnoDB were the only cases scoring more than 5 out of 10.
I think the problem with NoSQL is that they are targeting the wrong people - the lazy people. One of the DBs we reviewed (can't remember which) did not have single datacentre durability, lost like 80% data while crash during updating table contents and was boasting some geo-coordinate built-in datatype on their website. Its the priorities plague. Is built-in geo more important than data?
Why do NoSQL databases have to be all distributed beyond sharding? I think because thats what people wrongly perceive out of them. Google, LinkedIn all use NoSQL which is distributed, so if a NoSQL DB doesn't do it, its a shame. Thats the root of misjudgment I believe. Every DB, NoSQL or not, needs to have a place. One size fits all is what will kill NoSQL, whether enforced by engineering or marketing. Thats why I think marketing "lie".
CouchDB, Riak and Redis are only few exceptions I know of which seemed to have a vision and stuck to it.
you are confusing real world with a comment on hackernews. try and do a full recovery on innodb with real data and get back so uncle tuna can hold you tight and promise that will never happen again. hint: corrupted tables crashing mysql processes and you having to start it with innodb_recovery 4 for ro, dumping the table, reapplying it back.
To some commenters: the C in CAP and the C in ACID are not the same thing. If that is not clear to you, it is unlikely the database you develop will include correct implementations of core concepts. Knowledge is power.
Just to expand on this, the "C" in CAP corresponds (roughly) to the "A" and "I" in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the "atomic multicast" problem which is a private instance of the general consensus problem.
In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That's a loss of availability that's not acceptable to many applications.
This means, that you can't build a low-latency multi-master system that achieves the "A" and "I" guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with "floating masters" for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the "master-less" token range.
These are not binary "on/off" switches: see Yahoo's PNUTS for a great "middle of the road" system. The paper < http://research.yahoo.com/node/2304 > has an intuitive example explaining the various consistency models.
Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a "table group"/"entity group" (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).
[Disclaimer: I am commiter on Project Voldemort, a Dynamo implementation; in addition to Dynamo, I also find Yahoo's PNUTS and Google's BigTable to be very interesting architectures.]
Oh, great! Here we go again with that CAP flame war. In spite of strlen's well written post, things always degenerate in a pointless discussion everytime CAP is cited (Godwin, right?)
The truth is so simple: some applications can give up milliseconds (even hours) of "A" for strong "C" (but not otherwise), some apps can give up strong "C" for high "A" (but not otherwise). How difficult it is to accept this?
The tricky part is when to give up "C" or "A", where to draw the lines. There's no ready recipe for this, sorry. Basho post seems to point right into that direction when it states that it will provide various options across the CAP spectrum. Smart companies deliver what their customers want.
"But the world is eventual consistent then you always should choose A". Classic non sequitur. Yes, real world is weakly consistent, fractal and uncertain, but we, as computer professionals, aim to build models (i.e., simplifications) of real world processes. Now, try to model and automate all that uncertainty and inconsistency of the world when the deadlines are just around the corner!
Thanks for bringing some much needed science to these proceedings, my man. Too many people mistaking computer science for a therapy session where their feelings matter. Computer science has much in common with the honey badger. Think upon this and be enlightened.
> To some commenters: the C in CAP and the C in ACID are not the same thing.
This is an interesting point, but I wonder if they are really that different. Even NoSQL systems support atomic updates and sequential consistency at some granularity (like a single key, document, etc.)
I wonder if it's really so inaccurate to think of NoSQL data stores as a set of tiny ACID databases, one for each key/document/etc.
BigTable (which I use daily and have extensive experience with) offers atomic and transactional updates at the row level. It uses Paxos to guarantee that at most one process at a time owns each row and can mutate it. That process keeps a sequential log that imposes an absolute order over updates to that row.
So it appears to me that despite being a NoSQL database, BigTable rows offer both ACID and the "C" of CAP. In fact, I bet it would be possible to implement a MySQL backend that uses a BigTable row as its storage.
I get strlen's point that C != C (it's more analogous to A/I), but my real point is that NoSQL (at least in the case of BigTable) doesn't appear to be fundamentally different than SQL in the offered guarantees, but rather in the granularity at which those guarantees are offered. NoSQL just takes the traditional one-single-ACID-entity model (a SQL database) and breaks it apart into lots of little ACID entities called rows/keys/documents/etc.
CAP of course applies to both SQL and NoSQL equally.
I was not aware that BigTable worked that way, but it doesn't change my original point about "C != C". That a given NoSQL system is architected for both does not imply that such a design is always desirable or representative of the NoSQL database space. It is not and it is not. Thank you for your kind and compassionate suggestions about signing comments.
BigTable enforces integrity constraints defined by a schema on a per-row basis? This is interesting news. Why hasn't Google included this information in any of the published materials describing the system?
What about this story is inspiring people to write smarmy comments and sign them with silly pseudonyms?
Bigtable itself doesn't provide a schema or constraints, but it provides primitives that would allow you to implement them AFAICS. That's why I mentioned the idea of implementing a MySQL backend that uses BigTable as its storage -- MySQL would contain the schema and constraint logic, BigTable would provide the sequentially-consistent data storage.
Credit where credit's due, the 10gen and MongoDB guys have done a great job convincing developers to adopt their product despite the existence of technically superior alternatives. I guess that's what happens when a bunch of ex-DoubleClick execs start a database company.
It's kind of sad too. There are lots of use cases where I want fast datastores, and you know what, if the database goes down, who cares?
For example, I do lots of experiment logging to a mongodb. If the power goes out, and the data is lost who cares? The data was no longer valid or useful -- but if I slow down my writes for 'safety' I will be causing problems by introducing delay in ways that could cause conflict.
Basho has always had an issue with the way Mogno was architected and marketed, and they have no issue with letting people know. (several blog posts, killdashnine parties)
I actually like both Mongo and Riak. I think they both solve a different problem set, and can actually complement each other in a polyglot persistence setup. It's a shame that there has to be so much negativity between them, because at the end of the day, this type of whiny blog post doesn't really help anyone.
To be honest, Mongo's execs have done pretty much the same thing. As I said in another comment, the Changelog episode on Mongo was very illuminating with regards to the marketing tactics of 10gen.
If they're doing the same thing, that's just as shitty. But I've been meaning to listen to that episode of the Changelog for awhile now, so thanks for the reminder!
I hate subliminal attacks like these ... if you have a problem with someone, come out and say it, don't let me have to figure out what or who you're getting at.
And with regards to Mongo ... they've since added a feature that allows 'safe' writes to your database (confirms the data is written before returning a response) ... so what's the rant about?
The 'safe' feature isn't on by default, yet. Also, the benchmarks 10gen publishes are based on default setup, so basically, Mongo writes to RAM, therefore it's fast.
I love Mongo and am using it in a few apps, but their marketing does blow, I admit.
Also, Eliot Horowitz came out and bashed on Riak's eventual consistency promise by basically misleading devs into thinking that writing to MongoDB will always result in 'full consistency'. Listen to the ChangeLog episode on Mongo to hear that.
Riak and all the dynamo-style databases are really distributed key/value stores and I think, you know, I've never used Riak in production, but I have no reason not to believe it's not a very good, highly scalable distributed key/value store.
The difference between something like Riak and Mongo is that Mongo tries to solve a more generic problem. A couple of key points: one is consistency. Mongo is fully consistent, and all dynamo implementations are eventually consistent and for a lot of developers and a lot of applications, eventual consistency just is not an option. So I think for the default data store for a web site, you need something that's fully consistent.
The other major difference is just data model and query-ability and being able to manipulate data. So for example with Mongo you can index on any fields you want, you can have compound indexes, you can sort, you know, all the same types of queries you do with a relational database work with Mongo. In addition, you can update individual fields, you can increment counters, you can do a lot of the same kinds of update operations you would do with a relational database. It maps much closer to a relational database than to a key/value store. Key/value stores are great if you've got billions of keys and you need to store them, they'll work very well, but if you need to replace a relational database with something that is pretty feature-comparable, they're not designed to do that.
Can you please explain this for a case where there are multiple replica sets, the database is sharded and nodes are across data centers? What's sacrificed? Something must be.
When we talk about consistency, we're talking about taking the database from one consistant state to another.
With replica sets, we're still only dealing with one master. We can get inconsistant reads from the replicas, but we're always writing to a single master, which allows that master to determine the integrity of a write.
With sharding, we're still only dealing with one canonical home for a specific key(defined by the shard key). (besides latency, I'm not sure how datacenters would affect this)
What we're giving up in this case is availability. If an entire replica set goes down, we can't read or write any data for the key ranges contained on those machines. This is where Riak shines.
With Riak, any node can accept writes, and nodes contain copys of several other nodes data. What that means is, as long as we have one node up, we can write to the database. Because of this, there is the possibility of nodes having different views of the data. This is handled in a number of ways(read repairs, vector clocks, etc). Check out the Amazon Dynamo paper for more info, great read.
I'm sure I'm missing some stuff, but I think that covers the gist of it.
EDIT: One thing that I want to make clear, I don't think that one architecture is better than the other. They each have their own pros and cons, and are really suited to solve different problems.
None of this is guaranteed by default. By default, writes are flushed every 60 seconds. By default, there's no journaling. How can one claim full consistency if the the former two points are true?
Don't get me wrong, I love mongo. I'm building a web app backed by it. But the marketing talk is grating, which whT this post nails.
I think those two issues are orthogonal to consistency. In ACID, consistency and durability are two different letters and CAP doesn't even mention durability. Are you referring to another definition of consistency?
How is flushing a write every 60 seconds orthogonal to consistency? If there's a server crash between the write to RAM and the subsequent flush, the data is lost, is it not? How do you guarantee the data is there in that case?
That would mean the data set was not durable, it doesn't speak to consistency at all. DB consistency is about transaction ordering. Transaction 1 always comes before transaction 2, but 2 may exist or not as it pleases. Transaction 1 must be present if 2 is present.
If a slave can continue serving reads whilst partitioned from a master that continues to accept writes then you cannot guarantee consistency. If a slave cannot serve reads when partitioned then you aren't available. If a master cannot accept writes when partitioned then you aren't available. See this excellent post from Coda Hale on why it is meaningless to claim a system is partition tolerant http://codahale.com/you-cant-sacrifice-partition-tolerance/.
I interpreted "what is sacrificed?" as asking which letter of CAP MongoDB was giving up. Coda's article actually explains exactly the tradeoffs MongoDB makes for CP:
-------------------
Choosing Consistency Over Availability
If a system chooses to provide Consistency over Availability in the presence of partitions (again, read: failures), it will preserve the guarantees of its atomic reads and writes by refusing to respond to some requests. It may decide to shut down entirely (like the clients of a single-node data store), refuse writes (like Two-Phase Commit), or only respond to reads and writes for pieces of data whose "master" node is inside the partition component (like Membase).
This is perfectly reasonable. There are plenty of things (atomic counters, for one) which are made much easier (or even possible) by strongly consistent systems. They are a perfectly valid type of tool for satisfying a particular set of business requirements.
In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed. (You can optionally mark reads as "slaveOk", but then you admit inconsistency.)
This does sacrifice availability (in the CAP sense), but I haven't heard anyone claim otherwise.
"In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed."
One would hope that reading and writing a single node database was consistent. This is table stakes for something calling itself a persistent store. Claiming partition tolerance in the above is the same as claiming availability. The former claim has been made. Rest left as exercise for the reader.
If a slave is partitioned from its master, it won't be able to serve requests. (Unless the request is a read query marked as "slaveOk", in which case you admit inconsistency.) I highly doubt anyone would claim otherwise.
The implication is that the people for whom eventual consistency is not an option will never reach a data set size or availability requirement that'll require them to use replication and experience the lag (and eventual consistency) involved.
Among major features touted are auto-sharding and replica sets. I don't know if the implication is that it's only for web apps/websites that won't need those
In the sharded case, at any given moment each object will still live on exactly one replica set, which will have at most one master. You can do operations (such as findAndModify http://bit.ly/ilomQo) that require a "current" version of an object because all writes are always sent to the master for that object. You can also choose to accept a weaker form of consistency for some reads by directing them to slaves for performance. This decision can be made per-operation from most languages.
As for trade-offs: Relative to a relational db, there is no way to guarantee a consistent view of multiple objects because they could live on different servers which disagree about when "now" is. Relative to an eventually consistent system, you are unable to do writes if you can't contact the master or a majority of nodes are down.
This post is fantastic! I couldn't agree more with its contents. Sorry that this comment is vapid, but I wanted to do something more than just click the upvote button.
You can't escape the laws of physics, but if you know how to get physics on your side you can do things that at first glance appear impossible. See: human flight.
Practical, heavier than air flight was made possible due to internal combustion engines. It's also still less practical and more expensive for some applications e.g., cargo than other options. Practical, low-latency distributed databases based on "invoke consensus protocol on every commit to the log" would be made possible (on limited size local networks) when networking gear with performance exceeding 10GigE/Infiniband becomes "commodity". Even then, it will still be impractical and too expensive for some scenarios.
At the present time, the fact that I said "invoke consensus on every commit to the log" and "low latency" in the same sentence is making distributed systems engineers cringe (I would _not_ advocate building such a system). The fact that I said "Infiniband" and commodity in the same sentence is also making systems administrators and DBAs cringe.
“There’ll be a time when all people are alike.”
“Which is precisely the ideal society. No mysteries, no romantics, no discussions, no persecution because there’s no one to persecute. When all have received the same conditioning, it will be like…”
“Insects.”
“Who have existed longer than ourselves and will outlast our race by many millennia.”
“Is existence everything?
“There’s nothing else.”
Just a piece of self-glorification. I brushed my teeth, it is hard - all 32 teeths, complicated distributed problem, can't do them all at once consistently, yet i still was able to do it.
I disagree. It reads like an engineer who is frustrated that his/her tech is getting out-businessed (perhaps in a slimy way). Engineers generally want to live in a world where the best tech wins, but in the end, we go home to our betamax players and wonder why we can't rent any good movies for them.
I think the problem with NoSQL is that they are targeting the wrong people - the lazy people. One of the DBs we reviewed (can't remember which) did not have single datacentre durability, lost like 80% data while crash during updating table contents and was boasting some geo-coordinate built-in datatype on their website. Its the priorities plague. Is built-in geo more important than data?
Why do NoSQL databases have to be all distributed beyond sharding? I think because thats what people wrongly perceive out of them. Google, LinkedIn all use NoSQL which is distributed, so if a NoSQL DB doesn't do it, its a shame. Thats the root of misjudgment I believe. Every DB, NoSQL or not, needs to have a place. One size fits all is what will kill NoSQL, whether enforced by engineering or marketing. Thats why I think marketing "lie".
CouchDB, Riak and Redis are only few exceptions I know of which seemed to have a vision and stuck to it.