First time seeing someone call Spanner, CockroachDB, and YugabyteDB a "distributed key-value store with SQL" :)
The cons of the mentioned distributed shared-nothing SQL databases are questionable:
- "Key-value store" is in fact an LSM-tree-based document store that supports column-level versioning (Postgres supports row-level versioning only).
- "Many internal operations incur high latency." - I guess this conclusion is based on the referenced Gigaom benchmark that was paid for by Microsoft to compare apples to oranges.
- "No local joins in current implementations." (YugabyteDB certainly has colocated tables that store a whole table on a single node. CockroachDB and Spanner might do this as well.)
- "Not actually PostgreSQL..." - There is only one 100% compatible database with Postgres...It's Postgres itself. Citus, CockroachDB, Aurora, Alloy, YugabyteDB, and others can be classified as "not actually Postgres."
- "And less mature and optimized." - Well, both CockroachDB and YugabyteDB are on Gartner's Magic Quadrant of the top 20 cloud databases. Toys don't get there.
It feels like the author joined Crunchy to work on their own distributed version of Postgres. Good move for Crunchy, good luck!
> Related tables and indexes are not necessarily stored together, meaning typical operations such as joins and evaluating foreign keys or even simple index lookups might incur an excessive number of internal network hops. The relatively strong transactional guarantees that involve additional locks and coordination can also become a drag on performance.
You handwaved this away saying you can just store an entire table on a single node, but that defeats many of the benefits of these sharded SQL databases.
Edit: Also, before attacking the author's biases, it seems fair to disclose you appear to work at Yugabyte
In the case of YugabyteDB, here is how we avoid "excessive number" of networks hops
- true Index Only Scan. PostgreSQL doesn't store the MVCC visibility in indexes and have to look at the table even in case of Index Only Scan. YugabyteDB has a different implementation of MVCC with no bloat, no vacuum and true Index Only Scan. Here is an example: https://dev.to/yugabyte/boosts-secondary-index-queries-with-... This is also used for reference table (duplicate covering indexes in each regions)
- Batching reads and writes. It is not a problem to add 10ms because you join two tables or check a foreign key. What would be problematic is doing that for each rows. YugabyteDB batches the read/write operations as much as possible. Here are two examples: https://dev.to/franckpachot/series/25365
- Locality of transaction table. If a transaction touches to only one node, or zone, or region, a local transaction table is used, and is promoted to the right level depending on what the transaction reads and writes.
Most of the times when I've seen people asking to store tables together, it was premature optimization, based on opinions rather than facts. When they try (with the right indexes of course) they appreciate that the distribution is an implementation detail that the application doesn't have to know. Of course, there are more and more optimizations in each release. If you have a PostgreSQL application and see low performance, please open a git issue.
I'm also working for Yugabyte as Developer Advocate. I don't always feel the need to precise it as I'm writing about facts, not marketing opinions, and who pays my salary has no influence on the response time I see in execution plans ;)
Hey Franck, just wanted to say I appreciate your database writings. I read a whole bunch of them over the years, and always found them interesting and educational.
> You handwaved this away, saying you can just store an entire table on a single node, but that defeats many of the benefits of these sharded SQL databases.
I just clarified one-liners listed under the closing "Cons" section. My intention was not to say that the author is utterly wrong. Marco is a recognized expert in the Postgres community. It only feels like he was too opinionated about distributed SQL DBs while wearing his Citus hat.
> Also, before attacking the author's biases, it seems fair to disclose that you appear to work at Yugabyte.
I'm sorry if I sounded biased in my response. I'm with the YugabyteDB team right now, but that's not my first and I bet not the last database company. Thus, when I respond on my personal accounts, I try to be as objective as possible and don't bother mentioning my current employment.
Anyway, I'm very positive to see that this article got traction on HN. As a Postgres community member, I truly love what's happening with the database and its ecosystem. The healthy competition within the Postgres ecosystem is a big driver for the database growth that's becoming the Linux of databases.
>> First time seeing someone call Spanner, CockroachDB, and YugabyteDB a "distributed key-value store with SQL"
That was the first thing come to my mind when I read the paper on Spanner and CockroachDB (haven't read the paper on YugabyteDB yet) though, and surely I'm not the only one.
Referring to Distributed SQL as a key-value store is like defining monolithic databases as a block storage. It reduces it to an internal structure.
What makes it a database is what is on top: ACID transaction, full SQL features, relational tables, JSON document, foreign keys,... which are not available when sharding is done on top of SQL
The cons of the mentioned distributed shared-nothing SQL databases are questionable:
- "Key-value store" is in fact an LSM-tree-based document store that supports column-level versioning (Postgres supports row-level versioning only).
- "Many internal operations incur high latency." - I guess this conclusion is based on the referenced Gigaom benchmark that was paid for by Microsoft to compare apples to oranges.
- "No local joins in current implementations." (YugabyteDB certainly has colocated tables that store a whole table on a single node. CockroachDB and Spanner might do this as well.)
- "Not actually PostgreSQL..." - There is only one 100% compatible database with Postgres...It's Postgres itself. Citus, CockroachDB, Aurora, Alloy, YugabyteDB, and others can be classified as "not actually Postgres."
- "And less mature and optimized." - Well, both CockroachDB and YugabyteDB are on Gartner's Magic Quadrant of the top 20 cloud databases. Toys don't get there.
It feels like the author joined Crunchy to work on their own distributed version of Postgres. Good move for Crunchy, good luck!