Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is great.

One thing I'd add is a sense of scale - are these architectures for 100 queries per second or 100,000 or 100,000,000 ?



Marco (author) is probably asleep at this point and could give a deeper perspective. He sort of hits on this when talking about disk latency... Depending on your setup and well just from some personal experience I know it's not crazy for Postgres queries to go at 1ms per query. From there you can start to do some math on how many cores, how many queries per second, etc.

Single node Postgres (with a beefy machine) can definitely manage in the 100k transactions per second. When you're pushing the high 100k into millions read replicas is a common approach.

When we're talking transactions, question of is it simply basic queries, bigger aggregations, and is it writes or reads. Writes if you can manage to do any form of multi-line insert or batching with copy you can push basic Postgres really far... From some benchmarks Citus as mentioned can hit millions of records per second safely with those approaches, and even without Citus can get pretty high write throughput.


The "disappointing" benchmark mentioned in the article is a shame for GigaOm who published it and for Microsoft who paid for it. They compare Citus with no HA to CockroachDB and YugabyteDB with replication factor 3 Multi-AZ, resilient to data center failure. And they run Citus on 16 cores (=32 vCPU) and the others on 16 vCPU. But your point about "beefy machine" shows the real advantages of Distributed SQL. PostgreSQL and Citus needs downtime to save cost if you don't need that beefy machine all days all year. Scale up and down is downtime, as well as upgrades. Distributed SQL offers elasticity (no downtime to resize the cluster) and high availability (no downtime on failure or maintenance)


RE: "Distributed SQL offers elasticity (no downtime resize"). I'm not sure this is as much of an advantage of distributed databases vs single host databases anymore. Some of the tech to move virtual machines between machines quickly (without dropping TCP connections) is pretty neat. Neon has a blog post about it here[1]. Aurora Serverless V2 does the same thing (but I can't find a detailed technical blog post talking about how it works). Your still limited by "one big host" but its no longer as big of a deal to scale your compute up/down within that limit.

[1] https://neon.tech/blog/scaling-serverless-postgres


second yes to that - postgresql warm with plenty of RAM can do some fancy things and return an answer sub-millisecond too

cache is King


but large cache is expensive in the cloud and you cannot scale up/down without downtime


4TB of ram is only $71 per hour on AWS RDS. If you're at planetary scale that's not bad.


Scalability is not the only reason for jumping on a distributed Postgres version.

Some apps might do just 1000 ops/second but still run on a distributed database for high availability or data locality reasons. For instance, shared-nothing databases usually guarantee RPO=0 (no data loss, recovery point objective) with RTO (recovery time objective) measured in seconds for zone and region-level outages. As for data locality, think automatic data placement/pinning to regions/data centers for data regulatory and low latency reasons (serve read/write requests equally fast for folks living in NYC, London, Tokyo).


Any reason you can't achieve those RPO/RTO with straightforward replication?


You can achieve RPO=0 with Postgres using synchronous logical replication. You would need to replicate to 2+ standbys because if there is only one standby and it goes down then the primary will stuck. During the failover you would need to have Patroni or comparable tool, but I don’t know what’s the RTO.

But once you outgrow the primary/standbys severs storage or compute capacity you would need to scale to larger machines that can incur downtimes. With distributed Postgres such as YugabyteDB this is not gonna happen because you can scale horizontally




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: