Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I sort of had to chuckle at the 20k IOPS AWS instance, given even a consumer $100-200 NVMe gives ~1million+ IOPS these days. I suspect now we have PCIe 5.0 NVMes this will go up to

I always do wonder how much "arbitrary" cloud limits on things like this cause so many issues. I'm sure that async IO is very helpful anyway, but I bet on a 1million IOPS NVMe it is nowhere near as important.

We're effectively optimising critical infrastructure tech for ~2010 hardware because that's when big cloud got going and there has been so few price reductions on things since then vs the underlying hardware costs.

Obviously a consumer NVMe is not "enterprise" but my point is we are 3+ orders of magnitude off performance on cheap consumer hardware vs very expensive 'enterprise' AWS/big cloud costs.



> had to chuckle at the 20k IOPS AWS instance, given even a consumer $100-200 NVMe gives ~1million+ IOPS these days

The IOPS figure usually hides the fact that it is not a single IOP that is really fast, but a collection of them.

More IOPS generally is done best by reducing latency of a single operation but the average latency is what actually contributes to the "fast query" experience. Because a lot of the next IO is branchy from the last one (like an index or filter lookup).

As more and more disks to CPU connectivity goes over the network, we can really deliver a large IOPS even when we have very high latencies (by spreading the data across hundreds of SSDs and routing it fast), because with the network storage we pay a huge latency cost for durability of the data simply because of location diversification.

Every foot is a nanosecond, approximately.

That the tradeoff is worth it, because you don't need clusters to deal with a bad CPU or two. Stop & start, to fix memory/cpu errors.

The AWS model pushes the latency problem to the customer and we see it in the IOPS measurements, but it is really the latency x queue depth we're seeing not the hardware capacity.


Yep, I find cloud storage performance to be quite frustrating, but its the reality for many production database deployments I've seen.

Its worth noting that even on really fast local NVMe drives the new asynchronous I/O work delivers performance benefits, since its so much more efficient at issuing I/Os and reducing syscall overhead (for io_uring).

Andres Freund (one of the principal authors of the new functionality) did a lot of benchmarking on local NVMe drives during development. Here is one mailinglist thread I could find that shows a 2x and better benefit with the patch set at the time: https://www.postgresql.org/message-id/flat/uvrtrknj4kdytuboi...


You probably already know this but I will say it anyway. These cloud services like AWS are not succeeding in enterprise because they have outdated hardware. They succeed because in enterprise, CIOs and CTOs want something that is known, has a brand and everyone else uses it. It's like the old adage of "No one got fired for using IBM". Now it is "No one gets fired for hosting with AWS no matter how ridiculous the cost and corresponding feature is".


> No one gets fired for hosting with AWS

But consider the counterfactual: Non-realized customers because AWS certified solutions architect(tm) software couldn't deliver the price/perf they would have needed.

At $work this is a very real problem because a software system was built on api gateway, lambdas, sqs and a whole bunch of other moving pieces (serverless! scalable! easy compliance!) that combined resulted in way too much latency to meet a client's goal.


Reminds me of https://idlewords.com/talks/website_obesity.htm

> Let me give you a concrete example. I recently heard from a competitor, let’s call them ACME Bookmarking Co., who are looking to leave the bookmarking game and sell their website.

While ACME has much more traffic than I do, I learned they only have half the daily active users. This was reassuring, because the hard part of scaling a bookmarking site is dealing with people saving stuff.

We both had the same number of employees. They have an intern working on the project part time, while I dither around and travel the world giving talks. Say half a full-time employee for each of us.

We have similar revenue per active user. I gross $12,000 a month, they gross $5,000.

But where the projects differ radically is cost. ACME hosts their service on AWS, and at one point they were paying $23,000 (!!) in monthly fees. Through titanic effort, they have been able to reduce that to $9,000 a month.

I pay just over a thousand dollars a month for hosting, using my own equipment. That figure includes the amortized cost of my hardware, and sodas from the vending machine at the colo.

So while I consider bookmarking a profitable business, to them it's a $4,000/month money pit. I'm living large off the same income stream that is driving them to sell their user data to marketers and get the hell out of the game.

The point is that assumptions about complexity will anchor your expectations, and limit what you're willing to try. If you think a 'real' website has to live in the cloud and run across a dozen machines, a whole range of otherwise viable projects will seem unprofitable.

Similarly, if you think you need a many-layered CMS and extensive custom javascript for an online publishing venture, the range of things you will try becomes very constricted.

Rather than trying to make your overbuilt projects look simple, ask yourself if they can't just be simple.


> No one gets fired for hosting with AWS no matter how ridiculous the cost and corresponding feature is

Actually, AWS is so expensive, hosting everything we ran on Hetzner there would have simply depleted our funding, and the company would not exist anymore.


Everything in the cloud is throttled. Network, IOPS, CPU. And probably implemented incorrectly. AWS makes billions if the customer infrastructure is great or terrible. I found that anything smaller than an AWS EC2 m5.8xlarge had noticeably bad performance on loaded servers (Windows). The list price for that would be about $13k per year, but most organizations get lower than list prices.

This also applies to services, not only compute. Anything associated with Microsoft Office 365 Exchange, scripts may run 10x slower against the cloud using the MSOnline cmdlets. It's absolute insanity, I used to perform a dump of all mailbox statistics that would take about one hour, it could take almost 24 hours against Office 365. You have to be careful to not use the same app or service account in multiple places, because the throttle limits are per-account.


I noticed this with bandwidth. AWS price for bandwidth: $90.00/TB after 0.1TB/month. Price everywhere else (low cost VPSes): $1.50/TB after 1-5TB/month. Price some places (dedicated servers): $0.00/TB up to ~100TB/month, $1.50/TB after.

You pay 60 times the price for the privilege of being on AWS.

Bandwidth is just their most egregious price difference. The servers are more expensive too. The storage is more expensive (except for Glacier). The serverless platforms are mostly more expensive than using a cheap server.

There are only two AWS products that I understand to have good prices: S3 Glacier (and only if you never restore!), and serverless apps (Lambda / API Gateway) if your traffic is low enough to fit in the Always Free tier. For everything else, it appears you get ripped off by using AWS.


Which is why most folks use CloudFront like AWS recommends, which is typically free for services to route through. Prices for CloudFront egress are competitive with what you described for other vendors. I don't know anyone paying $90/TB out to internet on AWS.

https://aws.amazon.com/cloudfront/pricing/?nc=sn&loc=3

Were you trying to route S3 directly out to the internet?


The Cloudfront pricing page you cited is just as bad, except the free tier for Cloudfront traffic is 1TB/month rather than 0.1TB/month. The pricing is just as bad above that. It says "Free for origin fetches" but upon further research, seems like that just means you won't pay twice by also having to pay for your traffic to get from its origin to Cloudfront.


You still haven't cited where you got $90/TB from.


For example EC2 traffic (from US and EU locations, other get more expensive).


Okay, I stand corrected. Now my question to you is how are you generating all these terabytes over the internet? And would data transfer costs really be a significant portion of your bill at that scale regardless of cloud provider? Not video transcoding resources, AI capacity, GPU capacity, etc.?

You're talking about the equivalent of completely saturating a gigabit Internet connection for over two hours per terabyte. A high def video stream from Netflix is only about 5 megabits/sec. 4K is 15 megabits.

67+ simultaneous 4K streams for two hours. Or 200+ simultaneous high def streams. Or a metric fuck-ton of web resources.

And you think you're going to find a provider that will allow you to send those kinds of volumes for $10/TB or less AND have relatively few outages AND stick around because their business model is sound?

By all means, point out who these unicorns are. Sign me up!


On Hetzner I have no problem saturating my gigabit connection with static files if I want to.


Not everybody is serving public http traffic, though.


Even worse on Azure where we had to ask customers to scale up vcpu to increase iops

https://azure.microsoft.com/en-us/pricing/details/managed-di...

Increasing vcpu also opened up more disk slots to try improve situation with disk striping


instance store on aws can give up to 3.3mil iops https://aws.amazon.com/blogs/aws/now-available-i3-instances-... - the main problem is just using networked storage.


The NVMe on other instance types is quite throttled. E.g. on a G5.4xlarge instance EBS is limited to 593MB/s and 20000IOPS while instance-attached NVMe is limited to 512MB/s (read) at 125000IOPS, a fraction of IO what a workstation or gaming PC with similar GPU and RAM would have. And stopping the instance wipes it, which means you can't do instance warmup with those, everything must be populated at boot.


That quoted IOPS number is only with an 8-disk stripe (requiring the full instance), even if you don't need 488GB of RAM or a $3600/mo instance, I believe.

The per-disk performance is still nothing to write home about, and 8 actually fast disks would blow this instance type out of the water.


Instance store is also immediately wiped when the instance is halted / restarted, which can theoretically happen at any time, for example by a mystery instance failure, or a patching tool that's helpfully restarting your boxes during offhours.


My understanding is this not true, only when the instance permanently fails and is moved.



Yeah, so restart does not.

Which means you can count on it about as much as a server of your own, if you could not repair the server.

I know a database company that uses instance storage as the primary storage. It’s common.


Wasn't aware, interesting. I did consider it in the past as well, but the reliability aspect made me consider this as a moonshot rather than anything practical. Kind of weirdly validating to know there are (supposedly) database providers using it.

That said, a good bit of our environments are scheduled, so it still wouldn't be an option there without hacks (e.g. doing a compressed blockwise dump before shutting down and then a blockwise flash on startup).


I would take a logical dump on shutdown and the restore that rather. You’d need a backup and restore process anyway, so that path has to exist either way. It’s kind of like a full vacuum, so has some benefits too.

The NDA prevents me from saying which database company, but it’s a major provider of cloud managed databases across clouds.


> given even a consumer $100-200 NVMe gives ~1million+ IOPS these days

In the face of sustained writes? For how long?


sustained reads would not even give 1 mio iops in that case. Maybe wen you only read the same file that fits into the nvme cache. Which probably never happens in a production database..


I think you'd be surprised. Sustained write performance has gotten pretty good. Decent but not fancy consumer drives will often do 1GBps sustained, for bulkier writes. That's much better than we used to expect: flash has gotten much better with so many layers! This mid-range PCIe5 drive sustains a nice 1.5GBps: https://www.techpowerup.com/review/team-group-ge-pro-2-tb/6....

I don't think sustained reads are a problem? Benches like the CrystalDiskMark do a full disk random read test; they're designed to bust through cache afaik. 7.2GBp of 4k reads would translate to 1.8MIOps. Even if this is massively optimistic, you need to slash a lot of zeroes/orders of magnitude to get down to 20kIOps, which you will also pay >$100/mo for.


Samsung 9910 has a 1:1 TB:GB cache size of LPDDR4X memory. I won't pretend to understand the magic NVMe drives possess, but if you got a 4TB or 8TB 9910, could you not in theory pull in all of the data you require to cache?

I would assume, and it might be a poor assumption, that NVMe controllers don't pull in files, but rather blocks, so even if you had a database that exceeded cache size, in theory if the active blocks of that database did not exceed cache size, it could be "indefinitely" cached for a read-only pattern.


The DRAM on a SSD like that isn't for caching user data, it's for caching the drive's metadata about which logical blocks (as seen by the OS) correspond to which physical locations in the flash memory.


It definitely uses it for caching data


FWIW, using the same approach as in the article, ie io_uring, is one of the few ways to actually reach anywhere close to that 1 million, so it is not as if they are competing concerns.


Meanwhile people are running things on raspberry pi home clusters thinking they’re winning


Maybe they are. With NVMe hat, you get decent IO performance


It's still moderately bad. Raspberry pi is limited to 2 gen3 pcie lanes which is ~4-8x slower than the drive (and you will likely be further limited by cpu speed)


If the engineering demand is lower than the engineering supply, you're still winning. If your transactions per second only amounts to 60% of the Pi's capacity for a given use case, why complain?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: