But now consider how much extra data Cloudflare at its size would have to have just for staging, doubling or more their costs to have stage exactly as production. They would have to simulate similar amount of requests on top of themselves constantly since presumably they have 100s or 1000s of deployments per day.
In this case it seems the database table in question seemed modest in size (the features for ML) so naively thinking they could have kept stage features always in sync with prod at the very least, but could be they didn't consider that 55 rows vs 60 rows or similar could be a breaking point given a certain specific bug.
It is much easier to test with 20x data if you don't have the amount of data cloudflare probably handles.
That just means it takes longer to test. It may not be possible to do it in a reasonable timeframe with the volumes involved, but if you already have 100k servers running to serve 25M requests per second, maybe briefly booting up another 100k isn’t going to be the end of the world?
Either way, you don’t need to do it on every commit, just often enough that you catch these kinds of issues before they go to prod.
> maybe briefly booting up another 100k isn’t going to be the end of the world
Cloudflare doesn’t run in AWS. They are a cloud provider themselves and mostly run on bare metal. Where would these extra 100k physical servers come from?
Cloudflare infra costs are probably 300 mil+ usd. Their gaap profit is negative, their non gaap income is less than their infra expenses. Can you imagine how much they would have to charge more or spend more if they had to duplicate or simulate their production environment in staging and for each of the 100s deployments they probably do a day?
We’re like a millionth the size of cloudflare and we have automated tests for all (sort of) queries to see what would happen with 20x more data.
Mostly to catch performance regressions, but it would work to catch these issues too.
I guess that doesn’t say anything about how rare it is, because this is also the first company at which I get the time to go to such lengths.