Wal-E: Continuous Archiving for Postgres

ninkendo · on Feb 5, 2017

The way I understand it, WAL-E is just something that you can put into pg's `archive_command` and `restore_command`, but what does it add other than that?

Where I was, we investigated wal-e and determined it doesn't do anything that the raw curl commands wouldn't be able to do for us.

What we decided to do was to utilize HDFS for continuous archival, and we patched a version of `pg_receivexlog` to actually stream the WAL files out to HDFS, with flushing support that acknowledged transactions over the wire when the flushes were complete.

With this, you could treat this patched pg_receivexlog_hdfs command as a standby postgres database, and even add it to `synchronous_standby_names`, and postgres would effectively have its transaction logs synchronously written out to durable storage. We combined it with a lot of plumbing and we were able to get postgres running in mesos with docker without any actual persistent storage volumes (just used local disk.)

The best part is, you'd do a COMMIT and it would essentially block until the data was in HDFS. No periodic snapshots where you'd lose transactions that happened seconds after the snapshot... if the client sees a transaction as committed, it's on durable storage.

Worked pretty well, I'm surprised WAL-E doesn't support something similar (it only checkpoints at predefined intervals, not on transaction commit time.)

sdrothrock · on Feb 6, 2017

> The best part is, you'd do a COMMIT and it would essentially block until the data was in HDFS.

Do you have any problems with freezes or timeouts during high loads?

ninkendo · on Feb 6, 2017

We had freezes/timeouts only when the disk filled up, other than that it was pretty good. Simply writing the data to HDFS is a lot cheaper operation than committing the data to a local database, so in practice it was a lower latency than having a "real" standby server that committed logs to disk. (We saw about a 3x TPS gain over a dedicated standby in synchronous mode.)

Also the way I see it, turning on 'synchronous_standby_names' was a nice added safety guarantee, in practice leaving the hdfs receiver asynchronous would be a reasonable alternative if you're confident things will work as you expect.

_Codemonkeyism · on Feb 5, 2017

How does this e.g. compare to Barman mentioned in the 2ndquadrant Gitlab data loss reply

http://blog.2ndquadrant.com/dataloss-at-gitlab/

craigkerstiens · on Feb 5, 2017

At a high-level they're very similar in the problem they solve, both are focused on giving you reliable disaster recovery. At the time Wal-E was written barman didn't exist and S3 was one of the few reliable options to backup to (this was over 5 years ago). Since then Wal-E has expanded to include just about every object store you could want, and at the same time 2Q introduced barman as their take on it.

Wal-E has been used for a number of years to provide disaster recovery for Heroku Postgres, for over a million databases. It enables their follow and fork functionality, and we're using it at Citus as well for Citus Cloud given we have the person that authored it.

As for exact differences I'm less familiar as I've not seriously run barman in production so perhaps someone that's run both can chime in.

icebraining · on Feb 5, 2017

It enables their follow and fork functionality

Doesn't WAL replication only let you restore a full cluster, not a single DB? How do they get around that?

craigkerstiens · on Feb 5, 2017

Correct, fork and follow isn't enabled on the multi-tenant level. Well, sort of, for some of the production level plans that are multi-tenant there is still a single Postgres cluster running but multiple of those on a node. Wal-E in those cases running for each one.

icebraining · on Feb 5, 2017

It's a shame, though. Having to do WAL replication for DR plus logical backups to enable per-db restores is such a waste :|

user5994461 · on Feb 5, 2017

The real waste: Wal-e is constantly writing the transaction logs, doesn't matter if there are writes or not.

(Read: a new 10MB file to S3 every 10sec, even when the DB is 100% idle).

icebraining · on Feb 5, 2017

That's weird, what's your checkpoint_timeout? The default is 5 minutes, so you certainly shouldn't be pushing to S3 every 10s if the DB is idle.

Apparently PG 10 will improve your use case, though: http://paquier.xyz/postgresql-2/postgres-10-checkpoint-skip/

EDIT: Also, wal-e compresses by default, so even those regular WAL files should be much smaller than 10MB. Are you sure the DB is really idle?

user5994461 · on Feb 5, 2017

There is a setting for the time interval. That's not the point. I'm not gonna delay replication and backup by minutes just because the replication system sucks. I'd rather pay the storage for my use case.

Compression is enabled indeed. Surprisingly, the compression ratio for "nothing going on" is terrible. (still multiple MBytes).

The next version of postgre will redo the transaction log to have dynamic adaptive sizing, with new settings to control it. Not there yet.

icebraining · on Feb 5, 2017

AFAIK the dynamic WAL sizing has already landed in 9.5: http://www.databasesoup.com/2016/01/configuration-changes-in...

Or is this something else you're talking about?

user5994461 · on Feb 6, 2017

Yes, I'm talking about that and it's in 9.5 (which was released recently)

wiredfool · on Feb 6, 2017

That sounds like your archive_timeout is set to 10 seconds. (or close, as WAL segments are 16MB pre-compression).

That's the maximum age of a WAL segment before rotation, and Postgres will force the log rotation even if there's no data in the WAL segment.

If you're worried about keeping the last n seconds of data in replication, streaming replication is a far better tactic than log-shipping. (Though log shipping is useful for longer term storage)

_Codemonkeyism · on Feb 5, 2017

Thanks for your insight, I have no clue but will need a backup to pg in the next months, so this helped.

icebraining · on Feb 5, 2017

Just came from a nice presentation on the PG backup fundamentals, I highly recommend watching it when the video comes online: https://fosdem.org/2017/schedule/event/postgresql_backup/

TwistedWave · on Feb 6, 2017

The video is online now.

_Codemonkeyism · on Feb 8, 2017

Thanks!

fabian2k · on Feb 5, 2017

From what I read, Wal-E is meant to run on the database server and backups directly to S3 or equivalent while Barman typically runs on a separate server and can backup one or more remote Postgres servers.

stubish · on Feb 6, 2017

wal-e stuffs data in 'cloud' object storage, such as S3 and compatible, WABS, Swift, and GCE's object storage.

barman stores data on a filesystem.

I'm using wal-e, for me a no brainer since I don't have elastic filesystems available but do have scalable object storage. Other sites will have the opposite problem.

1zq · on Feb 6, 2017

We've been using WAL-E for about 4 years and it has worked well for us.

One way to continuously test that your files are being pushed up correctly is to have a hot standby pulling WAL files from S3 and check periodically that the data in the standby looks sane.

Although not a replacement for full "restore from backups" tests (because that process involves a base backup too), it's a good way to quickly notice issues preventing WAL files from being stored on S3 or from being decrypted.

nire · on Feb 5, 2017

Have been running it in production for two different start-ups - very happy with it. The pain really comes from setting-up GPG correctly to have encryption.

nzoschke · on Feb 6, 2017

Surprised to see Wal-E only supports GPG for encryption.

When backing up to S3, KMS has a ton of advantages.

Though in that vein, maybe the native S3 encryption at at rest with KMS is sufficient?

willlll · on Feb 5, 2017

I've used wal-e in production continuously since around march of 2011, so almost 6 years, and it's been great.

gruez · on Feb 5, 2017

Isn't that close to 6 years, not 7?