Amazon S3 is not enough for backup

Ishtiyak Rahman
3 min readApr 17, 2022
Photo by Christian Wiediger on Unsplash

This isn’t just my viewpoint. S3 Versioning is insufficient for three reasons, according to Andreas Wittig:
1. Accidental deletion, since you can delete all versions at the same time,
2. Malicious deletion, which hits the same problem, and
3. At scale, recovery is going to suck for you.

I’d also add MFA delete to that; it makes erasing items you don’t want to keep SUPER annoying while still not addressing anything. Object Lock might work, but you’ll never be able to erase your data again. That’s a lot of money, and it’s also a lot of restrictions.

Against what are you defending?

A common backup failure mode is not having a clear concept of the scale of disaster you’re trying to protect your data from. Overwriting a single file by accident? Are you worried about losing your entire AWS account? Is it possible to lose a whole AWS region? Is the entire Amazon Web Services (AWS) infrastructure down? Are all three cloud providers erasing your data at the same time? A determined assailant, potentially from within your company?

There’s a reason AWS Backup for S3 launched despite its clownshoe pricing, which is more expensive than the S3 original copy of the data it protects. Yes, AWS has a ridiculous number of 9s in its durability design targets for S3 (despite the fact that its SLAs don’t come close to that guarantee), but it’s only a design target!

People still believe that at S3’s scale and design, hard drives and computers are unlikely to fail–and they are correct! However, the same people who say that this eliminates the need for backups on Twitter also continually requesting a “Edit Tweet” button because they make mistakes while tweeting, presumably unaware of the paradox.

I can’t speak for anyone else, but I intend to make a lot of mistakes. I fat finger buckets, overwrite files and objects with empty versions, and create a slew of other only-amusing-to-others errors. I have numerous tiers of backups for the things that really matters: Time Machine for on-site backups, Backblaze for off-site backups, git for code management (and STILL copying the directory to a.bak version before making sweeping changes just in case! ), and more. People fall into two categories: those who have lost data and those who will. And no one is more devoted to this subject than a member of the first group. I am, in fact, one of them. Obviously! I’m aware that I’m prone to making mistakes, and I prepare for it.

It’s also crucial to determine what data you’ll be missing if you lose it. To be honest, most of my info isn’t that important. We generate a lot of that data, so I won’t be too concerned if my S3 access logs go missing. I’m just concerned about them now because they keep appearing on my AWS bill.

The Security Trade-Off

The converse is also true: the more backup locations you have, the larger your security risk. There are more mechanisms and sites to keep track of who has access to what information. Your backups should be encrypted without a doubt, but where do you keep the keys? I’m hoping you’re also backing up those!

One of the reasons that S3 bucket negligence awards have brought to light so much material that is so immensely destructive is that S3 is frequently utilized as a backup target for other systems that store extremely sensitive information. It’s easier to gain access to a single S3 bucket than it is to get access to all three of your production API keys: payroll, payments, and API keys.

Finally

I’m not saying S3 is prone to data loss. I’m not implying that it’s a risky proposition. I’m not even implying that it’s cheap, and you’re obviously well aware of how uncommon that is!

I’m suggesting that back in the data center days, when we had to worry about individual hard drive failures, having multiple copies of important data in different locations was a mainstay of good operational practice, and just because AWS has devised a clever way to survive those doesn’t mean we’re safe from the other threats to our data that we’ve always had to deal with. However, it gives us a significant advantage over the majority of malware.

--

--