Encryption for chunks #182

folbricht · 2020-12-27T01:43:16Z

Adds chunk encryption as option to chunk stores. Supports aes-256-ctr at this point (though this could be expanded to other key sizes or algorithms). The key is generated by hashing the password with SHA256. For now the only way to configure encryption is by using the config file.

TODO:

Document how to use it
Add a way to provide encryption for HTTP chunk servers, since the store config doesn't apply to server-side HTTP. Probably a new set of options.
Perhaps a couple more tests, ideally a high-level one that uses a config.json

Any (early) feedback is welcome.

folbricht · 2020-12-27T15:43:07Z

Open questions:

Is CTR mode the right choice? Since a 16-byte random IV is used it appears to be sufficiently safe from ever repeating. Alternatively could use CBC with random IV
Should encrypted chunks get a dedicated file extension? Right now we have compressed chunks with .cacnk, uncompressed with none. Could consider appending something like .aes-256-cbc in order to avoid conflicts. Though there would still be conflicts if a different password is used for the same store.

…ing bug

charles-dyfis-net · 2020-12-29T19:17:06Z

With respect to naming convention, my own preference would be to support having a key identifier in the name for encrypted chunks. Whether that's a few bytes out of a hash of the key or a user-specified opaque value is something that would require careful design consideration (the former is an information leak, the latter depends on users to do the right thing, so neither is ideal; this is one of those places where cryptosystems that generate handles for keys alongside those keys -- or use assymetric encryption and can take the handle from a hash of the public side of the key -- are making lives easier).

charles-dyfis-net · 2020-12-29T19:27:43Z

I'd be tempted to consider GCM-CTR with the IV deriving from the chunk's identifier.

folbricht · 2020-12-30T19:56:33Z

It now calculates a key handle using the 4 leading bytes of SHA256(key). That should be acceptable and unique enough. It's also adding file extensions, together with the algorithm which should ensure that compressed, uncompressed, encrypted chunks can live next to each other in the same store.

As for using the chunkID as IV, I'm still trying to think of ways that could be abused. Especially in the HTTP server where the chunk ID is passed from the outside. There are flags to disable validation on that part for performance, though it should still get checked before it ever gets written to a store. A small mistake in this area could be fatal for encryption.

Are you mostly concerned with the possibility of an IV collision or the extra prefix that is stored? If it's mostly about collisions, I could pick CBC mode as default which is less impacted by a collision, though also slower.

charles-dyfis-net · 2020-12-30T20:09:07Z

The goal was to eliminate potential for collisions while getting the advantage of GCM-CTR's integrity protection -- and the collision protection inherent given the nature of chunk identifiers.

It would indeed be good to get feedback from someone with a stronger background there.

folbricht · 2020-12-31T21:09:44Z

After reading a lot more about the different modes and IV generation, my suggestion is this:

The default mode will be XChaCha20-Poly1305 with random IV. It's got a 24-byte IV which is large enough for random IVs.(https://godoc.org/golang.org/x/crypto/chacha20poly1305)
AES-256-GCM with random IV will be optional. Going to add a note to the docs that it is only good to 2^32 chunks as per https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf (https://golang.org/pkg/crypto/cipher/#NewGCM)

Both are supported in TLS1.3 and are AEAD, though that isn't strictly necessary in this use-case it won't hurt. Not sure yet what to do with the CTR mode, could leave it in as 3rd option perhaps, with the same restrictions as GCM.

folbricht · 2021-01-09T16:23:27Z

Removed CTR mode which leaves the more modern AEAD algorithms xchacha20-poly1305 and aes-256-gcm.

antong · 2021-01-14T12:45:23Z

Cool! Some questions/notes: Why was key=SHA256(password) chosen instead of a dedicated key derivation function such as scrypt?

The chunk file names are not encrypted and if I understand correctly, this mean that the hashes of plaintext content are visible to whoever can observe these file names (fs access or e.g., web servers). Depending on the threat model, this could be catastrophic. The same weaknesses that apply to convergent encryption would apply to this. For instance, see https://www.mail-archive.com/[email protected]/msg08949.html .

I also recommend taking a look at how restic does encryption, as it has much in common with casync.

folbricht · 2021-01-14T13:47:10Z

A couple of reasons for choosing SHA256. The goal is to have one password that is used to encrypt/decrypt consistently. With bcrypt/scrypt one needs a salt. So I'd have to either use a fixed salt, or derive the salt from the password. Neither is a good choice. Also, scrypt is mainly about protecting the original password when it's derived key value is leaked. In this case though, if the key leaked, it's sufficient to decrypt. No need to break the password.

Yes, the content hashes are not encrypted. I don't consider that an issue though, they really mean nothing to an observer. They'e also listed in the index files which aren't encrypted.

antong · 2021-01-16T15:24:35Z

Thank you for taking the time to answer!

What kind of threats is the encryption intended to protect against? I was thinking of something like the Restic threat model. For example, that the chunks are stored on a system that is not fully trusted (like cloud storage or a shared system) where outsiders may have access to the files or file names. Then encryption could be used so that these outsiders can not find out anything about the content stored.

Please take the below with a grain of salt, as I'm not a cryptographer :-)

If an attacker can get hold of a chunk and the key is simply SHA256(password), then this password can be efficiently bruteforced as both SHA2 and AES are designed to be fast, and have hardware implementation on standard CPUs and can also be done on GPUs. Scrypt on the other hand is a KDF and is designed to be slow and hard to bruteforce. Even with a static or derived salt, it should be better than just SHA256. Could a salt be saved within the chunk store or locally in the config? I think restic uses a master encryption key for a repository, and then encrypts this master key with one or several keys derived using scrypt+password, and stores these encrypted keys and scrypt salts within the repository. This has the added benefit of being able to change the password and having several passwords.

If an observer can see the chunk filenames/IDs (hashes of the content), then this observer can tell whether a plaintext blob known to the observer is stored in or fetched from the chunk store (list file names or view fetched urls in web server logs). In some cases it this may be harmless as the observer already needs to have the plain text content to know what the hash is, but there are numerous scenarios where this can be disastrous. For example, if an internal Amazon memo is leaked, then Amazon can quickly determine which users have stored the leaked memo in encrypted chunk stores in S3 by scanning for file names with the known hashes. Or, an attacker can precompute chunk hashes of vulnerable software and libraries, and mount directed attacks at IP addresses that fetch these encrypted chunks.

And, even if the observer doesn't know the plaintext, he may be able to brute force the plaintext if he knows the hashes! There may be file formats or templates where most of the content is the same, but a small portion may vary. Then, the attacker can test by putting different content in the variable part, hash the result, and determine whether such an chunk exists in the encrypted chunk store. This is the kind of attack that was explained in the email thread I linked.

But I may be misunderstanding the way the encryption is supposed to be used. For instance you said that the chunk hashes are in the index files as well. This is true, but I was thinking that these would be stored locally somewhere else and not visible to the operator of the chunk store.

folbricht · 2021-01-18T01:06:42Z

Regarding the privacy aspect you're correct. This implementation protects data that is unknown to an attacker. It does not protect against an attacker that has full access to logs and the chunk store from checking if a certain file may be contained in it. And if the attacker has access to the index file and a full access log then that could be used to confirm if that index file was used, but not what's in it. If this is a concern, then this isn't the right tool. It really only protects the data in chunks from being exposed if leaked or accessed without the password. It'll not obfuscate access.

With regards to SHA256 vs scrypt, that's a different story. It makes no difference here at all. Consider you're saying it'd be easy to brute force the password. Well, you wouldn't have to, right? Since brute forcing this would mean guessing the password for a given key (which you can't actually get, see below). But, since you have the key in this case, why would you need the password? You have the key, you can decrypt all of it, no need to know the password as well. The SHA256 here is way to generate a key of the correct length from a variable-length password. The alternative would be to ask the user to provide a key of the right size, in binary. It would be equally secure if chosen properly.

Now, to get the key you'd have to brute-force AES and that's just not feasible, even if you knew the plain text. If this was easy, then AES wouldn't be a thing at all. Basically, you won't be able to get the key.

Encrypting the chunk filenames to hide the content-hashes as well may be possible, not sure it's the right thing to do for this tool though, but will consider it. The would prevent attackers from checking if a file is in the store with trial-and-error, but it won't close all the holes, since file-size also plays a role in this, it's not as straightforward

antong · 2021-01-18T15:53:03Z

I am assuming that the attacker has an encrypted chunk file, but not the password or the AES key. I also assume that using a password implies that the effective key space may be smaller than the AES key space. If you used a randomly generated password with 256 bits of entropy (like a totally random 43 character case sensitive alphanumeric password), then yes, SHA256 vs scrypt wouldn't matter. But as the key space for user chosen passwords may be significantly smaller, then password cracking using brute force + dictionary attacks can be effective.

To mount a successful password cracking attack given the assumptions above, you don't need to know the AES key. When you test a password, you derive the key, use it to decrypt the first block in the chunk and check for the Zstandard magic number. Or since the last four bytes of the sha256 sum of the derived key are stored last in the filename, this can also be used as a check for correct password. Or do both. These operations are really quick (something like millions of tests per second on a single laptop core and billions on a consumer GPU).

folbricht · 2021-01-19T14:14:19Z

Ok, so the main issue is really short/insecure passwords. That is indeed a problem which the tool doesn't currently protect against. scrypt would make the keygen operation more expensive and perhaps allow for shorter passwords, but is that really the solution? Especially with a fixed salt it's not great either. One alternative would be to expect the whole key to be configured, all 256 bits of it. It's easy enough to generated with something like openssl rand -hex 32 and that could be documented. Then there's no chance to pick a bad password.

AES encryption for chunks

2730aba

folbricht added 2 commits December 27, 2020 09:48

Update docs

02a2534

Add encryption options to chunk-server. Also fixes an old error handl…

f7f7a16

…ing bug

folbricht added 2 commits December 30, 2020 12:37

Calculate a KeyID and use it to build chunk file extensions

9fb3014

shorten the example to better fit in the readme

05dccb7

Add XChaCha20-Poly1305 (default) and AES-256-GCM

2837ee1

folbricht changed the title ~~AES encryption for chunks~~ Encryption for chunks Dec 31, 2020

folbricht added 3 commits December 31, 2020 16:40

Update readme

1c82e63

Typo

5961612

Remove AES-256-CTR

fa43d2e

folbricht marked this pull request as ready for review January 9, 2021 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encryption for chunks #182

Encryption for chunks #182

folbricht commented Dec 27, 2020 •

edited

Loading

folbricht commented Dec 27, 2020

charles-dyfis-net commented Dec 29, 2020

charles-dyfis-net commented Dec 29, 2020

folbricht commented Dec 30, 2020

charles-dyfis-net commented Dec 30, 2020

folbricht commented Dec 31, 2020

folbricht commented Jan 9, 2021

antong commented Jan 14, 2021

folbricht commented Jan 14, 2021

antong commented Jan 16, 2021

folbricht commented Jan 18, 2021

antong commented Jan 18, 2021

folbricht commented Jan 19, 2021

Encryption for chunks #182

Are you sure you want to change the base?

Encryption for chunks #182

Conversation

folbricht commented Dec 27, 2020 • edited Loading

folbricht commented Dec 27, 2020

charles-dyfis-net commented Dec 29, 2020

charles-dyfis-net commented Dec 29, 2020

folbricht commented Dec 30, 2020

charles-dyfis-net commented Dec 30, 2020

folbricht commented Dec 31, 2020

folbricht commented Jan 9, 2021

antong commented Jan 14, 2021

folbricht commented Jan 14, 2021

antong commented Jan 16, 2021

folbricht commented Jan 18, 2021

antong commented Jan 18, 2021

folbricht commented Jan 19, 2021

folbricht commented Dec 27, 2020 •

edited

Loading