Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for big endian platforms #81

Merged
merged 5 commits into from
Feb 8, 2024

Conversation

jonathan-albrecht-ibm
Copy link
Contributor

This PR adds support for big endian platforms. It adds byte swapping where needed so that all hash functions can run correctly on both big and little endian platforms. It tries to avoid any significant performance degradation on little endian platforms.

The hash families that are affected by this PR are:

  • SHA3
  • Blake
  • Scrypt

All of the other hash families already worked correctly on big endian platforms.

I have included all of the changes in this PR to hopefully make it easy to give feedback on the approach. I'm happy to split it into smaller PRs if preferred.

Most of the byte swapping is done in-place on Uint32Arrays which gives the best performance of the things I tried.

In the blake base class (_blake.ts) update() function, there is one spot where the byte swapping is done on the data32 Uint32Array which is backed by the input message. The byte swapping is reversed before the update() function returns but while the update() function is running the users input message will have been mutated if they passed it in as a Uint32Array. I'm not sure if its ok for the input to be temporarily mutated but, if not, I have a different fix that avoids mutation but has a bit worse performance.

Except for that spot, all other byte swapping should be done only on internal or output buffers.

Thanks in advance for looking at this. I'm happy to make any changes necessary.

@paulmillr
Copy link
Owner

Thanks for this. Could you show how the perf is degraded after this change?

@jonathan-albrecht-ibm
Copy link
Contributor Author

Yes I'll try. I've been using the benchmarks to watch performance. I've also been using the nodejs profiler to check if any new functions have started popping up in the cpu profile. On little-endian, my changes should not run any extra byte swapping or loops over the data since that's guarded with isLE checks.

Here is the output of npm run bench on an x86_64 linux vm with nodejs v20.11.0 for the main branch (before) and the big-endian-port branch (after). I think there's no real difference at least within the noisiness of my vm:

main
-------
Benchmarking
SHA256 32B x 214,684 ops/sec @ 4μs/op ± 2.31% (min: 2μs, max: 19ms)
SHA384 32B x 120,525 ops/sec @ 8μs/op
SHA512 32B x 120,279 ops/sec @ 8μs/op ± 1.03% (min: 4μs, max: 5ms)
SHA3-256, keccak256, shake256 32B x 42,286 ops/sec @ 23μs/op
Kangaroo12 32B x 59,708 ops/sec @ 16μs/op
Marsupilami14 32B x 53,047 ops/sec @ 18μs/op
BLAKE2b 32B x 95,084 ops/sec @ 10μs/op
BLAKE2s 32B x 126,182 ops/sec @ 7μs/op ± 1.23% (min: 4μs, max: 5ms)
BLAKE3 32B x 120,860 ops/sec @ 8μs/op ± 1.34% (min: 3μs, max: 15ms)
RIPEMD160 32B x 176,211 ops/sec @ 5μs/op ± 2.54% (min: 3μs, max: 23ms)
HMAC-SHA256 32B x 60,734 ops/sec @ 16μs/op
RAM: rss=148.3mb heap=88.5mb used=62.4mb
-------
Benchmarking
HKDF-SHA256 32 x 24,953 ops/sec @ 40μs/op
HKDF-SHA256 64 x 22,651 ops/sec @ 44μs/op
HKDF-SHA256 256 x 13,537 ops/sec @ 73μs/op ± 1.44% (min: 44μs, max: 9ms)
PBKDF2-HMAC-SHA256 16384 x 13 ops/sec @ 73ms/op ± 7.83% (min: 65ms, max: 89ms)
PBKDF2-HMAC-SHA256 65536 x 3 ops/sec @ 293ms/op ± 4.09% (min: 269ms, max: 314ms)
PBKDF2-HMAC-SHA256 262144 x 0 ops/sec @ 1163ms/op ± 6.25% (min: 1113ms, max: 1285ms)
PBKDF2-HMAC-SHA512 16384 x 5 ops/sec @ 178ms/op ± 4.70% (min: 161ms, max: 201ms)
PBKDF2-HMAC-SHA512 65536 x 1 ops/sec @ 685ms/op ± 3.40% (min: 669ms, max: 721ms)
PBKDF2-HMAC-SHA512 262144 x 0 ops/sec @ 2571ms/op ± 3.21% (min: 2291ms, max: 2659ms)
Scrypt r: 8, p: 1, n: 16384 x 7 ops/sec @ 141ms/op ± 11.25% (min: 115ms, max: 205ms)
Scrypt r: 8, p: 1, n: 65536 x 1 ops/sec @ 509ms/op ± 1.50% (min: 490ms, max: 520ms)
Scrypt r: 8, p: 1, n: 262144 x 0 ops/sec @ 2356ms/op ± 15.57% (min: 2057ms, max: 2802ms)
Scrypt Async r: 8, p: 1, n: 16384 x 6 ops/sec @ 152ms/op ± 9.14% (min: 138ms, max: 210ms)
Scrypt Async r: 8, p: 1, n: 65536 x 1 ops/sec @ 668ms/op ± 1.58% (min: 646ms, max: 681ms)
Scrypt Async r: 8, p: 1, n: 262144 x 0 ops/sec @ 2583ms/op ± 5.06% (min: 2434ms, max: 2754ms)
RAM: rss=357.7mb heap=11.2mb used=6.8mb arr=268.5mb
big-endian-port
-------
Benchmarking
SHA256 32B x 232,234 ops/sec @ 4μs/op ± 2.34% (min: 2μs, max: 18ms)
SHA384 32B x 128,766 ops/sec @ 7μs/op
SHA512 32B x 130,701 ops/sec @ 7μs/op ± 1.01% (min: 4μs, max: 4ms)
SHA3-256, keccak256, shake256 32B x 42,758 ops/sec @ 23μs/op
Kangaroo12 32B x 58,719 ops/sec @ 17μs/op
Marsupilami14 32B x 57,710 ops/sec @ 17μs/op
BLAKE2b 32B x 99,265 ops/sec @ 10μs/op ± 1.59% (min: 6μs, max: 28ms)
BLAKE2s 32B x 115,326 ops/sec @ 8μs/op ± 1.54% (min: 4μs, max: 24ms)
BLAKE3 32B x 116,049 ops/sec @ 8μs/op ± 1.02% (min: 3μs, max: 7ms)
RIPEMD160 32B x 193,948 ops/sec @ 5μs/op ± 2.20% (min: 3μs, max: 19ms)
HMAC-SHA256 32B x 64,304 ops/sec @ 15μs/op
RAM: rss=149.7mb heap=89.3mb used=66.3mb
-------
Benchmarking
HKDF-SHA256 32 x 29,782 ops/sec @ 33μs/op
HKDF-SHA256 64 x 25,902 ops/sec @ 38μs/op
HKDF-SHA256 256 x 14,370 ops/sec @ 69μs/op
PBKDF2-HMAC-SHA256 16384 x 14 ops/sec @ 68ms/op ± 9.10% (min: 61ms, max: 87ms)
PBKDF2-HMAC-SHA256 65536 x 3 ops/sec @ 261ms/op ± 2.66% (min: 249ms, max: 271ms)
PBKDF2-HMAC-SHA256 262144 x 0 ops/sec @ 1060ms/op ± 2.12% (min: 1022ms, max: 1096ms)
PBKDF2-HMAC-SHA512 16384 x 6 ops/sec @ 144ms/op ± 7.41% (min: 130ms, max: 178ms)
PBKDF2-HMAC-SHA512 65536 x 1 ops/sec @ 555ms/op ± 3.22% (min: 530ms, max: 574ms)
PBKDF2-HMAC-SHA512 262144 x 0 ops/sec @ 2239ms/op ± 3.03% (min: 2158ms, max: 2313ms)
Scrypt r: 8, p: 1, n: 16384 x 8 ops/sec @ 121ms/op ± 10.17% (min: 103ms, max: 160ms)
Scrypt r: 8, p: 1, n: 65536 x 2 ops/sec @ 492ms/op ± 3.15% (min: 465ms, max: 518ms)
Scrypt r: 8, p: 1, n: 262144 x 0 ops/sec @ 1932ms/op ± 3.19% (min: 1861ms, max: 2007ms)
Scrypt Async r: 8, p: 1, n: 16384 x 7 ops/sec @ 141ms/op ± 5.43% (min: 120ms, max: 165ms)
Scrypt Async r: 8, p: 1, n: 65536 x 1 ops/sec @ 652ms/op ± 12.74% (min: 556ms, max: 767ms)
Scrypt Async r: 8, p: 1, n: 262144 x 0 ops/sec @ 2290ms/op ± 3.89% (min: 2191ms, max: 2428ms)
RAM: rss=343.4mb heap=11.5mb used=7.5mb arr=268.5mb

@paulmillr
Copy link
Owner

Good job Jonathan. Proper pull request!

@paulmillr paulmillr merged commit fa8a7c4 into paulmillr:main Feb 8, 2024
2 checks passed
@jonathan-albrecht-ibm
Copy link
Contributor Author

Thanks for reviewing and merging @paulmillr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants