Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java tink library does not scale with java threads #24

Open
uploeger opened this issue May 31, 2023 · 14 comments
Open

java tink library does not scale with java threads #24

uploeger opened this issue May 31, 2023 · 14 comments
Labels

Comments

@uploeger
Copy link

It seems that java tink library does not scale with java threads. Find attached a small java test program.

Find below test results with one and three threads, running on a unix (redhat) box with 24 vCPUs.

In both cases around 1.000.000 encryption operations are executed every second. The amount of encryption operations is not increasing when using more threads.

java AeadThread encrypt "this is a test" 10000000 1

origtext: this is a test

Thread: 1 for encryption loop: Thu May 25 13:55:22 CEST 2023

Thread: 1 after encryption loop: Thu May 25 13:55:32 CEST 2023

java AeadThread encrypt "this is a test" 10000000 3

origtext: this is a test

Thread: 2 for encryption loop: Thu May 25 13:53:35 CEST 2023

Thread: 3 for encryption loop: Thu May 25 13:53:35 CEST 2023

Thread: 1 for encryption loop: Thu May 25 13:53:35 CEST 2023

Thread: 1 after encryption loop: Thu May 25 13:54:02 CEST 2023

Thread: 2 after encryption loop: Thu May 25 13:54:02 CEST 2023

Thread: 3 after encryption loop: Thu May 25 13:54:02 CEST 2023

AeadThread.zip

@uploeger
Copy link
Author

test environment:
java version: openjdk version "1.8.0_362"
java tink library version: 1.9.0

@tholenst
Copy link
Contributor

tholenst commented Jun 5, 2023

Thanks for the report. I think the issue here is that you call "GetPrimitive" in each thread separately.

In Tink, GetPrimitive can be slow and performs optimizations. It also may use locks. You should share the Aead objects between threads instead.

Let me know if this doesn't solve the issue.

@tholenst tholenst self-assigned this Jun 5, 2023
@tholenst tholenst added the p3 label Jun 5, 2023
@tholenst
Copy link
Contributor

tholenst commented Jun 9, 2023

I will close this so I have a better overview of open work items and since I assume that this is resolved. If this is still an issue, please reopen.

@uploeger
Copy link
Author

I tested with shared aead object, but the result is the same.
Since I could not re-open a closed issue, I opened a new one.

AeadThread.zip

@tholenst tholenst reopened this Jun 12, 2023
@tholenst
Copy link
Contributor

tholenst commented Jun 20, 2023

Thanks for insisting, I can reproduce this.

Doing a thread dump with 5 threads I get that 2 of them are blocked as follows:

"4" tink-crypto/tink#18 prio=5 os_prio=0 cpu=34757.19ms elapsed=41.17s tid=0x00007f043c323b10 nid=0x5feb0 waiting for monitor entry [0x00007f03f31fc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.provider.SecureRandom.engineNextBytes([email protected]/SecureRandom.java:222)

This is goes through at com.google.crypto.tink.subtle.Random.randBytes(Random.java:43)

See

public static byte[] randBytes(int size) {

This is confusing since this uses a ThreadLocal.

@tholenst tholenst added p2 and removed p3 labels Jun 20, 2023
@tholenst
Copy link
Contributor

Looking at it some more, a full stacktrace looks like this:

java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.provider.SecureRandom.engineNextBytes([email protected]/SecureRandom.java:222)
- waiting to lock <0x000000060f001770> (a sun.security.provider.SecureRandom)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes([email protected]/NativePRNG.java:537)
at sun.security.provider.NativePRNG.engineNextBytes([email protected]/NativePRNG.java:221)
at java.security.SecureRandom.nextBytes([email protected]/SecureRandom.java:758)
at com.google.crypto.tink.subtle.Random.randBytes(Random.java:43)
at com.google.crypto.tink.subtle.AesGcmJce.encrypt(AesGcmJce.java:51)
at AeadThread.run(AeadThread.java:46)

In Random.randBytes we use a ThreadLocal SecureRandom, attempting to make sure that we don't block threads on each other. However, in NativePRNG:221 the object calls "INSTANCE.implNextBytes(bytes)" which goes to the static member INSTANCE -- hence destroying our efforts to avoid a global mutex.

Quick googling suggests that this is https://bugs.openjdk.org/browse/JDK-8098581 -- but this bug was fixed in 2016 and I'm using openjdk 17.0.6 :/

I think we need to do some investigating how to pick the correct randomness provider :/ See https://docs.oracle.com/en/java/javase/17/security/oracle-providers.html for some starting point.

@tholenst
Copy link
Contributor

I will try to describe my current understand here. There might be bugs, I'm mainly writing this down in the hope that this helps me understand.

Let's say we call "java.security.SecureRandom.nextBytes()" (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/security/SecureRandom.java). I am pretty sure that "threadSafe" is true, so we call "secureRandomSpi.engineNextBytes()" (separate instance for each thread).

On my system, secureRandomSpi is an object of type NativePRNG (https://github.com/openjdk/jdk/blob/master/src/java.base/unix/classes/sun/security/provider/NativePRNG.java). There, the call to engineNextBytes will be forwarded to a global instance of the private class RandomIO in this class -- it uses the method implNextBytes.

This class in turn asks an object of type "sun.security.provider.SecureRandom" (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/SecureRandom.java) for "engineNextBytes". This is a synchronized method, and since we have a global lock from the "RandomIO" object above, we now lock on a global lock.

I feel I'm missing something.

@tholenst
Copy link
Contributor

I think I understand somewhat better now.

In the end, the job of the RandomIO object above is to read from /dev/urandom or /dev/random. It produces a mix of a SHA1 PRNG stream and the results of reading from /dev/(u)random.

The above explains how we get the SHA1 PRNG stream (because we first block on this one). However, it is more interesting how to read from /dev/urandom in multiple threads, and it seems at least somewhat natural that Java read from multiple threads with a global instance.

@morambro morambro transferred this issue from tink-crypto/tink Jan 26, 2024
@uploeger
Copy link
Author

uploeger commented Apr 2, 2024

Are you using a central/static instance of "java.util.Random" ?
Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs.

https://docs.oracle.com/javase/8/docs/api/java/util/Random.html

@tholenst
Copy link
Contributor

tholenst commented Apr 8, 2024

Instances of https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadLocalRandom.html are not cryptographically secure (as it explains there).

Tink already uses ThreadLocal random, but as explained in #24 (comment) above this simply forwards to a global object and then takes a lock.

@juergw
Copy link
Contributor

juergw commented Apr 29, 2024

I have now tested this by installing Conscrypt on startup, with:

Security.addProvider(Conscrypt.newProvider());

With a single thread, this makes my test run about 20% slower, but with 10 threads in parallel, it runs 4.5 times faster.

copybara-service bot pushed a commit that referenced this issue Apr 29, 2024
We already do the same thing for other key type, such as AES-GCM.

See also: #36, and #24.

I run some benchmarks for this, where I encrypt some data using AES-SIV many times.
- For 1kb data, it get 2.5x faster for a single thread, and 5.6x faster for 10 threads in parallel.
- For 32 bytes data, it gets 15x faster for a single thread, and 41x faster for 10 threads in parallel.

PiperOrigin-RevId: 629075782
Change-Id: I6735eafe4670213af15084343a7f269885d1101b
@juergw
Copy link
Contributor

juergw commented Jul 4, 2024

Tink now always tries to use Conscrypt for generating randomness, see 6409bba and 496703e. So I think this should be resolved for binaries that contain Conscrypt.

@uploeger
Copy link
Author

uploeger commented Jul 9, 2024

I can confirm that new version 1.14 is much faster and scaling with java threads.
Many thanks to the development team :)

issue can be closed

java AeadThreadDet encrypt "This is a test" 10000000 1
origtext: This is a test
Thread: 1 for encryption loop: Tue Jul 09 13:32:49 CEST 2024
Thread: 1 after encryption loop: Tue Jul 09 13:32:55 CEST 2024

java AeadThreadDet encrypt "This is a test" 10000000 2
origtext: This is a test
Thread: 1 for encryption loop: Tue Jul 09 13:34:28 CEST 2024
Thread: 2 for encryption loop: Tue Jul 09 13:34:28 CEST 2024
Thread: 1 after encryption loop: Tue Jul 09 13:34:35 CEST 2024
Thread: 2 after encryption loop: Tue Jul 09 13:34:35 CEST 2024

@tholenst
Copy link
Contributor

Thank you! I will keep reduce the priority of this, and keep it open though because AFAIK without Conscrypt Tink is still unacceptably slow.

@tholenst tholenst added p3 and removed p2 labels Jul 12, 2024
@tholenst tholenst removed their assignment Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants