-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug what happened with 14031 #14042
Comments
I have a computer with a |
I've yet to do any real debugging, but my sense here is that the aggressive unrolling in that we do (4x with float dot), is hurting us when we inline. I'll try to reproduce on my Intel box, this might not be AMD specific (might just hurt more there). I wanna run luceneutil benchmarks rather than the micro-benchmarks. |
what else are we to do though? cpus have multiple fma units, jvm won't unroll as it will change results of floating point. |
Hotspot will unroll the loops that are using the Vector API to do floating-point arithmetic. On my Intel box
Reducing the unrolling of Linux
|
Do you know why we see blocks of 4 FMA insns such as |
I think we are having a communication issue over terminology. I don't care about unrolling, i care about superscalar execution. JVM doesn't allow it, which means the hardware sits there idle and wasted. so we unroll the code to make this parallelism possible. Does it make sense? The "unrolling" the JVM does where it uses same registers over and over and forces serial execution is something irrelevant for this purpose. I totally see why they don't do this, besides the fact there is no "fast math" type flag I am aware of, it would be insanely confusing in general to start getting different answers after JIT compilation "changed". |
Description
PR #14031 makes the float vector functions inlinable, but caused a drop in nightly benchmark.
We reverted the PR in #14041 , but we should try to figure it out, as we don't want unpredictable performance.
Maybe the problem is specific to AMD and we should test that. Also it would be great if we could safely add some debugging to the nightly benchmark to understand what happens with the compilation (e.g.
-XX:+PrintCompilation
or-XX:+LogCompilation
).The text was updated successfully, but these errors were encountered: