Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please promote x86-64-v3 from experimental to regular releases with constant and timely updates #80

Open
ms178 opened this issue Mar 30, 2023 · 25 comments

Comments

@ms178
Copy link

ms178 commented Mar 30, 2023

@Alex313031 I am using your experimental x86-64-v3 build from Alex313031/thorium#112 (comment) as my daily driver since and it hasn't revealed any issues. In fact, it also provides measurable advantages over the normal Thorium-AVX Linux build as noticed in the WebGL Aquarium benchmark, hence I'd say it is safe to promote the x86-64-v3 Linux build to release status with regular updates going forward. What do you think?

@gz83
Copy link
Collaborator

gz83 commented Mar 30, 2023

@Alex313031 I am using your experimental x86-64-v3 build from Alex313031/thorium#112 (comment) as my daily driver since and it hasn't revealed any issues. In fact, it also provides measurable advantages over the normal Thorium-AVX Linux build as noticed in the WebGL Aquarium benchmark, hence I'd say it is safe to promote the x86-64-v3 Linux build to release status with regular updates going forward. What do you think?

@ms178

I think it's possible to push this work while ensuring compatibility with some older devices

@ms178
Copy link
Author

ms178 commented Mar 30, 2023

The alternative could be a Haswell-build as baseline for AVX2 in general (as RobRich is doing with his AVX2-build). It is up for debate if the compatibility to some other AVX2-capable devices that don't quite match Haswell is worth the performance trade-off of some fewer instructions used by x86-64-v3. Either way is fine with me personally, my priority is to have a AVX2 Linux build at all that is kept up-to-date.

@ms178
Copy link
Author

ms178 commented May 11, 2023

@gz83 @Alex313031 Months passed by since the initial experimental build, what's the status here? The current AVX2 release still doesn't bring an AVX2-Linux version by default.

@ms178
Copy link
Author

ms178 commented Aug 11, 2023

Ping. Another quarter has passed by without an AVX2-Linux release of any sort.

@Alex313031
Copy link
Owner

Alex313031 commented Aug 12, 2023

@ms178 It's not because I can't. The build files are there, you can use the same ones that the AVX2 windows builds use. It's just that I have to build for ALOT of platforms. Adding another takes up even more time. I didn't even want to make the AVX2 windows releases. I only did it because so many people wanted it. Only a few people have said they wanted an AVX2 linux release. The performance difference between avx and avx2 builds is negligible in my testing, and on systems where AVX2 causes downclocking of the CPU it can actually cause worse system performance overall.

I make the linux SSE3 and AVX builds, the raspberry pi builds, the android arm32 and arm64 builds, and the windows 7 builds. Each build takes multiple hours, and I build it on my personal machine which I also use for other stuff.

If you have a decent machine, I would be more than happy to do a personal one on one guidance for how to compile an AVX2 release for linux yourself.

@ms178
Copy link
Author

ms178 commented Aug 12, 2023

@Alex313031 Well, of course it is your project, I don't have an interest to build Thorium myself as that is too involved even on CachyOS that I use.

I have to question that narrative of yours. People want to max out their hardware capabilities and I have seen an improvement with the last AVX2 build vs the AVX build which motivates me to ask repeatedly for such an AVX2-build in the first place. And AVX2-capable machines are the vast majority of desktop machines out there today where you might even consider to make the AVX2-build the main version of Thorium; and even a small improvement that would bring would be still an improvement that is left on the table right now. The AVX2-downclocking issue that you mentioned is also no longer present in relevant CPU architectures in use today. Linux desktop usage is also on the rise and recently hit the same desktop market share as Windows 7 according to Statcounter. So the facts show that a AVX2-Linux build is just as important as a Windows 7-build when speaking of desktop market share as metric.

I can understand that maintenance of another build is a burden for the team, but build automation exists out there to help with that. I'd also rather see Windows-7 and SSE3-builds retire as Windwos 7 is EOL for some time now and supporting SSE3-only CPU architectures is less relevant than supporting AVX2 Linux systems in my eyes.

@Alex313031
Copy link
Owner

Alex313031 commented Aug 14, 2023

@ms178 If it didn't take so long to compile, I would be more than happy to make avx2 linux builds with every release.

That said, I can make an AVX2 release for you (and will probably also use it on my machine). But I can't promise any sort of regularity. I will only make AVX2 builds when I feel like it, and when I'm not doing anything else.

Also, this build will not be a "beta" build, as I have not encountered any errors building or running the AVX2 version on linux.

@ms178
Copy link
Author

ms178 commented Aug 14, 2023

@Alex313031 Understood - no promises. If the target audience is this limited, it might be worth considering going straigt to a -march=haswell -maes build though.

@Alex313031
Copy link
Owner

@ms178 Setting -march=haswell does not enable all x86-64-v3 instructions. In addition, setting -march without -mtune will set -mtune to the same thing. So this would pass -mtune=haswell to the compiler, leading it to perform very well on haswell CPUs, but not as good or even worse on other AVX2 capable CPUs. This is why I am following @RobRich999 advice and using -march=x86-64-v3 -mtune=generic -maes

@ms178
Copy link
Author

ms178 commented Aug 16, 2023

@Alex313031 RobRich999 also changes his flags from time to time, he also used to compile with -march=haswell -maes some time ago; by the way that tuning also works and performs fine on Zen 1 and up, usually. I haven't benchmarked it, but I would assume to perform significantly better than -march=x86-64-v3 -mtune=generic -maes due to the use of more instructions and compiler tuning; and if it is just for us two using that build on a Haswell+ or Zen+ target, than we don't need to care about performance and compatibility for old AMD AVX2 CPUs which is the only reason to use -march=x86-64-v3 -mtune=generic in the first place. I am offering to help with benchmarking a -march=haswell -maes vs -march=x86-64-v3 -mtune=generic -maes build, if you plan to put this debate to rest with numbers. ;)

@Alex313031
Copy link
Owner

Alex313031 commented Aug 16, 2023

@ms178 Here you go sir. 'twas compiled with march=x86-64-v3 -mtune=generic -maes
New repo too.

https://github.com/Alex313031/Thorium-Linux-AVX2/releases/tag/M115.0.5790.172

I might use -march=haswell -mtune-generic

@ms178
Copy link
Author

ms178 commented Aug 16, 2023

@Alex313031 Thank your for your work!

[edit:] Here are some MotionMark results on my Haswell-EP 2696V3:

Thorium (AVX1):
MotionMark_AVX1_1

Thorium (AVX2):
MotionMark_AVX2_1

As I had to close some Tabs during the AVX2 test run, the first sub-test result might be skewed in favor of the AVX1 test run where I did not intervene, so a clean AVX2 result should be even higher. I've also tested Speedometer, but that showed no significant differences.

@ms178
Copy link
Author

ms178 commented Aug 16, 2023

More data points, this time JetStream 2.1 where the AVX1 version wins.

AVX1:
JetStream2_AVX1_1

AVX2:
JetStream2_AVX2_1

@ms178
Copy link
Author

ms178 commented Aug 16, 2023

Last but not least WebXPRT, here we've got a solid win for the AVX2 version.

AVX1:
WEBXPRT4_AVX1_1

AVX2:
WEBXPRT4_AVX2_1

@Alex313031
Copy link
Owner

@ms178 Interesting.

@Alex313031
Copy link
Owner

@ms178 Will probably be rolling a Haswell release. I want you to test it and see how well it performs. Also wanna use it on my haswell workstation.

@ms178
Copy link
Author

ms178 commented Sep 10, 2023

@Alex313031 Great, just provide me a link to the file and I am happy to run some benchmarks on it.

@RobRich999
Copy link

RobRich999 commented Sep 16, 2023

@Alex313031 Using -mtune=generic is not a good idea for AVX2 builds. The "generic" tuning model maps to SandyBridge.

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86.td

Drop the -mtune= altogether and use -march=haswell or -march=x86-64-v3 for x86-64 AVX2 builds. You can benchmark if desired to see which works better for your use case, but being realistic, it is likely in the noise since both are quite similar and use the same "Haswell" base tuning model.


You can try adding -mprefer-vector-width=128 if you want to optimize specifically for older AVX2 procs like Haswell that can be affected by AVX2 frequency scaling when encountering "heavy" instructions.

The nice part is even if setting to prefer 128-bit AVX/AVX2, LLVM for x86-64-v3 (and most similar) AVX2-class processor targets will still generate certain 256-bit "light" instructions like load/store that do not typically affect AVX2 frequency scaling. You can take a look at "AllowLight256Bit" in the x86.td file if interested.

Thing is -mprefer-vector-width=128 might be slower on newer AVX2-class processors like Skylake, Zen, etc. not affected by frequency scaling. YMMV.


Currently I am using -mprefer-vector-width=128 for my LLVM AVX2 build since my 96c/192t build server uses Broadwell procs.

https://github.com/RobRich999/LLVM_Optimized_AVX2/blob/main/llvm-avx2.patch


Also Polly seems to be working okay once again, at least in my latest browser builds. I did not delve into a deep performance evaluation, but a couple of quick browser benchmarks looked acceptable (to me) anyway. Might be something to further evaluate if interested. I just used standard Polly and invariant-load-hoisting.


BTW, "-maes" was mostly a workaround for a long-since-fixed issue. You can drop it if desired IMO. I suspect it is not doing anything anyway, otherwise it would be breaking AVX2 builds on older Celeron and Pentium AVX2-class procs lacking AES support. However, I do not have those specific procs to test against the cflag, so another YMMV situation.

https://en.wikipedia.org/wiki/AES_instruction_set
https://reviews.llvm.org/D51510

@ValeZAA
Copy link

ValeZAA commented Sep 25, 2023

I have i5-4670K. Please? I would love -march=haswell, so that older CPUs will not even work, because it removes alternate code paths for older CPUs. I agree, it is very bad idea to tune to v3, as that assumes old and generic CPU.

@Alex313031
Copy link
Owner

v3 isnt old CPUs, -mtune=generic is

@RobRich999
Copy link

@Alex313031 Definitely do not pass -mtune=generic into an AVX2 build, or most any other x86-64 build for that matter. See here:

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86.td

Using -mtune=generic on x86-64 sets the SandyBridge scheduling model, which is already mapped into the -march=x86-64 proc default baseline LLVM uses anyway. IOW, you would just as well to set neither march= or -mtune= and just leave the existing -msse3 cflag in the compiler config(s) for a standard Chromium SSE3 baseline build.


BTW, the mentioned Intel i5-4670K is Haswell, so -march=haswell or -march=x86-64-v3 would be fine there.

That said, actually we use clang-cl for Windows builds, so replacing -msse3 with /arch:AVX2 in the Windows compiler config file would suffice for an WinAVX2 build. Clang-cl internally maps /arch:AVX2 to -march=haswell. ;)

@ValeZAA
Copy link

ValeZAA commented Sep 26, 2023

Clang-cl internally maps /arch:AVX2 to -march=haswell. ;)

That would be nice. Using Visual Studio 2022 and not 2015 would be also nice.

BTW, the mentioned Intel i5-4670K is Haswell

Yeah.

@Alex313031
Copy link
Owner

@ValeZAA WDYM 2015, when did someone mention using MSVS 2015?

Also, @RobRich999 I use both -march=x86-64-v3 and /arch:AVX2 in the win compiler config.

Did you ever figure out why using /arch:AVX caused errors, while just leaving -mavx -maes didn't?

@RobRich999
Copy link

RobRich999 commented Oct 4, 2023

@Alex313031 You can just set /arch:AVX2 in the win compiler config, which in turn will internally map to -march=haswell. The difference versus -march=x86-64-v3 tends to be in the noise, and that way you will not be passing multiple -march= flags into the compiler command line.

Using /arch:AVX hits an issue when building components with code paths having different /arch flags. Clang-cl tries to adhere to what MSVC cl.exe does, and IIRC, cl.exe does not like /arch:AVX then /arch:AVX2 on the same command line.

https://source.chromium.org/search?q=%2Farch:AVX2%20build.g/arch:AVXn&ss=chromium

By setting -mavx we get around the conflicting /arch flags issue. Of note the /arch:AVX flag maps to -march=sandybridge. The default x86-64 march uses the sandybridge instruction tuning model anyway, and -mavx restores the sandybridge level of SIMD support we want for AVX bulding.

@ms178
Copy link
Author

ms178 commented Oct 4, 2023

@Alex313031 I'd also vote for -march=haswell for the AVX2-Linux build as it offers some more instructions that are potentially of value. I'd volunteer for testing such an haswell vs x86-64-v3 build if you want to see some numbers. That would need to be on the exact same version with just the march-change to have a fair apples to apples comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants