Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zen workaround missing for Zen 2 (which is also affected by SpecLockMap). #5

Open
eddyb opened this issue Sep 22, 2020 · 2 comments
Open

Comments

@eddyb
Copy link

eddyb commented Sep 22, 2020

While this repository doesn't mention SpecLockMap by name, FAM17H_MSR_WA_1 is MSRC001_1020[54]:

#define FAM17H_MSR_WA_1 0xc0011020
#define FAM17H_MSR_WA_1_BITS 0x40000000000000ULL

We found that MSRC001_1020[54] disables SpecLockMap (some advanced form of lock speculation which may roll back a number of post-lock instructions but doesn't correctly roll back performance counters) in rr-debugger/rr#2034 (comment).

Latest rr (from git) now contains some detection logic for SpecLockMap (execute a single atomic, check if r0825:u aka SpecLockMapCommit counted it), so you could use for testing (sadly I don't think there's a standalone version).


What we also know is that Zen 2 is just as affected by SpecLockMap's missing counter rollback as Zen 1 or Zen 1+, which leads me to believe ce475e6 should've added Zen 2 to this list:

// "Zen" core workaround. Multiple SoCs have "Zen" cores.
// 0x00-0x0f are the Ryzen/Epyc CPU-only SoCs
// 0x10-0x2f are the Raven Ridge APUs
// 0x50-0x5f also exist
if (c->x86 == 0x17 &&
((c->x86_model >= 0x0 && c->x86_model <= 0x2f) ||
(c->x86_model >= 0x50 && c->x86_model < 0x5f)))
{

Furthermore, only on Zen 2 will setting the MSR bit stick, due to the kernel using MSRC001_1020[10] as SSBD on earlier Zen (whereas Zen 2 got an architectural MSR for speculation control, and the kernel toggles that instead).

https://github.com/mozilla/rr/wiki/Zen contains more information on how to check if the workaround sticks on a given machine and what you can do to mitigate that - I suspect the tracepoint-based (kernel module) workaround by @glandium is the only one that would be readily applicable for you, since you're already operating within a kernel module (though currently you don't require tracepoints so that would be a new requirement for using the IBS Toolkit).

@jlgreathouse
Copy link
Owner

Thanks for the report, and sorry for the delay in responding. This driver's workaround section doesn't mention that bit by name because the name of MSRC001_1020[54] is not mentioned in any public manuals. Tsk tsk on ASRock for putting a name to it. :)

The reason you describe (a possible incorrect perfmon value after a lock operation rollback) is not why this driver sets the bit on Zen 1. We have other reasons for setting this bit on Zen 1, though I can't describe them in detail because they are not in any of our public manuals. As the ASRock manual notes, the BIOS setting to enable IBS on Zen 1 does set that bit (amongst others), which is why we also set it in this driver if the BIOS was not configured enable IBS. However, neither the BIOS nor this driver do this to prevent perfmon inaccuracy.

The publicly-unexplained reason for setting this bit in Zen 1 no longer exists in Zen 2. As such, we are not required to set this bit on Zen 2 for IBS to be enabled -- in fact, IBS should be enabled out of the box on Zen 2 cores. There is no longer a need for a BIOS setting to enable workarounds before allowing IBS to be used, so this driver also does not need to do these workarounds. In addition, the perfmon inaccuracies you describe should not affect IBS on Zen 2, because IBS Op sampling only samples fully retired ops. As such, you will not take any IBS samples of speculative ops that are rolled back.

Thank you for the pointer about SSDB and how it's likely affecting this driver. I'll look into this when I find some spare time. This kind of complexity pops up when you're an out-of-tree driver, especially one that's trying to augment or paper over existing mechanisms that exist in-kernel. :)

Finally, I will separately note that in this post, @glandium notes that perhaps the in-kernel driver should be setting these workarounds bits on Zen 1. I disagree. Enabling IBS in the BIOS should be sufficient on most systems, and is the AMD-recommended way to use IBS in production. Setting these workaround bits and enabling IBS after boot is not officially supported by AMD. I'm setting these bits in this out-of-tree driver because this is a research tool that is not attempting to be upstreamable or production ready. I just wanted to make sure researchers could use IBS even if their BIOS did not include the option to enable it.

My take is that setting undocumented MSR bits in the upstream kernel driver is dangerous, and I cannot recommend submitting such a patch to LKML. Researchers can deal with potentially unstable systems if I've missed something during the post-boot workarounds; production systems cannot.

@eddyb
Copy link
Author

eddyb commented Oct 14, 2020

because IBS Op sampling only samples fully retired ops

Sadly it does seem to affect the "retired instructions" counter, so I guess it's not as "fully" retired as what IBS sees.

In any case, thanks for taking the time to respond! (I believe this might be the first time we heard back from anyone at AMD regarding the whole SpecLockMap saga, at all - then again, I don't think there was any attempt at grabbing anyone's attention, so no surprises there, heh)

And you can tell I'm also bad with notifications lately, didn't notice your reply until now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants