-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop poking AMDGPU SMU I2C #194
Comments
Is this something you've actually experienced with ddcutil, or just a general heads up? In normal operation, ddcutil prefilters the i2c devices using /sys. If the device has a class attribute, it must either be x03 (display) or x0a (docking station). If no class attribute is found, it checks the device name, possibly augmented by the driver name. In particular, if the name starts with "smu", the device is ignored. (Probing smu devices was found to hang the system with Linux running on a Mac G5.) If you are seeing the problem on ddcutil, we can probe further. Normally I'd just ask for the output of ddcutil environment --very-verbose, but the environment command is bit more promiscuous in choosing which I2C devices to examine, so there's a chance that it could trigger the problem. |
It is, unfortunately, a very real issue. Output attached. The SMU bus reports class 0x03 (which is probably also wrong?), and is named "AMDGPU SMU", which is probably why it passes the "starts with smu" check. |
I've added "AMDGPU SMU" to the list of names for devices that should be ignored, and pushed the change to branch 1.1.0-dev. Let me know if it works. The problem is that the device class (e.g. x03) is tied to the device, not to individual /dev/i2c buses. (see for example line 2342 of the environment output.) This actually makes sense, since the entire device is a video adapter. So we're left with looking at the device name. Name prefixes, or driver/name combinations, are added as they are encountered, which is infrequent - "AMDGPU SMU" increases the list size from 7 to 8. |
Thanks! Bad news though, in order:
|
The current 1.1.0-dev branch contains a fix to the check if a /dev/i2c device is ignorable. First the device name is checked. If the device name is not on the explicit ignorable list, then the class is checked. It also (hopefully) addresses the stringop-overflow and stringop-truncation warnings. These were introduced in a point release of gcc 8. Modifying the code so that it compiles without warning no matter what release of gcc or clang is in use, and is still clean to read, turned out to be an ugly little problem. Even glib function g_strlcpy() gets flagged. See, for example, this long stackoverflow discussion. (I inserted "hopefully" because the full set of distribution releases and architecture variants have not yet been checked.) |
The latest version compiles cleanly and works perfectly, thanks! Any chance we can get this backported to the older stable releases as well? |
What is the context in which backporting a fix to a prior release is important? The only situation in which I could see this mattering is if an application is using the shared library and can't rebuild to use the latest version. I'm not aware of any such application. |
Distribution policies, mostly. Debian/Ubuntu generally don't allow new feature releases into their stable branches, so 1.0.x to 1.0.x+1 is OK, but 1.0.x to 1.1.y won't be. |
Hi @rockowitz, This bug is a kernel bug, specifically in the way the amdgpu driver communicated with the AMD SMU. https://gitlab.freedesktop.org/ltuikov/linux/-/commits/i2c-rework-luben ddcutil can now freely scan AMDGPU SMU busses. Filtering out "AMDGPU SMU" should be reverted, especially when we merge the tip of my branch into mainline. I'm using ddcutil-0.9.9 on the kernel above and it scans the AMDGPU SMU bus without a problem. Regards, |
ddcutil doesn't really gain anything by scanning the SMU bus anyway, and people may be running it on older kernels. I'd just keep it on the blocklist. |
It should eventually be phased out. It was unfortunate that this but was added to the kernel more than 2 years ago when some things were moved around. |
Maybe add a kernel version check? |
In general, ddcutil avoids probing SMU devices if possible. Not only can this cause hard failures, but probing I2C buses is an expensive operation, to be avoided when we know a priori that a /dev/i2c device is not a monitor. There's no harm in leaving the check in. and avoids tying ddcutil to all but recent kernels. |
@ltuikov Luben, I assume you're an appropriate person to ask the following questions. If not, can you pass it on or redirect me? I'm continuing to see problem reports with Navi GPU cards, e.g. #223, #224. What's the minimum kernel release that contains the SMU changes? I'll tell people to ensure they're at that kernel level and, if so, to post a bug report on, I assume, https://gitlab.freedesktop.org/drm/amd/-/issues. What would you like to see in the bug report? Regards, |
@rockowitz Hi, 544dcd74b7093a drm/amd/pm: Fix a bug in semaphore double-lock If you have at least these two commits, then the SMU will not get in an "inconsistent state", and you don't need filtering. It just works. You can use a previous version of ddcutil, presumably without filtering. Regards, |
As of this writing, ddcutil will attempt to poke all i2c-N devices available on the system to detect monitors. However, on recent AMD cards (specifically Navi2x variants, i.e. RX 6000 series), even trying to read from the SMU I2C bus can get the GPU into an inconsistent state (see 1). This should eventually be resolved via a more correct I2C implementation on the kernel side (again, see 1), but it might be worth it to also add a filter on the ddcutil side to make sure the SMU bus is never touched - nothing good (for ddcutil) can ever come out of it anyway.
The text was updated successfully, but these errors were encountered: