-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CM5 random I/O freezes with CQE enabled for eMMC #6512
Comments
Please post the complete contents of dmesg. |
This looks too old to contain the CQE fixes from my issue you linked above. Those fixes were merged on 2024-10-18 (#6419), which is 10 days later than your build. FWIW, on Raspberry Pi OS 6.6.62-1+rpt1 (2024-11-25), all 3 of my RPi5 run without issue with CQE enabled. |
Good spot, the first fix in that series is for a bug that will affect eMMC as well and would result in the described hang. |
@P33M I was able to reproduce it on Raspberry Pi OS. Here are steps I did after flashing the latest image of RPi OS Lite and ran it through the first boot:
Then I ran a pexpect testing script that keeps rebooting the device, removing the created file if the boot is successful, power-cycling the device if it doesn't boot within ~5 minutes. Here is the serial output for a failed boot: rpios-cqe-frozen-onlyoverlay.txt I tried doing everything from scratch today and writing down my steps along, to rule out any other factors. Then I followed the steps again and saved the logs that are attached above. I don't get the hung task message in RPi OS (maybe it's enabled with some kernel option downstream?) but when it freezes, it doesn't recover even after tens of minutes. What is strange (but it can be some fluke) that after the test is ran for long time, it kind of "stabilizes" and doesn't happen as often as on the first boot after the OS is flashed. I saw this both on HAOS and RPi OS. Also, this test was conducted on 8GB RAM/8GB eMMC module (preproduction rev 0.2 which I used because I have access to BCM UART there) but it was reproducible with production CM5104032 too. |
The latest download without running |
Which commit exactly you have in mind? From internal comms I have backtracked I have tested for the issue with efecbda (although again with HAOS only) and it had the issue. I would like to avoid chasing ghosts, as the bug obviously has some racy trigger and it can be quite time-consuming to rule out reliably. (EDIT: Anyway, since I'm about to call it a day in a while, I will prepare one another test run from scratch and also perform an APT upgrade after the first boot.) |
The Raspberry Pi OS apt package is at 6.6.62. This corresponds to:
i.e. dd23943 |
Describe the bug
When CQE is enabled for the eMMC interface on CM5 (CM5104032), I/O operations sometimes completely freeze and do not recover. We noticed this on random occasions when the system was booted already, but the most reliable trigger was heavier I/O on the first system boot when a swapfile was created (copying few hundreds of MB of zeroes to ext4 FS). The issue is reproducible also with
bcm2712-rpi-cm5-cm5io.dtb
device tree, thus not limited only to the downstreambcm2712-rpi-cm5-ha-yellow.dtb
. To rule out bootloader being involved here (we're using U-Boot), it was confirmed loading kernel image directly by the default bootloader doesn't resolve the problem. However, removingsupports-cqe
from thesdio1
node reliably fixes it (verified in couple of hundreds of boots already).Steps to reproduce the behaviour
Boot HA OS for the first time. If it doesn't freeze, remove the swapfile (
swapoff /mnt/data/swapfile && rm /mnt/data/swapfile
) and reboot.Device (s)
Other
System
Tested on Home Assistant OS 14.0.rc2 (using 6.6.51 kernel based on
stable_20241008
tag).Logs
Additional context
This seems to be similar to #6349 but the fixes for it aimed only at a limited set of SD cards.
The text was updated successfully, but these errors were encountered: