Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to the sdcard hangs suddently #2500

Open
carlonluca opened this issue Apr 9, 2018 · 10 comments
Open

Writing to the sdcard hangs suddently #2500

carlonluca opened this issue Apr 9, 2018 · 10 comments
Labels
Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.

Comments

@carlonluca
Copy link

carlonluca commented Apr 9, 2018

Hello! I'm using a modified version of noobs for a project and I noticed that sometimes the process that writes to the sdcard hangs indefinitely. This happens an multiple sdcards and multiple devices. The process is a wget | gz | bsdtar. When it hangs, I can log in via ssh and whatever I try to run on the sdcard hangs. I added strace to the image and I get this:

strace dd if=/dev/zero of=/mnt2/test count=1024 bs=1024
execve("/bin/dd", ["dd", "if=/dev/zero", "of=/mnt2/test", "count=1024", "bs=1024"], [/* 20 vars */]) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0x76fd7000
open("/lib/libc.so.0", O_RDONLY)        = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=298280, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0x76fd6000
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\241\0\0004\0\0\0"..., 4096) = 4096
mmap2(NULL, 348160, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f76000
mmap2(0x76f76000, 293688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x76f76000
mmap2(0x76fc5000, 4832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x47000) = 0x76fc5000
mmap2(0x76fc7000, 14184, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76fc7000
close(3)                                = 0
munmap(0x76fd6000, 4096)                = 0
stat("/lib/ld-uClibc.so.0", {st_mode=S_IFREG|0755, st_size=25312, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0x76fd6000
set_tls(0x76fd6480, 0x76fd6b20, 0x76fd6b28, 0x76fd6480, 0x76fd8f74) = 0
mprotect(0x76fc5000, 4096, PROT_READ)   = 0
mprotect(0x76fd8000, 4096, PROT_READ)   = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
getuid32()                              = 0
brk(0)                                  = 0x1da4000
brk(0x1da5000)                          = 0x1da5000
rt_sigaction(SIGUSR1, {0x5d410, [], SA_RESTORER|SA_RESTART, 0x76f80d74}, NULL, 8) = 0
open("/dev/zero", O_RDONLY|O_LARGEFILE) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
open("/mnt2/test", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666

This is the status:

# cat /proc/242/status
Name:   dd
Umask:  0077
State:  D (disk sleep)
Tgid:   242
Ngid:   0
Pid:    242
PPid:   239
TracerPid:      239
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 32
Groups: 0 10 
VmPeak:     1176 kB
VmSize:     1176 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:       184 kB
VmRSS:       184 kB
RssAnon:              32 kB
RssFile:             152 kB
RssShmem:              0 kB
VmData:       44 kB
VmStk:       132 kB
VmExe:       640 kB
VmLib:       320 kB
VmPTE:         8 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        1
SigQ:   8/1546
SigPnd: 0000000000000000
ShdPnd: 0000000000020002
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000200
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Cpus_allowed:   f
Cpus_allowed_list:      0-3
voluntary_ctxt_switches:        59
nonvoluntary_ctxt_switches:     1

I can see no interrupt for mmc0 in /proc/interrupts. It seems someone else reproduced a similar behaviour in my report here raspberrypi/noobs#487. This is all I have from dmesg:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.9.80-rescue-v7 (luca@ubuntu) (gcc version 4.8.4 (Buildroot 2015.02-git-00027-g36c93da) ) #8 SMP PREEMPT Fri Apr 6 11:33:54 CEST 2018
[    0.000000] CPU: ARMv7 Processor [410fd034] revision 4 (ARMv7), cr=10c5383d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt:Machine model: Raspberry Pi 3 Model B Rev 1.2
[    0.000000] cma: Reserved 8 MiB at 0x0c800000
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] On node 0 totalpages: 61440
[    0.000000] free_area_init_node: node 0, pgdat 80821ac0, node_mem_map 8d145500
[    0.000000]   Normal zone: 480 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 61440 pages, LIFO batch:15
[    0.000000] percpu: Embedded 12 pages/cpu @8d109000 s25216 r0 d23936 u49152
[    0.000000] pcpu-alloc: s25216 r0 d23936 u49152 alloc=12*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 60960
[    0.000000] Kernel command line: 8250.nr_uarts=0 bcm2708_fb.fbwidth=1280 bcm2708_fb.fbheight=1024 bcm2708_fb.fbdepth=16 bcm2708_fb.fbswap=1 vc_mem.mem_base=0xfa00000 vc_mem.mem_size=0x10000000  quiet ramdisk_size=32768 root=/dev/ram0 init=/init vt.cur_default=1 elevator=deadline loglevel=2
[    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Memory: 197996K/245760K available (5120K kernel code, 137K rwdata, 880K rodata, 1024K init, 358K bss, 39572K reserved, 8192K cma-reserved)
[    0.000000] Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0x8f800000 - 0xff800000   (1792 MB)
    lowmem  : 0x80000000 - 0x8f000000   ( 240 MB)
      .text : 0x80008000 - 0x80600000   (6112 kB)
      .init : 0x80700000 - 0x80800000   (1024 kB)
      .data : 0x80800000 - 0x80822480   ( 138 kB)
       .bss : 0x80824000 - 0x8087db44   ( 359 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 32.
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] arm_arch_timer: Architected cp15 timer(s) running at 19.20MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[    0.000006] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[    0.000015] Switching to timer-based delay loop, resolution 52ns
[    0.000202] Console: colour dummy device 80x30
[    0.000217] console [tty0] enabled
[    0.000233] Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=192000)
[    0.000246] pid_max: default: 32768 minimum: 301
[    0.000342] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000348] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000950] CPU: Testing write buffer coherency: ok
[    0.001234] CPU0: update cpu_capacity 1024
[    0.001242] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.001267] Setting up static identity map for 0x100000 - 0x100034
[    0.170311] CPU1: update cpu_capacity 1024
[    0.170318] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[    0.240377] CPU2: update cpu_capacity 1024
[    0.240383] CPU2: thread -1, cpu 2, socket 0, mpidr 80000002
[    0.310458] CPU3: update cpu_capacity 1024
[    0.310464] CPU3: thread -1, cpu 3, socket 0, mpidr 80000003
[    0.310536] Brought up 4 CPUs
[    0.310544] SMP: Total of 4 processors activated (153.60 BogoMIPS).
[    0.310548] CPU: All CPU(s) started in HYP mode.
[    0.310551] CPU: Virtualization extensions available.
[    0.311360] devtmpfs: initialized
[    0.320955] VFP support v0.3: implementor 41 architecture 3 part 40 variant 3 rev 4
[    0.321189] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.321204] futex hash table entries: 1024 (order: 4, 65536 bytes)
[    0.321571] pinctrl core: initialized pinctrl subsystem
[    0.321939] NET: Registered protocol family 16
[    0.323857] DMA: preallocated 1024 KiB pool for atomic coherent allocations
[    0.360329] cpuidle: using governor menu
[    0.360762] Serial: AMBA PL011 UART driver
[    0.362229] bcm2835-mbox 3f00b880.mailbox: mailbox enabled
[    0.362637] uart-pl011 3f201000.serial: could not find pctldev for node /soc/gpio@7e200000/uart0_pins, deferring probe
[    0.421175] bcm2835-dma 3f007000.dma: DMA legacy API manager at 8f80d000, dmachans=0x1
[    0.422857] SCSI subsystem initialized
[    0.423075] usbcore: registered new interface driver usbfs
[    0.423123] usbcore: registered new interface driver hub
[    0.423195] usbcore: registered new device driver usb
[    0.430451] raspberrypi-firmware soc:firmware: Attached to firmware from 2018-03-13 18:47
[    0.432005] clocksource: Switched to clocksource arch_sys_counter
[    0.432222] FS-Cache: Loaded
[    0.432392] CacheFiles: Loaded
[    0.442512] NET: Registered protocol family 2
[    0.443109] TCP established hash table entries: 2048 (order: 1, 8192 bytes)
[    0.443143] TCP bind hash table entries: 2048 (order: 2, 16384 bytes)
[    0.443193] TCP: Hash tables configured (established 2048 bind 2048)
[    0.443244] UDP hash table entries: 256 (order: 1, 8192 bytes)
[    0.443269] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[    0.443420] NET: Registered protocol family 1
[    0.443641] Trying to unpack rootfs image as initramfs...
[    0.443855] rootfs image is not initramfs (junk in compressed archive); looks like an initrd
[    0.532994] Freeing initrd memory: 29464K
[    0.534712] workingset: timestamp_bits=30 max_order=16 bucket_order=0
[    0.542894] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.543178] fuse init (API version 7.26)
[    0.547040] io scheduler noop registered
[    0.547046] io scheduler deadline registered (default)
[    0.547228] io scheduler cfq registered
[    0.552253] BCM2708FB: allocated DMA memory cc910000
[    0.552272] BCM2708FB: allocated DMA channel 0 @ 8f80d000
[    0.571739] Console: switching to colour frame buffer device 160x64
[    0.584248] vc-mem: phys_addr:0x00000000 mem_base=0x0fa00000 mem_size:0x10000000(256 MiB)
[    0.584625] gpiomem-bcm2835 3f200000.gpiomem: Initialised: Registers at 0x3f200000
[    0.595931] brd: module loaded
[    0.603173] loop: module loaded
[    0.603653] libphy: Fixed MDIO Bus: probed
[    0.603860] usbcore: registered new interface driver brcmfmac
[    0.603913] usbcore: registered new interface driver r8152
[    0.603961] usbcore: registered new interface driver lan78xx
[    0.604008] usbcore: registered new interface driver smsc95xx
[    0.604021] dwc_otg: version 3.00a 10-AUG-2012 (platform bus)
[    0.831948] Core Release: 2.80a
[    0.831956] Setting default values for core params
[    0.831984] Finished setting default values for core params
[    1.032304] Using Buffer DMA mode
[    1.032309] Periodic Transfer Interrupt Enhancement - disabled
[    1.032313] Multiprocessor Interrupt Enhancement - disabled
[    1.032318] OTG VER PARAM: 0, OTG VER FLAG: 0
[    1.032325] Dedicated Tx FIFOs mode
[    1.032606] WARN::dwc_otg_hcd_init:1032: FIQ DMA bounce buffers: virt = 0x8c904000 dma = 0xcc904000 len=9024
[    1.032630] FIQ FSM acceleration enabled for :
Non-periodic Split Transactions
Periodic Split Transactions
High-Speed Isochronous Endpoints
Interrupt/Control Split Transaction hack enabled
[    1.032636] dwc_otg: Microframe scheduler enabled
[    1.032677] WARN::hcd_init_fiq:459: FIQ on core 1 at 0x803f6eb0
[    1.032686] WARN::hcd_init_fiq:460: FIQ ASM at 0x803f71e8 length 36
[    1.032695] WARN::hcd_init_fiq:486: MPHI regs_base at 0x8fb25000
[    1.032736] dwc_otg 3f980000.usb: DWC OTG Controller
[    1.032765] dwc_otg 3f980000.usb: new USB bus registered, assigned bus number 1
[    1.032793] dwc_otg 3f980000.usb: irq 62, io mem 0x00000000
[    1.032835] Init: Port Power? op_state=1
[    1.032839] Init: Power Port (0)
[    1.032997] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    1.033006] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.033014] usb usb1: Product: DWC OTG Controller
[    1.033021] usb usb1: Manufacturer: Linux 4.9.80-rescue-v7 dwc_otg_hcd
[    1.033028] usb usb1: SerialNumber: 3f980000.usb
[    1.033674] hub 1-0:1.0: USB hub found
[    1.033711] hub 1-0:1.0: 1 port detected
[    1.034215] dwc_otg: FIQ enabled
[    1.034219] dwc_otg: NAK holdoff enabled
[    1.034223] dwc_otg: FIQ split-transaction FSM enabled
[    1.034233] Module dwc_common_port init
[    1.034411] usbcore: registered new interface driver usb-storage
[    1.034758] mousedev: PS/2 mouse device common for all mice
[    1.035655] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
[    1.035898] bcm2835-cpufreq: min=600000 max=1200000
[    1.037637] sdhci: Secure Digital Host Controller Interface driver
[    1.037639] sdhci: Copyright(c) Pierre Ossman
[    1.037766] sdhost-bcm2835 3f202000.sdhost: could not get clk, deferring probe
[    1.039737] mmc-bcm2835 3f300000.mmc: could not get clk, deferring probe
[    1.039782] sdhci-pltfm: SDHCI platform and OF driver helper
[    1.040833] ledtrig-cpu: registered to indicate activity on CPUs
[    1.040925] hidraw: raw HID events driver (C) Jiri Kosina
[    1.041334] usbcore: registered new interface driver usbhid
[    1.041336] usbhid: USB HID core driver
[    1.041714] vchiq: vchiq_init_state: slot_zero = 0x8c980000, is_master = 0
[    1.042770] NET: Registered protocol family 17
[    1.042941] Registering SWP/SWPB emulation handler
[    1.045204] uart-pl011 3f201000.serial: cts_event_workaround enabled
[    1.045233] 3f201000.serial: ttyAMA0 at MMIO 0x3f201000 (irq = 87, base_baud = 0) is a PL011 rev2
[    1.045849] sdhost: log_buf @ 8c907000 (cc907000)
[    1.122009] mmc0: sdhost-bcm2835 loaded - DMA enabled (>1)
[    1.123976] mmc-bcm2835 3f300000.mmc: mmc_debug:0 mmc_debug2:0
[    1.123980] mmc-bcm2835 3f300000.mmc: DMA channel allocated
[    1.189510] mmc0: host does not support reading read-only switch, assuming write-enable
[    1.192056] mmc0: new high speed SDHC card at address 1234
[    1.192290] mmcblk0: mmc0:1234 SA16G 14.4 GiB
[    1.197558]  mmcblk0: p1 p2 < p5 p6 p7 p8 >
[    1.202283] RAMDISK: squashfs filesystem found at block 0
[    1.202292] RAMDISK: Loading 29463KiB [1 disk] into ram disk... /
[    1.227253] mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
[    1.227312] /
[    1.228754] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[    1.228845] |
[    1.230254] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[    1.230297] |
[    1.232955] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
[    1.233040] \
[    1.252141] Indeed it is in host mode hprt0 = 00021501
[    1.262105] -
[    1.288323] random: fast init done
[    1.288334] \
[    1.322369] -
[    1.327776] mmc1: new high speed SDIO card at address 0001
[    1.327777] \
[    1.327874] /
[    1.331488] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.bin failed with error -2
[    1.331499] brcmfmac mmc1:0001:1: Falling back to user helper
[    1.331516] -
[    1.367800] /
[    1.382500] done.
[    1.452081] usb 1-1: new high-speed USB device number 2 using dwc_otg
[    1.452189] Indeed it is in host mode hprt0 = 00001101
[    1.457999] VFS: Mounted root (squashfs filesystem) readonly on device 1:0.
[    1.458349] devtmpfs: mounted
[    1.459193] Freeing unused kernel memory: 1024K
[    1.542998] uart-pl011 3f201000.serial: no DMA platform data
[    1.663789] brcmfmac: Firmware version = wl0: Oct 23 2017 03:55:53 version 7.45.98.38 (r674442 CY) FWID 01-e58d219f
[    1.664415] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2 Data: 7.11.15 Compiler: 1.24.2 ClmImport: 1.24.1 Creation: 2014-05-26 10:53:55 Inc Data: 9.10.39 Inc Compiler: 1.29.4 Inc ClmImport: 1.36.3 Creation: 2017-10-23 03:47:14 
[    1.682213] usb 1-1: New USB device found, idVendor=0424, idProduct=9514
[    1.682222] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    1.682671] hub 1-1:1.0: USB hub found
[    1.682761] hub 1-1:1.0: 5 ports detected
[    2.022021] usb 1-1.1: new high-speed USB device number 3 using dwc_otg
[    2.182139] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[    2.182144] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.184556] smsc95xx v1.0.5
[    2.293635] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-3f980000.usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:8b:a1:04
[    2.402013] usb 1-1.2: new low-speed USB device number 4 using dwc_otg
[    2.579907] usb 1-1.2: New USB device found, idVendor=413c, idProduct=2003
[    2.579912] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.579916] usb 1-1.2: Product: Dell USB Keyboard
[    2.579920] usb 1-1.2: Manufacturer: Dell
[    2.593736] input: Dell Dell USB Keyboard as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.2/1-1.2:1.0/0003:413C:2003.0001/input/input0
[    2.662432] hid-generic 0003:413C:2003.0001: input,hidraw0: USB HID v1.10 Keyboard [Dell Dell USB Keyboard] on usb-3f980000.usb-1.2/input0
[    2.962052] usb 1-1.4: new low-speed USB device number 5 using dwc_otg
[    3.105795] usb 1-1.4: New USB device found, idVendor=1c4f, idProduct=0048
[    3.105803] usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    3.105807] usb 1-1.4: Product: Usb Mouse
[    3.105811] usb 1-1.4: Manufacturer: SIGMACHIP
[    3.105893] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)
[    3.115805] input: SIGMACHIP Usb Mouse as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.0/0003:1C4F:0048.0002/input/input1
[    3.118748] EXT4-fs (mmcblk0p5): re-mounted. Opts: data=ordered
[    3.125885] EXT4-fs (mmcblk0p5): re-mounted. Opts: data=ordered
[    3.182281] hid-generic 0003:1C4F:0048.0002: input,hidraw1: USB HID v1.10 Mouse [SIGMACHIP Usb Mouse] on usb-3f980000.usb-1.4/input0
[    5.233500] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)
[    5.373018] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[    6.863495] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xDDE1
[    6.886543] random: crng init done
[    8.774934] EXT4-fs (mmcblk0p5): re-mounted. Opts: data=ordered
[   11.449557] EXT4-fs (mmcblk0p5): re-mounted. Opts: data=ordered
[   11.574721]  mmcblk0: p1 p2 < p5 p6 p7 p8 >
[   11.718720]  mmcblk0: p1 p2 < p5 p6 p7 p8 >
[   12.806468] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: (null)
[   19.694824] EXT4-fs (mmcblk0p7): recovery complete
[   19.697769] EXT4-fs (mmcblk0p7): mounted filesystem with ordered data mode. Opts: (null)
[   26.801034] EXT4-fs (mmcblk0p7): mounted filesystem with ordered data mode. Opts: (null)

Rebooting fixes the problem and everything is back to normal. I have never reproduced anything similar in raspbian. Any advice on what I could do to debug further? Thanks!

@jens-maus
Copy link

This is very interesting. I am having similar issues here with one of my projects where also on heavy write operations to the SD card (mmc device) operations suddenly stop with the kernel returning "hung process" warnings after 120s. I am also able to perform a remote login to the system and every process not read/writing to the sd card returns immediately. but as soon as I try to read from the sd card the process in question hangs.

Compared to your setup I am using the latest 4.14.33 kernel. After some investigation I could slightly improve the probability that the sd card hang occurs by adding sdhci_bcm2708.enable_llm=0 to the kernel command-line disabling the low-latency mode of the sdhci. However, as said, this only seem to reduce the probability that this hang occurs as I can still see it happing from time to time.

I will try to grab some more debug information on that matter and post it here accordingly. But if a raspberrypi developer could already give some hints in how or where to investigate that issue, this would be great.

@stschake
Copy link
Contributor

Could you post your kernel configs? I've had this happen too, with the exact same symptoms (everything that needs sdcard IO just hangs), sometimes without the kernel hung task detector complaining. I can't imagine it happens under Raspbian, so maybe there is some common variable in our configs that might be causing it.

@jens-maus
Copy link

That's the kernel defconfig file my project is using and which exhibits the issue (especially if sdhci_bcm2708.enable_llm=0 is omitted):

https://github.com/jens-maus/RaspberryMatic/blob/master/buildroot-external/board/raspberrypi3/kernel_defconfig

@carlonluca
Copy link
Author

@stschake in fact I was never able to reproduce in Raspbian.

@jens-maus
Copy link

jens-maus commented Apr 17, 2018

As you are using the NOOBS provided kernel which actually is also created by buildroot like my environment/kernel I also suspect that this might be somewhat related to some kernel option or toolchain build related things causing that raspberrypi kernels compiled within buildroot might exhibit the problem. Perhaps indeed comparing kernel defconfigs between Raspbian and buildroot/NOOBS and also perhaps toolchain related differences might help in trying to pin down the potential issue. However, would also be great if some raspberrypi developer could comment on that issue and give some hints in how to debug this situation.

@pelwell
Copy link
Contributor

pelwell commented Apr 17, 2018

@jens-maus Which kernel are you building? sdhci_bcm2708 hasn't existed for very long time now (the 3.9 kernel, from what I can tell).

The sdhost driver interacts with the firmware to manage the clock divisor, but only when the MMC framework is expecting the clock to change, i.e. when the card is being probed and brought up to speed. Apart from that the only dependency should be on the DMA driver/hardware and on the SD hardware itself.

You may learn something by enabling the driver's logging feature which will record activity the kernel message log. Add dtparam=sd_debug=on to config.txt and reboot. You can also eliminate a DMA problem as being the cause (at the cost of some performance) by adding dtparam=sd_force_pio=on.

@monetny
Copy link

monetny commented Apr 17, 2018

I change root to f2fs for fix this problem (not fix, but minimaze)

@JamesH65
Copy link
Contributor

@jens-maus Did you have a chance to try out @pelwell suggestions above?

@JamesH65 JamesH65 added the Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator. label May 11, 2018
@JamesH65
Copy link
Contributor

This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.

@JamesH65 JamesH65 added the Close within 30 days Issue will be closed within 30 days unless requested to stay open label Jun 29, 2018
@carlonluca
Copy link
Author

I tested with the dtparam=sd_debug=on and I was able to reproduce the issue a couple of times. No error appeared in dmesg. Logs indicated that data was being written and then gradually stopped. The rest of the behavior is as described in the first post. I'm now using the watchdog to handle this issue, I couldn't find another way.

@JamesH65 JamesH65 removed the Close within 30 days Issue will be closed within 30 days unless requested to stay open label Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.
Projects
None yet
Development

No branches or pull requests

6 participants