-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RTS-GPIO not working with imx tty driver #1
Comments
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
…f fs_info::journal_info commit fcc9973 upstream. [BUG] One run of btrfs/063 triggered the following lockdep warning: ============================================ WARNING: possible recursive locking detected 5.6.0-rc7-custom+ #48 Not tainted -------------------------------------------- kworker/u24:0/7 is trying to acquire lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] but task is already holding lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_internal#2); lock(sb_internal#2); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u24:0/7: #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80 #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80 #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs] stack backtrace: CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ #48 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] Call Trace: dump_stack+0xc2/0x11a __lock_acquire.cold+0xce/0x214 lock_acquire+0xe6/0x210 __sb_start_write+0x14e/0x290 start_transaction+0x66c/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] find_free_extent+0x1504/0x1a50 [btrfs] btrfs_reserve_extent+0xd5/0x1f0 [btrfs] btrfs_alloc_tree_block+0x1ac/0x570 [btrfs] btrfs_copy_root+0x213/0x580 [btrfs] create_reloc_root+0x3bd/0x470 [btrfs] btrfs_init_reloc_root+0x2d2/0x310 [btrfs] record_root_in_trans+0x191/0x1d0 [btrfs] btrfs_record_root_in_trans+0x90/0xd0 [btrfs] start_transaction+0x16e/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs] finish_ordered_fn+0x15/0x20 [btrfs] btrfs_work_helper+0x116/0x9a0 [btrfs] process_one_work+0x632/0xb80 worker_thread+0x80/0x690 kthread+0x1a3/0x1f0 ret_from_fork+0x27/0x50 It's pretty hard to reproduce, only one hit so far. [CAUSE] This is because we're calling btrfs_join_transaction() without re-using the current running one: btrfs_finish_ordered_io() |- btrfs_join_transaction() <<< Call #1 |- btrfs_record_root_in_trans() |- btrfs_reserve_extent() |- btrfs_join_transaction() <<< Call #2 Normally such btrfs_join_transaction() call should re-use the existing one, without trying to re-start a transaction. But the problem is, in btrfs_join_transaction() call #1, we call btrfs_record_root_in_trans() before initializing current::journal_info. And in btrfs_join_transaction() call #2, we're relying on current::journal_info to avoid such deadlock. [FIX] Call btrfs_record_root_in_trans() after we have initialized current::journal_info. CC: [email protected] # 4.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 0fb0094 upstream. FDs can only be used on the ufile that created them, they cannot be mixed to other ufiles. We are lacking a check to prevent it. BUG: KASAN: null-ptr-deref in atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline] BUG: KASAN: null-ptr-deref in atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline] BUG: KASAN: null-ptr-deref in fput_many+0x1a/0x140 fs/file_table.c:336 Write of size 8 at addr 0000000000000038 by task syz-executor179/284 CPU: 0 PID: 284 Comm: syz-executor179 Not tainted 5.5.0-rc5+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x94/0xce lib/dump_stack.c:118 __kasan_report+0x18f/0x1b7 mm/kasan/report.c:510 kasan_report+0xe/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x15d/0x1b0 mm/kasan/generic.c:192 atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline] atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline] fput_many+0x1a/0x140 fs/file_table.c:336 rdma_lookup_put_uobject+0x85/0x130 drivers/infiniband/core/rdma_core.c:692 uobj_put_read include/rdma/uverbs_std_types.h:96 [inline] _ib_uverbs_lookup_comp_file drivers/infiniband/core/uverbs_cmd.c:198 [inline] create_cq+0x375/0xba0 drivers/infiniband/core/uverbs_cmd.c:1006 ib_uverbs_create_cq+0x114/0x140 drivers/infiniband/core/uverbs_cmd.c:1089 ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:769 __vfs_write+0x7c/0x100 fs/read_write.c:494 vfs_write+0x168/0x4a0 fs/read_write.c:558 ksys_write+0xc8/0x200 fs/read_write.c:611 do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x44ef99 Code: 00 b8 00 01 00 00 eb e1 e8 74 1c 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c4 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc0b74c028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ffc0b74c030 RCX: 000000000044ef99 RDX: 0000000000000040 RSI: 0000000020000040 RDI: 0000000000000005 RBP: 00007ffc0b74c038 R08: 0000000000401830 R09: 0000000000401830 R10: 00007ffc0b74c038 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000006be018 R15: 0000000000000000 Fixes: cf8966b ("IB/core: Add support for fd objects") Link: https://lore.kernel.org/r/[email protected] Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit e461bc9 ] Sed broke on some strings as it used colon as a separator. I made it more robust by using \001, which is legit POSIX AFAIK. E.g. ./config --set-str CONFIG_USBNET_DEVADDR "de:ad:be:ef:00:01" failed with: sed: -e expression #1, char 55: unknown option to `s' Signed-off-by: Jeremie Francois (on alpha) <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
The threaded interrupt handler may still be called after the usb_gadget_disconnect is called, it causes the structures used at interrupt handler was freed before it uses, eg the usb_request. This issue usually occurs we remove the udc function during the transfer. Below is the example when doing stress test for android switch function, the EP0's request is freed by .unbind (configfs_composite_unbind -> composite_dev_cleanup), but the threaded handler accesses this request during handling setup packet request. In fact, there is no protection between unbind the udc and udc interrupt handling, so we have to avoid the interrupt handler is occurred or scheduled during the .unbind flow. init: Sending signal 9 to service 'adbd' (pid 18077) process group... android_work: did not send uevent (0 0 000000007bec2039) libprocessgroup: Successfully killed process cgroup uid 0 pid 18077 in 6ms init: Service 'adbd' (pid 18077) received signal 9 init: Sending signal 9 to service 'adbd' (pid 18077) process group... libprocessgroup: Successfully killed process cgroup uid 0 pid 18077 in 0ms init: processing action (init.svc.adbd=stopped) from (/init.usb.configfs.rc:14) init: Received control message 'start' for 'adbd' from pid: 399 (/vendor/bin/hw/android.hardware.usb@1. init: starting service 'adbd'... read descriptors read strings Unable to handle kernel read from unreadable memory at virtual address 000000000000002a android_work: sent uevent USB_STATE=CONNECTED Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e97f1000 using random self ethernet address [000000000000002a] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 232 Comm: irq/68-5b110000 Not tainted 5.4.24-06075-g94a6b52b5815 #92 Hardware name: Freescale i.MX8QXP MEK (DT) pstate: 00400085 (nzcv daIf +PAN -UAO) using random host ethernet address pc : composite_setup+0x5c/0x1730 lr : android_setup+0xc0/0x148 sp : ffff80001349bba0 x29: ffff80001349bba0 x28: ffff00083a50da00 x27: ffff8000124e6000 x26: ffff800010177950 x25: 0000000000000040 x24: ffff000834e18010 x23: 0000000000000000 x22: 0000000000000000 x21: ffff00083a50da00 x20: ffff00082e75ec40 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 x11: ffff80001180fb58 x10: 0000000000000040 x9 : ffff8000120fc980 x8 : 0000000000000000 x7 : ffff00083f98df50 x6 : 0000000000000100 x5 : 00000307e8978431 x4 : ffff800011386788 x3 : 0000000000000000 x2 : ffff800012342000 x1 : 0000000000000000 x0 : ffff800010c6d3a0 Call trace: composite_setup+0x5c/0x1730 android_setup+0xc0/0x148 cdns3_ep0_delegate_req+0x64/0x90 cdns3_check_ep0_interrupt_proceed+0x384/0x738 cdns3_device_thread_irq_handler+0x124/0x6e0 cdns3_thread_irq+0x94/0xa0 irq_thread_fn+0x30/0xa0 irq_thread+0x150/0x248 kthread+0xfc/0x128 ret_from_fork+0x10/0x18 Code: 910e8000 f9400693 12001ed7 79400f79 (3940aa61) ---[ end trace c685db37f8773fba ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,20002008 Memory Limit: none Rebooting in 5 seconds.. Reviewed-by: Jun Li <[email protected]> Signed-off-by: Peter Chen <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
And there are no multiple TRBs on EP0 and WA1 workaround, so it doesn't need to change TRB for EP0. It fixes below oops. configfs-gadget gadget: high-speed config #1: b android_work: sent uevent USB_STATE=CONFIGURED Unable to handle kernel read from unreadable memory at virtual address 0000000000000008 Mem abort info: android_work: sent uevent USB_STATE=DISCONNECTED ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b5bb7000 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 430 Comm: HwBinder:401_1 Not tainted 5.4.24-06071-g6fa8921409c1-dirty #77 Hardware name: Freescale i.MX8QXP MEK (DT) pstate: 60400085 (nZCv daIf +PAN -UAO) pc : cdns3_gadget_ep_dequeue+0x1d4/0x270 lr : cdns3_gadget_ep_dequeue+0x48/0x270 sp : ffff800012763ba0 x29: ffff800012763ba0 x28: ffff00082c653c00 x27: 0000000000000000 x26: ffff000068fa7b00 x25: ffff0000699b2000 x24: ffff00082c6ac000 x23: ffff000834f0a480 x22: ffff000834e87b9c x21: 0000000000000000 x20: ffff000834e87800 x19: ffff000069eddc00 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 x11: ffff80001180fbe8 x10: 0000000000000001 x9 : ffff800012101558 x8 : 0000000000000001 x7 : 0000000000000006 x6 : ffff000835d9c668 x5 : ffff000834f0a4c8 x4 : 0000000096000000 x3 : 0000000000001810 x2 : 0000000000000000 x1 : ffff800024bd001c x0 : 0000000000000001 Call trace: cdns3_gadget_ep_dequeue+0x1d4/0x270 usb_ep_dequeue+0x34/0xf8 composite_dev_cleanup+0x154/0x170 configfs_composite_unbind+0x6c/0xa8 usb_gadget_remove_driver+0x44/0x70 usb_gadget_unregister_driver+0x74/0xe0 unregister_gadget+0x28/0x58 gadget_dev_desc_UDC_store+0x80/0x110 configfs_write_file+0x1e0/0x2a0 __vfs_write+0x48/0x90 vfs_write+0xe4/0x1c8 ksys_write+0x78/0x100 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x74/0x168 el0_svc_handler+0x34/0xa0 el0_svc+0x8/0xc Code: 52830203 b9407660 f94042e4 11000400 (b9400841) ---[ end trace 1574516e4c1772ca ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,20002008 Memory Limit: none Rebooting in 5 seconds.. Fixes: f616c3b ("usb: cdns3: Fix dequeue implementation") Cc: stable <[email protected]> Reviewed-by: Jun Li <[email protected]> Signed-off-by: Peter Chen <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 3740d93 upstream. Commit 64e90a8 ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()") added the optiont to disable all call_usermodehelper() calls by setting STATIC_USERMODEHELPER_PATH to an empty string. When this is done, and crashdump is triggered, it will crash on null pointer dereference, since we make assumptions over what call_usermodehelper_exec() did. This has been reported by Sergey when one triggers a a coredump with the following configuration: ``` CONFIG_STATIC_USERMODEHELPER=y CONFIG_STATIC_USERMODEHELPER_PATH="" kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e ``` The way disabling the umh was designed was that call_usermodehelper_exec() would just return early, without an error. But coredump assumes certain variables are set up for us when this happens, and calls ile_start_write(cprm.file) with a NULL file. [ 2.819676] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 2.819859] #PF: supervisor read access in kernel mode [ 2.820035] #PF: error_code(0x0000) - not-present page [ 2.820188] PGD 0 P4D 0 [ 2.820305] Oops: 0000 [#1] SMP PTI [ 2.820436] CPU: 2 PID: 89 Comm: a Not tainted 5.7.0-rc1+ #7 [ 2.820680] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 [ 2.821150] RIP: 0010:do_coredump+0xd80/0x1060 [ 2.821385] Code: e8 95 11 ed ff 48 c7 c6 cc a7 b4 81 48 8d bd 28 ff ff ff 89 c2 e8 70 f1 ff ff 41 89 c2 85 c0 0f 84 72 f7 ff ff e9 b4 fe ff ff <48> 8b 57 20 0f b7 02 66 25 00 f0 66 3d 00 8 0 0f 84 9c 01 00 00 44 [ 2.822014] RSP: 0000:ffffc9000029bcb8 EFLAGS: 00010246 [ 2.822339] RAX: 0000000000000000 RBX: ffff88803f860000 RCX: 000000000000000a [ 2.822746] RDX: 0000000000000009 RSI: 0000000000000282 RDI: 0000000000000000 [ 2.823141] RBP: ffffc9000029bde8 R08: 0000000000000000 R09: ffffc9000029bc00 [ 2.823508] R10: 0000000000000001 R11: ffff88803dec90be R12: ffffffff81c39da0 [ 2.823902] R13: ffff88803de84400 R14: 0000000000000000 R15: 0000000000000000 [ 2.824285] FS: 00007fee08183540(0000) GS:ffff88803e480000(0000) knlGS:0000000000000000 [ 2.824767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.825111] CR2: 0000000000000020 CR3: 000000003f856005 CR4: 0000000000060ea0 [ 2.825479] Call Trace: [ 2.825790] get_signal+0x11e/0x720 [ 2.826087] do_signal+0x1d/0x670 [ 2.826361] ? force_sig_info_to_task+0xc1/0xf0 [ 2.826691] ? force_sig_fault+0x3c/0x40 [ 2.826996] ? do_trap+0xc9/0x100 [ 2.827179] exit_to_usermode_loop+0x49/0x90 [ 2.827359] prepare_exit_to_usermode+0x77/0xb0 [ 2.827559] ? invalid_op+0xa/0x30 [ 2.827747] ret_from_intr+0x20/0x20 [ 2.827921] RIP: 0033:0x55e2c76d2129 [ 2.828107] Code: 2d ff ff ff e8 68 ff ff ff 5d c6 05 18 2f 00 00 01 c3 0f 1f 80 00 00 00 00 c3 0f 1f 80 00 00 00 00 e9 7b ff ff ff 55 48 89 e5 <0f> 0b b8 00 00 00 00 5d c3 66 2e 0f 1f 84 0 0 00 00 00 00 0f 1f 40 [ 2.828603] RSP: 002b:00007fffeba5e080 EFLAGS: 00010246 [ 2.828801] RAX: 000055e2c76d2125 RBX: 0000000000000000 RCX: 00007fee0817c718 [ 2.829034] RDX: 00007fffeba5e188 RSI: 00007fffeba5e178 RDI: 0000000000000001 [ 2.829257] RBP: 00007fffeba5e080 R08: 0000000000000000 R09: 00007fee08193c00 [ 2.829482] R10: 0000000000000009 R11: 0000000000000000 R12: 000055e2c76d2040 [ 2.829727] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2.829964] CR2: 0000000000000020 [ 2.830149] ---[ end trace ceed83d8c68a1bf1 ]--- ``` Cc: <[email protected]> # v4.11+ Fixes: 64e90a8 ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199795 Reported-by: Tony Vroon <[email protected]> Reported-by: Sergey Kvachonok <[email protected]> Tested-by: Sergei Trofimovich <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
During the unind() procedure, drm_encoder_cleanup() will detach the downstream DSI bridge if exists. And the DSI bridge's detach() will detach itself from the DSI host. So DSI host should be unregistered later than DSI device detach. Otherwise, below kernel panic happens: [ 2.437740] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000007 [ 2.446534] Mem abort info: [ 2.449327] ESR = 0x96000004 [ 2.452401] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.457717] SET = 0, FnV = 0 [ 2.460777] EA = 0, S1PTW = 0 [ 2.463921] Data abort info: [ 2.466808] ISV = 0, ISS = 0x00000004 [ 2.470649] CM = 0, WnR = 0 [ 2.473617] [0000000000000007] user address but active_mm is swapper [ 2.479978] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2.485550] Modules linked in: [ 2.488609] CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 5.4.24-04902-g14319eb2bae3 #1683 [ 2.496871] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 2.502188] Workqueue: events deferred_probe_work_func [ 2.507328] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 2.512121] pc : mipi_dsi_detach+0x10/0x38 [ 2.516219] lr : adv7533_detach_dsi+0x18/0x30 [ 2.520574] sp : ffff80001242b6e0 [ 2.523885] x29: ffff80001242b6e0 x28: ffff8000119d6000 [ 2.529196] x27: 0000000000000000 x26: 00000000fffffdfb [ 2.534507] x25: 0000000000000001 x24: ffff8000115159f0 [ 2.539819] x23: ffff000176652120 x22: ffff800011b94000 [ 2.545132] x21: ffff000177882000 x20: ffff0001778766e8 [ 2.550444] x19: ffff000177876880 x18: 0000000000000000 [ 2.555756] x17: ffff800011e0d000 x16: 0000000000000000 [ 2.561068] x15: 0000000000000004 x14: ffffffffffffffff [ 2.566381] x13: 0000000000000000 x12: ffff0001745765c8 [ 2.571694] x11: ffff000174576480 x10: 0000000000000040 [ 2.577006] x9 : ffff000176020e98 x8 : ffff000176020e90 [ 2.582318] x7 : 0000000000000001 x6 : 0000000000000001 [ 2.587630] x5 : 0000000000000000 x4 : 0000000000000000 [ 2.592943] x3 : ffff800011a398b0 x2 : ffffffffffffffff [ 2.598255] x1 : ffff0001778d5c00 x0 : ffff0001778d6400 [ 2.603569] Call trace: [ 2.606018] mipi_dsi_detach+0x10/0x38 [ 2.609769] adv7511_bridge_detach+0x6c/0x80 [ 2.614041] drm_bridge_detach+0x2c/0x50 [ 2.617964] drm_encoder_cleanup+0x2c/0xa0 [ 2.622063] imx_sec_dsim_unbind+0x50/0x68 [ 2.626159] component_unbind.isra.0+0x2c/0x50 [ 2.630601] component_bind_all+0x1e0/0x228 [ 2.634784] imx_drm_bind+0xb8/0x150 [ 2.638357] try_to_bring_up_master+0x164/0x1c0 [ 2.642887] __component_add+0xa0/0x168 [ 2.646721] component_add+0x10/0x18 [ 2.650297] lcdifv3_crtc_probe+0x4c/0x78 [ 2.654309] platform_drv_probe+0x50/0xa0 [ 2.658317] really_probe+0xd4/0x318 [ 2.661891] driver_probe_device+0x54/0xe8 [ 2.665991] __device_attach_driver+0x80/0xb8 [ 2.670348] bus_for_each_drv+0x74/0xc0 [ 2.674183] __device_attach+0xdc/0x138 [ 2.678020] device_initial_probe+0x10/0x18 [ 2.682204] bus_probe_device+0x90/0x98 [ 2.686042] device_add+0x378/0x648 [ 2.689531] platform_device_add+0xfc/0x228 [ 2.693718] imx_lcdifv3_probe+0x2b0/0x388 [ 2.697814] platform_drv_probe+0x50/0xa0 [ 2.701825] really_probe+0xd4/0x318 [ 2.705403] driver_probe_device+0x54/0xe8 [ 2.709502] __device_attach_driver+0x80/0xb8 [ 2.713859] bus_for_each_drv+0x74/0xc0 [ 2.717696] __device_attach+0xdc/0x138 [ 2.721535] device_initial_probe+0x10/0x18 [ 2.725721] bus_probe_device+0x90/0x98 [ 2.729555] deferred_probe_work_func+0x64/0x98 [ 2.734089] process_one_work+0x198/0x320 [ 2.738100] worker_thread+0x1f0/0x420 [ 2.741850] kthread+0xf0/0x120 [ 2.744993] ret_from_fork+0x10/0x18 [ 2.748573] Code: aa0003e1 f9400000 f9400402 b4000102 (f9400442) [ 2.754667] ---[ end trace 756e3cdcc6c5557e ]--- Signed-off-by: Fancy Fang <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 90b5feb ] A userspace process holding a file descriptor to a virtio_blk device can still invoke block_device_operations after hot unplug. This leads to a use-after-free accessing vblk->vdev in virtblk_getgeo() when ioctl(HDIO_GETGEO) is invoked: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] PGD 800000003a92f067 PUD 3a930067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000 RIP: 0010:[<ffffffffc00e5450>] [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246 RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40 RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680 R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000 FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk] [<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0 [<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20 [<ffffffff8d488771>] block_ioctl+0x41/0x50 [<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0 [<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0 A related problem is that virtblk_remove() leaks the vd_index_ida index when something still holds a reference to vblk->disk during hot unplug. This causes virtio-blk device names to be lost (vda, vdb, etc). Fix these issues by protecting vblk->vdev with a mutex and reference counting vblk so the vd_index_ida index can be removed in all cases. Fixes: 48e4043 ("virtio: add virtio disk geometry feature") Reported-by: Lance Digby <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 1449cb2 upstream. While removing the host (e.g. for USB role switch from host to device), if runtime pm is enabled by user, below oops occurs on dwc3 and cdns3 platforms. Keeping the xhci-plat device active during host removal, and disabling runtime pm before calling pm_runtime_set_suspended() fixes them. oops1: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000240 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.4.3-00107-g64d454a-dirty Hardware name: FSL i.MX8MP EVK (DT) Workqueue: pm pm_runtime_work pstate: 60000005 (nZCv daif -PAN -UAO) pc : xhci_suspend+0x34/0x698 lr : xhci_plat_runtime_suspend+0x2c/0x38 sp : ffff800011ddbbc0 Call trace: xhci_suspend+0x34/0x698 xhci_plat_runtime_suspend+0x2c/0x38 pm_generic_runtime_suspend+0x28/0x40 __rpm_callback+0xd8/0x138 rpm_callback+0x24/0x98 rpm_suspend+0xe0/0x448 rpm_idle+0x124/0x140 pm_runtime_work+0xa0/0xf8 process_one_work+0x1dc/0x370 worker_thread+0x48/0x468 kthread+0xf0/0x120 ret_from_fork+0x10/0x1c oops2: usb 2-1: USB disconnect, device number 2 xhci-hcd xhci-hcd.1.auto: remove, state 4 usb usb2: USB disconnect, device number 1 xhci-hcd xhci-hcd.1.auto: USB bus 2 deregistered xhci-hcd xhci-hcd.1.auto: remove, state 4 usb usb1: USB disconnect, device number 1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000138 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.6.0-rc4-next-20200304-03578 Hardware name: Freescale i.MX8QXP MEK (DT) Workqueue: 1-0050 tcpm_state_machine_work pstate: 20000005 (nzCv daif -PAN -UAO) pc : xhci_free_dev+0x214/0x270 lr : xhci_plat_runtime_resume+0x78/0x88 sp : ffff80001006b5b0 Call trace: xhci_free_dev+0x214/0x270 xhci_plat_runtime_resume+0x78/0x88 pm_generic_runtime_resume+0x30/0x48 __rpm_callback+0x90/0x148 rpm_callback+0x28/0x88 rpm_resume+0x568/0x758 rpm_resume+0x260/0x758 rpm_resume+0x260/0x758 __pm_runtime_resume+0x40/0x88 device_release_driver_internal+0xa0/0x1c8 device_release_driver+0x1c/0x28 bus_remove_device+0xd4/0x158 device_del+0x15c/0x3a0 usb_disable_device+0xb0/0x268 usb_disconnect+0xcc/0x300 usb_remove_hcd+0xf4/0x1dc xhci_plat_remove+0x78/0xe0 platform_drv_remove+0x30/0x50 device_release_driver_internal+0xfc/0x1c8 device_release_driver+0x1c/0x28 bus_remove_device+0xd4/0x158 device_del+0x15c/0x3a0 platform_device_del.part.0+0x20/0x90 platform_device_unregister+0x28/0x40 cdns3_host_exit+0x20/0x40 cdns3_role_stop+0x60/0x90 cdns3_role_set+0x64/0xd8 usb_role_switch_set_role.part.0+0x3c/0x68 usb_role_switch_set_role+0x20/0x30 tcpm_mux_set+0x60/0xf8 tcpm_reset_port+0xa4/0xf0 tcpm_detach.part.0+0x28/0x50 tcpm_state_machine_work+0x12ac/0x2360 process_one_work+0x1c8/0x470 worker_thread+0x50/0x428 kthread+0xfc/0x128 ret_from_fork+0x10/0x18 Code: c8037c02 35ffffa3 17ffe7c3 f9800011 (c85f7c01) ---[ end trace 45b1a173d2679e44 ]--- [minor commit message cleanup -Mathias] Cc: Baolin Wang <[email protected]> Cc: <[email protected]> Fixes: b0c69b4 ("usb: host: plat: Enable xHCI plat runtime PM") Reviewed-by: Peter Chen <[email protected]> Tested-by: Peter Chen <[email protected]> Signed-off-by: Li Jun <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 95cd7dc upstream. And there are no multiple TRBs on EP0 and WA1 workaround, so it doesn't need to change TRB for EP0. It fixes below oops. configfs-gadget gadget: high-speed config #1: b android_work: sent uevent USB_STATE=CONFIGURED Unable to handle kernel read from unreadable memory at virtual address 0000000000000008 Mem abort info: android_work: sent uevent USB_STATE=DISCONNECTED ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b5bb7000 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 430 Comm: HwBinder:401_1 Not tainted 5.4.24-06071-g6fa8921409c1-dirty #77 Hardware name: Freescale i.MX8QXP MEK (DT) pstate: 60400085 (nZCv daIf +PAN -UAO) pc : cdns3_gadget_ep_dequeue+0x1d4/0x270 lr : cdns3_gadget_ep_dequeue+0x48/0x270 sp : ffff800012763ba0 x29: ffff800012763ba0 x28: ffff00082c653c00 x27: 0000000000000000 x26: ffff000068fa7b00 x25: ffff0000699b2000 x24: ffff00082c6ac000 x23: ffff000834f0a480 x22: ffff000834e87b9c x21: 0000000000000000 x20: ffff000834e87800 x19: ffff000069eddc00 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 x11: ffff80001180fbe8 x10: 0000000000000001 x9 : ffff800012101558 x8 : 0000000000000001 x7 : 0000000000000006 x6 : ffff000835d9c668 x5 : ffff000834f0a4c8 x4 : 0000000096000000 x3 : 0000000000001810 x2 : 0000000000000000 x1 : ffff800024bd001c x0 : 0000000000000001 Call trace: cdns3_gadget_ep_dequeue+0x1d4/0x270 usb_ep_dequeue+0x34/0xf8 composite_dev_cleanup+0x154/0x170 configfs_composite_unbind+0x6c/0xa8 usb_gadget_remove_driver+0x44/0x70 usb_gadget_unregister_driver+0x74/0xe0 unregister_gadget+0x28/0x58 gadget_dev_desc_UDC_store+0x80/0x110 configfs_write_file+0x1e0/0x2a0 __vfs_write+0x48/0x90 vfs_write+0xe4/0x1c8 ksys_write+0x78/0x100 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x74/0x168 el0_svc_handler+0x34/0xa0 el0_svc+0x8/0xc Code: 52830203 b9407660 f94042e4 11000400 (b9400841) ---[ end trace 1574516e4c1772ca ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,20002008 Memory Limit: none Rebooting in 5 seconds.. Fixes: f616c3b ("usb: cdns3: Fix dequeue implementation") Cc: stable <[email protected]> Signed-off-by: Peter Chen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 1575358 upstream. FuzzUSB (a variant of syzkaller) found an illegal array access using an incorrect index while binding a gadget with UDC. Reference: https://www.spinics.net/lists/linux-usb/msg194331.html This bug occurs when a size variable used for a buffer is misused to access its strcpy-ed buffer. Given a buffer along with its size variable (taken from user input), from which, a new buffer is created using kstrdup(). Due to the original buffer containing 0 value in the middle, the size of the kstrdup-ed buffer becomes smaller than that of the original. So accessing the kstrdup-ed buffer with the same size variable triggers memory access violation. The fix makes sure no zero value in the buffer, by comparing the strlen() of the orignal buffer with the size variable, so that the access to the kstrdup-ed buffer is safe. BUG: KASAN: slab-out-of-bounds in gadget_dev_desc_UDC_store+0x1ba/0x200 drivers/usb/gadget/configfs.c:266 Read of size 1 at addr ffff88806a55dd7e by task syz-executor.0/17208 CPU: 2 PID: 17208 Comm: syz-executor.0 Not tainted 5.6.8 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xce/0x128 lib/dump_stack.c:118 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:641 __asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132 gadget_dev_desc_UDC_store+0x1ba/0x200 drivers/usb/gadget/configfs.c:266 flush_write_buffer fs/configfs/file.c:251 [inline] configfs_write_file+0x2f1/0x4c0 fs/configfs/file.c:283 __vfs_write+0x85/0x110 fs/read_write.c:494 vfs_write+0x1cd/0x510 fs/read_write.c:558 ksys_write+0x18a/0x220 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:620 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Kyungtae Kim <[email protected]> Reported-and-tested-by: Kyungtae Kim <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/20200510054326.GA19198@pizza01 Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
…sg list commit 3c6f8cb upstream. On platforms with IOMMU enabled, multiple SGs can be coalesced into one by the IOMMU driver. In that case the SG list processing as part of the completion of a urb on a bulk endpoint can result into a NULL pointer dereference with the below stack dump. <6> Unable to handle kernel NULL pointer dereference at virtual address 0000000c <6> pgd = c0004000 <6> [0000000c] *pgd=00000000 <6> Internal error: Oops: 5 [#1] PREEMPT SMP ARM <2> PC is at xhci_queue_bulk_tx+0x454/0x80c <2> LR is at xhci_queue_bulk_tx+0x44c/0x80c <2> pc : [<c08907c4>] lr : [<c08907bc>] psr: 000000d3 <2> sp : ca337c80 ip : 00000000 fp : ffffffff <2> r10: 00000000 r9 : 50037000 r8 : 00004000 <2> r7 : 00000000 r6 : 00004000 r5 : 00000000 r4 : 00000000 <2> r3 : 00000000 r2 : 00000082 r1 : c2c1a200 r0 : 00000000 <2> Flags: nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none <2> Control: 10c0383d Table: b412c06a DAC: 00000051 <6> Process usb-storage (pid: 5961, stack limit = 0xca336210) <snip> <2> [<c08907c4>] (xhci_queue_bulk_tx) <2> [<c0881b3c>] (xhci_urb_enqueue) <2> [<c0831068>] (usb_hcd_submit_urb) <2> [<c08350b4>] (usb_sg_wait) <2> [<c089f384>] (usb_stor_bulk_transfer_sglist) <2> [<c089f2c0>] (usb_stor_bulk_srb) <2> [<c089fe38>] (usb_stor_Bulk_transport) <2> [<c089f468>] (usb_stor_invoke_transport) <2> [<c08a11b4>] (usb_stor_control_thread) <2> [<c014a534>] (kthread) The above NULL pointer dereference is the result of block_len and the sent_len set to zero after the first SG of the list when IOMMU driver is enabled. Because of this the loop of processing the SGs has run more than num_sgs which resulted in a sg_next on the last SG of the list which has SG_END set. Fix this by check for the sg before any attributes of the sg are accessed. [modified reason for null pointer dereference in commit message subject -Mathias] Fixes: f9c589e ("xhci: TD-fragment, align the unsplittable case with a bounce buffer") Cc: [email protected] Signed-off-by: Sriharsha Allenki <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
initilize the m2m->complete in open() to avoid the NULL pointer in suspend because the suspend can be called before initilizing m2m->complete in convert [ 591.006691] Filesystems sync: 0.084 seconds [ 591.012292] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 591.020768] OOM killer disabled. [ 591.024016] Freezing remaining freezable tasks ... (elapsed 0.105 seconds) done. [ 591.809374] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 591.818160] Mem abort info: [ 591.820949] ESR = 0x96000004 [ 591.824000] EC = 0x25: DABT (current EL), IL = 32 bits [ 591.829307] SET = 0, FnV = 0 [ 591.832356] EA = 0, S1PTW = 0 [ 591.835493] Data abort info: [ 591.838369] ISV = 0, ISS = 0x00000004 [ 591.842200] CM = 0, WnR = 0 [ 591.845165] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001b033e000 [ 591.851600] [0000000000000000] pgd=0000000000000000 [ 591.856476] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 591.862043] Modules linked in: [ 591.865097] CPU: 2 PID: 867 Comm: rtcwakeup.out Not tainted 5.4.24-04892-g8b7eb66 #911 [ 591.873009] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 591.878317] pstate: a0000085 (NzCv daIf -PAN -UAO) [ 591.883110] pc : __wake_up_common+0x58/0x170 [ 591.887377] lr : __wake_up_locked+0x18/0x20 [ 591.891555] sp : ffff8000120b3a00 [ 591.894866] x29: ffff8000120b3a00 x28: ffff800011b84000 [ 591.900174] x27: 0000000000000000 x26: 0000000000000000 [ 591.905482] x25: 0000000000000003 x24: 0000000000000000 [ 591.910791] x23: 0000000000000001 x22: 0000000000000000 [ 591.916099] x21: ffff000171369420 x20: ffff000171369418 [ 591.921407] x19: ffff000171369410 x18: 0000000000000000 [ 591.926715] x17: 0000000000000000 x16: 0000000000000000 [ 591.932023] x15: 0000000000000000 x14: ffff0001760daa00 [ 591.937331] x13: ffff80016dbdc000 x12: 0000000034d4d91d [ 591.942639] x11: 0000000000000000 x10: 00000000000009c0 [ 591.947948] x9 : ffff000171a30470 x8 : 00000000ffffffff [ 591.953256] x7 : ffff8000120b3a38 x6 : ffffffffffffffe8 [ 591.958564] x5 : 0000000000000000 x4 : 0000000000000000 [ 591.963871] x3 : 0000000000000000 x2 : 0000000000000001 [ 591.969179] x1 : 0000000000000003 x0 : 0000000000000000 [ 591.974487] Call trace: [ 591.976931] __wake_up_common+0x58/0x170 [ 591.980851] __wake_up_locked+0x18/0x20 [ 591.984683] complete+0x48/0x68 [ 591.987823] fsl_easrc_suspend+0x84/0x100 [ 591.991831] dpm_run_callback.isra.0+0x38/0xd8 [ 591.996271] __device_suspend+0xfc/0x3c8 [ 592.000191] dpm_suspend+0xf0/0x1e8 [ 592.003676] dpm_suspend_start+0x98/0xa0 [ 592.007596] suspend_devices_and_enter+0xec/0x5b8 [ 592.012297] pm_suspend+0x2f8/0x340 [ 592.015781] state_store+0x88/0x108 [ 592.019267] kobj_attr_store+0x14/0x28 [ 592.023015] sysfs_kf_write+0x40/0x50 [ 592.026674] kernfs_fop_write+0xf8/0x210 [ 592.030594] __vfs_write+0x18/0x40 [ 592.033992] vfs_write+0xdc/0x1c8 [ 592.037303] ksys_write+0x68/0xf0 [ 592.040615] __arm64_sys_write+0x18/0x20 [ 592.044536] el0_svc_common.constprop.0+0x68/0x160 [ 592.049324] el0_svc_handler+0x20/0x80 [ 592.053070] el0_svc+0x8/0xc [ 592.055950] Code: 54000700 a90153f3 2a0203f7 52800018 (f9400cd3) [ 592.062040] ---[ end trace a36c1a42637c268e ]--- Signed-off-by: Shengjiu Wang <[email protected]> Reviewed-by: Daniel Baluta <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
… and ipu_ch_tbl.lock In VDOA_MODE, _get_vdoa_ipu_res() would lock ipu_ch_tbl.lock mutex first, then lock vdoa_lock mutex and finally unlock ipu_ch_tbl.lock mutex. The vdoa_lock mutex is unlocked until put_vdoa_ipu_res(). Since the two mutexes are not unlocked in a reversed order as they are locked, AB-BA deadlock issue may happen as the below warning shows, which can be detected with the debug option CONFIG_PROVE_LOCKING enabled. [ 52.398770] ====================================================== [ 52.404957] WARNING: possible circular locking dependency detected [ 52.411145] 5.4.24 #1477 Not tainted [ 52.414728] ------------------------------------------------------ [ 52.420915] ipu1_task/92 is trying to acquire lock: [ 52.425800] 81f02274 (&ipu_ch_tbl.lock){+.+.}, at: get_res_do_task+0x144/0x77c [ 52.433050] [ 52.433050] but task is already holding lock: [ 52.438888] 8183189c (vdoa_lock){+.+.}, at: vdoa_get_handle+0x64/0x158 [ 52.445434] [ 52.445434] which lock already depends on the new lock. [ 52.445434] [ 52.453615] [ 52.453615] the existing dependency chain (in reverse order) is: [ 52.461101] [ 52.461101] -> #1 (vdoa_lock){+.+.}: [ 52.466175] __mutex_lock+0xb8/0xaa8 [ 52.470283] mutex_lock_nested+0x2c/0x34 [ 52.474735] vdoa_get_handle+0x64/0x158 [ 52.479100] _get_vdoa_ipu_res+0x2b4/0x338 [ 52.483726] get_res_do_task+0x34/0x77c [ 52.488092] ipu_task_thread+0x148/0xeb0 [ 52.492551] kthread+0x168/0x170 [ 52.496310] ret_from_fork+0x14/0x20 [ 52.500414] 0x0 [ 52.502779] [ 52.502779] -> #0 (&ipu_ch_tbl.lock){+.+.}: [ 52.508457] __lock_acquire+0x15d0/0x2588 [ 52.512995] lock_acquire+0xdc/0x280 [ 52.517103] __mutex_lock+0xb8/0xaa8 [ 52.521210] mutex_lock_nested+0x2c/0x34 [ 52.525662] get_res_do_task+0x144/0x77c [ 52.530113] ipu_task_thread+0x148/0xeb0 [ 52.534566] kthread+0x168/0x170 [ 52.538322] ret_from_fork+0x14/0x20 [ 52.542425] 0x0 [ 52.544790] [ 52.544790] other info that might help us debug this: [ 52.544790] [ 52.552797] Possible unsafe locking scenario: [ 52.552797] [ 52.558721] CPU0 CPU1 [ 52.563256] ---- ---- [ 52.567790] lock(vdoa_lock); [ 52.570853] lock(&ipu_ch_tbl.lock); [ 52.577040] lock(vdoa_lock); [ 52.582619] lock(&ipu_ch_tbl.lock); [ 52.586289] [ 52.586289] *** DEADLOCK *** [ 52.586289] [ 52.592215] 1 lock held by ipu1_task/92: [ 52.596142] #0: 8183189c (vdoa_lock){+.+.}, at: vdoa_get_handle+0x64/0x158 [ 52.603123] [ 52.603123] stack backtrace: [ 52.607493] CPU: 0 PID: 92 Comm: ipu1_task Not tainted 5.4.24 #1477 [ 52.613765] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 52.620298] Backtrace: [ 52.622764] [<8010f094>] (dump_backtrace) from [<8010f3ac>] (show_stack+0x20/0x24) [ 52.630345] r7:818641a8 r6:00000000 r5:600f0193 r4:818641a8 [ 52.636026] [<8010f38c>] (show_stack) from [<80f534d8>] (dump_stack+0xbc/0xe8) [ 52.643261] [<80f5341c>] (dump_stack) from [<8019c8f4>] (print_circular_bug+0x1c4/0x214) [ 52.651361] r10:a8330000 r9:ffffffff r8:a8330000 r7:a8330550 r6:81d13b1c r5:81d13ac0 [ 52.659197] r4:81d13ac0 r3:46e9f8d1 [ 52.662783] [<8019c730>] (print_circular_bug) from [<8019cb48>] (check_noncircular+0x204/0x21c) [ 52.671492] r9:00000001 r8:81708f50 r7:00000000 r6:a8423a98 r5:a8330530 r4:a8330550 [ 52.679245] [<8019c944>] (check_noncircular) from [<8019f274>] (__lock_acquire+0x15d0/0x2588) [ 52.687778] r8:81708f50 r7:81d13ac0 r6:00000001 r5:81e93d7c r4:a8330530 [ 52.694491] [<8019dca4>] (__lock_acquire) from [<801a0b84>] (lock_acquire+0xdc/0x280) [ 52.702334] r10:00000000 r9:00000000 r8:00000000 r7:81f02274 r6:600f0113 r5:81708724 [ 52.710169] r4:00000000 [ 52.712717] [<801a0aa8>] (lock_acquire) from [<80f71938>] (__mutex_lock+0xb8/0xaa8) [ 52.720384] r10:81e93d7c r9:0000f6d0 r8:81f022e8 r7:00008f50 r6:00000000 r5:ffffe000 [ 52.728219] r4:81f02240 [ 52.730765] [<80f71880>] (__mutex_lock) from [<80f72354>] (mutex_lock_nested+0x2c/0x34) [ 52.738778] r10:00000000 r9:a8423ccc r8:81f022e8 r7:000002d0 r6:8188b6f8 r5:81f02234 [ 52.746613] r4:a4540400 [ 52.749162] [<80f72328>] (mutex_lock_nested) from [<80b8c830>] (get_res_do_task+0x144/0x77c) [ 52.757612] [<80b8c6ec>] (get_res_do_task) from [<80b8d6a0>] (ipu_task_thread+0x148/0xeb0) [ 52.765886] r10:a8139bd0 r9:a8423ccc r8:81f022e8 r7:a4540400 r6:81831604 r5:a454053c [ 52.773721] r4:600f0013 [ 52.776269] [<80b8d558>] (ipu_task_thread) from [<801684c8>] (kthread+0x168/0x170) [ 52.783849] r10:a8139bd0 r9:80b8d558 r8:81f022e8 r7:a8422000 r6:00000000 r5:a840cd00 [ 52.791684] r4:a83cc080 [ 52.794231] [<80168360>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20) [ 52.801460] Exception stack(0xa8423fb0 to 0xa8423ff8) [ 52.806521] 3fa0: 00000000 00000000 00000000 00000000 [ 52.814711] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 52.822898] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 52.829521] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:80168360 [ 52.837356] r4:a840cd00 This patch corrects the locking/unlocking sequence for the two mutexes to fix the deadlock issue. Reviewed-by: Sandor Yu <[email protected]> Signed-off-by: Liu Ying <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
The reason is that we access the register when power is disabled reproduce command: echo 8 > /sys/devices/platform/soc@0/30c00000.bus/30c00000.spba-bus/30ca0000.micfil/hwvad/enable [ 332.838518] SError Interrupt on CPU2, code 0xbf000002 -- SError [ 332.838521] CPU: 2 PID: 1383 Comm: sh Tainted: G O 5.4.24-2.1.0+g2ad925d15481 #1 [ 332.838523] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 332.838525] pstate: 20000085 (nzCv daIf -PAN -UAO) [ 332.838526] pc : regcache_sync_block+0x7c/0x258 [ 332.838528] lr : regcache_sync_block+0xf0/0x258 [ 332.838531] sp : ffff800012bf3be0 [ 332.838532] x29: ffff800012bf3be0 x28: 0000000000000000 [ 332.838535] x27: 0000000000000000 x26: 0000000000000001 [ 332.838541] x25: ffff00017753ed00 x24: 0000000000000000 [ 332.838546] x23: ffff00017706d400 x22: 000000000000002b [ 332.838550] x21: ffff00017753f100 x20: ffff000177538c00 [ 332.838554] x19: 0000000000000001 x18: 0000000000000000 [ 332.838559] x17: 0000000000000000 x16: 0000000000000000 [ 332.838564] x15: 0000000000000000 x14: 0000000000000000 [ 332.838567] x13: 0000000000000000 x12: 0000000000000000 [ 332.838572] x11: ffff800012bf3d50 x10: ffff000173c0e900 [ 332.838577] x9 : ffff00017706d870 x8 : 0000000000000000 [ 332.838582] x7 : 00000000000000a8 x6 : 0000000000000000 [ 332.838587] x5 : 0000000000000000 x4 : ffff800010769d88 [ 332.838592] x3 : ffff80001076d278 x2 : 0000000008000000 [ 332.838597] x1 : ffff800023c10000 x0 : 0000000000000000 [ 332.838603] Kernel panic - not syncing: Asynchronous SError Interrupt [ 332.838605] CPU: 2 PID: 1383 Comm: sh Tainted: G O 5.4.24-2.1.0+g2ad925d15481 #1 [ 332.838606] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 332.838608] Call trace: [ 332.838609] dump_backtrace+0x0/0x140 [ 332.838612] show_stack+0x14/0x20 [ 332.838613] dump_stack+0xb4/0xf8 [ 332.838617] panic+0x158/0x324 [ 332.838618] nmi_panic+0x84/0x88 [ 332.838621] arm64_serror_panic+0x74/0x80 [ 332.838622] do_serror+0x80/0x138 [ 332.838623] el1_error+0x84/0xf8 [ 332.838627] regcache_sync_block+0x7c/0x258 [ 332.838628] regcache_rbtree_sync+0x60/0xb0 [ 332.838629] regcache_sync+0xac/0x140 [ 332.838633] disable_hwvad+0x3c/0x1f0 [ 332.838634] micfil_hwvad_handler+0x78/0x170 [ 332.838635] kobj_attr_store+0x14/0x28 [ 332.838637] sysfs_kf_write+0x40/0x50 [ 332.838640] kernfs_fop_write+0xf8/0x210 [ 332.838641] __vfs_write+0x18/0x40 [ 332.838642] vfs_write+0xdc/0x1c8 [ 332.838646] ksys_write+0x68/0xf0 [ 332.838647] __arm64_sys_write+0x18/0x20 [ 332.838648] el0_svc_common.constprop.0+0x68/0x160 [ 332.838651] el0_svc_handler+0x20/0x80 [ 332.838653] el0_svc+0x8/0xc [ 332.838972] SMP: stopping secondary CPUs [ 332.838974] Kernel Offset: disabled [ 332.838975] CPU features: 0x0002,2000200c [ 332.838976] Memory Limit: none Signed-off-by: Shengjiu Wang <[email protected]> Reviewed-by: Viorel Suman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
When do suspend/resume test, there have WARN_ON() log dump from stmmac_xmit() funciton, the code logic: entry = tx_q->cur_tx; first_entry = entry; WARN_ON(tx_q->tx_skbuff[first_entry]); In normal case, tx_q->tx_skbuff[txq->cur_tx] should be NULL because the skb should be handled and freed in stmmac_tx_clean(). But stmmac_resume() reset queue parameters like below, skb buffers may not be freed. tx_q->cur_tx = 0; tx_q->dirty_tx = 0; So free tx skb buffer in stmmac_resume() to avoid warning and memory leak. log: [ 46.139824] ------------[ cut here ]------------ [ 46.144453] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3235 stmmac_xmit+0x7a0/0x9d0 [ 46.154969] Modules linked in: crct10dif_ce vvcam(O) flexcan can_dev [ 46.161328] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.4.24-2.1.0+g2ad925d15481 #1 [ 46.170369] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 46.175677] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 46.180465] pc : stmmac_xmit+0x7a0/0x9d0 [ 46.184387] lr : dev_hard_start_xmit+0x94/0x158 [ 46.188913] sp : ffff800010003cc0 [ 46.192224] x29: ffff800010003cc0 x28: ffff000177e2a100 [ 46.197533] x27: ffff000176ef0840 x26: ffff000176ef0090 [ 46.202842] x25: 0000000000000000 x24: 0000000000000000 [ 46.208151] x23: 0000000000000003 x22: ffff8000119ddd30 [ 46.213460] x21: ffff00017636f000 x20: ffff000176ef0cc0 [ 46.218769] x19: 0000000000000003 x18: 0000000000000000 [ 46.224078] x17: 0000000000000000 x16: 0000000000000000 [ 46.229386] x15: 0000000000000079 x14: 0000000000000000 [ 46.234695] x13: 0000000000000003 x12: 0000000000000003 [ 46.240003] x11: 0000000000000010 x10: 0000000000000010 [ 46.245312] x9 : ffff00017002b140 x8 : 0000000000000000 [ 46.250621] x7 : ffff00017636f000 x6 : 0000000000000010 [ 46.255930] x5 : 0000000000000001 x4 : ffff000176ef0000 [ 46.261238] x3 : 0000000000000003 x2 : 00000000ffffffff [ 46.266547] x1 : ffff000177e2a000 x0 : 0000000000000000 [ 46.271856] Call trace: [ 46.274302] stmmac_xmit+0x7a0/0x9d0 [ 46.277874] dev_hard_start_xmit+0x94/0x158 [ 46.282056] sch_direct_xmit+0x11c/0x338 [ 46.285976] __qdisc_run+0x118/0x5f0 [ 46.289549] net_tx_action+0x110/0x198 [ 46.293297] __do_softirq+0x120/0x23c [ 46.296958] irq_exit+0xb8/0xd8 [ 46.300098] __handle_domain_irq+0x64/0xb8 [ 46.304191] gic_handle_irq+0x5c/0x148 [ 46.307936] el1_irq+0xb8/0x180 [ 46.311076] cpuidle_enter_state+0x84/0x360 [ 46.315256] cpuidle_enter+0x34/0x48 [ 46.318829] call_cpuidle+0x18/0x38 [ 46.322314] do_idle+0x1e0/0x280 [ 46.325539] cpu_startup_entry+0x24/0x40 [ 46.329460] rest_init+0xd4/0xe0 [ 46.332687] arch_call_rest_init+0xc/0x14 [ 46.336695] start_kernel+0x420/0x44c [ 46.340353] ---[ end trace bc1ee695123cbacd ]--- Reviewed-by: Richard Zhu <[email protected]> Signed-off-by: Fugang Duan <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 20a785a ] This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e #5 [f418de68] sctp_do_sm at f8db8c77 [sctp] #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] #7 [f418df48] inet_shutdown at c080baf9 #8 [f418df5c] sys_shutdown at c079eedf #9 [f418df70] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <[email protected]> CC: Vlad Yasevich <[email protected]> CC: "David S. Miller" <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 6988f31 ] Replace superfluous VM_BUG_ON() with comment about correct usage. Technically reverts commit 1d148e2 ("mm: add VM_BUG_ON_PAGE() to page_mapcount()"), but context lines have changed. Function isolate_migratepages_block() runs some checks out of lru_lock when choose pages for migration. After checking PageLRU() it checks extra page references by comparing page_count() and page_mapcount(). Between these two checks page could be removed from lru, freed and taken by slab. As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). Race window is tiny. For certain workload this happens around once a year. page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0 flags: 0x500000000008100(slab|head) raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:628! invalid opcode: 0000 [#1] SMP NOPTI CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W 4.19.109-27 #1 Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 RIP: 0010:isolate_migratepages_block+0x986/0x9b0 The code in isolate_migratepages_block() was added in commit 119d6d5 ("mm, compaction: avoid isolating pinned pages") before adding VM_BUG_ON into page_mapcount(). This race has been predicted in 2015 by Vlastimil Babka (see link below). [[email protected]: comment tweaks, per Hugh] Fixes: 1d148e2 ("mm: add VM_BUG_ON_PAGE() to page_mapcount()") Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1) Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit bf71bc1 ] The Debian kernel v5.6 triggers this kernel panic: Kernel panic - not syncing: Bad Address (null pointer deref?) Bad Address (null pointer deref?): Code=26 (Data memory access rights trap) at addr 0000000000000000 CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-2-parisc64 #1 Debian 5.6.14-1 IAOQ[0]: mem_init+0xb0/0x150 IAOQ[1]: mem_init+0xb4/0x150 RP(r2): start_kernel+0x6c8/0x1190 Backtrace: [<0000000040101ab4>] start_kernel+0x6c8/0x1190 [<0000000040108574>] start_parisc+0x158/0x1b8 on a HP-PARISC rp3440 machine with this memory layout: Memory Ranges: 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB 1) Start 0x0000004040000000 End 0x00000040ffdfffff Size 3070 MB Fix the crash by avoiding virt_to_page() and similar functions in mem_init() until the memory zones have been fully set up. Signed-off-by: Helge Deller <[email protected]> Cc: [email protected] # v5.0+ Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit e2d4a80 upstream. On a non-forwarding 802.11s link between two fairly busy neighboring nodes (iperf with -P 16 at ~850MBit/s TCP; 1733.3 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 4), so with frequent PREQ retries, usually after around 30-40 seconds the following crash would occur: [ 1110.822428] Unable to handle kernel read from unreadable memory at virtual address 00000000 [ 1110.830786] Mem abort info: [ 1110.833573] Exception class = IABT (current EL), IL = 32 bits [ 1110.839494] SET = 0, FnV = 0 [ 1110.842546] EA = 0, S1PTW = 0 [ 1110.845678] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800076386000 [ 1110.852204] [0000000000000000] *pgd=00000000f6322003, *pud=00000000f62de003, *pmd=0000000000000000 [ 1110.861167] Internal error: Oops: 86000004 [#1] PREEMPT SMP [ 1110.866730] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 usbcore usb_common [ 1110.932190] Process swapper/3 (pid: 0, stack limit = 0xffff0000090c8000) [ 1110.938884] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0 [ 1110.944965] Hardware name: LS1043A RGW Board (DT) [ 1110.949658] task: ffff8000787a81c0 task.stack: ffff0000090c8000 [ 1110.955568] PC is at 0x0 [ 1110.958097] LR is at call_timer_fn.isra.27+0x24/0x78 [ 1110.963055] pc : [<0000000000000000>] lr : [<ffff0000080ff29c>] pstate: 00400145 [ 1110.970440] sp : ffff00000801be10 [ 1110.973744] x29: ffff00000801be10 x28: ffff000008bf7018 [ 1110.979047] x27: ffff000008bf87c8 x26: ffff000008c160c0 [ 1110.984352] x25: 0000000000000000 x24: 0000000000000000 [ 1110.989657] x23: dead000000000200 x22: 0000000000000000 [ 1110.994959] x21: 0000000000000000 x20: 0000000000000101 [ 1111.000262] x19: ffff8000787a81c0 x18: 0000000000000000 [ 1111.005565] x17: ffff0000089167b0 x16: 0000000000000058 [ 1111.010868] x15: ffff0000089167b0 x14: 0000000000000000 [ 1111.016172] x13: ffff000008916788 x12: 0000000000000040 [ 1111.021475] x11: ffff80007fda9af0 x10: 0000000000000001 [ 1111.026777] x9 : ffff00000801bea0 x8 : 0000000000000004 [ 1111.032080] x7 : 0000000000000000 x6 : ffff80007fda9aa8 [ 1111.037383] x5 : ffff00000801bea0 x4 : 0000000000000010 [ 1111.042685] x3 : ffff00000801be98 x2 : 0000000000000614 [ 1111.047988] x1 : 0000000000000000 x0 : 0000000000000000 [ 1111.053290] Call trace: [ 1111.055728] Exception stack(0xffff00000801bcd0 to 0xffff00000801be10) [ 1111.062158] bcc0: 0000000000000000 0000000000000000 [ 1111.069978] bce0: 0000000000000614 ffff00000801be98 0000000000000010 ffff00000801bea0 [ 1111.077798] bd00: ffff80007fda9aa8 0000000000000000 0000000000000004 ffff00000801bea0 [ 1111.085618] bd20: 0000000000000001 ffff80007fda9af0 0000000000000040 ffff000008916788 [ 1111.093437] bd40: 0000000000000000 ffff0000089167b0 0000000000000058 ffff0000089167b0 [ 1111.101256] bd60: 0000000000000000 ffff8000787a81c0 0000000000000101 0000000000000000 [ 1111.109075] bd80: 0000000000000000 dead000000000200 0000000000000000 0000000000000000 [ 1111.116895] bda0: ffff000008c160c0 ffff000008bf87c8 ffff000008bf7018 ffff00000801be10 [ 1111.124715] bdc0: ffff0000080ff29c ffff00000801be10 0000000000000000 0000000000400145 [ 1111.132534] bde0: ffff8000787a81c0 ffff00000801bde8 0000ffffffffffff 000001029eb19be8 [ 1111.140353] be00: ffff00000801be10 0000000000000000 [ 1111.145220] [< (null)>] (null) [ 1111.149917] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 1111.155741] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 1111.161130] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 1111.166002] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 1111.171825] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 [ 1111.177213] Exception stack(0xffff0000090cbe30 to 0xffff0000090cbf70) [ 1111.183642] be20: 0000000000000020 0000000000000000 [ 1111.191461] be40: 0000000000000001 0000000000000000 00008000771af000 0000000000000000 [ 1111.199281] be60: ffff000008c95180 0000000000000000 ffff000008c19360 ffff0000090cbef0 [ 1111.207101] be80: 0000000000000810 0000000000000400 0000000000000098 ffff000000000000 [ 1111.214920] bea0: 0000000000000001 ffff0000089167b0 0000000000000000 ffff0000089167b0 [ 1111.222740] bec0: 0000000000000000 ffff000008c198e8 ffff000008bf7018 ffff000008c19000 [ 1111.230559] bee0: 0000000000000000 0000000000000000 ffff8000787a81c0 ffff000008018000 [ 1111.238380] bf00: ffff00000801c000 ffff00000913ba34 ffff8000787a81c0 ffff0000090cbf70 [ 1111.246199] bf20: ffff0000080857cc ffff0000090cbf70 ffff0000080857d0 0000000000400145 [ 1111.254020] bf40: ffff000008018000 ffff00000801c000 ffffffffffffffff ffff0000080fa574 [ 1111.261838] bf60: ffff0000090cbf70 ffff0000080857d0 [ 1111.266706] [<ffff0000080832e8>] el1_irq+0xe8/0x18c [ 1111.271576] [<ffff0000080857d0>] arch_cpu_idle+0x10/0x18 [ 1111.276880] [<ffff0000080d7de4>] do_idle+0xec/0x1b8 [ 1111.281748] [<ffff0000080d8020>] cpu_startup_entry+0x20/0x28 [ 1111.287399] [<ffff00000808f81c>] secondary_start_kernel+0x104/0x110 [ 1111.293662] Code: bad PC value [ 1111.296710] ---[ end trace 555b6ca4363c3edd ]--- [ 1111.301318] Kernel panic - not syncing: Fatal exception in interrupt [ 1111.307661] SMP: stopping secondary CPUs [ 1111.311574] Kernel Offset: disabled [ 1111.315053] CPU features: 0x0002000 [ 1111.318530] Memory Limit: none [ 1111.321575] Rebooting in 3 seconds.. With some added debug output / delays we were able to push the crash from the timer callback runner into the callback function and by that shedding some light on which object holding the timer gets corrupted: [ 401.720899] Unable to handle kernel read from unreadable memory at virtual address 00000868 [...] [ 402.335836] [<ffff0000088fafa4>] _raw_spin_lock_bh+0x14/0x48 [ 402.341548] [<ffff000000dbe684>] mesh_path_timer+0x10c/0x248 [mac80211] [ 402.348154] [<ffff0000080ff29c>] call_timer_fn.isra.27+0x24/0x78 [ 402.354150] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 402.359974] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 402.365362] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 402.370231] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 402.376053] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 The issue happens due to the following sequence of events: 1) mesh_path_start_discovery(): -> spin_unlock_bh(&mpath->state_lock) before mesh_path_sel_frame_tx() 2) mesh_path_free_rcu() -> del_timer_sync(&mpath->timer) [...] -> kfree_rcu(mpath) 3) mesh_path_start_discovery(): -> mod_timer(&mpath->timer, ...) [...] -> rcu_read_unlock() 4) mesh_path_free_rcu()'s kfree_rcu(): -> kfree(mpath) 5) mesh_path_timer() starts after timeout, using freed mpath object So a use-after-free issue due to a timer re-arming bug caused by an early spin-unlocking. This patch fixes this issue by re-checking if mpath is about to be free'd and if so bails out of re-arming the timer. Cc: [email protected] Fixes: 050ac52 ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Cc: Simon Wunderlich <[email protected]> Signed-off-by: Linus Lüssing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 8874347 upstream. The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x100000000 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x100000 for MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03e ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on x86-32. However if one has set CONFIG_IOMMU_INTEL, since c5a5dc4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. [ bp: Massage commit message. ] Fixes: 1b7e03e ("x86, NUMA: Enable emulation on 32bit too") Reported-by: Alan Jenkins <[email protected]> Suggested-by: Robin Murphy <[email protected]> Signed-off-by: Alexander Dahl <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://unix.stackexchange.com/q/520065/50007 Link: https://web.nettworks.org/bugs/browse/FFL-2560 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit c95c5f5 upstream. Here is the steps to reproduce the problem: ip netns add foo ip netns add bar ip -n foo link add xfrmi0 type xfrm dev lo if_id 42 ip -n foo link set xfrmi0 netns bar ip netns del foo ip netns del bar Which results to: [ 186.686395] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bd3: 0000 [#1] SMP PTI [ 186.687665] CPU: 7 PID: 232 Comm: kworker/u16:2 Not tainted 5.6.0+ #1 [ 186.688430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 186.689420] Workqueue: netns cleanup_net [ 186.689903] RIP: 0010:xfrmi_dev_uninit+0x1b/0x4b [xfrm_interface] [ 186.690657] Code: 44 f6 ff ff 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 8d 8f c0 08 00 00 8b 05 ce 14 00 00 48 8b 97 d0 08 00 00 48 8b 92 c0 0e 00 00 <48> 8b 14 c2 48 8b 02 48 85 c0 74 19 48 39 c1 75 0c 48 8b 87 c0 08 [ 186.692838] RSP: 0018:ffffc900003b7d68 EFLAGS: 00010286 [ 186.693435] RAX: 000000000000000d RBX: ffff8881b0f31000 RCX: ffff8881b0f318c0 [ 186.694334] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000246 RDI: ffff8881b0f31000 [ 186.695190] RBP: ffffc900003b7df0 R08: ffff888236c07740 R09: 0000000000000040 [ 186.696024] R10: ffffffff81fce1b8 R11: 0000000000000002 R12: ffffc900003b7d80 [ 186.696859] R13: ffff8881edcc6a40 R14: ffff8881a1b6e780 R15: ffffffff81ed47c8 [ 186.697738] FS: 0000000000000000(0000) GS:ffff888237dc0000(0000) knlGS:0000000000000000 [ 186.698705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 186.699408] CR2: 00007f2129e93148 CR3: 0000000001e0a000 CR4: 00000000000006e0 [ 186.700221] Call Trace: [ 186.700508] rollback_registered_many+0x32b/0x3fd [ 186.701058] ? __rtnl_unlock+0x20/0x3d [ 186.701494] ? arch_local_irq_save+0x11/0x17 [ 186.702012] unregister_netdevice_many+0x12/0x55 [ 186.702594] default_device_exit_batch+0x12b/0x150 [ 186.703160] ? prepare_to_wait_exclusive+0x60/0x60 [ 186.703719] cleanup_net+0x17d/0x234 [ 186.704138] process_one_work+0x196/0x2e8 [ 186.704652] worker_thread+0x1a4/0x249 [ 186.705087] ? cancel_delayed_work+0x92/0x92 [ 186.705620] kthread+0x105/0x10f [ 186.706000] ? __kthread_bind_mask+0x57/0x57 [ 186.706501] ret_from_fork+0x35/0x40 [ 186.706978] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace fscache sunrpc button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic 8139too ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix libata scsi_mod piix psmouse i2c_piix4 ide_core 8139cp i2c_core mii floppy [ 186.710423] ---[ end trace 463bba18105537e5 ]--- The problem is that x-netns xfrm interface are not removed when the link netns is removed. This causes later this oops when thoses interfaces are removed. Let's add a handler to remove all interfaces related to a netns when this netns is removed. Fixes: f203b76 ("xfrm: Add virtual xfrm interfaces") Reported-by: Christophe Gouault <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: Steffen Klassert <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit f6a23d8 upstream. This patch is to fix a crash: [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access [ ] general protection fault: 0000 [#1] SMP KASAN PTI [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0 [ ] Call Trace: [ ] xfrm6_local_error+0x1eb/0x300 [ ] xfrm_local_error+0x95/0x130 [ ] __xfrm6_output+0x65f/0xb50 [ ] xfrm6_output+0x106/0x46f [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel] [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan] [ ] vxlan_xmit+0x6a0/0x4276 [vxlan] [ ] dev_hard_start_xmit+0x165/0x820 [ ] __dev_queue_xmit+0x1ff0/0x2b90 [ ] ip_finish_output2+0xd3e/0x1480 [ ] ip_do_fragment+0x182d/0x2210 [ ] ip_output+0x1d0/0x510 [ ] ip_send_skb+0x37/0xa0 [ ] raw_sendmsg+0x1b4c/0x2b80 [ ] sock_sendmsg+0xc0/0x110 This occurred when sending a v4 skb over vxlan6 over ipsec, in which case skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries to get ipv6 info from a ipv4 sk. This issue was actually fixed by Commit 628e341 ("xfrm: make local error reporting more robust"), but brought back by Commit 844d487 ("xfrm: choose protocol family by skb protocol"). So to fix it, we should call xfrm6_local_error() only when skb->protocol is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6. Fixes: 844d487 ("xfrm: choose protocol family by skb protocol") Reported-by: Xiumei Mu <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: Steffen Klassert <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 2b86cb8 upstream. Be there a platform with the following layout: Regular NIC | +----> DSA master for switch port | +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit 1a33e10 ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] *** DEADLOCK *** [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit ab92d68 ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: ab92d68 ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 770f605 ] This patch fixes the following warning and few other instances of traversal of evm_config_xattrnames list: [ 32.848432] ============================= [ 32.848707] WARNING: suspicious RCU usage [ 32.848966] 5.7.0-rc1-00006-ga8d5875ce5f0b #1 Not tainted [ 32.849308] ----------------------------- [ 32.849567] security/integrity/evm/evm_main.c:231 RCU-list traversed in non-reader section!! Since entries are only added to the list and never deleted, use list_for_each_entry_lockless() instead of list_for_each_entry_rcu for traversing the list. Also, add a relevant comment in evm_secfs.c to indicate this fact. Reported-by: kernel test robot <[email protected]> Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Madhuparna Bhowmik <[email protected]> Acked-by: Paul E. McKenney <[email protected]> (RCU viewpoint) Signed-off-by: Mimi Zohar <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit e274832 ] In null_init_zone_dev() check if the zone size is larger than device capacity, return error if needed. This also fixes the following oops :- null_blk: changed the number of conventional zones to 4294967295 BUG: kernel NULL pointer dereference, address: 0000000000000010 PGD 7d76c5067 P4D 7d76c5067 PUD 7d240c067 PMD 0 Oops: 0002 [#1] SMP NOPTI CPU: 4 PID: 5508 Comm: nullbtests.sh Tainted: G OE 5.7.0-rc4lblk-fnext0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e4 RIP: 0010:null_init_zoned_dev+0x17a/0x27f [null_blk] RSP: 0018:ffffc90007007e00 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8887fb3f3c00 RCX: 0000000000000007 RDX: 0000000000000000 RSI: ffff8887ca09d688 RDI: ffff888810fea510 RBP: 0000000000000010 R08: ffff8887ca09d688 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887c26e8000 R13: ffffffffa05e9390 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fcb5256f740(0000) GS:ffff888810e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000081e8fe000 CR4: 00000000003406e0 Call Trace: null_add_dev+0x534/0x71b [null_blk] nullb_device_power_store.cold.41+0x8/0x2e [null_blk] configfs_write_file+0xe6/0x150 vfs_write+0xba/0x1e0 ksys_write+0x5f/0xe0 do_syscall_64+0x60/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fcb51c71840 Signed-off-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 02c71b1 ] syzbot recently found a way to crash the kernel [1] Issue here is that inet_hash() & inet_unhash() are currently only meant to be used by TCP & DCCP, since only these protocols provide the needed hashinfo pointer. L2TP uses a single list (instead of a hash table) This old bug became an issue after commit 6102365 ("bpf: Add new cgroup attach type to enable sock modifications") since after this commit, sk_common_release() can be called while the L2TP socket is still considered 'hashed'. general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 7063 Comm: syz-executor654 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00 RSP: 0018:ffffc90001777d30 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242 RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008 RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1 R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0 R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00 FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sk_common_release+0xba/0x370 net/core/sock.c:3210 inet_create net/ipv4/af_inet.c:390 [inline] inet_create+0x966/0xe00 net/ipv4/af_inet.c:248 __sock_create+0x3cb/0x730 net/socket.c:1428 sock_create net/socket.c:1479 [inline] __sys_socket+0xef/0x200 net/socket.c:1521 __do_sys_socket net/socket.c:1530 [inline] __se_sys_socket net/socket.c:1528 [inline] __x64_sys_socket+0x6f/0xb0 net/socket.c:1528 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x441e29 Code: e8 fc b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdce184148 EFLAGS: 00000246 ORIG_RAX: 0000000000000029 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441e29 RDX: 0000000000000073 RSI: 0000000000000002 RDI: 0000000000000002 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000402c30 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace 23b6578228ce553e ]--- RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00 RSP: 0018:ffffc90001777d30 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242 RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008 RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1 R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0 R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00 FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 0d76751 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Signed-off-by: Eric Dumazet <[email protected]> Cc: James Chapman <[email protected]> Cc: Andrii Nakryiko <[email protected]> Reported-by: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit b86dab0 upstream. When k_ascii is invoked several times in a row there is a potential for signed integer overflow: UBSAN: Undefined behaviour in drivers/tty/vt/keyboard.c:888:19 signed integer overflow: 10 * 1111111111 cannot be represented in type 'int' CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.11 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xce/0x128 lib/dump_stack.c:118 ubsan_epilogue+0xe/0x30 lib/ubsan.c:154 handle_overflow+0xdc/0xf0 lib/ubsan.c:184 __ubsan_handle_mul_overflow+0x2a/0x40 lib/ubsan.c:205 k_ascii+0xbf/0xd0 drivers/tty/vt/keyboard.c:888 kbd_keycode drivers/tty/vt/keyboard.c:1477 [inline] kbd_event+0x888/0x3be0 drivers/tty/vt/keyboard.c:1495 While it can be worked around by using check_mul_overflow()/ check_add_overflow(), it is better to introduce a separate flag to signal that number pad is being used to compose a symbol, and change type of the accumulator from signed to unsigned, thus avoiding undefined behavior when it overflows. Reported-by: Kyungtae Kim <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/20200525232740.GA262061@dtor-ws Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 3866f21 ] call_undef_hook() in traps.c applies the same instr_mask for both 16-bit and 32-bit thumb instructions. If instr_mask then is only 16 bits wide (0xffff as opposed to 0xffffffff), the first half-word of 32-bit thumb instructions will be masked out. This makes the function match 32-bit thumb instructions where the second half-word is equal to instr_val, regardless of the first half-word. The result in this case is that all undefined 32-bit thumb instructions with the second half-word equal to 0xde01 (udf #1) work as breakpoints and will raise a SIGTRAP instead of a SIGILL, instead of just the one intended 16-bit instruction. An example of such an instruction is 0xeaa0de01, which is unallocated according to Arm ARM and should raise a SIGILL, but instead raises a SIGTRAP. This patch fixes the issue by setting all the bits in instr_mask, which will still match the intended 16-bit thumb instruction (where the upper half is always 0), but not any 32-bit thumb instructions. Cc: Oleg Nesterov <[email protected]> Signed-off-by: Fredrik Strupe <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit e18035c upstream. Add and use snd_pcm_stream_lock_nested() in snd_pcm_link/unlink implementation. The code is fine, but generates a lockdep complaint: ============================================ WARNING: possible recursive locking detected 5.7.1mq+ #381 Tainted: G O -------------------------------------------- pulseaudio/4180 is trying to acquire lock: ffff888402d6f508 (&group->lock){-...}-{2:2}, at: snd_pcm_common_ioctl+0xda8/0xee0 [snd_pcm] but task is already holding lock: ffff8883f7a8cf18 (&group->lock){-...}-{2:2}, at: snd_pcm_common_ioctl+0xe4e/0xee0 [snd_pcm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&group->lock); lock(&group->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by pulseaudio/4180: #0: ffffffffa1a05190 (snd_pcm_link_rwsem){++++}-{3:3}, at: snd_pcm_common_ioctl+0xca0/0xee0 [snd_pcm] #1: ffff8883f7a8cf18 (&group->lock){-...}-{2:2}, at: snd_pcm_common_ioctl+0xe4e/0xee0 [snd_pcm] [...] Cc: [email protected] Fixes: f57f3df ("ALSA: pcm: More fine-grained PCM link locking") Signed-off-by: Michał Mirosław <[email protected]> Link: https://lore.kernel.org/r/37252c65941e58473b1219ca9fab03d48f47e3e3.1591610330.git.mirq-linux@rere.qmqm.pl Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit a194c33 upstream. Will reported a UBSAN warning: UBSAN: null-ptr-deref in arch/arm64/kernel/smp.c:596:6 member access within null pointer of type 'struct acpi_madt_generic_interrupt' CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc6-00124-g96bc42ff0a82 #1 Call trace: dump_backtrace+0x0/0x384 show_stack+0x28/0x38 dump_stack+0xec/0x174 handle_null_ptr_deref+0x134/0x174 __ubsan_handle_type_mismatch_v1+0x84/0xa4 acpi_parse_gic_cpu_interface+0x60/0xe8 acpi_parse_entries_array+0x288/0x498 acpi_table_parse_entries_array+0x178/0x1b4 acpi_table_parse_madt+0xa4/0x110 acpi_parse_and_init_cpus+0x38/0x100 smp_init_cpus+0x74/0x258 setup_arch+0x350/0x3ec start_kernel+0x98/0x6f4 This is from the use of the ACPI_OFFSET in arch/arm64/include/asm/acpi.h. Replace its use with offsetof from include/linux/stddef.h which should implement the same logic using __builtin_offsetof, so that UBSAN wont warn. Reported-by: Will Deacon <[email protected]> Suggested-by: Ard Biesheuvel <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Jeremy Linton <[email protected]> Acked-by: Lorenzo Pieralisi <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/20200521100952.GA5360@willie-the-truck/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 8301c71 upstream. After commit c3aab9a ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages"), the following null pointer dereference has been reported on nilfs2: BUG: kernel NULL pointer dereference, address: 00000000000000a8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... RIP: 0010:percpu_counter_add_batch+0xa/0x60 ... Call Trace: __test_set_page_writeback+0x2d3/0x330 nilfs_segctor_do_construct+0x10d3/0x2110 [nilfs2] nilfs_segctor_construct+0x168/0x260 [nilfs2] nilfs_segctor_thread+0x127/0x3b0 [nilfs2] kthread+0xf8/0x130 ... This crash turned out to be caused by set_page_writeback() call for segment summary buffers at nilfs_segctor_prepare_write(). set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK) where inode_to_wb(inode) is NULL if the inode of underlying block device does not have an associated wb. This fixes the issue by calling inode_attach_wb() in advance to ensure to associate the bdev inode with its wb. Fixes: c3aab9a ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages") Reported-by: Walton Hoops <[email protected]> Reported-by: Tomas Hlavaty <[email protected]> Reported-by: ARAI Shun-ichi <[email protected]> Reported-by: Hideki EIRAKU <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> [5.4+] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
[ Upstream commit 42ea9f1 ] In case there is a work in the health WQ when we teardown the driver, in driver load error flow, the health work will try to read dev->iseg, which was already unmap in mlx5_pci_close(). Fix it by draining the health workqueue first thing in mlx5_pci_close(). Trace of the error: BUG: unable to handle page fault for address: ffffb5b141c18014 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 1fe95d067 P4D 1fe95d067 PUD 1fe95e067 PMD 1b7823067 PTE 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 6755 Comm: kworker/u128:2 Not tainted 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? mlx5_health_try_recover+0x4d/0x270 [mlx5_core] mlx5_fw_fatal_reporter_recover+0x16/0x20 [mlx5_core] devlink_health_reporter_recover+0x1c/0x50 devlink_health_report+0xfb/0x240 mlx5_fw_fatal_reporter_err_work+0x65/0xd0 [mlx5_core] process_one_work+0x1fb/0x4e0 ? process_one_work+0x16b/0x4e0 worker_thread+0x4f/0x3d0 kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 ? kthread_cancel_delayed_work_sync+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache 8021q garp mrp stp llc ipmi_devintf ipmi_msghandler rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_uverbs ib_core mlx5_core sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 mlxfw crypto_simd cryptd glue_helper input_leds hyperv_fb intel_rapl_perf joydev serio_raw pci_hyperv pci_hyperv_mini mac_hid hv_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 hv_utils hid_generic hv_storvsc ptp hid_hyperv hid hv_netvsc hyperv_keyboard pps_core scsi_transport_fc psmouse hv_vmbus i2c_piix4 floppy pata_acpi CR2: ffffb5b141c18014 ---[ end trace b12c5503157cad24 ]--- RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:38 in_atomic(): 0, irqs_disabled(): 1, pid: 6755, name: kworker/u128:2 INFO: lockdep is turned off. CPU: 3 PID: 6755 Comm: kworker/u128:2 Tainted: G D 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: dump_stack+0x63/0x88 ___might_sleep+0x10a/0x130 __might_sleep+0x4a/0x80 exit_signals+0x33/0x230 ? blocking_notifier_call_chain+0x16/0x20 do_exit+0xb1/0xc30 ? kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 Fixes: 52c368d ("net/mlx5: Move health and page alloc init to mdev_init") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
commit 2bbcaae upstream. In ath9k_hif_usb_rx_cb interface number is assumed to be 0. usb_ifnum_to_if(urb->dev, 0) But it isn't always true. The case reported by syzbot: https://lore.kernel.org/linux-usb/[email protected] usb 2-1: new high-speed USB device number 2 using dummy_hcd usb 2-1: config 1 has an invalid interface number: 2 but max is 0 usb 2-1: config 1 has no interface number 0 usb 2-1: New USB device found, idVendor=0cf3, idProduct=9271, bcdDevice= 1.08 usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 general protection fault, probably for non-canonical address 0xdffffc0000000015: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc5-syzkaller #0 Call Trace __usb_hcd_giveback_urb+0x29a/0x550 drivers/usb/core/hcd.c:1650 usb_hcd_giveback_urb+0x368/0x420 drivers/usb/core/hcd.c:1716 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x5f9/0x1500 kernel/time/timer.c:1786 __do_softirq+0x21e/0x950 kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x178/0x1a0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x141/0x540 arch/x86/kernel/apic/apic.c:1146 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 Reported-and-tested-by: [email protected] Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
$ rmmod mx6s_capture $ rmmod mxc_mipi_csi ------------[ cut here ]------------ WARNING: CPU: 0 PID: 31970 at drivers/clk/clk.c:954 clk_core_disable+0xc4/0xcc mipi_csi_root_clk already disabled Modules linked in: cdc_acm 8021q brcmfmac brcmutil ov5640_camera_mipi_v2 mxc_mipi_csi(-) [last unloaded: mx6s_capture] CPU: 0 PID: 31970 Comm: modprobe Tainted: G O 5.4.24-2.1.0+g2ad925d15481 #1 Hardware name: Freescale i.MX7 Dual (Device Tree) [<801107b8>] (unwind_backtrace) from [<8010b684>] (show_stack+0x10/0x14) [<8010b684>] (show_stack) from [<80bd6b0c>] (dump_stack+0x90/0xa4) [<80bd6b0c>] (dump_stack) from [<801305e4>] (__warn+0xbc/0xd8) [<801305e4>] (__warn) from [<80130698>] (warn_slowpath_fmt+0x98/0xc4) [<80130698>] (warn_slowpath_fmt) from [<805257d0>] (clk_core_disable+0xc4/0xcc) [<805257d0>] (clk_core_disable) from [<805258c8>] (clk_core_disable_lock+0x18/0x24) [<805258c8>] (clk_core_disable_lock) from [<7f00b834>] (mipi_csis_clk_disable+0x14/0x6c [mxc_mipi_csi]) [<7f00b834>] (mipi_csis_clk_disable [mxc_mipi_csi]) from [<7f00b9b4>] (mipi_csis_remove+0x44/0x58 [mxc_mipi_csi]) [<7f00b9b4>] (mipi_csis_remove [mxc_mipi_csi]) from [<8063138c>] (platform_drv_remove+0x24/0x3c) [<8063138c>] (platform_drv_remove) from [<8062fd30>] (device_release_driver_internal+0xec/0x1bc) [<8062fd30>] (device_release_driver_internal) from [<8062fe5c>] (driver_detach+0x44/0x80) [<8062fe5c>] (driver_detach) from [<8062ea30>] (bus_remove_driver+0x4c/0xa4) [<8062ea30>] (bus_remove_driver) from [<801ace34>] (sys_delete_module+0x134/0x1f8) [<801ace34>] (sys_delete_module) from [<80101000>] (ret_fast_syscall+0x0/0x54) Exception stack(0xa4979fa8 to 0xa4979ff0) 9fa0: 00ecb688 00ecb6c4 00ecb6c4 00000800 00000000 00000000 9fc0: 00ecb688 00ecb6c4 00000001 00000081 00000000 7efc7bb0 00ecb688 7efc7cd9 9fe0: 0049ff70 7efc67f4 0048667b 76f49468 ---[ end trace f2007e3990192dab ]--- Fix the kernel dump by removing unbalanced clock disablement in remove function. Signed-off-by: Robby Cai <[email protected]> Reviewed-by: G.n. Zhou <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
use existing standard function instead of mx6s_csi_mmap(). this could also fix possible deadlock issue from mx6s_csi_mmap as below. ====================================================== WARNING: possible circular locking dependency detected 5.7.0-next-20200612-lts-next+g2a193301c3f1 #1 Tainted: G O ------------------------------------------------------ v4l2_capture_em/1430 is trying to acquire lock: d933281c (&mm->mmap_lock){++++}-{3:3}, at: get_vaddr_frames+0x5c/0x234 but task is already holding lock: d94d6b50 (&csi_dev->lock){+.+.}-{3:3}, at: __video_do_ioctl+0xec/0x44c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&csi_dev->lock){+.+.}-{3:3}: mutex_lock_interruptible_nested+0x1c/0x24 mx6s_csi_mmap+0x20/0x54 [mx6s_capture] v4l2_mmap+0x54/0x90 mmap_region+0x3ac/0x67c do_mmap+0x3c0/0x50c vm_mmap_pgoff+0x94/0xe8 ksys_mmap_pgoff+0x7c/0xb4 ret_fast_syscall+0x0/0x28 0xbe925524 -> #0 (&mm->mmap_lock){++++}-{3:3}: lock_acquire+0xe0/0x524 down_read+0x38/0x1f4 get_vaddr_frames+0x5c/0x234 vb2_create_framevec+0x48/0x84 vb2_dc_get_userptr+0x7c/0x3e0 __prepare_userptr+0x17c/0x3d0 __buf_prepare+0x190/0x230 vb2_core_qbuf+0x3c8/0x68c vb2_qbuf+0x7c/0xd4 __video_do_ioctl+0x214/0x44c video_usercopy+0x140/0x860 ksys_ioctl+0xec/0xb8c ret_fast_syscall+0x0/0x28 0xbeb775a4 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&csi_dev->lock); lock(&mm->mmap_lock); lock(&csi_dev->lock); lock(&mm->mmap_lock); *** DEADLOCK *** 1 lock held by v4l2_capture_em/1430: #0: d94d6b50 (&csi_dev->lock){+.+.}-{3:3}, at: __video_do_ioctl+0xec/0x44c stack backtrace: CPU: 0 PID: 1430 Comm: v4l2_capture_em Tainted: G O 5.7.0-next-20200612-lts-next+g2a193301c3f1 #1 Hardware name: Freescale i.MX6 SoloX (Device Tree) [<c01127b0>] (unwind_backtrace) from [<c010c3b4>] (show_stack+0x10/0x14) [<c010c3b4>] (show_stack) from [<c05b8054>] (dump_stack+0xd8/0x10c) [<c05b8054>] (dump_stack) from [<c0193608>] (check_noncircular+0x130/0x1e4) [<c0193608>] (check_noncircular) from [<c0196580>] (__lock_acquire+0x15e8/0x33dc) [<c0196580>] (__lock_acquire) from [<c0198d0c>] (lock_acquire+0xe0/0x524) [<c0198d0c>] (lock_acquire) from [<c0e3f140>] (down_read+0x38/0x1f4) [<c0e3f140>] (down_read) from [<c02ba824>] (get_vaddr_frames+0x5c/0x234) [<c02ba824>] (get_vaddr_frames) from [<c09bbc48>] (vb2_create_framevec+0x48/0x84) [<c09bbc48>] (vb2_create_framevec) from [<c09bcf0c>] (vb2_dc_get_userptr+0x7c/0x3e0) [<c09bcf0c>] (vb2_dc_get_userptr) from [<c09b5368>] (__prepare_userptr+0x17c/0x3d0) [<c09b5368>] (__prepare_userptr) from [<c09b5e70>] (__buf_prepare+0x190/0x230) [<c09b5e70>] (__buf_prepare) from [<c09b80d4>] (vb2_core_qbuf+0x3c8/0x68c) [<c09b80d4>] (vb2_core_qbuf) from [<c09bb8bc>] (vb2_qbuf+0x7c/0xd4) [<c09bb8bc>] (vb2_qbuf) from [<c0988d8c>] (__video_do_ioctl+0x214/0x44c) [<c0988d8c>] (__video_do_ioctl) from [<c0989778>] (video_usercopy+0x140/0x860) [<c0989778>] (video_usercopy) from [<c02d6210>] (ksys_ioctl+0xec/0xb8c) [<c02d6210>] (ksys_ioctl) from [<c0100080>] (ret_fast_syscall+0x0/0x28) Signed-off-by: Robby Cai <[email protected]> Reviewed-by: G.n. Zhou <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
Add Over sample ratio kcontrol interface, allow user to specify over sample ratio amixer -c1 cget numid=1 numid=1,iface=MIXER,name='over sampling ratio' ; type=ENUMERATED,access=rw------,values=1,items=5 ; Item #0 'OSR_4x12' ; Item #1 'OSR_4x16' ; Item #2 'OSR_4x24' ; Item #3 'OSR_4x32' ; Item #4 'OSR_4x48' : values=0 Set different OSR amixer -c1 cset numid=1 2 Signed-off-by: Adrian Alonso <[email protected]> Reviewed-by: Viorel Suman <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
…orrectly At udc core, it sets vbus flag as true and call usb_gadget_connect by default. But for chipidea udc driver, it depends on VBUS status to call usb_gadget_connect, so it needs to set vbus flag according to VBUS at the .udc_start. With this change, the un-necessary usb_gadget_connect when load gadget function could be avoided. It could fix below oops as well: [ 17.175905] BUG: scheduling while atomic: v4l_id/649/0x00000002 [ 17.182016] Modules linked in: crct10dif_ce brcmfmac brcmutil synaptics_dsx_i2c fsl_imx8_ddr_perf galcore(O) [ 17.191893] CPU: 1 PID: 649 Comm: v4l_id Tainted: G O 5.4.3-lts-lf-5.4.y+gf8118585ee3c #1 [ 17.201369] Hardware name: FSL i.MX8MM DDR4 EVK RevB board (DT) [ 17.207286] Call trace: [ 17.209739] dump_backtrace+0x0/0x140 [ 17.213401] show_stack+0x14/0x20 [ 17.216717] dump_stack+0xb4/0xf8 [ 17.220033] __schedule_bug+0x5c/0x70 [ 17.223695] __schedule+0x4b4/0x560 [ 17.227181] schedule+0x40/0xe0 [ 17.230321] schedule_hrtimeout_range_clock+0x9c/0x118 [ 17.235455] schedule_hrtimeout_range+0x10/0x18 [ 17.239985] usleep_range+0x74/0xa0 [ 17.243473] ci_controller_resume+0x90/0xd8 [ 17.247652] ci_runtime_resume+0xc/0x18 [ 17.251488] pm_generic_runtime_resume+0x28/0x40 [ 17.256103] __rpm_callback+0x88/0x140 [ 17.259848] rpm_callback+0x20/0x80 [ 17.263334] rpm_resume+0x39c/0x580 [ 17.266819] __pm_runtime_resume+0x38/0x80 [ 17.270914] ci_udc_pullup+0x40/0xf8 [ 17.274489] usb_gadget_activate+0x48/0x78 [ 17.278582] usb_function_activate+0x68/0xb8 [ 17.282851] uvc_function_connect+0x18/0x50 [ 17.287031] uvc_v4l2_open+0x5c/0x78 [ 17.290607] v4l2_open+0xa0/0x118 [ 17.293922] chrdev_open+0xa0/0x198 [ 17.297408] do_dentry_open+0x110/0x3b0 [ 17.301240] vfs_open+0x28/0x30 [ 17.304381] path_openat+0x4a0/0x1240 [ 17.308040] do_filp_open+0x74/0xf8 [ 17.311525] do_sys_open+0x168/0x218 [ 17.315097] __arm64_sys_openat+0x20/0x28 [ 17.319107] el0_svc_common.constprop.0+0x68/0x160 [ 17.323898] el0_svc_handler+0x20/0x80 [ 17.327647] el0_svc+0x8/0xc Reviewed-by: Jun Li <[email protected]> Signed-off-by: Peter Chen <[email protected]>
advrisc
pushed a commit
that referenced
this issue
Dec 17, 2020
…tion According to the define: typedef irqreturn_t (*irq_handler_t)(int, void *); correct the two parameter definition of tpd_eint_handler. If not, Android will get the following dump since Android set the CONFIG_CFI_CLANG [ 49.905828] ------------[ cut here ]------------ [1088/49494] [ 49.910457] CFI failure (target: tpd_eint_handler+0x0/0x4 [synaptics_dsx_i2c]): [ 49.917785] WARNING: CPU: 0 PID: 0 at kernel/cfi.c:29 __ubsan_handle_cfi_check_fail+0x4c/0x54 [ 49.926299] Modules linked in: vvcam(O) synaptics_dsx_i2c pcie8xxx mlan [ 49.932912] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 5.4.47-00119-gfd9c870e583c #1 [ 49.941948] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 49.947252] pstate: 60400085 (nZCv daIf +PAN -UAO) [ 49.952036] pc : __ubsan_handle_cfi_check_fail+0x4c/0x54 [ 49.957340] lr : __ubsan_handle_cfi_check_fail+0x4c/0x54 [ 49.962642] sp : ffff800010003d80 [ 49.965956] x29: ffff800010003d80 x28: ffff80001154e76c [ 49.971268] x27: ffff800012560140 x26: ffff000177c28c00 [ 49.976572] x25: 0000000000000003 x24: ffff8000096fb394 [ 49.981876] x23: 0000000000000059 x22: ffff8000096fb000 [ 49.987180] x21: c6783f1d911d68d7 x20: ffff8000126a90c0 [ 49.992483] x19: ffff8000096fb394 x18: ffff00017f338058 [ 49.997787] x17: 0000000000000041 x16: ffff800011526a2c [ 50.003099] x15: ffff800011e308e7 x14: 0000000000000050 [ 50.008403] x13: 0000000000003537 x12: 0000000000000000 [ 50.013706] x11: 0000000000000000 x10: 00000000ffffffff [ 50.019010] x9 : 6320576a23b1fd00 x8 : 6320576a23b1fd00 [ 50.024314] x7 : 735b203478302f30 x6 : ffff800012772273 [ 50.029617] x5 : 0000000000000000 x4 : 0000000000000008 [ 50.034921] x3 : ffff80001155978c x2 : fffffffffffffc04 [ 50.040225] x1 : 0000000000000000 x0 : 0000000000000043 [ 50.045529] Call trace: [ 50.047978] __ubsan_handle_cfi_check_fail+0x4c/0x54 [ 50.052940] __cfi_check_fail+0x30/0x38 [synaptics_dsx_i2c] [ 50.058509] __cfi_check+0x32c/0x358 [synaptics_dsx_i2c] [ 50.063817] __handle_irq_event_percpu+0x330/0x37c [ 50.068600] handle_irq_event+0x60/0xd8 [ 50.072428] handle_level_irq+0x178/0x2dc [ 50.076430] generic_handle_irq+0x44/0x8c [ 50.080433] mxc_gpio_irq_handler+0x68/0x10c [ 50.084695] mx3_gpio_irq_handler+0xf8/0x200 [ 50.088957] __handle_domain_irq+0xa0/0x108 [ 50.093132] efi_header_end+0xb8/0x15c [ 50.096873] el1_irq+0x104/0x200 [ 50.100103] cpuidle_enter_state+0x178/0x314 [ 50.104365] cpuidle_enter+0x38/0x50 [ 50.107933] do_idle.llvm.13790858479135427060+0x1a4/0x294 [ 50.113410] cpu_startup_entry+0x24/0x28 [ 50.117327] kernel_init+0x0/0x2ac [ 50.120723] start_kernel+0x0/0x41c [ 50.124211] start_kernel+0x3a4/0x41c [ 50.127865] ---[ end trace b7a14580e01cd817 ]--- [ 50.136267] synaptics_rmi4_sensor_report: spontaneous reset detected [ 50.139705] imx2-wdt 30280000.watchdog: Device shutdown: Expect reboot! [ 50.149756] reboot: Restarting system with command 'bootloader' Reported-by: Jindong Yue <[email protected]> Signed-off-by: Haibo Chen <[email protected]> Reviewed-by: Fugang Duan <[email protected]> (cherry picked from commit 7bb33ed)
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
[ Upstream commit 06c4da8 ] Otherwise there may be race between module removal and the handling of netlink command, which can lead to the oops as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000098 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:down_write+0x1a/0x50 Call Trace: start_creating+0x89/0x130 debugfs_create_dir+0x1b/0x130 nbd_start_device+0x13d/0x390 [nbd] nbd_genl_connect+0x42f/0x748 [nbd] genl_family_rcv_msg_doit.isra.0+0xec/0x150 genl_rcv_msg+0xe5/0x1e0 netlink_rcv_skb+0x55/0x100 genl_rcv+0x29/0x40 netlink_unicast+0x1a8/0x250 netlink_sendmsg+0x21b/0x430 ____sys_sendmsg+0x2a4/0x2d0 ___sys_sendmsg+0x81/0xc0 __sys_sendmsg+0x62/0xb0 __x64_sys_sendmsg+0x1f/0x30 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Modules linked in: nbd(E-) Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
[ Upstream commit c55b2b9 ] When nbd module is being removing, nbd_alloc_config() may be called concurrently by nbd_genl_connect(), although try_module_get() will return false, but nbd_alloc_config() doesn't handle it. The race may lead to the leak of nbd_config and its related resources (e.g, recv_workq) and oops in nbd_read_stat() due to the unload of nbd module as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000040 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: knbd16-recv recv_work [nbd] RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd] Call Trace: recv_work+0x3b/0xb0 [nbd] process_one_work+0x1ed/0x390 worker_thread+0x4a/0x3d0 kthread+0x12a/0x150 ret_from_fork+0x22/0x30 Fixing it by checking the return value of try_module_get() in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV), assign nbd->config only when nbd_alloc_config() succeeds to ensure the value of nbd->config is binary (valid or NULL). Also adding a debug message to check the reference counter of nbd_config during module removal. Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
commit ba3beec upstream. Fix a crash that happens if an Rx only socket is created first, then a second socket is created that is Tx only and bound to the same umem as the first socket and also the same netdev and queue_id together with the XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page pool was not created by the first socket as it was an Rx only socket. When the second socket is bound it needs this tx_descs array of this shared page pool as it has a Tx component, but unfortunately it was never allocated, leading to a crash. Note that this array is only used for zero-copy drivers using the batched Tx APIs, currently only ice and i40e. [ 5511.150360] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 5511.158419] #PF: supervisor write access in kernel mode [ 5511.164472] #PF: error_code(0x0002) - not-present page [ 5511.170416] PGD 0 P4D 0 [ 5511.173347] Oops: 0002 [#1] PREEMPT SMP PTI [ 5511.178186] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 5.18.0-rc1+ #97 [ 5511.187245] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016 [ 5511.198418] RIP: 0010:xsk_tx_peek_release_desc_batch+0x198/0x310 [ 5511.205375] Code: c0 83 c6 01 84 c2 74 6d 8d 46 ff 23 07 44 89 e1 48 83 c0 14 48 c1 e1 04 48 c1 e0 04 48 03 47 10 4c 01 c1 48 8b 50 08 48 8b 00 <48> 89 51 08 48 89 01 41 80 bd d7 00 00 00 00 75 82 48 8b 19 49 8b [ 5511.227091] RSP: 0018:ffffc90000003dd0 EFLAGS: 00010246 [ 5511.233135] RAX: 0000000000000000 RBX: ffff88810c8da600 RCX: 0000000000000000 [ 5511.241384] RDX: 000000000000003c RSI: 0000000000000001 RDI: ffff888115f555c0 [ 5511.249634] RBP: ffffc90000003e08 R08: 0000000000000000 R09: ffff889092296b48 [ 5511.257886] R10: 0000ffffffffffff R11: ffff889092296800 R12: 0000000000000000 [ 5511.266138] R13: ffff88810c8db500 R14: 0000000000000040 R15: 0000000000000100 [ 5511.274387] FS: 0000000000000000(0000) GS:ffff88903f800000(0000) knlGS:0000000000000000 [ 5511.283746] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5511.290389] CR2: 0000000000000008 CR3: 00000001046e2001 CR4: 00000000003706f0 [ 5511.298640] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5511.306892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5511.315142] Call Trace: [ 5511.317972] <IRQ> [ 5511.320301] ice_xmit_zc+0x68/0x2f0 [ice] [ 5511.324977] ? ktime_get+0x38/0xa0 [ 5511.328913] ice_napi_poll+0x7a/0x6a0 [ice] [ 5511.333784] __napi_poll+0x2c/0x160 [ 5511.337821] net_rx_action+0xdd/0x200 [ 5511.342058] __do_softirq+0xe6/0x2dd [ 5511.346198] irq_exit_rcu+0xb5/0x100 [ 5511.350339] common_interrupt+0xa4/0xc0 [ 5511.354777] </IRQ> [ 5511.357201] <TASK> [ 5511.359625] asm_common_interrupt+0x1e/0x40 [ 5511.364466] RIP: 0010:cpuidle_enter_state+0xd2/0x360 [ 5511.370211] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 e9 00 7b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 72 02 00 00 31 ff e8 02 0c 80 ff fb 45 85 f6 <0f> 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49 [ 5511.391921] RSP: 0018:ffffffff82a03e60 EFLAGS: 00000202 [ 5511.397962] RAX: ffff88903f800000 RBX: 0000000000000001 RCX: 000000000000001f [ 5511.406214] RDX: 0000000000000000 RSI: ffffffff823400b9 RDI: ffffffff8234c046 [ 5511.424646] RBP: ffff88810a384800 R08: 000005032a28c046 R09: 0000000000000008 [ 5511.443233] R10: 000000000000000b R11: 0000000000000006 R12: ffffffff82bcf700 [ 5511.461922] R13: 000005032a28c046 R14: 0000000000000001 R15: 0000000000000000 [ 5511.480300] cpuidle_enter+0x29/0x40 [ 5511.494329] do_idle+0x1c7/0x250 [ 5511.507610] cpu_startup_entry+0x19/0x20 [ 5511.521394] start_kernel+0x649/0x66e [ 5511.534626] secondary_startup_64_no_verify+0xc3/0xcb [ 5511.549230] </TASK> Detect such case during bind() and allocate this memory region via newly introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as for other buffer pool allocations, so that it matches the kvfree() from xp_destroy(). Fixes: d1bc532 ("i40e: xsk: Move tmp desc array from driver to pool") Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
[ Upstream commit a1b29ba ] The following KASAN warning was reported in our kernel. BUG: KASAN: stack-out-of-bounds in get_wchan+0x188/0x250 Read of size 4 at addr d216f958 by task ps/14437 CPU: 3 PID: 14437 Comm: ps Tainted: G O 5.10.0 #1 Call Trace: [daa63858] [c0654348] dump_stack+0x9c/0xe4 (unreliable) [daa63888] [c035cf0c] print_address_description.constprop.3+0x8c/0x570 [daa63908] [c035d6bc] kasan_report+0x1ac/0x218 [daa63948] [c00496e8] get_wchan+0x188/0x250 [daa63978] [c0461ec8] do_task_stat+0xce8/0xe60 [daa63b98] [c0455ac8] proc_single_show+0x98/0x170 [daa63bc8] [c03cab8c] seq_read_iter+0x1ec/0x900 [daa63c38] [c03cb47c] seq_read+0x1dc/0x290 [daa63d68] [c037fc94] vfs_read+0x164/0x510 [daa63ea8] [c03808e4] ksys_read+0x144/0x1d0 [daa63f38] [c005b1dc] ret_from_syscall+0x0/0x38 --- interrupt: c00 at 0x8fa8f4 LR = 0x8fa8cc The buggy address belongs to the page: page:98ebcdd2 refcount:0 mapcount:0 mapping:00000000 index:0x2 pfn:0x1216f flags: 0x0() raw: 00000000 00000000 01010122 00000000 00000002 00000000 ffffffff 00000000 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: d216f800: 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 d216f880: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >d216f900: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 ^ d216f980: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 d216fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 After looking into this issue, I find the buggy address belongs to the task stack region. It seems KASAN has something wrong. I look into the code of __get_wchan in x86 architecture and find the same issue has been resolved by the commit f7d27c3 ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to powerpc architecture too. As Andrey Ryabinin said, get_wchan() is racy by design, it may access volatile stack of running task, thus it may access redzone in a stack frame and cause KASAN to warn about this. Use READ_ONCE_NOCHECK() to silence these warnings. Reported-by: Wanming Hu <[email protected]> Signed-off-by: He Ying <[email protected]> Signed-off-by: Chen Jingwen <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
[ Upstream commit c9b576d ] Fix our pointer offset usage in error_state_read when there is no i915_gpu_coredump but buf offset is non-zero. This fixes a kernel page fault can happen when multiple tests are running concurrently in a loop and one is producing engine resets and consuming the i915 error_state dump while the other is forcing full GT resets. (takes a while to trigger). The dmesg call trace: [ 5590.803000] BUG: unable to handle page fault for address: ffffffffa0b0e000 [ 5590.803009] #PF: supervisor read access in kernel mode [ 5590.803013] #PF: error_code(0x0000) - not-present page [ 5590.803016] PGD 5814067 P4D 5814067 PUD 5815063 PMD 109de4067 PTE 0 [ 5590.803022] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 5590.803026] CPU: 5 PID: 13656 Comm: i915_hangman Tainted: G U 5.17.0-rc5-ups69-guc-err-capt-rev6+ #136 [ 5590.803033] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-M LP4x RVP, BIOS ADLPFWI1.R00. 3031.A02.2201171222 01/17/2022 [ 5590.803039] RIP: 0010:memcpy_erms+0x6/0x10 [ 5590.803045] Code: fe ff ff cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 5590.803054] RSP: 0018:ffffc90003a8fdf0 EFLAGS: 00010282 [ 5590.803057] RAX: ffff888107ee9000 RBX: ffff888108cb1a00 RCX: 0000000000000f8f [ 5590.803061] RDX: 0000000000001000 RSI: ffffffffa0b0e000 RDI: ffff888107ee9071 [ 5590.803065] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 [ 5590.803069] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000019 [ 5590.803073] R13: 0000000000174fff R14: 0000000000001000 R15: ffff888107ee9000 [ 5590.803077] FS: 00007f62a99bee80(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000 [ 5590.803082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5590.803085] CR2: ffffffffa0b0e000 CR3: 000000010a1a8004 CR4: 0000000000770ee0 [ 5590.803089] PKRU: 55555554 [ 5590.803091] Call Trace: [ 5590.803093] <TASK> [ 5590.803096] error_state_read+0xa1/0xd0 [i915] [ 5590.803175] kernfs_fop_read_iter+0xb2/0x1b0 [ 5590.803180] new_sync_read+0x116/0x1a0 [ 5590.803185] vfs_read+0x114/0x1b0 [ 5590.803189] ksys_read+0x63/0xe0 [ 5590.803193] do_syscall_64+0x38/0xc0 [ 5590.803197] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 5590.803201] RIP: 0033:0x7f62aaea5912 [ 5590.803204] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a b9 0c 00 e8 05 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 5590.803213] RSP: 002b:00007fff5b659ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 5590.803218] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007f62aaea5912 [ 5590.803221] RDX: 000000000008b000 RSI: 00007f62a8c4000f RDI: 0000000000000006 [ 5590.803225] RBP: 00007f62a8bcb00f R08: 0000000000200010 R09: 0000000000101000 [ 5590.803229] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000006 [ 5590.803233] R13: 0000000000075000 R14: 00007f62a8acb010 R15: 0000000000200000 [ 5590.803238] </TASK> [ 5590.803240] Modules linked in: i915 ttm drm_buddy drm_dp_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers nfnetlink br_netfilter overlay mei_pxp mei_hdcp x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei fuse ip_tables x_tables crct10dif_pclmul e1000e crc32_pclmul ptp i2c_i801 ghash_clmulni_intel i2c_smbus pps_core [last unloa ded: ttm] [ 5590.803277] CR2: ffffffffa0b0e000 [ 5590.803280] ---[ end trace 0000000000000000 ]--- Fixes: 0e39037 ("drm/i915: Cache the error string") Signed-off-by: Alan Previn <[email protected]> Reviewed-by: John Harrison <[email protected]> Signed-off-by: John Harrison <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 3304033) Signed-off-by: Jani Nikula <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
commit aeaadcd upstream. Currently the back pointer from a queue to the vhost adapter isn't set until after subcrq interrupt registration. The value is available when a queue is first allocated and can/should be also set for primary and async queues as well as subcrqs. This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS interrupt controller where a pending subcrq interrupt from the previous kernel can be replayed immediately upon IRQ registration resulting in dereference of a garbage backpointer in ibmvfc_interrupt_scsi(). Kernel attempted to read user page (58) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000058 Faulting instruction address: 0xc008000003216a08 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c008000003216a08] ibmvfc_interrupt_scsi+0x40/0xb0 [ibmvfc] LR [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270 Call Trace: [c000000047fa3d80] [c0000000123e6180] 0xc0000000123e6180 (unreliable) [c000000047fa3df0] [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270 [c000000047fa3ea0] [c000000008207d18] handle_irq_event+0x98/0x188 [c000000047fa3ef0] [c00000000820f564] handle_fasteoi_irq+0xc4/0x310 [c000000047fa3f40] [c000000008205c60] generic_handle_irq+0x50/0x80 [c000000047fa3f60] [c000000008015c40] __do_irq+0x70/0x1a0 [c000000047fa3f90] [c000000008016d7c] __do_IRQ+0x9c/0x130 [c000000014622f60] [0000000020000000] 0x20000000 [c000000014622ff0] [c000000008016e50] do_IRQ+0x40/0xa0 [c000000014623020] [c000000008017044] replay_soft_interrupts+0x194/0x2f0 [c000000014623210] [c0000000080172a8] arch_local_irq_restore+0x108/0x170 [c000000014623240] [c000000008eb1008] _raw_spin_unlock_irqrestore+0x58/0xb0 [c000000014623270] [c00000000820b12c] __setup_irq+0x49c/0x9f0 [c000000014623310] [c00000000820b7c0] request_threaded_irq+0x140/0x230 [c000000014623380] [c008000003212a50] ibmvfc_register_scsi_channel+0x1e8/0x2f0 [ibmvfc] [c000000014623450] [c008000003213d1c] ibmvfc_init_sub_crqs+0xc4/0x1f0 [ibmvfc] [c0000000146234d0] [c0080000032145a8] ibmvfc_reset_crq+0x150/0x210 [ibmvfc] [c000000014623550] [c0080000032147c8] ibmvfc_init_crq+0x160/0x280 [ibmvfc] [c0000000146235f0] [c00800000321a9cc] ibmvfc_probe+0x2a4/0x530 [ibmvfc] Link: https://lore.kernel.org/r/[email protected] Fixes: 3034ebe ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels") Cc: [email protected] Reviewed-by: Brian King <[email protected]> Signed-off-by: Tyrel Datwyler <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
commit 72ea7fe upstream. Currently, the sub-queues and event pool resources are allocated/freed for every CRQ connection event such as reset and LPM. This exposes the driver to a couple issues. First the inefficiency of freeing and reallocating memory that can simply be resued after being sanitized. Further, a system under memory pressue runs the risk of allocation failures that could result in a crippled driver. Finally, there is a race window where command submission/compeletion can try to pull/return elements from/to an event pool that is being deleted or already has been deleted due to the lack of host state around freeing/allocating resources. The following is an example of list corruption following a live partition migration (LPM): Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: vfat fat isofs cdrom ext4 mbcache jbd2 nft_counter nft_compat nf_tables nfnetlink rpadlpar_io rpaphp xsk_diag nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill bonding tls sunrpc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth vmx_crypto dm_multipath dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse CPU: 0 PID: 2108 Comm: ibmvfc_0 Kdump: loaded Not tainted 5.14.0-70.9.1.el9_0.ppc64le #1 NIP: c0000000007c4bb0 LR: c0000000007c4bac CTR: 00000000005b9a10 REGS: c00000025c10b760 TRAP: 0700 Not tainted (5.14.0-70.9.1.el9_0.ppc64le) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 2800028f XER: 0000000f CFAR: c0000000001f55bc IRQMASK: 0 GPR00: c0000000007c4bac c00000025c10ba00 c000000002a47c00 000000000000004e GPR04: c0000031e3006f88 c0000031e308bd00 c00000025c10b768 0000000000000027 GPR08: 0000000000000000 c0000031e3009dc0 00000031e0eb0000 0000000000000000 GPR12: c0000031e2ffffa8 c000000002dd0000 c000000000187108 c00000020fcee2c0 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c008000002f81300 GPR24: 5deadbeef0000100 5deadbeef0000122 c000000263ba6910 c00000024cc88000 GPR28: 000000000000003c c0000002430a0000 c0000002430ac300 000000000000c300 NIP [c0000000007c4bb0] __list_del_entry_valid+0x90/0x100 LR [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100 Call Trace: [c00000025c10ba00] [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100 (unreliable) [c00000025c10ba60] [c008000002f42284] ibmvfc_free_queue+0xec/0x210 [ibmvfc] [c00000025c10bb10] [c008000002f4246c] ibmvfc_deregister_scsi_channel+0xc4/0x160 [ibmvfc] [c00000025c10bba0] [c008000002f42580] ibmvfc_release_sub_crqs+0x78/0x130 [ibmvfc] [c00000025c10bc20] [c008000002f4f6cc] ibmvfc_do_work+0x5c4/0xc70 [ibmvfc] [c00000025c10bce0] [c008000002f4fdec] ibmvfc_work+0x74/0x1e8 [ibmvfc] [c00000025c10bda0] [c0000000001872b8] kthread+0x1b8/0x1c0 [c00000025c10be10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 40820034 38600001 38210060 4e800020 7c0802a6 7c641b78 3c62fe7a 7d254b78 3863b590 f8010070 4ba309cd 60000000 <0fe00000> 7c0802a6 3c62fe7a 3863b640 ---[ end trace 11a2b65a92f8b66c ]--- ibmvfc 30000003: Send warning. Receive queue closed, will retry. Add registration/deregistration helpers that are called instead during connection resets to sanitize and reconfigure the queues. Link: https://lore.kernel.org/r/[email protected] Fixes: 3034ebe ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels") Cc: [email protected] Reviewed-by: Brian King <[email protected]> Signed-off-by: Tyrel Datwyler <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
[ Upstream commit f9710c3 ] When a VBD is not fully created and then closed, the kernel can have a NULL pointer dereference: The reproducer is trivial: [user@dom0 ~]$ sudo xl block-attach work backend=sys-usb vdev=xvdi target=/dev/sdz [user@dom0 ~]$ xl block-list work Vdev BE handle state evt-ch ring-ref BE-path 51712 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51712 51728 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51728 51744 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51744 51760 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51760 51840 3 241 3 -1 -1 /local/domain/3/backend/vbd/241/51840 ^ note state, the /dev/sdz doesn't exist in the backend [user@dom0 ~]$ sudo xl block-detach work xvdi [user@dom0 ~]$ xl block-list work Vdev BE handle state evt-ch ring-ref BE-path work is an invalid domain identifier And its console has: BUG: kernel NULL pointer dereference, address: 0000000000000050 PGD 80000000edebb067 P4D 80000000edebb067 PUD edec2067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 52 Comm: xenwatch Not tainted 5.16.18-2.43.fc32.qubes.x86_64 #1 RIP: 0010:blk_mq_stop_hw_queues+0x5/0x40 Code: 00 48 83 e0 fd 83 c3 01 48 89 85 a8 00 00 00 41 39 5c 24 50 77 c0 5b 5d 41 5c 41 5d c3 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <8b> 47 50 85 c0 74 32 41 54 49 89 fc 55 53 31 db 49 8b 44 24 48 48 RSP: 0018:ffffc90000bcfe98 EFLAGS: 00010293 RAX: ffffffffc0008370 RBX: 0000000000000005 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 RBP: ffff88800775f000 R08: 0000000000000001 R09: ffff888006e620b8 R10: ffff888006e620b0 R11: f000000000000000 R12: ffff8880bff39000 R13: ffff8880bff39000 R14: 0000000000000000 R15: ffff88800604be00 FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000050 CR3: 00000000e932e002 CR4: 00000000003706e0 Call Trace: <TASK> blkback_changed+0x95/0x137 [xen_blkfront] ? read_reply+0x160/0x160 xenwatch_thread+0xc0/0x1a0 ? do_wait_intr_irq+0xa0/0xa0 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack nft_counter nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront pcspkr xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn ipmi_devintf ipmi_msghandler fuse bpf_preload ip_tables overlay xen_blkfront CR2: 0000000000000050 ---[ end trace 7bc9597fd06ae89d ]--- RIP: 0010:blk_mq_stop_hw_queues+0x5/0x40 Code: 00 48 83 e0 fd 83 c3 01 48 89 85 a8 00 00 00 41 39 5c 24 50 77 c0 5b 5d 41 5c 41 5d c3 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <8b> 47 50 85 c0 74 32 41 54 49 89 fc 55 53 31 db 49 8b 44 24 48 48 RSP: 0018:ffffc90000bcfe98 EFLAGS: 00010293 RAX: ffffffffc0008370 RBX: 0000000000000005 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 RBP: ffff88800775f000 R08: 0000000000000001 R09: ffff888006e620b8 R10: ffff888006e620b0 R11: f000000000000000 R12: ffff8880bff39000 R13: ffff8880bff39000 R14: 0000000000000000 R15: ffff88800604be00 FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000050 CR3: 00000000e932e002 CR4: 00000000003706e0 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled info->rq and info->gd are only set in blkfront_connect(), which is called for state 4 (XenbusStateConnected). Guard against using NULL variables in blkfront_closing() to avoid the issue. The rest of blkfront_closing looks okay. If info->nr_rings is 0, then for_each_rinfo won't do anything. blkfront_remove also needs to check for non-NULL pointers before cleaning up the gendisk and request queue. Fixes: 05d69d9 "xen-blkfront: sanitize the removal state machine" Reported-by: Marek Marczykowski-Górecki <[email protected]> Signed-off-by: Jason Andryuk <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
The call trace show the kernel panic occurs in memset_io. Seems dma_alloc_coherent() return an error code on fail instead of NULL. So we use the IS_ERR_OR_NULL to check the return value. But it's strangely as the document has described the function dma_alloc_coherent returns a pointer to the allocated region (in the processor's virtual address space) or NULL if the allocation failed. 2022-05-24T04:06:15 [ 745.193344] Unable to handle kernel paging request at virtual address ffffffffffffffe8 2022-05-24T04:06:15 [ 745.201296] Mem abort info: 2022-05-24T04:06:15 [ 745.204101] ESR = 0x86000004 2022-05-24T04:06:15 [ 745.207161] EC = 0x21: IABT (current EL), IL = 32 bits 2022-05-24T04:06:15 [ 745.212503] SET = 0, FnV = 0 2022-05-24T04:06:15 [ 745.215596] EA = 0, S1PTW = 0 2022-05-24T04:06:15 [ 745.218740] FSC = 0x04: level 0 translation fault 2022-05-24T04:06:15 [ 745.224070] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081b03000 2022-05-24T04:06:15 [ 745.231018] [ffffffffffffffe8] pgd=0000000000000000, p4d=0000000000000000 2022-05-24T04:06:15 [ 745.237851] Internal error: Oops: 86000004 [#1] PREEMPT SMP 2022-05-24T04:06:16 [ 745.287814] Hardware name: Freescale i.MX8QM MEK (DT) 2022-05-24T04:06:16 [ 745.292862] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) 2022-05-24T04:06:16 [ 745.299819] pc : 0xffffffffffffffe8 2022-05-24T04:06:16 [ 745.303303] lr : alloc_dma_buffer+0x5c/0xdc 2022-05-24T04:06:16 [ 745.307499] sp : ffff80001686ba50 2022-05-24T04:06:16 [ 745.310809] x29: ffff80001686ba50 x28: ffff00081e934008 x27: 0000000000000000 2022-05-24T04:06:16 [ 745.317950] x26: ffff00081e934680 x25: ffff00081e934900 x24: ffff00081e935000 2022-05-24T04:06:16 [ 745.325091] x23: ffff800009d3b5b0 x22: ffff0008197d9800 x21: fffffffffffff9b0 2022-05-24T04:06:16 [ 745.332233] x20: ffff00081e934000 x19: ffff00081e935db0 x18: 0000000000000000 2022-05-24T04:06:16 [ 745.339374] x17: 0000000000002000 x16: fffffc00010fffc8 x15: 0000000000000001 2022-05-24T04:06:16 [ 745.346515] x14: 0000000000000002 x13: 00000000000c5a87 x12: ffff0008ff4458c0 2022-05-24T04:06:16 [ 745.353656] x11: ffff0008f95f9668 x10: fffffc0020453a08 x9 : 0000000000000000 2022-05-24T04:06:16 [ 745.360797] x8 : ffff80001c8be000 x7 : 0000000000000000 x6 : 000000000000003f 2022-05-24T04:06:16 [ 745.367938] x5 : 0000000000000040 x4 : 0000000000000000 x3 : ffff80001c8be000 2022-05-24T04:06:16 [ 745.375079] x2 : 0000000000c00000 x1 : 0000000000000000 x0 : ffff80001c5ef198 2022-05-24T04:06:16 [ 745.382221] Call trace: 2022-05-24T04:06:16 [ 745.384662] 0xffffffffffffffe8 2022-05-24T04:06:16 [ 745.387798] vpu_buf_queue+0x314/0x580 2022-05-24T04:06:16 [ 745.391551] __enqueue_in_driver+0x3c/0x6c 2022-05-24T04:06:16 [ 745.395651] vb2_core_qbuf+0x44c/0x5a0 2022-05-24T04:06:16 [ 745.399396] vb2_qbuf+0x94/0xf0 2022-05-24T04:06:16 [ 745.402532] v4l2_ioctl_qbuf+0x1a4/0x29c 2022-05-24T04:06:16 [ 745.406450] v4l_qbuf+0x48/0x60 2022-05-24T04:06:16 [ 745.409586] __video_do_ioctl+0x178/0x3dc 2022-05-24T04:06:16 [ 745.413591] video_usercopy+0x374/0x6a0 2022-05-24T04:06:16 [ 745.417422] video_ioctl2+0x18/0x24 2022-05-24T04:06:16 [ 745.420906] v4l2_ioctl+0x44/0x64 2022-05-24T04:06:16 [ 745.424216] __arm64_sys_ioctl+0xa8/0xf0 2022-05-24T04:06:16 [ 745.428142] invoke_syscall+0x48/0x114 2022-05-24T04:06:16 [ 745.431895] el0_svc_common.constprop.0+0xd4/0xfc 2022-05-24T04:06:16 [ 745.436595] do_el0_svc+0x28/0x90 2022-05-24T04:06:16 [ 745.439905] el0_svc+0x34/0xb0 2022-05-24T04:06:16 [ 745.442955] el0t_64_sync_handler+0xa4/0x130 2022-05-24T04:06:16 [ 745.447220] el0t_64_sync+0x18c/0x190 2022-05-24T04:06:16 [ 745.450884] Code: bad PC value 2022-05-24T04:06:16 [ 745.453940] ---[ end trace 0000000000000000 ]--- 2022-05-24T04:06:16 [ 745.458550] Kernel panic - not syncing: Oops: Fatal exception 2022-05-24T04:06:16 [ 745.464292] SMP: stopping secondary CPUs Signed-off-by: Ming Qian <[email protected]> Reviewed-by: Zhou Peng <[email protected]>
clayderhua
pushed a commit
that referenced
this issue
Jun 15, 2023
When running the following test on an 8QM using HDMI: systemctl stop weston; dmesg -n 1; while true; do (sleep 5; echo) | modetest -M imx-drm -s 148:1024x768-60.00@XR24; sleep 2; done the following synchronous external abort will be randomly thrown, resulting in the board freezing up: --- SError Interrupt on CPU4, code 0xbf000002 -- SError CPU: 4 PID: 287 Comm: cdns-mhdp-cec Tainted: G C 5.15.32-lts-next+ge8a7b095be1b #1 Hardware name: Freescale i.MX8QM MEK (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mhdp_cec_read+0x90/0xd0 lr : mhdp_cec_read+0x20/0xd0 sp : ffff80000c4fbde0 x29: ffff80000c4fbde0 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: ffff0008144d96c0 x24: ffff0008142bcdb0 x23: 0000000000033850 x22: ffff800009eab858 x21: ffff000811e98dc0 x20: ffff0008142bcdb0 x19: 0000000000000000 x18: 0000000000000001 x17: 0000000000000000 x16: 00000000000000a6 x15: 00000000000000a0 x14: 000000000000001e x13: 0000000000000001 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000960 x9 : ffff80000c4fbcf0 x8 : ffff000811e99780 x7 : ffff0008f95e0e80 x6 : 0000000000000000 x5 : 00000000410fd080 x4 : 0000000000000000 x3 : ffff0008142bcd88 x2 : ffff000811e98dc0 x1 : 0000000000000033 x0 : ffff80000afa594c Kernel panic - not syncing: Asynchronous SError Interrupt CPU: 4 PID: 287 Comm: cdns-mhdp-cec Tainted: G C 5.15.32-lts-next+ge8a7b095be1b #1 Hardware name: Freescale i.MX8QM MEK (DT) Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x1c/0x70 dump_stack_lvl+0x68/0x84 dump_stack+0x1c/0x38 panic+0x15c/0x31c add_taint+0x0/0xb0 arm64_serror_panic+0x70/0x80 do_serror+0x5c/0x60 el1h_64_error_handler+0x34/0x50 el1h_64_error+0x78/0x7c mhdp_cec_read+0x90/0xd0 mhdp_cec_poll_worker+0x64/0x27c kthread+0x154/0x160 ret_from_fork+0x10/0x20 SMP: stopping secondary CPUs Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP --- The reason for the SError is because we're trying to read/write from MHDP controller, in the HDMI CEC worker thread, while clocks are off in the middle of a mode set pixel clock change. The simplest solution is to guard the pixel clock changes with the iolock mutex and all read/writes from MHDP controller will then have to wait until the clocks are back on. Signed-off-by: Laurentiu Palcu <[email protected]> Reviewed-by: Sandor Yu <[email protected]> Acked-by: Robert Chiras <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm trying to use the
rts-gpio
devicetree setting to emulate RTS modem control signals for an RS485 serial bus.I can NOT use hardware RTS because the Advantech RSB-3430 does not have those signals exposed on the 40-pin connectors (CN1 and CN2).
I'm doing manual gpio sets for RTS from user space (using gpiod) but it is not reliable, especially under processor load. The timing is not reliable and clobbers responses from other devices on the RS485 bus.
The imx tty driver does support the
rts-gpio
devicetree setting. When I specify it I can see that the gpiod system reports that it is in use (by consumer "rts") and I cannot manually set it high/low anymore from user space.HOWEVER, the gpio does not toggle when sending data. Looking at the imx tty driver, I can see that there are
#ifdef CONFIG_ARCH_ADVANTECH
blocks that skip the RS485 handling of RTS. i.e. the functionsimx_port_rts_active()
andimx_port_rts_inactive()
are never called (see code example below).Why is there special code for Advantech to disable this feature ??
Can it be fixed/patched please so I can use RS485 with my Advantech RSB-3430 board please.
I'm using the
adv_4.14.98_2.0.0_ga
kernel.https://github.com/ADVANTECH-Corp/linux-imx/blob/adv_4.14.98_2.0.0_ga/drivers/tty/serial/imx.c
The text was updated successfully, but these errors were encountered: