forked from vvfedorenko/linux-dpll
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dpll: first changes to dpll logic in ice driver #6
Open
mwolech
wants to merge
1
commit into
kubalewski:base
Choose a base branch
from
mwolech:molech_basic_ice_changes
base: base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Milena Olech <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Dec 23, 2022
KASAN reported a UAF bug when I was running xfs/235: BUG: KASAN: use-after-free in xlog_recover_process_intents+0xa77/0xae0 [xfs] Read of size 8 at addr ffff88804391b360 by task mount/5680 CPU: 2 PID: 5680 Comm: mount Not tainted 6.0.0-xfsx #6.0.0 77e7b52a4943a975441e5ac90a5ad7748b7867f6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2cc/0x682 kasan_report+0xa3/0x120 xlog_recover_process_intents+0xa77/0xae0 [xfs fb841c7180aad3f8359438576e27867f5795667e] xlog_recover_finish+0x7d/0x970 [xfs fb841c7180aad3f8359438576e27867f5795667e] xfs_log_mount_finish+0x2d7/0x5d0 [xfs fb841c7180aad3f8359438576e27867f5795667e] xfs_mountfs+0x11d4/0x1d10 [xfs fb841c7180aad3f8359438576e27867f5795667e] xfs_fs_fill_super+0x13d5/0x1a80 [xfs fb841c7180aad3f8359438576e27867f5795667e] get_tree_bdev+0x3da/0x6e0 vfs_get_tree+0x7d/0x240 path_mount+0xdd3/0x17d0 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7ff5bc069eae Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe433fd448 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5bc069eae RDX: 00005575d7213290 RSI: 00005575d72132d0 RDI: 00005575d72132b0 RBP: 00005575d7212fd0 R08: 00005575d7213230 R09: 00005575d7213fe0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00005575d7213290 R14: 00005575d72132b0 R15: 00005575d7212fd0 </TASK> Allocated by task 5680: kasan_save_stack+0x1e/0x40 __kasan_slab_alloc+0x66/0x80 kmem_cache_alloc+0x152/0x320 xfs_rui_init+0x17a/0x1b0 [xfs] xlog_recover_rui_commit_pass2+0xb9/0x2e0 [xfs] xlog_recover_items_pass2+0xe9/0x220 [xfs] xlog_recover_commit_trans+0x673/0x900 [xfs] xlog_recovery_process_trans+0xbe/0x130 [xfs] xlog_recover_process_data+0x103/0x2a0 [xfs] xlog_do_recovery_pass+0x548/0xc60 [xfs] xlog_do_log_recovery+0x62/0xc0 [xfs] xlog_do_recover+0x73/0x480 [xfs] xlog_recover+0x229/0x460 [xfs] xfs_log_mount+0x284/0x640 [xfs] xfs_mountfs+0xf8b/0x1d10 [xfs] xfs_fs_fill_super+0x13d5/0x1a80 [xfs] get_tree_bdev+0x3da/0x6e0 vfs_get_tree+0x7d/0x240 path_mount+0xdd3/0x17d0 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x144/0x1b0 slab_free_freelist_hook+0xab/0x180 kmem_cache_free+0x1f1/0x410 xfs_rud_item_release+0x33/0x80 [xfs] xfs_trans_free_items+0xc3/0x220 [xfs] xfs_trans_cancel+0x1fa/0x590 [xfs] xfs_rui_item_recover+0x913/0xd60 [xfs] xlog_recover_process_intents+0x24e/0xae0 [xfs] xlog_recover_finish+0x7d/0x970 [xfs] xfs_log_mount_finish+0x2d7/0x5d0 [xfs] xfs_mountfs+0x11d4/0x1d10 [xfs] xfs_fs_fill_super+0x13d5/0x1a80 [xfs] get_tree_bdev+0x3da/0x6e0 vfs_get_tree+0x7d/0x240 path_mount+0xdd3/0x17d0 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff88804391b300 which belongs to the cache xfs_rui_item of size 688 The buggy address is located 96 bytes inside of 688-byte region [ffff88804391b300, ffff88804391b5b0) The buggy address belongs to the physical page: page:ffffea00010e4600 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888043919320 pfn:0x43918 head:ffffea00010e4600 order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff) raw: 04fff80000010200 0000000000000000 dead000000000122 ffff88807f0eadc0 raw: ffff888043919320 0000000080140010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804391b200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88804391b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88804391b300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88804391b380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88804391b400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== The test fuzzes an rmap btree block and starts writer threads to induce a filesystem shutdown on the corrupt block. When the filesystem is remounted, recovery will try to replay the committed rmap intent item, but the corruption problem causes the recovery transaction to fail. Cancelling the transaction frees the RUD, which frees the RUI that we recovered. When we return to xlog_recover_process_intents, @lip is now a dangling pointer, and we cannot use it to find the iop_recover method for the tracepoint. Hence we must store the item ops before calling ->iop_recover if we want to give it to the tracepoint so that the trace data will tell us exactly which intent item failed. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Dec 23, 2022
Andrii Nakryiko says: ==================== This patch set fixes and improves BPF verifier's precision tracking logic for SCALAR registers. Patches #1 and #2 are bug fixes discovered while working on these changes. Patch #3 enables precision tracking for BPF programs that contain subprograms. This was disabled before and prevent any modern BPF programs that use subprograms from enjoying the benefits of SCALAR (im)precise logic. Patch #4 is few lines of code changes and many lines of explaining why those changes are correct. We establish why ignoring precise markings in current state is OK. Patch #5 build on explanation in patch #4 and pushes it to the limit by forcefully forgetting inherited precise markins. Patch #4 by itself doesn't prevent current state from having precise=true SCALARs, so patch #5 is necessary to prevent such stray precise=true registers from creeping in. Patch #6 adjusts test_align selftests to work around BPF verifier log's limitations when it comes to interactions between state output and precision backtracking output. Overall, the goal of this patch set is to make BPF verifier's state tracking a bit more efficient by trying to preserve as much generality in checkpointed states as possible. v1->v2: - adjusted patch #1 commit message to make it clear we are fixing forward step, not precision backtracking (Alexei); - moved last_idx/first_idx verbose logging up to make it clear when global func reaches the first empty state (Alexei). ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Dec 23, 2022
Petr Machata says: ==================== mlxsw: Add 802.1X and MAB offload support This patchset adds 802.1X [1] and MAB [2] offload support in mlxsw. Patches #1-#3 add the required switchdev interfaces. Patches #4-#5 add the required packet traps for 802.1X. Patches #6-#10 are small preparations in mlxsw. Patch #11 adds locked bridge port support in mlxsw. Patches #12-#15 add mlxsw selftests. The patchset was also tested with the generic forwarding selftest ('bridge_locked_port.sh'). [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a21d9a670d81103db7f788de1a4a4a6e4b891a0b [2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a35ec8e38cdd1766f29924ca391a01de20163931 ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Apr 4, 2023
In xfs_buffered_write_iomap_begin, @icur is the iext cursor for the data fork and @CCur is the cursor for the cow fork. Pass in whichever cursor corresponds to allocfork, because otherwise the xfs_iext_prev_extent call can use the data fork cursor to walk off the end of the cow fork structure. Best case it returns the wrong results, worst case it does this: stack segment: 0000 [#1] PREEMPT SMP CPU: 2 PID: 3141909 Comm: fsstress Tainted: G W 6.3.0-rc2-xfsx #6.3.0-rc2 7bf5cc2e98997627cae5c930d890aba3aeec65dd Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014 RIP: 0010:xfs_iext_prev+0x71/0x150 [xfs] RSP: 0018:ffffc90002233aa8 EFLAGS: 00010297 RAX: 000000000000000f RBX: 000000000000000e RCX: 000000000000000c RDX: 0000000000000002 RSI: 000000000000000e RDI: ffff8883d0019ba0 RBP: 989642409af8a7a7 R08: ffffea0000000001 R09: 0000000000000002 R10: 0000000000000000 R11: 000000000000000c R12: ffffc90002233b00 R13: ffff8883d0019ba0 R14: 989642409af8a6bf R15: 000ffffffffe0000 FS: 00007fdf8115f740(0000) GS:ffff88843fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdf8115e000 CR3: 0000000357256000 CR4: 00000000003506e0 Call Trace: <TASK> xfs_iomap_prealloc_size.constprop.0.isra.0+0x1a6/0x410 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c] xfs_buffered_write_iomap_begin+0xa87/0xc60 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c] iomap_iter+0x132/0x2f0 iomap_file_buffered_write+0x92/0x330 xfs_file_buffered_write+0xb1/0x330 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c] vfs_write+0x2eb/0x410 ksys_write+0x65/0xe0 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Found by xfs/538 in alwayscow mode, but this doesn't seem particular to that test. Fixes: 590b165 ("xfs: refactor xfs_iomap_prealloc_size") Actually-Fixes: 66ae56a ("xfs: introduce an always_cow mode") Signed-off-by: Darrick J. Wong <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
May 24, 2023
In the function ieee80211_tx_dequeue() there is a particular locking sequence: begin: spin_lock(&local->queue_stop_reason_lock); q_stopped = local->queue_stop_reasons[q]; spin_unlock(&local->queue_stop_reason_lock); However small the chance (increased by ftracetest), an asynchronous interrupt can occur in between of spin_lock() and spin_unlock(), and the interrupt routine will attempt to lock the same &local->queue_stop_reason_lock again. This will cause a costly reset of the CPU and the wifi device or an altogether hang in the single CPU and single core scenario. The only remaining spin_lock(&local->queue_stop_reason_lock) that did not disable interrupts was patched, which should prevent any deadlocks on the same CPU/core and the same wifi device. This is the probable trace of the deadlock: kernel: ================================ kernel: WARNING: inconsistent lock state kernel: 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4 Tainted: G W kernel: -------------------------------- kernel: inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kernel: kworker/5:0/25656 [HC0[0]:SC0[0]:HE1:SE1] takes: kernel: ffff9d6190779478 (&local->queue_stop_reason_lock){+.?.}-{2:2}, at: return_to_handler+0x0/0x40 kernel: {IN-SOFTIRQ-W} state was registered at: kernel: lock_acquire+0xc7/0x2d0 kernel: _raw_spin_lock+0x36/0x50 kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211] kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm] kernel: iwl_mvm_mac_wake_tx_queue+0x2d/0xd0 [iwlmvm] kernel: ieee80211_queue_skb+0x450/0x730 [mac80211] kernel: __ieee80211_xmit_fast.constprop.66+0x834/0xa50 [mac80211] kernel: __ieee80211_subif_start_xmit+0x217/0x530 [mac80211] kernel: ieee80211_subif_start_xmit+0x60/0x580 [mac80211] kernel: dev_hard_start_xmit+0xb5/0x260 kernel: __dev_queue_xmit+0xdbe/0x1200 kernel: neigh_resolve_output+0x166/0x260 kernel: ip_finish_output2+0x216/0xb80 kernel: __ip_finish_output+0x2a4/0x4d0 kernel: ip_finish_output+0x2d/0xd0 kernel: ip_output+0x82/0x2b0 kernel: ip_local_out+0xec/0x110 kernel: igmpv3_sendpack+0x5c/0x90 kernel: igmp_ifc_timer_expire+0x26e/0x4e0 kernel: call_timer_fn+0xa5/0x230 kernel: run_timer_softirq+0x27f/0x550 kernel: __do_softirq+0xb4/0x3a4 kernel: irq_exit_rcu+0x9b/0xc0 kernel: sysvec_apic_timer_interrupt+0x80/0xa0 kernel: asm_sysvec_apic_timer_interrupt+0x1f/0x30 kernel: _raw_spin_unlock_irqrestore+0x3f/0x70 kernel: free_to_partial_list+0x3d6/0x590 kernel: __slab_free+0x1b7/0x310 kernel: kmem_cache_free+0x52d/0x550 kernel: putname+0x5d/0x70 kernel: do_sys_openat2+0x1d7/0x310 kernel: do_sys_open+0x51/0x80 kernel: __x64_sys_openat+0x24/0x30 kernel: do_syscall_64+0x5c/0x90 kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: irq event stamp: 5120729 kernel: hardirqs last enabled at (5120729): [<ffffffff9d149936>] trace_graph_return+0xd6/0x120 kernel: hardirqs last disabled at (5120728): [<ffffffff9d149950>] trace_graph_return+0xf0/0x120 kernel: softirqs last enabled at (5069900): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40 kernel: softirqs last disabled at (5067555): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40 kernel: other info that might help us debug this: kernel: Possible unsafe locking scenario: kernel: CPU0 kernel: ---- kernel: lock(&local->queue_stop_reason_lock); kernel: <Interrupt> kernel: lock(&local->queue_stop_reason_lock); kernel: *** DEADLOCK *** kernel: 8 locks held by kworker/5:0/25656: kernel: #0: ffff9d618009d138 ((wq_completion)events_freezable){+.+.}-{0:0}, at: process_one_work+0x1ca/0x530 kernel: #1: ffffb1ef4637fe68 ((work_completion)(&local->restart_work)){+.+.}-{0:0}, at: process_one_work+0x1ce/0x530 kernel: #2: ffffffff9f166548 (rtnl_mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40 kernel: #3: ffff9d6190778728 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: return_to_handler+0x0/0x40 kernel: #4: ffff9d619077b480 (&mvm->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40 kernel: #5: ffff9d61907bacd8 (&trans_pcie->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40 kernel: #6: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_queue_state_change+0x59/0x3a0 [iwlmvm] kernel: #7: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm] kernel: stack backtrace: kernel: CPU: 5 PID: 25656 Comm: kworker/5:0 Tainted: G W 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4 kernel: Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN51WW 11/16/2022 kernel: Workqueue: events_freezable ieee80211_restart_work [mac80211] kernel: Call Trace: kernel: <TASK> kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: dump_stack_lvl+0x5f/0xa0 kernel: dump_stack+0x14/0x20 kernel: print_usage_bug.part.46+0x208/0x2a0 kernel: mark_lock.part.47+0x605/0x630 kernel: ? sched_clock+0xd/0x20 kernel: ? trace_clock_local+0x14/0x30 kernel: ? __rb_reserve_next+0x5f/0x490 kernel: ? _raw_spin_lock+0x1b/0x50 kernel: __lock_acquire+0x464/0x1990 kernel: ? mark_held_locks+0x4e/0x80 kernel: lock_acquire+0xc7/0x2d0 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: ? ftrace_return_to_handler+0x8b/0x100 kernel: ? preempt_count_add+0x4/0x70 kernel: _raw_spin_lock+0x36/0x50 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211] kernel: ? prepare_ftrace_return+0xc5/0x190 kernel: ? ftrace_graph_func+0x16/0x20 kernel: ? 0xffffffffc02ab0b1 kernel: ? lock_acquire+0xc7/0x2d0 kernel: ? iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm] kernel: ? ieee80211_tx_dequeue+0x9/0x1330 [mac80211] kernel: ? __rcu_read_lock+0x4/0x40 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_mvm_queue_state_change+0x311/0x3a0 [iwlmvm] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_mvm_wake_sw_queue+0x17/0x20 [iwlmvm] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_txq_gen2_unmap+0x1c9/0x1f0 [iwlwifi] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_txq_gen2_free+0x55/0x130 [iwlwifi] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_txq_gen2_tx_free+0x63/0x80 [iwlwifi] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: _iwl_trans_pcie_gen2_stop_device+0x3f3/0x5b0 [iwlwifi] kernel: ? _iwl_trans_pcie_gen2_stop_device+0x9/0x5b0 [iwlwifi] kernel: ? mutex_lock_nested+0x4/0x30 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_trans_pcie_gen2_stop_device+0x5f/0x90 [iwlwifi] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_mvm_stop_device+0x78/0xd0 [iwlmvm] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: __iwl_mvm_mac_start+0x114/0x210 [iwlmvm] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: iwl_mvm_mac_start+0x76/0x150 [iwlmvm] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: drv_start+0x79/0x180 [mac80211] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: ieee80211_reconfig+0x1523/0x1ce0 [mac80211] kernel: ? synchronize_net+0x4/0x50 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: ieee80211_restart_work+0x108/0x170 [mac80211] kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: process_one_work+0x250/0x530 kernel: ? ftrace_regs_caller_end+0x66/0x66 kernel: worker_thread+0x48/0x3a0 kernel: ? __pfx_worker_thread+0x10/0x10 kernel: kthread+0x10f/0x140 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork+0x29/0x50 kernel: </TASK> Fixes: 4444bc2 ("wifi: mac80211: Proper mark iTXQs for resumption") Link: https://lore.kernel.org/all/[email protected]/ Reported-by: Mirsad Goran Todorovac <[email protected]> Cc: Gregory Greenman <[email protected]> Cc: Johannes Berg <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Cc: David S. Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Alexander Wetzel <[email protected]> Signed-off-by: Mirsad Goran Todorovac <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: tag, or it goes automatically? Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 5, 2023
…ting Currently ffa_drv->remove() is called unconditionally from ffa_device_remove(). Since the driver registration doesn't check for it and allows it to be registered without .remove callback, we need to check for the presence of it before executing it from ffa_device_remove() to above a NULL pointer dereference like the one below: | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 | Mem abort info: | ESR = 0x0000000086000004 | EC = 0x21: IABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | user pgtable: 4k pages, 48-bit VAs, pgdp=0000000881cc8000 | [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 | Internal error: Oops: 0000000086000004 [#1] PREEMPT SMP | CPU: 3 PID: 130 Comm: rmmod Not tainted 6.3.0-rc7 #6 | Hardware name: FVP Base RevC (DT) | pstate: 63402809 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=-c) | pc : 0x0 | lr : ffa_device_remove+0x20/0x2c | Call trace: | 0x0 | device_release_driver_internal+0x16c/0x260 | driver_detach+0x90/0xd0 | bus_remove_driver+0xdc/0x11c | driver_unregister+0x30/0x54 | ffa_driver_unregister+0x14/0x20 | cleanup_module+0x18/0xeec | __arm64_sys_delete_module+0x234/0x378 | invoke_syscall+0x40/0x108 | el0_svc_common+0xb4/0xf0 | do_el0_svc+0x30/0xa4 | el0_svc+0x2c/0x7c | el0t_64_sync_handler+0x84/0xf0 | el0t_64_sync+0x190/0x194 Fixes: 244f5d5 ("firmware: arm_ffa: Add missing remove callback to ffa_bus_type") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sudeep Holla <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 5, 2023
Magnus Karlsson says: ==================== Prepare the AF_XDP selftests test framework code for the upcoming multi-buffer support in AF_XDP. This so that the multi-buffer patch set does not become way too large. In that upcoming patch set, we are only including the multi-buffer tests together with any framework code that depends on the new options bit introduced in the AF_XDP multi-buffer implementation itself. Currently, the test framework is based on the premise that a packet consists of a single fragment and thus occupies a single buffer and a single descriptor. Multi-buffer breaks this assumption, as that is the whole purpose of it. Now, a packet can consist of multiple buffers and therefore consume multiple descriptors. The patch set starts with some clean-ups and simplifications followed by patches that make sure that the current code works even when a packet occupies multiple buffers. The actual code for sending and receiving multi-buffer packets will be included in the AF_XDP multi-buffer patch set as it depends on a new bit being used in the options field of the descriptor. Patch set anatomy: 1: The XDP program was unnecessarily changed many times. Fixes this. 2: There is no reason to generate a full UDP/IPv4 packet as it is never used. Simplify the code by just generating a valid Ethernet frame. 3: Introduce a more complicated payload pattern that can detect fragments out of bounds in a multi-buffer packet and other errors found in single-fragment packets. 4: As a convenience, dump the content of the faulty packet at error. 5: To simplify the code, make the usage of the packet stream for Tx and Rx more similar. 6: Store the offset of the packet in the buffer in the struct pkt definition instead of the address in the umem itself and introduce a simple buffer allocator. The address only made sense when all packets consumed a single buffer. Now, we do not know beforehand how many buffers a packet will consume, so we instead just allocate a buffer from the allocator and specify the offset within that buffer. 7: Test for huge pages only once instead of before each test that needs it. 8: Populate the fill ring based on how many frags are needed for each packet. 9: Change the data generation code so it can generate data for multi-buffer packets too. 10: Adjust the packet pacing algorithm so that it can cope with multi-buffer packets. The pacing algorithm is present so that Tx does not send too many packets/frames to Rx that it starts to drop packets. That would ruin the tests. v1 -> v2: * Fixed spelling error in patch #6 [Simon] * Fixed compilation error with llvm in patch #7 [Daniel] Thanks: Magnus ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 5, 2023
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 vvfedorenko#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 vvfedorenko#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f vvfedorenko#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 vvfedorenko#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 5, 2023
Jiri Pirko says: ==================== devlink: move port ops into separate structure In devlink, some of the objects have separate ops registered alongside with the object itself. Port however have ops in devlink_ops structure. For drivers what register multiple kinds of ports with different ops this is not convenient. This patchset changes does following changes: 1) Introduces devlink_port_ops with functions that allow devlink port to be registered passing a pointer to driver port ops. (patch #1) 2) Converts drivers to define port_ops and register ports passing the ops pointer. (patches #2, #3, #4, #6, #8, and #9) 3) Moves ops from devlink_ops struct to devlink_port_ops. (patches #5, #7, #10-15) No functional changes. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 5, 2023
Ido Schimmel says: ==================== Add layer 2 miss indication and filtering tl;dr ===== This patchset adds a single bit to the tc skb extension to indicate that a packet encountered a layer 2 miss in the bridge and extends flower to match on this metadata. This is required for non-DF (Designated Forwarder) filtering in EVPN multi-homing which prevents decapsulated BUM packets from being forwarded multiple times to the same multi-homed host. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches in a rack. These switches act as VTEPs and are not directly connected (as opposed to MLAG), but can communicate with each other (as well as with VTEPs in remote racks) via spine switches over L3. When a host sends a BUM packet over ES1 to VTEP1, the VTEP will flood it to other VTEPs in the network, including those connected to the host over ES1. The receiving VTEPs must drop the packet and not forward it back to the host. This is called "split-horizon filtering" (SPH) [1]. FRR configures SPH filtering using two tc filters. The first, an ingress filter that matches on packets received from VTEP1 and marks them using a fwmark (firewall mark). The second, an egress filter configured on the LAG interface connected to the host that matches on the fwmark and drops the packets. Example: # tc filter add dev vxlan0 ingress pref 1 proto all flower enc_src_ip $VTEP1_IP action skbedit mark 101 # tc filter add dev bond0 egress pref 1 handle 101 fw action drop Motivation ========== For each ES, only one VTEP is elected by the control plane as the DF. The DF is responsible for forwarding decapsulated BUM traffic to the host over the ES. The non-DF VTEPs must drop such traffic as otherwise the host will receive multiple copies of BUM traffic. This is called "non-DF filtering" [2]. Filtering of multicast and broadcast traffic can be achieved using the following flower filter: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop Unlike broadcast and multicast traffic, it is not currently possible to filter unknown unicast traffic. The classification into unknown unicast is performed by the bridge driver, but is not visible to other layers. Implementation ============== The proposed solution is to add a single bit to the tc skb extension that is set by the bridge for packets that encountered an FDB or MDB miss. The flower classifier is extended to be able to match on this new metadata bit in a similar fashion to existing metadata options such as 'indev'. A bit that is set for every flooded packet would also work, but it does not allow us to differentiate between registered and unregistered multicast traffic which might be useful in the future. A relatively generic name is chosen for this bit - 'l2_miss' - to allow its use to be extended to other layer 2 devices such as VXLAN, should a use case arise. With the above, the control plane can implement a non-DF filter using the following tc filters: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop # tc filter add dev bond0 egress pref 2 proto all flower indev vxlan0 l2_miss true action drop The first drops broadcast and multicast traffic and the second drops unknown unicast traffic. Testing ======= A test exercising the different permutations of the 'l2_miss' bit is added in patch #8. Patchset overview ================= Patch #1 adds the new bit to the tc skb extension and sets it in the bridge driver for packets that encountered a miss. The marking of the packets and the use of this extension is protected by the 'tc_skb_ext_tc' static key in order to keep performance impact to a minimum when the feature is not in use. Patch #2 extends the flow dissector to dissect this information from the tc skb extension into the 'FLOW_DISSECTOR_KEY_META' key. Patch #3 extends the flower classifier to be able to match on the new layer 2 miss metadata. The classifier enables the 'tc_skb_ext_tc' static key upon the installation of the first filter that matches on 'l2_miss' and disables the key upon the removal of the last filter that matches on it. Patch #4 rejects matching on the new metadata in drivers that already support the 'FLOW_DISSECTOR_KEY_META' key. Patches #5-#6 are small preparations in mlxsw. Patch #7 extends mlxsw to be able to match on layer 2 miss. Patch #8 adds a selftest. iproute2 patches can be found here [3]. [1] https://datatracker.ietf.org/doc/html/rfc7432#section-8.3 [2] https://datatracker.ietf.org/doc/html/rfc7432#section-8.5 [3] https://github.com/idosch/iproute2/tree/submit/non_df_filter_v1 [4] https://lore.kernel.org/netdev/[email protected]/ [5] https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 19, 2023
Petr Machata says: ==================== mlxsw, selftests: Cleanups This patchset consolidates a number of disparate items that can all be considered cleanups. They are all related to mlxsw in that they are directly in mlxsw code, or in selftests that mlxsw heavily uses. - patch #1 fixes a comment, patch #2 propagates an extack - patches #3 and #4 tweak several loops to query a resource once and cache in a local variable instead of querying on each iteration - patches #5 and #6 fix selftest diagrams, and #7 adds a missing diagram into an existing test - patch #8 disables a PVID on a bridge in a selftest that should not need said PVID ==================== Signed-off-by: David S. Miller <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 19, 2023
Currently, the per cpu upcall counters are allocated after the vport is created and inserted into the system. This could lead to the datapath accessing the counters before they are allocated resulting in a kernel Oops. Here is an example: PID: 59693 TASK: ffff0005f4f51500 CPU: 0 COMMAND: "ovs-vswitchd" #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4 #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60 #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58 #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388 #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68 #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch] ... PID: 58682 TASK: ffff0005b2f0bf00 CPU: 0 COMMAND: "kworker/0:3" #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758 #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994 #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8 #3 [ffff80000a5d3120] die at ffffb70f0628234c #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8 #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4 #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4 #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710 #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74 #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24 #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch] #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch] #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch] #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch] #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch] #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90 We moved the per cpu upcall counter allocation to the existing vport alloc and free functions to solve this. Fixes: 95637d9 ("net: openvswitch: release vport resources on failure") Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets") Signed-off-by: Eelco Chaudron <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 19, 2023
Petr Machata says: ==================== mlxsw: Cleanups in router code This patchset moves some router-related code from spectrum.c to spectrum_router.c where it should be. It also simplifies handlers of netevent notifications. - Patch #1 caches router pointer in a dedicated variable. This obviates the need to access the same as mlxsw_sp->router, making lines shorter, and permitting a future patch to add code that fits within 80 character limit. - Patch #2 moves IP / IPv6 validation notifier blocks from spectrum.c to spectrum_router, where the handlers are anyway. - In patch #3, pass router pointer to scheduler of deferred work directly, instead of having it deduce it on its own. - This makes the router pointer available in the handler function mlxsw_sp_router_netevent_event(), so in patch #4, use it directly, instead of finding it through mlxsw_sp_port. - In patch #5, extend mlxsw_sp_router_schedule_work() so that the NETEVENT_NEIGH_UPDATE handler can use it directly instead of inlining equivalent code. - In patches #6 and #7, add helpers for two common operations involving a backing netdev of a RIF. This makes it unnecessary for the function mlxsw_sp_rif_dev() to be visible outside of the router module, so in patch #8, hide it. ==================== Signed-off-by: David S. Miller <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 19, 2023
Petr Machata says: ==================== mlxsw: Preparations for out-of-order-operations patches The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to a bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. Over the course of the following several patchsets, mlxsw code is going to be adjusted to diminish the space of wrongly offloaded configurations. Ideally the offload state will reflect the actual state, regardless of the sequence of operation used to construct that state. No functional changes are intended in this patchset yet. Rather the patches prepare the codebase for easier introduction of functional changes in later patchsets. - In patch #1, extract a helper to join a RIF of a given port, if there is one. In patch #2, use it in a newly-added helper to join a LAG interface. - In patches #3, #4 and #5, add helpers that abstract away the rif->dev access. This will make it simpler in the future to change the way the deduction is done. In patch #6, do this for deduction from nexthop group info to RIF. - In patch #7, add a helper to destroy a RIF. So far RIF was destroyed simply by kfree'ing it. - In patch #8, add a helper to check if any IP addresses are configured on a netdevice. This helper will be useful later. - In patch #9, add a helper to migrate a RIF. This will be a convenient place to put extensions later on. - Patch #10 move IPIP initialization up to make ipip_ops_arr available earlier. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jun 27, 2023
Petr Machata says: ==================== mlxsw: Maintain candidate RIFs The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. The situation is going to be made better by implementing a range of replays and post-hoc offloads. This patch set lays the ground for replay of next hops. The particular issue that it deals with is that currently, driver-specific bookkeeping for next hops is hooked off RIF objects, which come and go across the lifetime of a netdevice. We would rather keep these objects at an entity that mirrors the lifetime of the netdevice itself. That way they are at hand and can be offloaded when a RIF is eventually created. To that end, with this patchset, mlxsw keeps a hash table of CRIFs: candidate RIFs, persistent handles for netdevices that mlxsw deems potentially interesting. The lifetime of a CRIF matches that of the underlying netdevice, and thus a RIF can always assume a CRIF exists. A CRIF is where next hops are kept, and when RIF is created, these next hops can be easily offloaded. (Previously only the next hops created after the RIF was created were offloaded.) - Patches #1 and #2 are minor adjustments. - In patches #3 and #4, add CRIF bookkeeping. - In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF, the corresponding CRIF is easy to look up. - Patch #6 is a clean-up allowed by the previous patches - Patches #7 and #8 move next hop tracking to CRIFs No observable effects are intended as of yet. This will be useful once there is support for RIF creation for netdevices that become mlxsw uppers, which will come in following patch sets. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Jul 28, 2023
Petr Machata says: ==================== mlxsw: Permit enslavement to netdevices with uppers The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. Similarly, detaching a front panel port from a configured topology means unoffloading of this whole topology -- VLAN uppers, next hops, etc. Attaching the port back is then not permitted at all. If it were, it would not result in a working configuration, because much of mlxsw is written to react to changes in immediate configuration. There is nothing that would go visit netdevices in the attached-to topology and offload existing routes and VLAN memberships, for example. In this patchset, introduce a number of replays to be invoked so that this sort of post-hoc offload is supported. Then remove the vetoes that disallowed enslavement of front panel ports to other netdevices with uppers. The patchset progresses as follows: - In patch #1, fix an issue in the bridge driver. To my knowledge, the issue could not have resulted in a buggy behavior previously, and thus is packaged with this patchset instead of being sent separately to net. - In patch #2, add a new helper to the switchdev code. - In patch #3, drop mlxsw selftests that will not be relevant after this patchset anymore. - Patches #4, #5, #6, #7 and #8 prepare the codebase for smoother introduction of the rest of the code. - Patches #9, #10, #11, #12, #13 and #14 replay various aspects of upper configuration when a front panel port is introduced into a topology. Individual patches take care of bridge and LAG RIF memberships, switchdev replay, nexthop and neighbors replay, and MACVLAN offload. - Patches #15 and #16 introduce RIFs for newly-relevant netdevices when a front panel port is enslaved (in which case all uppers are newly relevant), or, respectively, deslaved (in which case the newly-relevant netdevice is the one being deslaved). - Up until this point, the introduced scaffolding was not really used, because mlxsw still forbids enslavement of mlxsw netdevices to uppers with uppers. In patch #17, this condition is finally relaxed. A sizable selftest suite is available to test all this new code. That will be sent in a separate patchset. ==================== Signed-off-by: David S. Miller <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Aug 4, 2023
The cited commit holds encap tbl lock unconditionally when setting up dests. But it may cause the following deadlock: PID: 1063722 TASK: ffffa062ca5d0000 CPU: 13 COMMAND: "handler8" #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91 #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528 #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core] #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core] #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core] #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core] #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core] #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core] #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core] #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core] #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower] #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower] #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0 [1031218.028143] wait_for_completion+0x24/0x30 [1031218.028589] mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core] [1031218.029256] mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core] [1031218.029885] process_one_work+0x24e/0x510 Actually no need to hold encap tbl lock if there is no encap action. Fix it by checking if encap action exists or not before holding encap tbl lock. Fixes: 37c3b9f ("net/mlx5e: Prevent encap offload when neigh update is running") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Aug 4, 2023
Petr Machata says: ==================== selftests: New selftests for out-of-order-operations patches in mlxsw In the past, the mlxsw driver made the assumption that the user applies configuration in a bottom-up manner. Thus netdevices needed to be added to the bridge before IP addresses were configured on that bridge or SVI added on top of it, because whatever happened before a netdevice was mlxsw upper was generally ignored by mlxsw. Recently, several patch series were pushed to introduce the bookkeeping and replays necessary to offload the full state, not just the immediate configuration step. In this patchset, introduce new selftests that directly exercise the out of order code paths in mlxsw. - Patch #1 adds new tests into the existing selftest router_bridge.sh. - Patches #2-#5 add new generic selftests. - Patches #6-#8 add new mlxsw-specific selftests. ==================== Signed-off-by: David S. Miller <[email protected]>
kubalewski
pushed a commit
that referenced
this pull request
Aug 22, 2023
…l/git/netfilter/nf Florisn Westphal says: ==================== These are netfilter fixes for the *net* tree. First patch resolves a false-positive lockdep splat: rcu_dereference is used outside of rcu read lock. Let lockdep validate that the transaction mutex is locked. Second patch fixes a kdoc warning added in previous PR. Third patch fixes a memory leak: The catchall element isn't disabled correctly, this allows userspace to deactivate the element again. This results in refcount underflow which in turn prevents memory release. This was always broken since the feature was added in 5.13. Patch 4 fixes an incorrect change in the previous pull request: Adding a duplicate key to a set should work if the duplicate key has expired, restore this behaviour. All from myself. Patch #5 resolves an old historic artifact in sctp conntrack: a 300ms timeout for shutdown_ack. Increase this to 3s. From Xin Long. Patch #6 fixes a sysctl data race in ipvs, two threads can clobber the sysctl value, from Sishuai Gong. This is a day-0 bug that predates git history. Patches 7, 8 and 9, from Pablo Neira Ayuso, are also followups for the previous GC rework in nf_tables: The netlink notifier and the netns exit path must both increment the gc worker seqcount, else worker may encounter stale (free'd) pointers. ================ Signed-off-by: David S. Miller <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Milena Olech [email protected]