Jakub Kicinski
2020-Dec-03 18:28 UTC
[Bridge] [PATCH] bridge: Fix a deadlock when enabling multicast snooping
On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:> When enabling multicast snooping, bridge module deadlocks on multicast_lock > if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 > network. > > The deadlock was caused by the following sequence: While holding the lock, > br_multicast_open calls br_multicast_join_snoopers, which eventually causes > IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). > Since the destination Ethernet address is a multicast address, br_dev_xmit > feeds the packet back to the bridge via br_multicast_rcv, which in turn > calls br_multicast_add_group, which then deadlocks on multicast_lock. > > The fix is to move the call br_multicast_join_snoopers outside of the > critical section. This works since br_multicast_join_snoopers only deals > with IP and does not modify any multicast data structures of the bridge, > so there's no need to hold the lock. > > Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") > > Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com>Nik, Linus - how does this one look?
Nikolay Aleksandrov
2020-Dec-03 20:46 UTC
[Bridge] [PATCH] bridge: Fix a deadlock when enabling multicast snooping
On 03/12/2020 20:28, Jakub Kicinski wrote:> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote: >> When enabling multicast snooping, bridge module deadlocks on multicast_lock >> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 >> network. >> >> The deadlock was caused by the following sequence: While holding the lock, >> br_multicast_open calls br_multicast_join_snoopers, which eventually causes >> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). >> Since the destination Ethernet address is a multicast address, br_dev_xmit >> feeds the packet back to the bridge via br_multicast_rcv, which in turn >> calls br_multicast_add_group, which then deadlocks on multicast_lock. >> >> The fix is to move the call br_multicast_join_snoopers outside of the >> critical section. This works since br_multicast_join_snoopers only deals >> with IP and does not modify any multicast data structures of the bridge, >> so there's no need to hold the lock. >> >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") >> >> Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> > > Nik, Linus - how does this one look? >Hi, Thanks, somehow I missed this one too. Need to check my email config. :) I believe I see how it can happen, although it's not straight-forward to follow. A selftest for this case would be great, and any traces (e.g. hung task) would help a lot as well. Correct me if I'm wrong but the sequence is something like: br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> igmp6_group_added -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode igmp6_join_group() -> igmp6_join_group -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge mcast processing code which uses the multicast_lock spinlock. Right? One question - shouldn't leaving have the same problem? I.e. br_multicast_toggle -> br_multicast_leave_snoopers -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> igmp6_group_dropped -> igmp6_leave_group -> MLDv1 mode && last reporter -> igmp6_send() ? I think it was saved by the fact that !br_opt_get(br, BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that because it's not really trivial to find out. :) Anyhow, the patch is fine as-is too: Acked-by: Nikolay Aleksandrov <nikolay at nvidia.com> Thanks, Nik