On 4/22/20 01:45, Kristof Provost wrote:> On 22 Apr 2020, at 10:20, Xin Li wrote: >> Hi, >> >> On 4/14/20 02:51, Kristof Provost wrote: >>> Hi, >>> >>> Thanks to support from The FreeBSD Foundation I?ve been able to work on >>> improving the throughput of if_bridge. >>> It changes the (data path) locking to use the NET_EPOCH infrastructure. >>> Benchmarking shows substantial improvements (x5 in test setups). >>> >>> This work is ready for wider testing now. >>> >>> It?s under review here: https://reviews.freebsd.org/D24250 >>> >>> Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true >>> Patches for stable/12: >>> https://people.freebsd.org/~kp/if_bridge/stable_12/ >>> >>> I?m not currently aware of any panics or issues resulting from these >>> patches. >> >> I have observed the following panic with latest stable/12 after applying >> the stable_12 patchset, it appears like a race condition related NULL >> pointer deference, but I haven't took a deeper look yet. >> >> The box have 7 igb(4) NICs, with several bridge and VLAN configured >> acting as a router.? Please let me know if you need additional >> information; I can try -CURRENT as well, but it would take some time as >> the box is relatively slow (it's a ZFS based system so I can create a >> separate boot environment for -CURRENT if needed, but that would take >> some time as I might have to upgrade the packages, should there be any >> ABI breakages). >> > Thanks for the report. I don?t immediately see how this could happen. > > Are you running an L2 firewall on that bridge by any chance? An earlier > version of the patch had issues with a stray unlock in that code path.I don't think I have a L2 firewall (I assume means filtering based on MAC address like what can be done with e.g. ipfw? The bridges were created on vlan interfaces though, do they count as L2 firewall?), the system is using pf with a few NAT rules: $ sudo pfctl -s rules anchor "miniupnpd" all pass in quick inet6 proto tcp from <myv6> to any flags S/SA keep state block drop in quick inet6 proto tcp from ! <myv6> to <myv6> flags S/SA block drop in quick proto tcp from any os "Linux" to any port = ssh pass out on igb6 inet proto tcp from (igb6) to any port = domain flags S/SA keep state queue dns pass out on igb6 inet proto udp from (igb6) to any port = domain keep state queue dns pass in on igb6 proto tcp from any to (igb6) port = http flags S/SA modulate state queue(web, ack) pass in on igb6 proto tcp from any to (igb6) port = https flags S/SA modulate state queue(web, ack) pass out on igb6 inet proto tcp from (igb6) to any flags S/SA modulate state queue bulk block drop in quick on igb6 proto tcp from <sshguard> to any port = ssh label "ssh bruteforce" block drop in on igb6 from <badhosts> to any Cheers, -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 865 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20200422/104f93e2/attachment.sig>
Just using pf is enough to provoke this panic. I had the same back trace. This patch from Kristof fixed it for me. diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c index 373fa096d70..83c453090bb 100644 --- a/sys/net/if_bridge.c +++ b/sys/net/if_bridge.c @@ -2529,7 +2529,6 @@ bridge_input(struct ifnet *ifp, struct mbuf *m) OR_PFIL_HOOKED_INET6)) { \ if (bridge_pfil(&m, NULL, ifp, \ PFIL_IN) != 0 || m == NULL) { \ - BRIDGE_UNLOCK(sc); \ return (NULL); \ } \ eh = mtod(m, struct ether_header *); \> On 22 Apr 2020, at 18:15, Xin Li <delphij at delphij.net> wrote: > > On 4/22/20 01:45, Kristof Provost wrote: >> On 22 Apr 2020, at 10:20, Xin Li wrote: >>> Hi, >>> >>> On 4/14/20 02:51, Kristof Provost wrote: >>>> Hi, >>>> >>>> Thanks to support from The FreeBSD Foundation I?ve been able to work on >>>> improving the throughput of if_bridge. >>>> It changes the (data path) locking to use the NET_EPOCH infrastructure. >>>> Benchmarking shows substantial improvements (x5 in test setups). >>>> >>>> This work is ready for wider testing now. >>>> >>>> It?s under review here: https://reviews.freebsd.org/D24250 >>>> >>>> Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true >>>> Patches for stable/12: >>>> https://people.freebsd.org/~kp/if_bridge/stable_12/ >>>> >>>> I?m not currently aware of any panics or issues resulting from these >>>> patches. >>> >>> I have observed the following panic with latest stable/12 after applying >>> the stable_12 patchset, it appears like a race condition related NULL >>> pointer deference, but I haven't took a deeper look yet. >>> >>> The box have 7 igb(4) NICs, with several bridge and VLAN configured >>> acting as a router. Please let me know if you need additional >>> information; I can try -CURRENT as well, but it would take some time as >>> the box is relatively slow (it's a ZFS based system so I can create a >>> separate boot environment for -CURRENT if needed, but that would take >>> some time as I might have to upgrade the packages, should there be any >>> ABI breakages). >>> >> Thanks for the report. I don?t immediately see how this could happen. >> >> Are you running an L2 firewall on that bridge by any chance? An earlier >> version of the patch had issues with a stray unlock in that code path. > > I don't think I have a L2 firewall (I assume means filtering based on > MAC address like what can be done with e.g. ipfw? The bridges were > created on vlan interfaces though, do they count as L2 firewall?), the > system is using pf with a few NAT rules: > > $ sudo pfctl -s rules > anchor "miniupnpd" all > pass in quick inet6 proto tcp from <myv6> to any flags S/SA keep state > block drop in quick inet6 proto tcp from ! <myv6> to <myv6> flags S/SA > block drop in quick proto tcp from any os "Linux" to any port = ssh > pass out on igb6 inet proto tcp from (igb6) to any port = domain flags > S/SA keep state queue dns > pass out on igb6 inet proto udp from (igb6) to any port = domain keep > state queue dns > pass in on igb6 proto tcp from any to (igb6) port = http flags S/SA > modulate state queue(web, ack) > pass in on igb6 proto tcp from any to (igb6) port = https flags S/SA > modulate state queue(web, ack) > pass out on igb6 inet proto tcp from (igb6) to any flags S/SA modulate > state queue bulk > block drop in quick on igb6 proto tcp from <sshguard> to any port = ssh > label "ssh bruteforce" > block drop in on igb6 from <badhosts> to any > > Cheers,
On 22 Apr 2020, at 18:15, Xin Li wrote:> On 4/22/20 01:45, Kristof Provost wrote: >> On 22 Apr 2020, at 10:20, Xin Li wrote: >>> Hi, >>> >>> On 4/14/20 02:51, Kristof Provost wrote: >>>> Hi, >>>> >>>> Thanks to support from The FreeBSD Foundation I?ve been able to >>>> work on >>>> improving the throughput of if_bridge. >>>> It changes the (data path) locking to use the NET_EPOCH >>>> infrastructure. >>>> Benchmarking shows substantial improvements (x5 in test setups). >>>> >>>> This work is ready for wider testing now. >>>> >>>> It?s under review here: https://reviews.freebsd.org/D24250 >>>> >>>> Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true >>>> Patches for stable/12: >>>> https://people.freebsd.org/~kp/if_bridge/stable_12/ >>>> >>>> I?m not currently aware of any panics or issues resulting from >>>> these >>>> patches. >>> >>> I have observed the following panic with latest stable/12 after >>> applying >>> the stable_12 patchset, it appears like a race condition related >>> NULL >>> pointer deference, but I haven't took a deeper look yet. >>> >>> The box have 7 igb(4) NICs, with several bridge and VLAN configured >>> acting as a router.? Please let me know if you need additional >>> information; I can try -CURRENT as well, but it would take some time >>> as >>> the box is relatively slow (it's a ZFS based system so I can create >>> a >>> separate boot environment for -CURRENT if needed, but that would >>> take >>> some time as I might have to upgrade the packages, should there be >>> any >>> ABI breakages). >>> >> Thanks for the report. I don?t immediately see how this could >> happen. >> >> Are you running an L2 firewall on that bridge by any chance? An >> earlier >> version of the patch had issues with a stray unlock in that code >> path. > > I don't think I have a L2 firewall (I assume means filtering based on > MAC address like what can be done with e.g. ipfw? The bridges were > created on vlan interfaces though, do they count as L2 firewall?), the > system is using pf with a few NAT rules: >That backtrace looks identical to the one Peter reported, up to and including the offset in the bridge_input() function. Given that there?s no likely way to end up with a NULL mutex either I have to assume that it?s a case of trying to unlock a locked mutex, and the most likely reason is that you ran into the same problem Peter ran into. The current version of the patch should resolve it. Best regards, Kristof