thr3ads.net - Linux Ethernet Bridging - [Bridge] 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2] [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Patrick Ringl

2010-Oct-16 12:11 UTC

[Bridge] 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]

Hi,

okay I narrowed down the issue. I watched all function calls of the 
'bridge' module with the help of a small systemtap probe of mine. I 
first traced a timespan where the issue did not occur, then one where it 
did and composed an intersection of these two:

br_fdb_cleanup
br_flood
br_flood_forward
br_ip4_multicast_add_group
br_ip4_multicast_alloc_query
br_ip4_multicast_leave_group
br_ip6_multicast_alloc_query
br_mdb_get
br_multicast_alloc_query
br_multicast_flood
br_multicast_forward
br_multicast_ipv4_rcv
br_multicast_port_query_expired
br_multicast_query_expired
br_multicast_rcv
__br_multicast_send_query
br_multicast_send_query

igmp_hdr
ip_hdrlen
ipv6_addr_copy
ipv6_addr_set
ipv6_eth_mc_map
ipv6_hdr

maybe_deliver
netdev_alloc_skb
netdev_alloc_skb_ip_align

skb_checksum_complete
__skb_pull
__skb_push
skb_reserve
skb_reset_transport_header
skb_set_network_header
skb_set_transport_header

These are the function calls that are called during the 
'nonfunctional'-timespan.

This again gave me the idea to use tcpdump and watch out for igmp and 
v6. Well, and that is also where the issue is coming from.

Once a multicast membership query (igmp) arrives, A multicast listener 
query (icmpv6) is sent.
 From my understanding of the bridge code br_flood will propgate the 
packet to all nodes (simple multicast) and this is also where things 
stop working. Systemtap itself and thus in my case function calls of the 
bridge module are not delayed, but something needs to be wrong in the 
multicast handling of the bridge interface, since as pointed out in my 
previous email with 2.6.32 everything is working fine.

Can anyone reconfirm this issue, or give a helping hand in how to 
proceed further?

PS: I have attached two files (functional and 
nonfunctional/problematic-trace of the bridge module)

PPS: Again please CC back to me, since I am not subscribed

regards,
Patrick
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: functional-trace
Url:
http://lists.linux-foundation.org/pipermail/bridge/attachments/20101016/d8a2bc0c/attachment-0002.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nonfunctional-trace
Url:
http://lists.linux-foundation.org/pipermail/bridge/attachments/20101016/d8a2bc0c/attachment-0003.txt

Patrick Ringl

2010-Oct-16 18:15 UTC

head link

[Bridge] 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]

Hi,

okay I narrowed down the issue. I watched all function calls of the 
'bridge' module with the help of a small systemtap probe of mine. I 
first traced a timespan where the issue did not occur, then one where it 
did and composed an intersection of these two:

br_fdb_cleanup
br_flood
br_flood_forward
br_ip4_multicast_add_group
br_ip4_multicast_alloc_query
br_ip4_multicast_leave_group
br_ip6_multicast_alloc_query
br_mdb_get
br_multicast_alloc_query
br_multicast_flood
br_multicast_forward
br_multicast_ipv4_rcv
br_multicast_port_query_expired
br_multicast_query_expired
br_multicast_rcv
__br_multicast_send_query
br_multicast_send_query

igmp_hdr
ip_hdrlen
ipv6_addr_copy
ipv6_addr_set
ipv6_eth_mc_map
ipv6_hdr

maybe_deliver
netdev_alloc_skb
netdev_alloc_skb_ip_align

skb_checksum_complete
__skb_pull
__skb_push
skb_reserve
skb_reset_transport_header
skb_set_network_header
skb_set_transport_header

These are the function calls that are exclusively called during the 
'nonfunctional'-timespan.

This again gave me the idea to use tcpdump and watch out for igmp and 
v6. Well, and that is also where the issue is coming from.

Once a multicast membership query (igmp) arrives, A multicast listener 
query (icmpv6) is sent.
  From my understanding of the bridge code br_flood will propgate the 
packet to all nodes (simple multicast) and this is also where things 
stop working. Systemtap itself and thus in my case function calls of the 
bridge module are not delayed, but something needs to be wrong in the 
multicast handling of the bridge interface, since as pointed out in my 
previous email with 2.6.32 everything is working fine.

Can anyone reconfirm this issue, or give a helping hand in how to 
proceed further?

PS: Herbert, I've seen your changes for 2.6.34 which I think are 
responsible for this behavior (even 2.6.33 here works fine. Anything 
containing your multicast-related fixed breaks here).
Could you specifically take a look into it and/or tell me how I can help 
you?

PPS: Again please CC back to me, since I am not subscribed

regards,
Patrick

Linux Ethernet Bridging - Oct 2010 - [Bridge] 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]

[Bridge] 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]

[Bridge] 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]