Darren Reed
2008-Sep-17 18:37 UTC
[crossbow-discuss] Related packets not all on the same CPU?
In tracking down a test suite failure that *sometimes* happens with ipfilter, it would appear that *sometimes* the ordering of packets when they are passed up into IP is not the same order as appears on the wire. For example, with snoop I see: 13 2.39264 netvirt-c0 -> netvirt-c1 UDP IP fragment ID=17681 Offset=0 MF=1 TOS=0x0 TTL=255 14 0.00009 netvirt-c0 -> netvirt-c1 UDP IP fragment ID=17681 Offset=1480 MF=1 TOS=0x0 TTL=255 15 0.00006 netvirt-c0 -> netvirt-c1 UDP IP fragment ID=17681 Offset=2960 MF=0 TOS=0x0 TTL=255 16 0.00323 netvirt-c1 -> netvirt-c0 UDP IP fragment ID=14803 Offset=0 MF=1 TOS=0x0 TTL=255 17 0.00000 netvirt-c1 -> netvirt-c0 UDP IP fragment ID=14803 Offset=1480 MF=1 TOS=0x0 TTL=255 18 0.00000 netvirt-c1 -> netvirt-c0 UDP IP fragment ID=14803 Offset=2960 MF=0 TOS=0x0 TTL=255 But with dtrace I see this: 0 <- ipfr_newfrag new(17681) = 6001da5db18 0 -> fr_fraglookup fin(2a100706bb8).flx 2024 id 17681 off 1480 0 <- fr_fraglookup 6001da5db18 0 -> fr_fraglookup fin(2a100706bb8).flx 2024 id 17681 off 2960 0 <- fr_fraglookup 6001da5db18 8 -> fr_fraglookup fin(2a100ee74e8).flx 2024 id 14803 off 1480 8 <- fr_fraglookup 0 8 -> fr_fraglookup fin(2a100ee74e8).flx 2024 id 14803 off 2960 8 <- fr_fraglookup 6001d806140 10 -> fr_fraglookup fin(2a101fc14e8).flx 24 id 14803 off 0 10 <- fr_fraglookup 0 10 <- ipfr_newfrag new(14803) = 6001d806140 For reference the values next to flx are "bits/ip_id", where the values for "bits" are 0x2000 = fragment body, 0x4 = fragment, 0x20 = tcp/udp and "off" is the offset from ip_off. Now I know there is no guaranteed order of delivery of packets with IP, but I find it somewhat strange that we are doing this. Further, I don''t know if it is related, but where the ordering is mixed up, the processing of the packet is split across different CPUs - see all on #0 vs split over #8 & #10. I suspect the dtrace output is not quite so easy to interpret and that the packet on cpu#10 is processed between the firsta and second packets on cpu#8. This is being observed on a T-1000 running snv_96. While this is pre-crossbow, is someone from crossbow knowledgable enough about what is going on to understand if this problem/bug is likely to exist with crossbow or if crossbow will handle this properly? My gut feeling is that if a packet is fragmented then we need to exclude the layer 4 headers (even if it is a packet with fragment offset #0) when deciding which CPU a packet should be scheduled on...something that might be useful input for the crossbow project... btw, I''m seemingly able to reproduce this relatively easily (but not every time), so if someone from crossbow would like to engage in some testing, we can probably do that... Darren (filed as CR#6749500)