bugzilla-daemon at netfilter.org
2024-Aug-26 09:08 UTC
[Bug 1766] New: nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Bug ID: 1766
Summary: nfqueue randomly drops packets with same tuple
Product: netfilter/iptables
Version: unspecified
Hardware: x86_64
OS: All
Status: NEW
Severity: major
Priority: P5
Component: netfilter hooks
Assignee: netfilter-buglog at lists.netfilter.org
Reporter: antonio.ojea.garcia at gmail.com
I was puzzled by this problem for a long time, first reported in
https://github.com/kubernetes-sigs/kube-network-policies/issues/12 and
now reported in https://github.com/kubernetes-sigs/kind/issues/3713
It seems the same symptom as described here
https://www.spinics.net/lists/netfilter/msg58296.html but those seems to be
fixed back in the day.
I was able to narrow down the scenario, I will try to
translate the kubernetes constructs to namespaces and node to describe
better the scenario.
2 nodes: N1 and N2
N1 contains two containers:
- client C1 (10.244.1.3)
- DNS server D1 (10.244.1.5)
N2 containers the second DNS server D2 (10.244.2.4)
One rule to send the packets to nfqueue in postrouting, but it
happened in other hooks before. We can assume the set matches the
packet and the nfqueue userspace always accept the packet
> chain postrouting {
> type filter hook postrouting priority srcnat - 5; policy
accept;
> icmpv6 type { nd-neighbor-solicit, nd-neighbor-advert }
accept
> meta skuid 0 accept
> ct state established,related accept
> ip saddr @podips-v4 queue flags bypass to 100 comment
"process IPv4 traffic with network policy enforcement"
> ip daddr @podips-v4 queue flags bypass to 100 comment
"process IPv4 traffic with network policy enforcement"
> ip6 saddr @podips-v6 queue flags bypass to 100 comment
"process IPv6 traffic with network policy enforcement"
> ip6 daddr @podips-v6 queue flags bypass to 100 comment
"process IPv6 traffic with network policy enforcement"
> }
The containerd DNS servers are abstracted via DNAT with IP 10.96.0.10
> meta l4proto udp ip daddr 10.96.0.10 udp dport 53 counter packets 0 bytes
0 jump KUBE-SVC-TCOU7JCQXEZGVUNU
> chain KUBE-SVC-TCOU7JCQXEZGVUNU {
> meta l4proto udp ip saddr != 10.244.0.0/16 ip daddr
10.96.0.10 udp dport 53 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
> meta random & 2147483647 < 1073741824 counter
packets 38 bytes 2280 jump KUBE-SEP-CEYPGFB7VCORONY3
> counter packets 32 bytes 1920 jump
KUBE-SEP-RJHMR3QLYGJVBWVL
> }
> chain KUBE-SEP-CEYPGFB7VCORONY3 {
> ip saddr 10.244.1.5 counter packets 0 bytes 0 jump
KUBE-MARK-MASQ
> meta l4proto udp counter packets 38 bytes 2280 dnat to
10.244.1.5:53
> }
C1 sends a DNS request to the virtual ip 10.96.0.10, because of the
happy-eyeball protocol, it sends two packets with the same tuple for
each record A and AAAA
The symptom is that one of the packets does not come back ... see
tcpdump trace, the packets go out at 22:49:07 but only the A answer
comes back, the client retries at 22:49:10 the AAAA and this times
come back
22:49:07.632846 vetha5c90841 In IP (tos 0x0, ttl 64, id 52468, offset
0, flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.96.0.10.53: 60169+ A? www.google.com. (32)
22:49:07.632909 vetha5c90841 In IP (tos 0x0, ttl 64, id 52469, offset
0, flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.96.0.10.53: 60459+ AAAA? www.google.com. (32)
22:49:07.633080 veth271ea3e0 Out IP (tos 0x0, ttl 63, id 52468, offset
0, flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.244.1.5.53: 60169+ A? www.google.com. (32)
22:49:07.633210 eth0 Out IP (tos 0x0, ttl 63, id 52469, offset 0,
flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.244.1.5.53: 60459+ AAAA? www.google.com. (32)
22:49:07.633352 eth0 In IP (tos 0x0, ttl 62, id 52469, offset 0,
flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.244.1.5.53: 60459+ AAAA? www.google.com. (32)
22:49:07.653981 veth271ea3e0 In IP (tos 0x0, ttl 64, id 28750, offset
0, flags [DF], proto UDP (17), length 240)
10.244.1.5.53 > 10.244.1.3.48199: 60169 6/0/0 www.google.com. A
172.217.218.104, www.google.com. A 172.217.218.99, www.google.com. A
172.217.218.106, www.google.com. A 172.217.218.147, www.google.com. A
172.217.218.105, www.google.com. A 172.217.218.103 (212)
22:49:07.654012 vetha5c90841 Out IP (tos 0x0, ttl 63, id 28750, offset
0, flags [DF], proto UDP (17), length 240)
10.96.0.10.53 > 10.244.1.3.48199: 60169 6/0/0 www.google.com. A
172.217.218.104, www.google.com. A 172.217.218.99, www.google.com. A
172.217.218.106, www.google.com. A 172.217.218.147, www.google.com. A
172.217.218.105, www.google.com. A 172.217.218.103 (212)
22:49:10.135710 vetha5c90841 In IP (tos 0x0, ttl 64, id 52470, offset
0, flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.96.0.10.53: 60459+ AAAA? www.google.com. (32)
22:49:10.135740 veth271ea3e0 Out IP (tos 0x0, ttl 63, id 52470, offset
0, flags [DF], proto UDP (17), length 60)
10.244.1.3.48199 > 10.244.1.5.53: 60459+ AAAA? www.google.com. (32)
22:49:10.136635 veth271ea3e0 In IP (tos 0x0, ttl 64, id 28842, offset
0, flags [DF], proto UDP (17), length 228)
10.244.1.5.53 > 10.244.1.3.48199: 60459 4/0/0 www.google.com. AAAA
2a00:1450:4013:c08::6a, www.google.com. AAAA 2a00:1450:4013:c08::67,
www.google.com. AAAA 2a00:1450:4013:c08::63, www.google.com. AAAA
2a00:1450:4013:c08::68 (200)
22:49:10.136669 vetha5c90841 Out IP (tos 0x0, ttl 63, id 28842, offset
0, flags [DF], proto UDP (17), length 228)
10.96.0.10.53 > 10.244.1.3.48199: 60459 4/0/0 www.google.com. AAAA
2a00:1450:4013:c08::6a, www.google.com. AAAA 2a00:1450:4013:c08::67,
www.google.com. AAAA 2a00:1450:4013:c08::63, www.google.com. AAAA
2a00:1450:4013:c08::68 (200)
^C
23 packets captured
When tracing the packets I could observer two different reasons for
dropping, depending the destination of the DNAT rule, if is local it
is dropped by SKB_DROP_REASON_IP_RPFILTER if is in the other node it
is dropped by SKB_DROP_REASON_NEIGH_FAILED
0xffff9527290acb00 3 [<empty>(3178406)]
kfree_skb_reason(SKB_DROP_REASON_IP_RPFILTER) 1289
netns=4026533244 mark=0x0 iface=52(eth0) proto=0x0800 mtu=1500 len=60
10.244.1.3:48199->10.244.1.5:53(udp)
and
3:24:37.411 0xffff9534c19a3d00 7 [<empty>(0)]
kfree_skb_reason(SKB_DROP_REASON_NEIGH_FAILED) 1583194220087332
netns=4026533244 mark=0x0 iface=5(veth271ea3e0) proto=0x0800 mtu=1500
len=60 10.244.1.3:58611->10.244.2.4:53(udp)
If I enable martian logging net.ipv4.conf.all.log_martians=1 it also
reports these packets as martians when the destination is in the same
node
[1581593.716839] IPv4: martian source 10.244.1.5 from 10.244.1.3, on dev eth0
[1581593.723848] ll header: 00000000: 02 42 c0 a8 08 05 02 42 c0 a8 08 03 08 00
An interesting detail is that only seems to happen with DNS (2 packets
with the same tuple) and when there is more than 1 replica behind the
virtual IP (> 1 DNAT rules) . When there is only 1 DNAT rule it does
not happen, this is a fact.
Since the behavior is not deterministic but reproducible, it makes me
think that there is some kind of race where the nfqueue system is not
able to correctly handle the two packets with the same tuple on the
return path and it goes dropped ...
I would like some help on two fronts:
- advices on the next steps to debugging further or how can I provide
more information that can help maintainer
- advices on how to workaround temporarily this problem
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240826/46a406a1/attachment.html>
bugzilla-daemon at netfilter.org
2024-Aug-26 16:17 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Pablo Neira Ayuso <pablo at netfilter.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pablo at netfilter.org
--- Comment #1 from Pablo Neira Ayuso <pablo at netfilter.org> ---
Could you have a look at
# conntrack -S
to check if show clash_resolve= gets bumped?
I suspect martians are going on because packets goes through without being
mangled by NAT.
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240826/de1d8182/attachment.html>
bugzilla-daemon at netfilter.org
2024-Aug-27 17:15 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #2 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- These are the only entries that are bumped when an dns timeout happens # conntrack -S > cs6.log # diff cs5.log cs6.log 45c45 < cpu=44 found=0 invalid=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 clash_resolve=0 chaintoolong=0 ---> cpu=44 found=3 invalid=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 clash_resolve=0 chaintoolong=0No other stats are bumped if there are no dns timeouts -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240827/888f31a8/attachment.html>
bugzilla-daemon at netfilter.org
2024-Aug-28 06:31 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #3 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- I was tracing the packets with this tool from cilium folks , https://github.com/cilium/pwru/tree/main And I think the problem is here, bear in mind 10.244.1.5 and 10.244.2.4 are the DNATed addresses for 10.96.0.10 SKB CPU PROCESS NETNS MARK/x IFACE PROTO MTU LEN TUPLE FUNC 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.96.0.10:53(udp) skb_ensure_writable 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.96.0.10:53(udp) inet_proto_csum_replace4 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.96.0.10:53(udp) inet_proto_csum_replace4 SKB 0xffff9523b3207080 DNAT 10.96.0.10 to 0.244.1.5 on CPU 1 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) udp_v4_early_demux 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) ip_route_input_noref 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) ip_route_input_slow 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) fib_validate_source 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) ip_forward 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) nf_hook_slow 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) ip_forward_finish 0xffff9523b3207080 1 <empty>:913286 4026533244 0 vetha5c90841:3 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) ip_output 0xffff9523b3207080 1 <empty>:913286 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) nf_hook_slow 0xffff9523b3207080 1 <empty>:913286 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) apparmor_ip_postroute 0xffff9523b3207080 1 <empty>:913286 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) nf_queue 0xffff9523b3207080 1 <empty>:913286 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) __nf_queue SEND 0xffff9523b3207080 to the queue 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) skb_ensure_writable 0xffff9523b3207080 is returned on CPU 16 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) skb_ensure_writable 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) inet_proto_csum_replace4 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) inet_proto_csum_replace4 and DNATTED again ?? to 10.244.2.4 (we drop here 10.244.1.5) 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) nf_reroute 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) ip_finish_output 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) __ip_finish_output 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) ip_finish_output2 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) neigh_resolve_output 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) __neigh_event_send 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) skb_clone 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) arp_solicit 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) consume_skb 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) skb_release_head_state 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) skb_release_data 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) kfree_skbmem -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240828/a00e0fa0/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-01 20:13 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #4 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- An interesting observation, the problem only seems to happen when at least one of the DNAT destinations is in the same namespace where the nfqueue program runs, I imaging this causes the packet to follow a different codepath than if the packet is sent out. What I'm puzzled is why the packets gets dnated twice, after __nf_queue and before nf_reroute 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) __nf_queue 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) skb_ensure_writable 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) skb_ensure_writable 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) inet_proto_csum_replace4 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.1.5:53(udp) inet_proto_csum_replace4 0xffff9523b3207080 16 <empty>:3178406 4026533244 0 veth271ea3e0:5 0x0800 1500 60 10.244.1.3:45957->10.244.2.4:53(udp) nf_reroute -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240901/319fb3a5/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-01 20:22 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Antonio Ojea <antonio.ojea.garcia at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P5 |P2
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240901/b6ca1a2a/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-01 20:40 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #5 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- The nftables rule does not detect the two packets from the same tuple as the same connection> ct state established,related acceptSo, it seems the problem is that the same tuple gets DNATed to a different address for each packet, but there is only one conntrack entry, so the return packet is not able to be handled and is discarded -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240901/5b8b6978/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-01 22:03 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #6 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- testcase https://patchwork.ozlabs.org/project/netfilter-devel/patch/20240901220228.4157482-1-aojea at google.com/ -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240901/b93ccb1d/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-02 07:54 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
--- Comment #7 from Pablo Neira Ayuso <pablo at netfilter.org> ---
thanks for your testcase
issue is related to
368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed
conntracks")
which collides with the new approach to deal with clash resolution.
Let me get back to you with a remedy for this situation.
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240902/db334147/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-12 19:41 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Pablo Neira Ayuso <pablo at netfilter.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
--- Comment #8 from Pablo Neira Ayuso <pablo at netfilter.org> ---
Patch attempt to fix this:
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20240912185832.11962-1-pablo
at netfilter.org/
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240912/2ca3538e/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-18 08:41 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
--- Comment #9 from Antonio Ojea <antonio.ojea.garcia at gmail.com> ---
Still failing, captured two traces to show the differences
GOOD ONE
0xffff8dd044679180 0 <empty>:3760 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) nf_queue
nf_hook_slow
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_conntrack_update
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_nat_ipv4_out
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_nat_inet_fn
nf_nat_ipv4_out
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nft_nat_do_chain
nf_nat_inet_fn
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) selinux_ip_postroute
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp)
selinux_ip_postroute_compat selinux_ip_postroute
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_confirm
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp)
__nf_conntrack_confirm
nf_confirm
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) ip_finish_output
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) __ip_finish_output
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) ip_finish_output2
nfqnl_reinject
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) __dev_queue_xmit
ip_finish_output2
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) netdev_core_pick_tx
__dev_queue_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) validate_xmit_skb
__dev_queue_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) netif_skb_features
validate_xmit_skb
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp)
passthru_features_check
netif_skb_features
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) skb_network_protocol
netif_skb_features
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp)
skb_csum_hwoffload_help
validate_xmit_skb
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) validate_xmit_xfrm
__dev_queue_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) dev_hard_start_xmit
__dev_queue_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) veth_xmit
dev_hard_start_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) __dev_forward_skb
veth_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) __dev_forward_skb2
veth_xmit
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) skb_scrub_packet
__dev_forward_skb2
0xffff8dd044679680 0 <empty>:1635 4026532316 0
veth60925af6:7
0x0800 1500 73 10.244.0.2:32942->10.244.0.4:53(udp) eth_type_trans
__dev_forward_skb2
BAD ONE
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) nf_conntrack_update
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) nf_nat_manip_pkt
nf_conntrack_update
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) nf_nat_ipv4_manip_pkt
nf_nat_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) skb_ensure_writable
nf_nat_ipv4_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) l4proto_manip_pkt
nf_nat_ipv4_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) skb_ensure_writable
l4proto_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp) nf_csum_update
l4proto_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp)
inet_proto_csum_replace4 l4proto_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.3:53(udp)
inet_proto_csum_replace4 l4proto_manip_pkt
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_nat_ipv4_out
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_nat_inet_fn
nf_nat_ipv4_out
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) selinux_ip_postroute
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp)
selinux_ip_postroute_compat selinux_ip_postroute
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) nf_confirm
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) ip_finish_output
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) __ip_finish_output
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) ip_finish_output2
nfqnl_reinject
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) neigh_resolve_output
ip_finish_output2
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) __neigh_event_send
neigh_resolve_output
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) skb_clone
neigh_probe
0xffff8dd044679180 0 <empty>:1635 4026532316 0
veth539c3d56:6
0x0800 1500 59 10.244.0.2:32942->10.244.0.4:53(udp) arp_solicit
neigh_probe
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240918/aa36ee3d/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-18 09:33 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #10 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- @pablo I'm rereading 368982cd7d1bd41cd39049c794990aca3770db44 , and the problem comes with> NAT mangling for the packet losing race is corrected by using the conntrack information that won race.I don't have enough knowledge on the codebase to fully understand all the logic, but I think the problems comes because the packet is enqueued in postrouting and the NAT is redone ... but IIUIC is not considering the hook from when the function is called, so it redoes all the NAT, in this case the PREROUTING NAT. What if it ONLY redoes the part of NAT that belongs to the hook where is this called, is that possible? does it make sense? -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240918/4597ba99/attachment-0001.html>
bugzilla-daemon at netfilter.org
2024-Sep-18 09:33 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766 --- Comment #11 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- @pablo I'm rereading 368982cd7d1bd41cd39049c794990aca3770db44 , and the problem comes with> NAT mangling for the packet losing race is corrected by using the conntrack information that won race.I don't have enough knowledge on the codebase to fully understand all the logic, but I think the problems comes because the packet is enqueued in postrouting and the NAT is redone ... but IIUIC is not considering the hook from when the function is called, so it redoes all the NAT, in this case the PREROUTING NAT. What if it ONLY redoes the part of NAT that belongs to the hook where is this called, is that possible? does it make sense? -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240918/eecfbb01/attachment.html>
bugzilla-daemon at netfilter.org
2024-Oct-03 10:43 UTC
[Bug 1766] nfqueue randomly drops packets with same tuple
https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Antonio Ojea <antonio.ojea.garcia at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #12 from Antonio Ojea <antonio.ojea.garcia at gmail.com> ---
Fixed by
https://github.com/torvalds/linux/commit/8af79d3edb5fd2dce35ea0a71595b6d4f9962350
--
You are receiving this mail because:
You are watching all bug changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20241003/9a365003/attachment.html>
Possibly Parallel Threads
- [Bug 1435] New: segfault when using iptables-nft and iptables-legacy inside a container
- [Bug 1728] New: Regression: iptables lock is now waited for without --wait
- [Bug 1730] New: nft does not handle IPv6 addresses with embedded IPv4 addresses
- [Bug 1742] New: using nfqueue breaks SCTP connection (tracking)
- sendmail on Centos 7.7