bugzilla-daemon at netfilter.org
2024-Apr-04 20:55 UTC
[Bug 1743] New: Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
https://bugzilla.netfilter.org/show_bug.cgi?id=1743 Bug ID: 1743 Summary: Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged Product: nftables Version: 1.0.x Hardware: x86_64 OS: other Status: NEW Severity: normal Priority: P5 Component: kernel Assignee: pablo at netfilter.org Reporter: tim at muppetz.com Created attachment 739 --> https://bugzilla.netfilter.org/attachment.cgi?id=739&action=edit Session where Conntrack Changed to 300 Kernel: 6.6.21 I have a TCP flow between an Android Phone and Google's Firebase Cloud Messaging (FCM). FCM uses TCP port 5228 and is a very low traffic connection, it can be anywhere up to 28 minutes before a keepalive packet goes via it. It is used for push messaging (and probably a lot of other things too) Firstly, I have Flowtable Disabled: When I watch the FCM flow in conntrack as such: watch -n 1 "sudo conntrack -L -p TCP -s 192.168.0.128 -d 142.251.12.188 --dport 5228" I will quite often see the flow change from a keepalive time of ~432000 down to 300. To determine if this was nf_conntrack_tcp_timeout_unacknowledged or nf_conntrack_tcp_timeout_max_retrans I altered both sysctls entries and was able to determine if I changed nf_conntrack_tcp_timeout_unacknowledged to 400, that when I see the keepalive time change, it changes to 400 seconds. So my first question that I don't understand is, why is a flow in the Established state changing to the unacknowledged timeout? It only changes for a second though, then I assume another packet comes in and the time jumps back to 5 days. This to be appears odd, but probably this is normal behaviour and I just don't understand it. [I tested with OpenWRT with kernel 5.15.150 (also using nftables) and I see it do the same thing, conntrack timeout dropping to 300 for a second before bouncing back to $nf_conntrack_tcp_timeout_established so this must be expected behaviour.] My real issue comes about when I enable Flow Offload. With the same sort of packet flow, I will see the following: The flow enters the OFFLOAD state in conntrack. When it comes out of OFFLOAD it will be in one of 3 states: A timeout of ~432000 (Seems odd, I expect ~86400) A timeout of ~86400 (This is what I expect) A timeout of 300 ($nf_conntrack_tcp_timeout_unacknowledged) minus anywhere up to 30 seconds. So values like 260, 274, 283 are all values I've seen. A major problem comes about when it enters the table with the nf_conntrack_tcp_timeout_unacknowledged timeout of ~300. Because there is so little traffic on this session, it will often age out and leave the conntrack table. When this happens, the FCM session dies and Android devices on the network no longer receive push messages until they are woken up, realise the session is dead and establish a new one. Attached is a tcpdump of a Google FCM session where I saw the timeout drop to $nf_conntrack_tcp_timeout_unacknowledged at approx packet 23. I have tried watching conntrack with -E but I see no events for this session being generated when the keepalive times are changing. Other details: This is happening on a Vyos 1.4.0-epa2 release Router. My WAN interface is a PPPoE interface, my LAN Interface is an Ethernet interface (virtio, the router is virtualised) There are two patches in the Vyos kernel that are "non-standard" - I have looked at them and I can't see how they could interfere with Offload - here is the link to them: https://github.com/vyos/vyos-build/tree/sagitta/packages/linux-kernel/patches/kernel Please let me know what other details I can provide that might help locate the issue. Thank you very much. Tim -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240404/da85a292/attachment.html>
bugzilla-daemon at netfilter.org
2024-Apr-08 03:17 UTC
[Bug 1743] Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
https://bugzilla.netfilter.org/show_bug.cgi?id=1743 --- Comment #1 from Tim Harman <tim at muppetz.com> --- In fact on further reading/investigation, I don't know why I thought a timeout of ~86400 was expected. I don't see this value anywhere in /proc/sys/net/netfilter Also I have ct state { established, related } meta l4proto { tcp, udp } as my offload rule. Should that be ALL traffic, or is the established+related correct? -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240408/dc0a684b/attachment.html>
bugzilla-daemon at netfilter.org
2024-Apr-16 01:46 UTC
[Bug 1743] Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
https://bugzilla.netfilter.org/show_bug.cgi?id=1743 --- Comment #2 from Tim Harman <tim at muppetz.com> --- Further testing shows that it doesn't matter if I use established/related or just accept everything, the same odd timeouts persist. -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240416/67eb3156/attachment.html>
bugzilla-daemon at netfilter.org
2024-Apr-30 22:29 UTC
[Bug 1743] Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
https://bugzilla.netfilter.org/show_bug.cgi?id=1743 --- Comment #3 from Tim Harman <tim at muppetz.com> --- I have recently moved ISPs. My old ISP required PPPoE, my new ISP doesn't (uses DHCP) Since moving to my new ISP, I have been 100% unable to reproduce this problem. My easy-to-reproduce test before I have tried 100 times and I can't reproduce it. I wonder if the issue that I was encountering was related to this fix I see in 6.6.29, and moving away from PPPoE has stopped the problem from appearing? ---- begin ---- ommit 4ed82dd368ad883dc4284292937b882f044e625d Author: Pablo Neira Ayuso <pablo at netfilter.org> Date: Thu Apr 11 00:09:00 2024 +0200 netfilter: flowtable: incorrect pppoe tuple [ Upstream commit 6db5dc7b351b9569940cd1cf445e237c42cd6d27 ] pppoe traffic reaching ingress path does not match the flowtable entry because the pppoe header is expected to be at the network header offset. This bug causes a mismatch in the flow table lookup, so pppoe packets enter the classical forwarding path. --- end ---- -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240430/655177f9/attachment.html>
bugzilla-daemon at netfilter.org
2024-May-02 07:36 UTC
[Bug 1743] Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
https://bugzilla.netfilter.org/show_bug.cgi?id=1743 --- Comment #4 from Pablo Neira Ayuso <pablo at netfilter.org> --- Hi, flowtable PPPoE was broken in software mode. The flow entry was created in the flowtable, but it did not match. That is, listing with conntrack -L shows an entry with the OFFLOAD flag but it was never match, but you still see packets hitting the forward chain which is not correct. Once flowtable fast path is set up, packets are seen at ingress and egress hooks. Basically, PPPoE encapsulated packets were pushed back to classic path because the tuple was not correctly set up, only one direction of the flow followed the fast path. I managed to reproduce this in a small testbed with a PPPoE server/client, hence the fix I posted. I have a more permanent testbed to test PPPoE, it would be good to integrate this into a script that can run in nftables tests/shell with containers to make sure this does not break again in the future, I have to look into this. Please, note that this patch is also convenient to have for those that require PPPoE: From: Pablo Neira Ayuso <pablo at netfilter.org> [ Upstream commit 87b3593bed1868b2d9fe096c01bcdf0ea86cbebf ] Ensure there is sufficient room to access the protocol field of the PPPoe header. Validate it once before the flowtable lookup, then use a helper function to access protocol field. Reported-by: syzbot+b6f07e1c07ef40199081 at syzkaller.appspotmail.com Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support") Signed-off-by: Pablo Neira Ayuso <pablo at netfilter.org> Signed-off-by: Sasha Levin <sashal at kernel.org> These two patches has been enqueued to -stable kernels. -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240502/8ff96a71/attachment.html>
bugzilla-daemon at netfilter.org
2024-May-02 18:11 UTC
[Bug 1743] Flowtable: Flows exiting OFFLOAD State being assigned value of nf_conntrack_tcp_timeout_unacknowledged
https://bugzilla.netfilter.org/show_bug.cgi?id=1743 Tim Harman <tim at muppetz.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Tim Harman <tim at muppetz.com> --- Hi Pablo, Thank you very much for your detailed explanation of the issue and obviously for the patches you've issued that have made it into the latest stable kernel. When I get a change I will test PPPoE with a v6.6.29+ kernel. But I think it's pretty safe to say this can be closed. Thanks again for all your hard work on the Netfilter system. -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240502/fffbe8c8/attachment.html>