bugzilla-daemon at netfilter.org
2024-Sep-11 15:40 UTC
[Bug 1773] New: tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 Bug ID: 1773 Summary: tproxy with nftables collides with nat entries Product: netfilter/iptables Version: unspecified Hardware: x86_64 OS: All Status: NEW Severity: major Priority: P5 Component: netfilter hooks Assignee: netfilter-buglog at lists.netfilter.org Reporter: antonio.ojea.garcia at gmail.com It took me a while to figure out the problem when working with nftables tproxy for UDP in a kubernetes environment, until I found the explanation on https://patchwork.ozlabs.org/project/netfilter-devel/patch/20180628164258.25646-1-ecklm94 at gmail.com/> - tproxy statement is not terminal hereWhen trying to implement a transparent proxy for a DNS server, I could observe that only the first connection was redirected, subsequent connections with the same tuple were sent to the actual DNAT destination. Assume a Pod/Namespace client with IP 100.96.1.14 A virtual IP for the DNS service 100.64.0.10 that forwards traffic to 100.96.1. and 10.96.1.12 An UDP transparent proxy server listening on port 12345 Rules to intercept the DNS traffic (fwmark rule and anyip route are ok) table inet kindnet-dnscache { comment "rules for kindnet dnscache" chain prerouting { type filter hook prerouting priority dstnat - 10; policy accept; meta mark 0x0000000c return ip saddr 100.96.1.0/24 ip daddr 100.64.0.10 udp dport 53 tproxy ip to 127.0.0.1:12345 accept comment "DNS IPv4 pod originated traffic" } } When connecting from 100.96.1.14:8888 to 100.64.0.10:53, the tproxy rule redirect the connection to localhost:12345 When tracing the nftables rules with "nft monitor" I could observe that the tproxy statement verdict was accepted, but the subsequent rules were matched and the corresponding conntrack entries for NAT were added [NEW] udp 17 30 src=100.96.1.14 dst=100.64.0.10 sport=42345 dport=53 [UNREPLIED] src=100.96.1.12 dst=100.96.1.14 s port=53 dport=42345 [NEW] udp 17 30 src=100.64.0.10 dst=100.96.1.14 sport=53 dport=42345 [UNREPLIED] src=100.96.1.14 dst=100.64.0.10 sport=42345 dport=81 [DESTROY] udp 17 src=100.64.0.10 dst=100.64.0.10 sport=34943 dport=53 src=100.96.1.12 dst=100.96.1.1 sport=53 dport=58102 A workaround that seems to work fine is to no track the tproxy traffic table inet kindnet-dnscache { comment "rules for kindnet dnscache" chain prerouting { type filter hook prerouting priority raw - 10; policy accept; meta mark 0x0000000c return ip saddr 100.96.1.0/24 ip daddr 100.64.0.10 udp dport 53 tproxy ip to 127.0.0.1:54136 meta mark set 0x0000000b notrack accept comment "DNS IPv4 pod originated traffic" } } I personally found surprising the tproxy action is not terminal, as it is actually the intent of the user to steal the traffic and redirect to the port. I don't know at this point if is feasible to break this behavior, as it sounds strange that somebody relies on it, although I'm sure we have seen worse :) https://imgs.xkcd.com/comics/workflow.png It will be desirable for this tproxy action to be terminal, if not by default, at least give the user some knob to opt-in to making it terminal -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240911/49d8ef3c/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-11 17:22 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 Phil Sutter <phil at nwl.cc> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |phil at nwl.cc --- Comment #1 from Phil Sutter <phil at nwl.cc> --- Hmm. The sample usage from nft.8 does not indicate the described behaviour. Looking at nft_tproxy.c, eval function's behaviour seems to differ based on socket transparent state. Does your socket have IP_TRANSPARENT set or not? -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240911/9a5ad495/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-11 20:00 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 Antonio Ojea <antonio.ojea.garcia at gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 |P2 -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240911/1a177190/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-11 22:30 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 Pablo Neira Ayuso <pablo at netfilter.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pablo at netfilter.org --- Comment #2 from Pablo Neira Ayuso <pablo at netfilter.org> --- (In reply to Antonio Ojea from comment #0)> I personally found surprising the tproxy action is not terminal, as it is > actually the intent of the user to steal the traffic and redirect to the > port.For the record, action is terminal in xt_TPROXY. -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240911/b1ff2da4/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-12 20:34 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 --- Comment #3 from Antonio Ojea <antonio.ojea.garcia at gmail.com> ---> For the record, action is terminal in xt_TPROXY.yeah, exactly, I think this is a change in behavior that can difficult the migration> Does your socket have IP_TRANSPARENT set or not?Phil, thanks for looking into, I have IP_TRANSPARENT and the anyip route and the ip rule correctly, I can see in the logs of the tranparent proxy the first connection is sent there, but since is UDP, subsequent connections with the same tuple are sent directly to the DNATed entries created by the conntrack entries. I've created a kselftest with a reproducer, however, the behavior I have with this test is different, once I enable the DNAT rules those always take precedence. By removing the dnat rules in the test the connection is correctly proxied. In production is different, though kubernetes add way more nftables rules and more complicated deployment -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240912/d21729c7/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-12 20:36 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 --- Comment #4 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- Testcase using selftest to reproduce the problem, you can see that if you remove the DNAT rules it "transparently proxy" the connection diff --git a/tools/testing/selftests/net/netfilter/nft_tproxy_udp_nat.sh b/tools/testing/selftests/net/netfilter/nft_tproxy_udp_nat.sh new file mode 100755 index 000000000000..f0b261a019a3 --- /dev/null +++ b/tools/testing/selftests/net/netfilter/nft_tproxy_udp_nat.sh @@ -0,0 +1,283 @@ +#!/bin/bash +# +# This tests tproxy on the following scenario: +# +# +------------+ +# +-------+ | nsrouter | +-------+ +# |ns1 |.99 .1| |.1 .99| ns2| +# | eth0|---------------|veth0 veth1|------------------|eth0 | +# | | 10.0.1.0/24 | | 10.0.2.0/24 | | +# +-------+ dead:1::/64 | veth2 | dead:2::/64 +-------+ +# +------------+ +# |.1 +# | +# | +# | +-------+ +# | .99| ns3| +# +------------------------|eth0 | +# 10.0.3.0/24 | | +# dead:3::/64 +-------+ +# +# The tproxy implementation acts as an echo server so the client +# must receive the same message it sent if it has been proxied. +# If is not proxied the servers return PONG_NS# with the number +# of the namespace the server is running. +# shellcheck disable=SC2162,SC2317 + +source lib.sh +ret=0 +# UDP is slow +timeout=30 + +cleanup() +{ + ip netns pids "$ns1" | xargs kill 2>/dev/null + ip netns pids "$ns2" | xargs kill 2>/dev/null + ip netns pids "$ns3" | xargs kill 2>/dev/null + ip netns pids "$nsrouter" | xargs kill 2>/dev/null + + cleanup_all_ns +} + +checktool "nft --version" "test without nft tool" +checktool "socat -h" "run test without socat" + +trap cleanup EXIT +setup_ns ns1 ns2 ns3 nsrouter + +if ! ip link add veth0 netns "$nsrouter" type veth peer name eth0 netns "$ns1"> /dev/null 2>&1; then+ echo "SKIP: No virtual ethernet pair device support in kernel" + exit $ksft_skip +fi +ip link add veth1 netns "$nsrouter" type veth peer name eth0 netns "$ns2" +ip link add veth2 netns "$nsrouter" type veth peer name eth0 netns "$ns3" + +ip -net "$nsrouter" link set veth0 up +ip -net "$nsrouter" addr add 10.0.1.1/24 dev veth0 +ip -net "$nsrouter" addr add dead:1::1/64 dev veth0 nodad + +ip -net "$nsrouter" link set veth1 up +ip -net "$nsrouter" addr add 10.0.2.1/24 dev veth1 +ip -net "$nsrouter" addr add dead:2::1/64 dev veth1 nodad + +ip -net "$nsrouter" link set veth2 up +ip -net "$nsrouter" addr add 10.0.3.1/24 dev veth2 +ip -net "$nsrouter" addr add dead:3::1/64 dev veth2 nodad + +ip -net "$ns1" link set eth0 up +ip -net "$ns2" link set eth0 up +ip -net "$ns3" link set eth0 up + +ip -net "$ns1" addr add 10.0.1.99/24 dev eth0 +ip -net "$ns1" addr add dead:1::99/64 dev eth0 nodad +ip -net "$ns1" route add default via 10.0.1.1 +ip -net "$ns1" route add default via dead:1::1 + +ip -net "$ns2" addr add 10.0.2.99/24 dev eth0 +ip -net "$ns2" addr add dead:2::99/64 dev eth0 nodad +ip -net "$ns2" route add default via 10.0.2.1 +ip -net "$ns2" route add default via dead:2::1 + +ip -net "$ns3" addr add 10.0.3.99/24 dev eth0 +ip -net "$ns3" addr add dead:3::99/64 dev eth0 nodad +ip -net "$ns3" route add default via 10.0.3.1 +ip -net "$ns3" route add default via dead:3::1 + +ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=1 > /dev/null +ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null +ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth1.forwarding=1 > /dev/null +ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth2.forwarding=1 > /dev/null + +test_ping() { + if ! ip netns exec "$ns1" ping -c 1 -q 10.0.2.99 > /dev/null; then + return 1 + fi + + if ! ip netns exec "$ns1" ping -c 1 -q dead:2::99 > /dev/null; then + return 2 + fi + + if ! ip netns exec "$ns1" ping -c 1 -q 10.0.3.99 > /dev/null; then + return 1 + fi + + if ! ip netns exec "$ns1" ping -c 1 -q dead:3::99 > /dev/null; then + return 2 + fi + + return 0 +} + +test_ping_router() { + if ! ip netns exec "$ns1" ping -c 1 -q 10.0.2.1 > /dev/null; then + return 3 + fi + + if ! ip netns exec "$ns1" ping -c 1 -q dead:2::1 > /dev/null; then + return 4 + fi + + return 0 +} + + +listener_ready() +{ + local ns="$1" + local port="$2" + local proto="$3" + ss -N "$ns" -ln "$proto" -o "sport = :$port" | grep -q "$port" +} + +test_tproxy_udp_forward_nat() +{ + local ip_proto="$1" + + local expect_ns1_ns2="I_M_PROXIED" + local expect_ns1_ns3="PONG_NS3" + local expect_nsrouter_ns2="PONG_NS2" + local expect_nsrouter_ns3="PONG_NS3" + + # derived variables + local testname="test_${ip_proto}_udp_forward_dnat" + local socat_ipproto + local virtual_ip + local virtual_ip_port + local ns1_ip + local ns2_ip + local ns3_ip + local ns1_ip_port + local ns2_ip_port + local ns3_ip_port + local ip_command + + # socat 1.8.0 has a bug that requires to specify the IP family to bind (fixed in 1.8.0.1) + case $ip_proto in + "ip") + socat_ipproto="-4" + virtual_ip=10.100.100.10 + virtual_ip_port="$virtual_ip:8080" + ns1_ip=10.0.1.99 + ns2_ip=10.0.2.99 + ns3_ip=10.0.3.99 + ns1_ip_port="$ns1_ip:18888" + ns2_ip_port="$ns2_ip:8080" + ns3_ip_port="$ns3_ip:8080" + ip_command="ip" + ;; + "ip6") + socat_ipproto="-6" + virtual_ip=dead:100:100::10 + virtual_ip_port="[$virtual_ip]:8080" + ns1_ip=dead:1::99 + ns2_ip=dead:2::99 + ns3_ip=dead:3::99 + ns1_ip_port="[$ns1_ip]:18888" + ns2_ip_port="[$ns2_ip]:8080" + ns3_ip_port="[$ns3_ip]:8080" + ip_command="ip -6" + ;; + *) + echo "FAIL: unsupported protocol" + exit 255 + ;; + esac + + # shellcheck disable=SC2046 # Intended splitting of ip_command + ip netns exec "$nsrouter" $ip_command rule add fwmark 1 table 100 + ip netns exec "$nsrouter" $ip_command route add local "$virtual_ip" dev lo table 100 + ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF +flush ruleset +table inet filter { + chain divert { + type filter hook prerouting priority -100; policy accept; + $ip_proto daddr $virtual_ip udp dport 8080 tproxy $ip_proto to :12345 meta mark set 1 accept + } + # Removing this chain makes the first connection to succeed + chain PREROUTING { + type nat hook prerouting priority 1; policy accept; + $ip_proto daddr $virtual_ip udp dport 8080 dnat to numgen inc mod 2 map { 0 : $ns2_ip , 1: $ns3_ip } + } +} +EOF + + timeout "$timeout" ip netns exec "$nsrouter" socat -u "$socat_ipproto" udp-listen:12345,fork,ip-transparent,reuseport udp:"$ns1_ip_port",ip-transparent,reuseport,bind="$virtual_ip_port" 2>/dev/null & + local tproxy_pid=$! + + timeout "$timeout" ip netns exec "$ns2" socat "$socat_ipproto" udp-listen:8080,fork SYSTEM:"echo PONG_NS2" 2>/dev/null & + local server2_pid=$! + + timeout "$timeout" ip netns exec "$ns3" socat "$socat_ipproto" udp-listen:8080,fork SYSTEM:"echo PONG_NS3" 2>/dev/null & + local server3_pid=$! + + busywait "$BUSYWAIT_TIMEOUT" listener_ready "$nsrouter" 12345 "-u" + busywait "$BUSYWAIT_TIMEOUT" listener_ready "$ns2" 8080 "-u" + busywait "$BUSYWAIT_TIMEOUT" listener_ready "$ns3" 8080 "-u" + + local result + # request from ns1 to ns2 (forwarded traffic) + result=$(echo I_M_PROXIED | ip netns exec "$ns1" socat -t 2 -T 2 STDIO udp:"$virtual_ip_port",sourceport=18888) + if [ "$result" == "$expect_ns1_ns2" ] ;then + echo "PASS: tproxy test $testname: ns1 got reply \"$result\" connecting to $virtual_ip_port" + else + echo "ERROR: tproxy test $testname: ns1 got reply \"$result\" connecting to $virtual_ip_port, not \"${expect_ns1_ns2}\" as intended" + ret=1 + fi + + # request from ns1 to ns3 (forwarded traffic) + result=$(echo I_M_PROXIED | ip netns exec "$ns1" socat -t 2 -T 2 STDIO udp:"$ns3_ip_port") + if [ "$result" = "$expect_ns1_ns3" ] ;then + echo "PASS: tproxy test $testname: ns1 got reply \"$result\" connecting to ns3" + else + echo "ERROR: tproxy test $testname: ns1 got reply \"$result\" connecting to ns3, not \"$expect_ns1_ns3\" as intended" + ret=1 + fi + + # request from nsrouter to ns2 (localy originated traffic) + result=$(echo I_M_PROXIED | ip netns exec "$nsrouter" socat -t 2 -T 2 STDIO udp:"$ns2_ip_port") + if [ "$result" == "$expect_nsrouter_ns2" ] ;then + echo "PASS: tproxy test $testname: nsrouter got reply \"$result\" connecting to ns2" + else + echo "ERROR: tproxy test $testname: nsrouter got reply \"$result\" connecting to ns2, not \"$expect_nsrouter_ns2\" as intended" + ret=1 + fi + + # request from nsrouter to ns3 (localy originated traffic) + result=$(echo I_M_PROXIED | ip netns exec "$nsrouter" socat -t 2 -T 2 STDIO udp:"$ns3_ip_port") + if [ "$result" = "$expect_nsrouter_ns3" ] ;then + echo "PASS: tproxy test $testname: nsrouter got reply \"$result\" connecting to ns3" + else + echo "ERROR: tproxy test $testname: nsrouter got reply \"$result\" connecting to ns3, not \"$expect_nsrouter_ns3\" as intended" + ret=1 + fi + + # request from ns1 to ns2 (forwarded traffic) + result=$(echo I_M_PROXIED | ip netns exec "$ns1" socat -t 2 -T 2 STDIO udp:"$virtual_ip_port",sourceport=18888) + if [ "$result" == "$expect_ns1_ns2" ] ;then + echo "PASS: tproxy test $testname: ns1 got reply \"$result\" connecting to $virtual_ip_port" + else + echo "ERROR: tproxy test $testname: ns1 got reply \"$result\" connecting to $virtual_ip_port, not \"${expect_ns1_ns2}\" as intended" + ret=1 + fi + + + # cleanup + kill "$tproxy_pid" "$server2_pid" "$server3_pid" 2>/dev/null + # shellcheck disable=SC2046 # Intended splitting of ip_command + ip netns exec "$nsrouter" $ip_command rule del fwmark 1 table 100 + ip netns exec "$nsrouter" $ip_command route flush table 100 +} + + +if test_ping; then + # queue bypass works (rules were skipped, no listener) + echo "PASS: ${ns1} can reach ${ns2}" +else + echo "FAIL: ${ns1} cannot reach ${ns2}: $ret" 1>&2 + exit $ret +fi + +test_tproxy_udp_forward_nat "ip" +test_tproxy_udp_forward_nat "ip6" + +exit $ret (END) + # cleanup + kill "$tproxy_pid" "$server2_pid" "$server3_pid" 2>/dev/null + # shellcheck disable=SC2046 # Intended splitting of ip_command + ip netns exec "$nsrouter" $ip_command rule del fwmark 1 table 100 + ip netns exec "$nsrouter" $ip_command route flush table 100 +} + + +if test_ping; then + # queue bypass works (rules were skipped, no listener) + echo "PASS: ${ns1} can reach ${ns2}" +else + echo "FAIL: ${ns1} cannot reach ${ns2}: $ret" 1>&2 + exit $ret +fi + +test_tproxy_udp_forward_nat "ip" +test_tproxy_udp_forward_nat "ip6" + +exit $ret -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240912/cfbebfb3/attachment-0001.html>
bugzilla-daemon at netfilter.org
2024-Sep-13 10:23 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 --- Comment #5 from Pablo Neira Ayuso <pablo at netfilter.org> --- https://patchwork.ozlabs.org/project/netfilter-devel/patch/20240913102023.3948-1-pablo at netfilter.org/ -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240913/82baa2a3/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-13 12:39 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 --- Comment #6 from Pablo Neira Ayuso <pablo at netfilter.org> --- not applicable, for the record see: https://lore.kernel.org/netfilter-devel/ZuQpbnjAoutXEFUj at orbyte.nwl.cc/T/ a patch to document this behaviour has been proposed instead. I remembered that tproxy is not terminal in nftables to fix the hack in xt_TPROXY to mangle the packet mark. nftables is more flexible in this regard because user could want to make more actions on the packet after validating that the socket is transparent. -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240913/460fe4e4/attachment.html>
bugzilla-daemon at netfilter.org
2024-Sep-16 08:04 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 --- Comment #7 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- Created attachment 746 --> https://bugzilla.netfilter.org/attachment.cgi?id=746&action=edit testcase UDP TPROXY DNAT -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20240916/27375af9/attachment.html>
bugzilla-daemon at netfilter.org
2024-Oct-03 10:45 UTC
[Bug 1773] tproxy with nftables collides with nat entries
https://bugzilla.netfilter.org/show_bug.cgi?id=1773 Antonio Ojea <antonio.ojea.garcia at gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |WORKSFORME --- Comment #8 from Antonio Ojea <antonio.ojea.garcia at gmail.com> --- WAI https://lore.kernel.org/netfilter-devel/CABhP=taSX5Ka=Xa98RpnQj2Rx3E+gemUPPCs0c66yHAFh0=NxA at mail.gmail.com/T/#t -- You are receiving this mail because: You are watching all bug changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.netfilter.org/pipermail/netfilter-buglog/attachments/20241003/8f8409ab/attachment.html>