Rundong Ge
2019-Jul-30 12:25 UTC
[Bridge] [PATCH] bridge:fragmented packets dropped by bridge
Given following setup: -modprobe br_netfilter -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables -brctl addbr br0 -brctl addif br0 enp2s0 -brctl addif br0 enp3s0 -brctl addif br0 enp6s0 -ifconfig enp2s0 mtu 1300 -ifconfig enp3s0 mtu 1500 -ifconfig enp6s0 mtu 1500 -ifconfig br0 up multi-port mtu1500 - mtu1500|bridge|1500 - mtu1500 A | B mtu1300 With netfilter defragmentation/conntrack enabled, fragmented packets from A will be defragmented in prerouting, and refragmented at postrouting. But in this scenario the bridge found the frag_max_size(1500) is larger than the dst mtu stored in the fake_rtable whitch is always equal to the bridge's mtu 1300, then packets will be dopped. This modifies ip_skb_dst_mtu to use the out dev's mtu instead of bridge's mtu in bridge refragment. Signed-off-by: Rundong Ge <rdong.ge at gmail.com> --- include/net/ip.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/ip.h b/include/net/ip.h index 29d89de..0512de3 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, static inline unsigned int ip_skb_dst_mtu(struct sock *sk, const struct sk_buff *skb) { + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) + return min(skb->dev->mtu, IP_MAX_MTU); if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; -- 1.8.3.1
Florian Westphal
2019-Jul-30 12:35 UTC
[Bridge] [PATCH] bridge:fragmented packets dropped by bridge
Rundong Ge <rdong.ge at gmail.com> wrote:> Given following setup: > -modprobe br_netfilter > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > -brctl addbr br0 > -brctl addif br0 enp2s0 > -brctl addif br0 enp3s0 > -brctl addif br0 enp6s0 > -ifconfig enp2s0 mtu 1300 > -ifconfig enp3s0 mtu 1500 > -ifconfig enp6s0 mtu 1500 > -ifconfig br0 up > > multi-port > mtu1500 - mtu1500|bridge|1500 - mtu1500 > A | B > mtu1300How can a bridge forward a frame from A/B to mtu1300?> With netfilter defragmentation/conntrack enabled, fragmented > packets from A will be defragmented in prerouting, and refragmented > at postrouting.Yes, but I don't see how that relates to the problem at hand.> But in this scenario the bridge found the frag_max_size(1500) is > larger than the dst mtu stored in the fake_rtable whitch is > always equal to the bridge's mtu 1300, then packets will be dopped.What happens without netfilter or non-fragmented packets?> This modifies ip_skb_dst_mtu to use the out dev's mtu instead > of bridge's mtu in bridge refragment.It seems quite a hack? The above setup should use a router, not a bridge.
Nikolay Aleksandrov
2019-Jul-30 12:41 UTC
[Bridge] [PATCH] bridge:fragmented packets dropped by bridge
On 30/07/2019 15:25, Rundong Ge wrote:> Given following setup: > -modprobe br_netfilter > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > -brctl addbr br0 > -brctl addif br0 enp2s0 > -brctl addif br0 enp3s0 > -brctl addif br0 enp6s0 > -ifconfig enp2s0 mtu 1300 > -ifconfig enp3s0 mtu 1500 > -ifconfig enp6s0 mtu 1500 > -ifconfig br0 up > > multi-port > mtu1500 - mtu1500|bridge|1500 - mtu1500 > A | B > mtu1300 > > With netfilter defragmentation/conntrack enabled, fragmented > packets from A will be defragmented in prerouting, and refragmented > at postrouting. > But in this scenario the bridge found the frag_max_size(1500) is > larger than the dst mtu stored in the fake_rtable whitch is > always equal to the bridge's mtu 1300, then packets will be dopped. > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > of bridge's mtu in bridge refragment. > > Signed-off-by: Rundong Ge <rdong.ge at gmail.com> > --- > include/net/ip.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/net/ip.h b/include/net/ip.h > index 29d89de..0512de3 100644 > --- a/include/net/ip.h > +++ b/include/net/ip.h > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, > static inline unsigned int ip_skb_dst_mtu(struct sock *sk, > const struct sk_buff *skb) > { > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) > + return min(skb->dev->mtu, IP_MAX_MTU); > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; > >I don't think this is correct, there's a reason why the bridge chooses the smallest possible MTU out of its members and this is simply a hack to circumvent it. If you really like to do so just set the bridge MTU manually, we've added support so it won't change automatically to the smallest, but then how do you pass packets 1500 -> 1300 in this setup ? You're talking about the frag_size check in br_nf_ip_fragment(), right ?
Rundong Ge
2019-Aug-26 02:45 UTC
[Bridge] [PATCH] bridge:fragmented packets dropped by bridge
On Tue, Jul 30, 2019 at 8:41 PM Nikolay Aleksandrov <nikolay at cumulusnetworks.com> wrote:> > On 30/07/2019 15:25, Rundong Ge wrote: > > Given following setup: > > -modprobe br_netfilter > > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > > -brctl addbr br0 > > -brctl addif br0 enp2s0 > > -brctl addif br0 enp3s0 > > -brctl addif br0 enp6s0 > > -ifconfig enp2s0 mtu 1300 > > -ifconfig enp3s0 mtu 1500 > > -ifconfig enp6s0 mtu 1500 > > -ifconfig br0 up > > > > multi-port > > mtu1500 - mtu1500|bridge|1500 - mtu1500 > > A | B > > mtu1300 > > > > With netfilter defragmentation/conntrack enabled, fragmented > > packets from A will be defragmented in prerouting, and refragmented > > at postrouting. > > But in this scenario the bridge found the frag_max_size(1500) is > > larger than the dst mtu stored in the fake_rtable whitch is > > always equal to the bridge's mtu 1300, then packets will be dopped. > > > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > > of bridge's mtu in bridge refragment. > > > > Signed-off-by: Rundong Ge <rdong.ge at gmail.com> > > --- > > include/net/ip.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/net/ip.h b/include/net/ip.h > > index 29d89de..0512de3 100644 > > --- a/include/net/ip.h > > +++ b/include/net/ip.h > > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, > > static inline unsigned int ip_skb_dst_mtu(struct sock *sk, > > const struct sk_buff *skb) > > { > > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) > > + return min(skb->dev->mtu, IP_MAX_MTU); > > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { > > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; > > > > > > I don't think this is correct, there's a reason why the bridge chooses the smallest > possible MTU out of its members and this is simply a hack to circumvent it. > If you really like to do so just set the bridge MTU manually, we've added support > so it won't change automatically to the smallest, but then how do you pass packets > 1500 -> 1300 in this setup ? > > You're talking about the frag_size check in br_nf_ip_fragment(), right ? >Hi Nikolay My setup may not be common. And may I know if there is any reason to use output port's MTU to do the re-fragment check but then use the bridge's MTU to do the re-fragment? Is it the expected behavior that the bridge's MTU will affect the FORWARD traffic re-fragment, because I used to think the bridge's MTU will only effect the OUTPUT traffic sent from "br0". And the modification in this patch will replace the MTU in the fake_rtable which is only used in the FORWARD re-fragment and won't affect the local traffic from "br0". TKS Raydodn
Jan Engelhardt
2019-Aug-26 07:59 UTC
[Bridge] [PATCH] bridge:fragmented packets dropped by bridge
On Tuesday 2019-07-30 14:35, Florian Westphal wrote:>Rundong Ge <rdong.ge at gmail.com> wrote: >> Given following setup: >> -modprobe br_netfilter >> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables >> -brctl addbr br0 >> -brctl addif br0 enp2s0 >> -brctl addif br0 enp3s0 >> -brctl addif br0 enp6s0 >> -ifconfig enp2s0 mtu 1300 >> -ifconfig enp3s0 mtu 1500 >> -ifconfig enp6s0 mtu 1500 >> -ifconfig br0 up >> >> multi-port >> mtu1500 - mtu1500|bridge|1500 - mtu1500 >> A | B >> mtu1300 > >How can a bridge forward a frame from A/B to mtu1300?There might be a misunderstanding here judging from the shortness of this thread. I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0 (in essence the third bridge port if you so wish) itself has MTU 1300. Therefore, frame forwarding from eth0 to eth1 should succeed, since the 1300-byte MTU is only relevant if the bridge decides the packet needs to be locally delivered.