Fredrik Markström
2017-May-11 19:10 UTC
[Bridge] [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger <stephen at networkplumber.org> wrote:> On Thu, 11 May 2017 15:46:27 +0200 > Fredrik Markstrom <fredrik.markstrom at gmail.com> wrote: > >> From: Fredrik Markstr?m <fredrik.markstrom at gmail.com> >> >> is_skb_forwardable() currently checks if the packet size is <= mtu of >> the receiving interface. This is not consistent with most of the hardware >> ethernet drivers that happily receives packets larger then MTU. > > Wrong.What is "Wrong" ? I was initially skeptical to implement this patch, since it feels odd to have different MTU:s set on the two sides of a link. After consulting some IP people and the RFC:s I kind of changed my mind and thought I'd give it a shot. In the RFCs I couldn't find anything that defined when and when not a received packet should be dropped.> > Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN). > The actual limit is a function of the hardware. Some hardware can only limit by > power of 2; some can only limit frames larger than 1500; some have no limiting at all.Agreed. The purpose of these patches is to be able to configure an veth interface to mimic these different behaviors. Non of the Ethernet interfaces I have access to drops packets due to them being larger then the configured MTU like veth does. Being able to mimic real Ethernet hardware is useful when consolidating hardware using containers/namespaces. In a reply to a comment from David Miller in my previous version of the patch I attached the example below to demonstrate the case in detail. This works with all ethernet hardware setups I have access to: ---- 8< ------ # Host A eth2 and Host B eth0 is on the same network. # On HOST A % ip address add 1.2.3.4/24 dev eth2 % ip link set eth2 mtu 300 up % # HOST B % ip address add 1.2.3.5/24 dev eth0 % ip link set eth0 mtu 1000 up % ping -c 1 -W 1 -s 400 1.2.3.4 PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data. 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms --- 1.2.3.4 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms ---- 8< ------ But it doesn't work with veth: ---- 8< ------ # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate network namespace. % # NS A % ip address add 1.2.3.4/24 dev veth0 % ip link set veth0 mtu 300 up % # NS B % ip address add 1.2.3.5/24 dev veth1 % ip link set veth1 mtu 1000 up % ping -c 1 -W 1 -s 400 1.2.3.4 PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data. --- 1.2.3.4 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms ---- 8< ------ -- /Fredrik
Stephen Hemminger
2017-May-11 19:44 UTC
[Bridge] [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
On Thu, 11 May 2017 21:10:11 +0200 Fredrik Markstr?m <fredrik.markstrom at gmail.com> wrote:> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger > <stephen at networkplumber.org> wrote: > > On Thu, 11 May 2017 15:46:27 +0200 > > Fredrik Markstrom <fredrik.markstrom at gmail.com> wrote: > > > >> From: Fredrik Markstr?m <fredrik.markstrom at gmail.com> > >> > >> is_skb_forwardable() currently checks if the packet size is <= mtu of > >> the receiving interface. This is not consistent with most of the hardware > >> ethernet drivers that happily receives packets larger then MTU. > > > > Wrong. > > What is "Wrong" ? I was initially skeptical to implement this patch, > since it feels odd to have different MTU:s set on the two sides of a > link. After consulting some IP people and the RFC:s I kind of changed > my mind and thought I'd give it a shot. In the RFCs I couldn't find > anything that defined when and when not a received packet should be > dropped. > > > > > Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN). > > The actual limit is a function of the hardware. Some hardware can only limit by > > power of 2; some can only limit frames larger than 1500; some have no limiting at all. > > Agreed. The purpose of these patches is to be able to configure an > veth interface to mimic these different behaviors. Non of the Ethernet > interfaces I have access to drops packets due to them being larger > then the configured MTU like veth does. > > Being able to mimic real Ethernet hardware is useful when > consolidating hardware using containers/namespaces. > > In a reply to a comment from David Miller in my previous version of > the patch I attached the example below to demonstrate the case in > detail. > > This works with all ethernet hardware setups I have access to: >Why not just use an iptables rule to enforce what ever semantic you want?
Teco Boot
2017-May-12 08:05 UTC
[Bridge] [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
IP MTU and L2 MTU are different animals. IMHO IP MTU is for fragmentation at sender of a link. There is no need dropping IP packets at receiver with size > configured IP MTU. IP packets with size > receiver L2 MTU will be dropped at sub-IP layer. For this patch: if veth has some notion on L2 MTU (e.g. buffer size limits), there has to be checks for it. I don't know why configuring MRU helps, more config, more mistakes. If there is no need for dropping the packet: don't. Teco> Op 11 mei 2017, om 21:10 heeft Fredrik Markstr?m <fredrik.markstrom at gmail.com> het volgende geschreven: > > On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger > <stephen at networkplumber.org> wrote: >> On Thu, 11 May 2017 15:46:27 +0200 >> Fredrik Markstrom <fredrik.markstrom at gmail.com> wrote: >> >>> From: Fredrik Markstr?m <fredrik.markstrom at gmail.com> >>> >>> is_skb_forwardable() currently checks if the packet size is <= mtu of >>> the receiving interface. This is not consistent with most of the hardware >>> ethernet drivers that happily receives packets larger then MTU. >> >> Wrong. > > What is "Wrong" ? I was initially skeptical to implement this patch, > since it feels odd to have different MTU:s set on the two sides of a > link. After consulting some IP people and the RFC:s I kind of changed > my mind and thought I'd give it a shot. In the RFCs I couldn't find > anything that defined when and when not a received packet should be > dropped. > >> >> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN). >> The actual limit is a function of the hardware. Some hardware can only limit by >> power of 2; some can only limit frames larger than 1500; some have no limiting at all. > > Agreed. The purpose of these patches is to be able to configure an > veth interface to mimic these different behaviors. Non of the Ethernet > interfaces I have access to drops packets due to them being larger > then the configured MTU like veth does. > > Being able to mimic real Ethernet hardware is useful when > consolidating hardware using containers/namespaces. > > In a reply to a comment from David Miller in my previous version of > the patch I attached the example below to demonstrate the case in > detail. > > This works with all ethernet hardware setups I have access to: > > ---- 8< ------ > # Host A eth2 and Host B eth0 is on the same network. > > # On HOST A > % ip address add 1.2.3.4/24 dev eth2 > % ip link set eth2 mtu 300 up > > % # HOST B > % ip address add 1.2.3.5/24 dev eth0 > % ip link set eth0 mtu 1000 up > % ping -c 1 -W 1 -s 400 1.2.3.4 > PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data. > 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms > > --- 1.2.3.4 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms > ---- 8< ------ > > > But it doesn't work with veth: > > ---- 8< ------ > # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate > network namespace. > % # NS A > % ip address add 1.2.3.4/24 dev veth0 > % ip link set veth0 mtu 300 up > > % # NS B > % ip address add 1.2.3.5/24 dev veth1 > % ip link set veth1 mtu 1000 up > % ping -c 1 -W 1 -s 400 1.2.3.4 > PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data. > > --- 1.2.3.4 ping statistics --- > 1 packets transmitted, 0 received, 100% packet loss, time 0ms > ---- 8< ------ > > -- > /Fredrik