Arnd Bergmann
2009-Nov-17 22:39 UTC
[Bridge] [PATCH 0/3] macvlan: add vepa and bridge mode
This is based on an earlier patch from Eric Biederman adding forwarding between macvlans. I extended his approach to allow the administrator to choose the mode for each macvlan, and to implement a functional VEPA between macvlan. Still missing from this is support for communication between the lower device that the macvlans are based on. This would be extremely useful but as others have found out before me requires significant changes not only to macvlan but also to the common transmit path. I've seen one panic during testing this that I still need to track down, but it generally does what is advertised. I've tested VEPA operation with the hairpin support added to the bridge driver by Anna Fischer. My current plan is to submit this for inclusion in 2.6.33 when people are happy with it and I tracked down any remaining bugs, and possibly to do the communication with the lower device one release later. Arnd <>< --- Arnd Bergmann (3): macvlan: implement VEPA and private mode macvlan: export macvlan mode through netlink iplink: add macvlan options for bridge mode Eric Biederman (1): macvlan: Reflect macvlan packets meant for other macvlan devices linux/drivers/net/macvlan.c | 170 +++++++++++++++++++++++++++++++++----- linux/include/linux/if_link.h | 15 +++ 2 files changed, 161 insertions(+), 24 deletions(-) iproute2/include/linux/if_link.h | 15 +++ iproute2/ip/Makefile | 3 +- iproute2/ip/iplink_macvlan.c | 93 ++++++++++++++++++ 3 files changed, 110 insertions(+), 1 deletions(-) create mode 100644 ip/iplink_macvlan.c
Arnd Bergmann
2009-Nov-17 22:39 UTC
[Bridge] [PATCH 1/3] macvlan: Reflect macvlan packets meant for other macvlan devices
From: Eric Biederman <ebiederm at xmission.com> Switch ports do not send packets back out the same port they came in on. This causes problems when using a macvlan device inside of a network namespace as it becomes impossible to talk to other macvlan devices. Signed-off-by: Eric Biederman <ebiederm at xmission.com> Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- drivers/net/macvlan.c | 91 ++++++++++++++++++++++++++++++++++++------------- 1 files changed, 67 insertions(+), 24 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 3aabfd9..406b8b5 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -29,6 +29,7 @@ #include <linux/if_link.h> #include <linux/if_macvlan.h> #include <net/rtnetlink.h> +#include <net/xfrm.h> #define MACVLAN_HASH_SIZE (1 << BITS_PER_BYTE) @@ -102,7 +103,8 @@ static int macvlan_addr_busy(const struct macvlan_port *port, } static void macvlan_broadcast(struct sk_buff *skb, - const struct macvlan_port *port) + const struct macvlan_port *port, + struct net_device *src) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_dev *vlan; @@ -118,6 +120,9 @@ static void macvlan_broadcast(struct sk_buff *skb, hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) { dev = vlan->dev; + if (dev == src) + continue; + nskb = skb_clone(skb, GFP_ATOMIC); if (nskb == NULL) { dev->stats.rx_errors++; @@ -140,20 +145,45 @@ static void macvlan_broadcast(struct sk_buff *skb, } } +static int macvlan_unicast(struct sk_buff *skb, const struct macvlan_dev *dest) +{ + struct net_device *dev = dest->dev; + + if (unlikely(!dev->flags & IFF_UP)) { + kfree_skb(skb); + return NET_XMIT_DROP; + } + + skb = skb_share_check(skb, GFP_ATOMIC); + if (!skb) { + dev->stats.rx_errors++; + dev->stats.rx_dropped++; + return NET_XMIT_DROP; + } + + dev->stats.rx_bytes += skb->len + ETH_HLEN; + dev->stats.rx_packets++; + + skb->dev = dev; + skb->pkt_type = PACKET_HOST; + netif_rx(skb); + return NET_XMIT_SUCCESS; +} + + /* called under rcu_read_lock() from netif_receive_skb */ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_port *port; const struct macvlan_dev *vlan; - struct net_device *dev; port = rcu_dereference(skb->dev->macvlan_port); if (port == NULL) return skb; if (is_multicast_ether_addr(eth->h_dest)) { - macvlan_broadcast(skb, port); + macvlan_broadcast(skb, port, NULL); return skb; } @@ -161,27 +191,43 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) if (vlan == NULL) return skb; - dev = vlan->dev; - if (unlikely(!(dev->flags & IFF_UP))) { - kfree_skb(skb); - return NULL; - } + macvlan_unicast(skb, vlan); + return NULL; +} - skb = skb_share_check(skb, GFP_ATOMIC); - if (skb == NULL) { - dev->stats.rx_errors++; - dev->stats.rx_dropped++; - return NULL; - } +static int macvlan_xmit_world(struct sk_buff *skb, struct net_device *dev) +{ + const struct macvlan_dev *vlan = netdev_priv(dev); + __skb_push(skb, skb->data - skb_mac_header(skb)); + skb->dev = vlan->lowerdev; + return dev_queue_xmit(skb); +} - dev->stats.rx_bytes += skb->len + ETH_HLEN; - dev->stats.rx_packets++; +static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) +{ + const struct macvlan_dev *vlan = netdev_priv(dev); + const struct macvlan_port *port = vlan->port; + const struct macvlan_dev *dest; + const struct ethhdr *eth; - skb->dev = dev; - skb->pkt_type = PACKET_HOST; + skb->protocol = eth_type_trans(skb, dev); + eth = eth_hdr(skb); - netif_rx(skb); - return NULL; + skb_dst_drop(skb); + skb->mark = 0; + secpath_reset(skb); + nf_reset(skb); + + if (is_multicast_ether_addr(eth->h_dest)) { + macvlan_broadcast(skb, port, dev); + return macvlan_xmit_world(skb, dev); + } + + dest = macvlan_hash_lookup(port, eth->h_dest); + if (dest) + return macvlan_unicast(skb, dest); + + return macvlan_xmit_world(skb, dev); } static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb, @@ -189,13 +235,10 @@ static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb, { int i = skb_get_queue_mapping(skb); struct netdev_queue *txq = netdev_get_tx_queue(dev, i); - const struct macvlan_dev *vlan = netdev_priv(dev); unsigned int len = skb->len; int ret; - skb->dev = vlan->lowerdev; - ret = dev_queue_xmit(skb); - + ret = macvlan_queue_xmit(skb, dev); if (likely(ret == NET_XMIT_SUCCESS)) { txq->tx_packets++; txq->tx_bytes += len; -- 1.6.3.3
Arnd Bergmann
2009-Nov-17 22:39 UTC
[Bridge] [PATCH 2/3] macvlan: implement VEPA and private mode
This allows each macvlan slave device to be in one of three modes, depending on the use case: MACVLAN_MODE_PRIVATE: The device never communicates with any other device on the same upper_dev. This even includes frames coming back from a reflective relay, where supported by the adjacent bridge. MACVLAN_MODE_VEPA: The new Virtual Ethernet Port Aggregator (VEPA) mode, we assume that the adjacent bridge returns all frames where both source and destination are local to the macvlan port, i.e. the bridge is set up as a reflective relay. Broadcast frames coming in from the upper_dev get flooded to all macvlan interfaces in VEPA mode. We never deliver any frames locally. MACVLAN_MODE_BRIDGE: We provide the behavior of a simple bridge between different macvlan interfaces on the same port. Frames from one interface to another one get delivered directly and are not sent out externally. Broadcast frames get flooded to all other bridge ports and to the external interface, but when they come back from a reflective relay, we don't deliver them again. Since we know all the MAC addresses, the macvlan bridge mode does not require learning or STP like the bridge module does. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- drivers/net/macvlan.c | 46 +++++++++++++++++++++++++++++++++++++--------- 1 files changed, 37 insertions(+), 9 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 406b8b5..fa8b568 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -33,6 +33,12 @@ #define MACVLAN_HASH_SIZE (1 << BITS_PER_BYTE) +enum macvlan_type { + MACVLAN_PRIVATE = 1, + MACVLAN_VEPA = 2, + MACVLAN_BRIDGE = 4, +}; + struct macvlan_port { struct net_device *dev; struct hlist_head vlan_hash[MACVLAN_HASH_SIZE]; @@ -45,6 +51,7 @@ struct macvlan_dev { struct hlist_node hlist; struct macvlan_port *port; struct net_device *lowerdev; + enum macvlan_mode mode; }; @@ -104,7 +111,8 @@ static int macvlan_addr_busy(const struct macvlan_port *port, static void macvlan_broadcast(struct sk_buff *skb, const struct macvlan_port *port, - struct net_device *src) + struct net_device *src, + enum macvlan_mode mode) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_dev *vlan; @@ -123,6 +131,9 @@ static void macvlan_broadcast(struct sk_buff *skb, if (dev == src) continue; + if (!(vlan->mode & mode)) + continue; + nskb = skb_clone(skb, GFP_ATOMIC); if (nskb == NULL) { dev->stats.rx_errors++; @@ -177,13 +188,27 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_port *port; const struct macvlan_dev *vlan; + const struct macvlan_dev *src; port = rcu_dereference(skb->dev->macvlan_port); if (port == NULL) return skb; if (is_multicast_ether_addr(eth->h_dest)) { - macvlan_broadcast(skb, port, NULL); + src = macvlan_hash_lookup(port, eth->h_source); + if (!src) + /* frame comes from an external address */ + macvlan_broadcast(skb, port, NULL, MACVLAN_MODE_VEPA + | MACVLAN_MODE_VEPA | MACVLAN_MODE_BRIDGE); + else if (src->mode == MACVLAN_MODE_VEPA) + /* flood to everyone except source */ + macvlan_broadcast(skb, port, src->dev, + MACVLAN_MODE_VEPA | MACVLAN_MODE_BRIDGE); + else if (src->mode == MACVLAN_MODE_BRIDGE) + /* flood only to VEPA ports, bridge ports + already saw the frame */ + macvlan_broadcast(skb, port, src->dev, + MACVLAN_MODE_VEPA); return skb; } @@ -218,14 +243,17 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) secpath_reset(skb); nf_reset(skb); - if (is_multicast_ether_addr(eth->h_dest)) { - macvlan_broadcast(skb, port, dev); - return macvlan_xmit_world(skb, dev); - } + if (vlan->mode == MACVLAN_MODE_BRIDGE) { + /* send to other bridge ports directly */ + if (is_multicast_ether_addr(eth->h_dest)) { + macvlan_broadcast(skb, port, dev, MACVLAN_MODE_BRIDGE); + return macvlan_xmit_world(skb, dev); + } - dest = macvlan_hash_lookup(port, eth->h_dest); - if (dest) - return macvlan_unicast(skb, dest); + dest = macvlan_hash_lookup(port, eth->h_dest); + if (dest && dest->mode == MACVLAN_MODE_BRIDGE) + return macvlan_unicast(skb, dest); + } return macvlan_xmit_world(skb, dev); } -- 1.6.3.3
Arnd Bergmann
2009-Nov-17 22:39 UTC
[Bridge] [PATCH 3/3] macvlan: export macvlan mode through netlink
In order to support all three modes of macvlan at runtime, extend the existing netlink protocol to allow choosing the mode per macvlan slave interface. This depends on a matching patch to iproute2 in order to become accessible in user land. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- drivers/net/macvlan.c | 67 +++++++++++++++++++++++++++++++++++++++++----- include/linux/if_link.h | 15 ++++++++++ 2 files changed, 74 insertions(+), 8 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index fa8b568..731017e 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -33,12 +33,6 @@ #define MACVLAN_HASH_SIZE (1 << BITS_PER_BYTE) -enum macvlan_type { - MACVLAN_PRIVATE = 1, - MACVLAN_VEPA = 2, - MACVLAN_BRIDGE = 4, -}; - struct macvlan_port { struct net_device *dev; struct hlist_head vlan_hash[MACVLAN_HASH_SIZE]; @@ -51,7 +45,7 @@ struct macvlan_dev { struct hlist_node hlist; struct macvlan_port *port; struct net_device *lowerdev; - enum macvlan_mode mode; + enum ifla_macvlan_mode mode; }; @@ -112,7 +106,7 @@ static int macvlan_addr_busy(const struct macvlan_port *port, static void macvlan_broadcast(struct sk_buff *skb, const struct macvlan_port *port, struct net_device *src, - enum macvlan_mode mode) + enum ifla_macvlan_mode mode) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_dev *vlan; @@ -553,6 +547,18 @@ static int macvlan_validate(struct nlattr *tb[], struct nlattr *data[]) if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) return -EADDRNOTAVAIL; } + + if (data && data[IFLA_MACVLAN_MODE]) { + u32 mode = nla_get_u32(data[IFLA_MACVLAN_MODE]); + switch (mode) { + case MACVLAN_MODE_PRIVATE: + case MACVLAN_MODE_VEPA: + case MACVLAN_MODE_BRIDGE: + break; + default: + return -EINVAL; + } + } return 0; } @@ -617,6 +623,13 @@ static int macvlan_newlink(struct net_device *dev, vlan->dev = dev; vlan->port = port; + vlan->mode = MACVLAN_MODE_VEPA; + if (data && data[IFLA_MACVLAN_MODE]) { + u32 mode = nla_get_u32(data[IFLA_MACVLAN_MODE]); + + vlan->mode = mode; + } + err = register_netdevice(dev); if (err < 0) return err; @@ -638,6 +651,39 @@ static void macvlan_dellink(struct net_device *dev) macvlan_port_destroy(port->dev); } +static int macvlan_changelink(struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + if (data && data[IFLA_MACVLAN_MODE]) { + u32 mode = nla_get_u32(data[IFLA_MACVLAN_MODE]); + vlan->mode = mode; + } + + return 0; +} + +static size_t macvlan_get_size(const struct net_device *dev) +{ + return nla_total_size(4); +} + +static int macvlan_fill_info(struct sk_buff *skb, + const struct net_device *dev) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + + NLA_PUT_U32(skb, IFLA_MACVLAN_MODE, vlan->mode); + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + +static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = { + [IFLA_MACVLAN_MODE] = { .type = NLA_U32 }, +}; + static struct rtnl_link_ops macvlan_link_ops __read_mostly = { .kind = "macvlan", .priv_size = sizeof(struct macvlan_dev), @@ -646,6 +692,11 @@ static struct rtnl_link_ops macvlan_link_ops __read_mostly = { .validate = macvlan_validate, .newlink = macvlan_newlink, .dellink = macvlan_dellink, + .maxtype = IFLA_MACVLAN_MAX, + .policy = macvlan_policy, + .changelink = macvlan_changelink, + .get_size = macvlan_get_size, + .fill_info = macvlan_fill_info, }; static int macvlan_device_event(struct notifier_block *unused, diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 176c518..ef70ebc 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -190,4 +190,19 @@ struct ifla_vlan_qos_mapping __u32 to; }; +/* MACVLAN section */ +enum { + IFLA_MACVLAN_UNSPEC, + IFLA_MACVLAN_MODE, + __IFLA_MACVLAN_MAX, +}; + +#define IFLA_MACVLAN_MAX (__IFLA_MACVLAN_MAX - 1) + +enum ifla_macvlan_mode { + MACVLAN_MODE_PRIVATE = 1, /* don't talk to other macvlans */ + MACVLAN_MODE_VEPA = 2, /* talk to other ports through ext bridge */ + MACVLAN_MODE_BRIDGE = 4, /* talk to bridge ports directly */ +}; + #endif /* _LINUX_IF_LINK_H */ -- 1.6.3.3
Arnd Bergmann
2009-Nov-17 22:39 UTC
[Bridge] [PATCH] iplink: add macvlan options for bridge mode
Macvlan can now optionally support forwarding between its ports, if they are in "bridge" mode. This adds support for this option to "ip link add", "ip link set" and "ip -d link show". The default mode in the kernel is now "vepa" mode, meaning "virtual ethernet port aggregator". This mode is used together with the "hairpin" mode of an ethernet bridge that the parent of the macvlan device is connected to. All frames still get sent out to the external interface, but the adjacent bridge is able to send them back on the same wire in hairpin mode, so the macvlan ports are able to see each other, which the bridge can be configured to monitor and control traffic between all macvlan instances. Multicast traffic coming in from the external interface is checked for the source MAC address and only delivered to ports that have not yet seen it. In bridge mode, macvlan will send all multicast traffic to other interfaces that are also in bridge mode but not to those in vepa mode, which get them on the way back from the hairpin. The third supported mode is "private", which prevents communication between macvlans even if the adjacent bridge is in hairpin mode. This behavior is closer to the original implementation of macvlan but stricly maintains isolation. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- include/linux/if_link.h | 15 ++++++++ ip/Makefile | 3 +- ip/iplink_macvlan.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 110 insertions(+), 1 deletions(-) create mode 100644 ip/iplink_macvlan.c diff --git a/include/linux/if_link.h b/include/linux/if_link.h index b0b9e8a..425c489 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -188,4 +188,19 @@ struct ifla_vlan_qos_mapping __u32 to; }; +/* MACVLAN section */ +enum { + IFLA_MACVLAN_UNSPEC, + IFLA_MACVLAN_MODE, + __IFLA_MACVLAN_MAX, +}; + +enum ifla_macvlan_mode { + MACVLAN_MODE_PRIVATE = 1, /* don't talk to other macvlans */ + MACVLAN_MODE_VEPA = 2, /* talk to other ports through ext bridge */ + MACVLAN_MODE_BRIDGE = 4, /* talk to bridge ports directly */ +}; + +#define IFLA_MACVLAN_MAX (__IFLA_MACVLAN_MAX - 1) + #endif /* _LINUX_IF_LINK_H */ diff --git a/ip/Makefile b/ip/Makefile index 51914e8..46a9836 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -2,7 +2,8 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o \ rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \ ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \ ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \ - iplink_vlan.o link_veth.o link_gre.o iplink_can.o + iplink_vlan.o link_veth.o link_gre.o iplink_can.o \ + iplink_macvlan.o RTMONOBJ=rtmon.o diff --git a/ip/iplink_macvlan.c b/ip/iplink_macvlan.c new file mode 100644 index 0000000..307f559 --- /dev/null +++ b/ip/iplink_macvlan.c @@ -0,0 +1,93 @@ +/* + * iplink_vlan.c VLAN device support + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Patrick McHardy <kaber at trash.net> + * Arnd Bergmann <arnd at arndb.de> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/socket.h> +#include <linux/if_link.h> + +#include "rt_names.h" +#include "utils.h" +#include "ip_common.h" + +static void explain(void) +{ + fprintf(stderr, + "Usage: ... macvlan mode { private | vepa | bridge }\n" + ); +} + +static int mode_arg(void) +{ + fprintf(stderr, "Error: argument of \"mode\" must be \"private\", " + "\"vepa\" or \"bridge\"\n"); + return -1; +} + +static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv, + struct nlmsghdr *n) +{ + while (argc > 0) { + if (matches(*argv, "mode") == 0) { + __u32 mode = 0; + NEXT_ARG(); + + if (strcmp(*argv, "private") == 0) + mode = MACVLAN_MODE_PRIVATE; + else if (strcmp(*argv, "vepa") == 0) + mode = MACVLAN_MODE_VEPA; + else if (strcmp(*argv, "bridge") == 0) + mode = MACVLAN_MODE_BRIDGE; + else + return mode_arg(); + + addattr32(n, 1024, IFLA_MACVLAN_MODE, mode); + } else if (matches(*argv, "help") == 0) { + explain(); + return -1; + } else { + fprintf(stderr, "macvlan: what is \"%s\"?\n", *argv); + explain(); + return -1; + } + argc--, argv++; + } + + return 0; +} + +static void macvlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) +{ + __u32 mode; + + if (!tb) + return; + + if (!tb[IFLA_MACVLAN_MODE] || + RTA_PAYLOAD(tb[IFLA_MACVLAN_MODE]) < sizeof(__u32)) + return; + + mode = *(__u32 *)RTA_DATA(tb[IFLA_VLAN_ID]); + fprintf(f, " mode %s ", + mode == MACVLAN_MODE_PRIVATE ? "private" + : mode == MACVLAN_MODE_VEPA ? "vepa" + : mode == MACVLAN_MODE_BRIDGE ? "bridge" + : "unknown"); +} + +struct link_util macvlan_link_util = { + .id = "macvlan", + .maxattr = IFLA_MACVLAN_MAX, + .parse_opt = macvlan_parse_opt, + .print_opt = macvlan_print_opt, +}; -- 1.6.3.3
Arnd Bergmann
2009-Nov-17 22:56 UTC
[Bridge] [PATCH 0/3] macvlan: add vepa and bridge mode
Sorry, I used the wrong address for the virtualization mailing list at first. Please correct this to <virtualization at lists.linux-foundation.org> when replying to the other mails. For people only subscribed to virtualization, you can find the actual patches at http://patchwork.kernel.org/patch/60810/ http://patchwork.kernel.org/patch/60811/ http://patchwork.kernel.org/patch/60813/ http://patchwork.kernel.org/patch/60814/ Arnd <>< On Tuesday 17 November 2009, Arnd Bergmann wrote:> This is based on an earlier patch from Eric Biederman adding > forwarding between macvlans. I extended his approach to > allow the administrator to choose the mode for each macvlan, > and to implement a functional VEPA between macvlan. > > Still missing from this is support for communication between > the lower device that the macvlans are based on. This would > be extremely useful but as others have found out before me > requires significant changes not only to macvlan but also > to the common transmit path. > > I've seen one panic during testing this that I still need > to track down, but it generally does what is advertised. > I've tested VEPA operation with the hairpin support > added to the bridge driver by Anna Fischer. > > My current plan is to submit this for inclusion in 2.6.33 > when people are happy with it and I tracked down any > remaining bugs, and possibly to do the communication with > the lower device one release later. > > Arnd <>< > > --- > > Arnd Bergmann (3): > macvlan: implement VEPA and private mode > macvlan: export macvlan mode through netlink > iplink: add macvlan options for bridge mode > > Eric Biederman (1): > macvlan: Reflect macvlan packets meant for other macvlan devices > > linux/drivers/net/macvlan.c | 170 +++++++++++++++++++++++++++++++++----- > linux/include/linux/if_link.h | 15 +++ > 2 files changed, 161 insertions(+), 24 deletions(-) > > iproute2/include/linux/if_link.h | 15 +++ > iproute2/ip/Makefile | 3 +- > iproute2/ip/iplink_macvlan.c | 93 ++++++++++++++++++ > 3 files changed, 110 insertions(+), 1 deletions(-) > create mode 100644 ip/iplink_macvlan.c >
Eric Dumazet
2009-Nov-18 06:30 UTC
[Bridge] [PATCH 1/3] macvlan: Reflect macvlan packets meant for other macvlan devices
Arnd Bergmann a ?crit :> From: Eric Biederman <ebiederm at xmission.com> > > Switch ports do not send packets back out the same port they came > in on. This causes problems when using a macvlan device inside > of a network namespace as it becomes impossible to talk to > other macvlan devices.This patch is very welcome. I review it and found one oddity.> > Signed-off-by: Eric Biederman <ebiederm at xmission.com> > Signed-off-by: Arnd Bergmann <arnd at arndb.de> > --- > +static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) > +{ > + const struct macvlan_dev *vlan = netdev_priv(dev); > + const struct macvlan_port *port = vlan->port; > + const struct macvlan_dev *dest; > + const struct ethhdr *eth; > > - skb->dev = dev; > - skb->pkt_type = PACKET_HOST; > + skb->protocol = eth_type_trans(skb, dev); > + eth = eth_hdr(skb); > > - netif_rx(skb); > - return NULL; > + skb_dst_drop(skb);Why do you drop dst here ? It seems strange, since this driver specifically masks out IFF_XMIT_DST_RELEASE in its macvlan_setup() : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; If we really want to drop dst, it could be done by caller, if IFF_XMIT_DST_RELEASE was not masked in macvlan_setup().> + skb->mark = 0; > + secpath_reset(skb); > + nf_reset(skb); > + > + if (is_multicast_ether_addr(eth->h_dest)) { > + macvlan_broadcast(skb, port, dev); > + return macvlan_xmit_world(skb, dev); > + } > + > + dest = macvlan_hash_lookup(port, eth->h_dest); > + if (dest) > + return macvlan_unicast(skb, dest); > + > + return macvlan_xmit_world(skb, dev); > }# find net drivers/net|xargs grep -n IFF_XMIT_DST_RELEASE net/8021q/vlan_dev.c:837: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; net/atm/clip.c:561: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; net/core/dev.c:1778: if (dev->priv_flags & IFF_XMIT_DST_RELEASE) net/core/dev.c:5287: dev->priv_flags = IFF_XMIT_DST_RELEASE; net/ipv4/ip_gre.c:1236: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; net/ipv4/ipip.c:717: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; net/ipv6/sit.c:1104: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/appletalk/ipddp.c:76: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/bonding/bond_main.c:4534: bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/eql.c:197: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/ifb.c:162: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/loopback.c:174: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/macvlan.c:421: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/ppp_generic.c:1057: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; drivers/net/wan/hdlc_fr.c:1057: dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
On Tue, 17 Nov 2009 22:39:07 +0000 Arnd Bergmann <arnd at arndb.de> wrote:> This is based on an earlier patch from Eric Biederman adding > forwarding between macvlans. I extended his approach to > allow the administrator to choose the mode for each macvlan, > and to implement a functional VEPA between macvlan. > > Still missing from this is support for communication between > the lower device that the macvlans are based on. This would > be extremely useful but as others have found out before me > requires significant changes not only to macvlan but also > to the common transmit path.If this means that the "children" macvlans can't communicate with their "parent" interface as though they were all attached to the same virtual ethernet segment, I think that is a reasonable limitation. On other networking equipment I've used, the moment "sub-interfaces" are created, their parent interface can't be used for any communications, only for setting link related parameters e.g. for ethernet interfaces, speed and duplex etc.> > I've seen one panic during testing this that I still need > to track down, but it generally does what is advertised. > I've tested VEPA operation with the hairpin support > added to the bridge driver by Anna Fischer. > > My current plan is to submit this for inclusion in 2.6.33 > when people are happy with it and I tracked down any > remaining bugs, and possibly to do the communication with > the lower device one release later. > > Arnd <>< > > --- > > Arnd Bergmann (3): > macvlan: implement VEPA and private mode > macvlan: export macvlan mode through netlink > iplink: add macvlan options for bridge mode > > Eric Biederman (1): > macvlan: Reflect macvlan packets meant for other macvlan devices > > linux/drivers/net/macvlan.c | 170 +++++++++++++++++++++++++++++++++----- > linux/include/linux/if_link.h | 15 +++ > 2 files changed, 161 insertions(+), 24 deletions(-) > > iproute2/include/linux/if_link.h | 15 +++ > iproute2/ip/Makefile | 3 +- > iproute2/ip/iplink_macvlan.c | 93 ++++++++++++++++++ > 3 files changed, 110 insertions(+), 1 deletions(-) > create mode 100644 ip/iplink_macvlan.c > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
roel kluin
2009-Nov-18 10:00 UTC
[Bridge] [PATCH 1/3] macvlan: Reflect macvlan packets meant for other macvlan devices
On Tue, Nov 17, 2009 at 11:39 PM, Arnd Bergmann <arnd at arndb.de> wrote:> From: Eric Biederman <ebiederm at xmission.com> > > Switch ports do not send packets back out the same port they came > in on. ?This causes problems when using a macvlan device inside > of a network namespace as it becomes impossible to talk to > other macvlan devices. > > Signed-off-by: Eric Biederman <ebiederm at xmission.com> > Signed-off-by: Arnd Bergmann <arnd at arndb.de>I found a problem:> @@ -140,20 +145,45 @@ static void macvlan_broadcast(struct sk_buff *skb, > ? ? ? ?} > ?} > > +static int macvlan_unicast(struct sk_buff *skb, const struct macvlan_dev *dest) > +{ > + ? ? ? struct net_device *dev = dest->dev; > + > + ? ? ? if (unlikely(!dev->flags & IFF_UP)) {parentheses are missing: if (unlikely(!(dev->flags & IFF_UP))) {> + ? ? ? ? ? ? ? kfree_skb(skb); > + ? ? ? ? ? ? ? return NET_XMIT_DROP; > + ? ? ? } > + > + ? ? ? skb = skb_share_check(skb, GFP_ATOMIC); > + ? ? ? if (!skb) { > + ? ? ? ? ? ? ? dev->stats.rx_errors++; > + ? ? ? ? ? ? ? dev->stats.rx_dropped++; > + ? ? ? ? ? ? ? return NET_XMIT_DROP; > + ? ? ? } > + > + ? ? ? dev->stats.rx_bytes += skb->len + ETH_HLEN; > + ? ? ? dev->stats.rx_packets++; > + > + ? ? ? skb->dev = dev; > + ? ? ? skb->pkt_type = PACKET_HOST; > + ? ? ? netif_rx(skb); > + ? ? ? return NET_XMIT_SUCCESS; > +}
Arnd Bergmann
2009-Nov-27 10:57 UTC
[Bridge] [PATCH, resend] iproute2/iplink: add macvlan options for bridge mode
Resending, the kernel patches have gone into net-next, so a version of this should go into iproute2. --- Macvlan can now optionally support forwarding between its ports, if they are in "bridge" mode. This adds support for this option to "ip link add", "ip link set" and "ip -d link show". The default mode in the kernel is now "vepa" mode, meaning "virtual ethernet port aggregator". This mode is used together with the "hairpin" mode of an ethernet bridge that the parent of the macvlan device is connected to. All frames still get sent out to the external interface, but the adjacent bridge is able to send them back on the same wire in hairpin mode, so the macvlan ports are able to see each other, which the bridge can be configured to monitor and control traffic between all macvlan instances. Multicast traffic coming in from the external interface is checked for the source MAC address and only delivered to ports that have not yet seen it. In bridge mode, macvlan will send all multicast traffic to other interfaces that are also in bridge mode but not to those in vepa mode, which get them on the way back from the hairpin. The third supported mode is "private", which prevents communication between macvlans even if the adjacent bridge is in hairpin mode. This behavior is closer to the original implementation of macvlan but stricly maintains isolation. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- include/linux/if_link.h | 15 ++++++++ ip/Makefile | 3 +- ip/iplink_macvlan.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 110 insertions(+), 1 deletions(-) create mode 100644 ip/iplink_macvlan.c diff --git a/include/linux/if_link.h b/include/linux/if_link.h index b0b9e8a..425c489 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -188,4 +188,19 @@ struct ifla_vlan_qos_mapping __u32 to; }; +/* MACVLAN section */ +enum { + IFLA_MACVLAN_UNSPEC, + IFLA_MACVLAN_MODE, + __IFLA_MACVLAN_MAX, +}; + +enum ifla_macvlan_mode { + MACVLAN_MODE_PRIVATE = 1, /* don't talk to other macvlans */ + MACVLAN_MODE_VEPA = 2, /* talk to other ports through ext bridge */ + MACVLAN_MODE_BRIDGE = 4, /* talk to bridge ports directly */ +}; + +#define IFLA_MACVLAN_MAX (__IFLA_MACVLAN_MAX - 1) + #endif /* _LINUX_IF_LINK_H */ diff --git a/ip/Makefile b/ip/Makefile index 51914e8..46a9836 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -2,7 +2,8 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o \ rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \ ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \ ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \ - iplink_vlan.o link_veth.o link_gre.o iplink_can.o + iplink_vlan.o link_veth.o link_gre.o iplink_can.o \ + iplink_macvlan.o RTMONOBJ=rtmon.o diff --git a/ip/iplink_macvlan.c b/ip/iplink_macvlan.c new file mode 100644 index 0000000..307f559 --- /dev/null +++ b/ip/iplink_macvlan.c @@ -0,0 +1,93 @@ +/* + * iplink_vlan.c VLAN device support + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Patrick McHardy <kaber at trash.net> + * Arnd Bergmann <arnd at arndb.de> + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/socket.h> +#include <linux/if_link.h> + +#include "rt_names.h" +#include "utils.h" +#include "ip_common.h" + +static void explain(void) +{ + fprintf(stderr, + "Usage: ... macvlan mode { private | vepa | bridge }\n" + ); +} + +static int mode_arg(void) +{ + fprintf(stderr, "Error: argument of \"mode\" must be \"private\", " + "\"vepa\" or \"bridge\"\n"); + return -1; +} + +static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv, + struct nlmsghdr *n) +{ + while (argc > 0) { + if (matches(*argv, "mode") == 0) { + __u32 mode = 0; + NEXT_ARG(); + + if (strcmp(*argv, "private") == 0) + mode = MACVLAN_MODE_PRIVATE; + else if (strcmp(*argv, "vepa") == 0) + mode = MACVLAN_MODE_VEPA; + else if (strcmp(*argv, "bridge") == 0) + mode = MACVLAN_MODE_BRIDGE; + else + return mode_arg(); + + addattr32(n, 1024, IFLA_MACVLAN_MODE, mode); + } else if (matches(*argv, "help") == 0) { + explain(); + return -1; + } else { + fprintf(stderr, "macvlan: what is \"%s\"?\n", *argv); + explain(); + return -1; + } + argc--, argv++; + } + + return 0; +} + +static void macvlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) +{ + __u32 mode; + + if (!tb) + return; + + if (!tb[IFLA_MACVLAN_MODE] || + RTA_PAYLOAD(tb[IFLA_MACVLAN_MODE]) < sizeof(__u32)) + return; + + mode = *(__u32 *)RTA_DATA(tb[IFLA_VLAN_ID]); + fprintf(f, " mode %s ", + mode == MACVLAN_MODE_PRIVATE ? "private" + : mode == MACVLAN_MODE_VEPA ? "vepa" + : mode == MACVLAN_MODE_BRIDGE ? "bridge" + : "unknown"); +} + +struct link_util macvlan_link_util = { + .id = "macvlan", + .maxattr = IFLA_MACVLAN_MAX, + .parse_opt = macvlan_parse_opt, + .print_opt = macvlan_print_opt, +}; -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Arnd Bergmann
2009-Dec-18 13:45 UTC
[Bridge] [PATCH] iplink: add macvlan options for bridge mode
Ping! Stephen, I submitted this twice but never heard back from you. The changes to macvlan have been merged in 2.6.33-rc1, so it would be good to have this included as well. Arnd On Tuesday 17 November 2009, Arnd Bergmann wrote:> Macvlan can now optionally support forwarding between its > ports, if they are in "bridge" mode. This adds support > for this option to "ip link add", "ip link set" and "ip > -d link show". > > The default mode in the kernel is now "vepa" mode, meaning > "virtual ethernet port aggregator". This mode is used > together with the "hairpin" mode of an ethernet bridge > that the parent of the macvlan device is connected to. > All frames still get sent out to the external interface, > but the adjacent bridge is able to send them back on > the same wire in hairpin mode, so the macvlan ports > are able to see each other, which the bridge can be > configured to monitor and control traffic between > all macvlan instances. Multicast traffic coming in > from the external interface is checked for the source > MAC address and only delivered to ports that have not > yet seen it. > > In bridge mode, macvlan will send all multicast traffic > to other interfaces that are also in bridge mode but > not to those in vepa mode, which get them on the way > back from the hairpin. > > The third supported mode is "private", which prevents > communication between macvlans even if the adjacent > bridge is in hairpin mode. This behavior is closer to > the original implementation of macvlan but stricly > maintains isolation. > > Signed-off-by: Arnd Bergmann <arnd at arndb.de>
Stephen Hemminger
2009-Dec-26 19:24 UTC
[Bridge] [PATCH, resend] iproute2/iplink: add macvlan options for bridge mode
On Fri, 27 Nov 2009 10:57:25 +0000 Arnd Bergmann <arnd at arndb.de> wrote:> Macvlan can now optionally support forwarding between its > ports, if they are in "bridge" mode. This adds support > for this option to "ip link add", "ip link set" and "ip > -d link show". > > The default mode in the kernel is now "vepa" mode, meaning > "virtual ethernet port aggregator". This mode is used > together with the "hairpin" mode of an ethernet bridge > that the parent of the macvlan device is connected to. > All frames still get sent out to the external interface, > but the adjacent bridge is able to send them back on > the same wire in hairpin mode, so the macvlan ports > are able to see each other, which the bridge can be > configured to monitor and control traffic between > all macvlan instances. Multicast traffic coming in > from the external interface is checked for the source > MAC address and only delivered to ports that have not > yet seen it. > > In bridge mode, macvlan will send all multicast traffic > to other interfaces that are also in bridge mode but > not to those in vepa mode, which get them on the way > back from the hairpin. > > The third supported mode is "private", which prevents > communication between macvlans even if the adjacent > bridge is in hairpin mode. This behavior is closer to > the original implementation of macvlan but stricly > maintains isolation. > > Signed-off-by: Arnd Bergmann <arnd at arndb.de>Okay, applied for next version --
Apparently Analagous Threads
- [Bridge] [PATCH 0/3] macvlan: add vepa and bridge mode
- [Bridge] [PATCH 0/3] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv2 0/4] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv2 0/4] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv2 0/4] macvlan: add vepa and bridge mode