Arnd Bergmann
2009-Nov-26 16:07 UTC
[Bridge] [PATCHv3 0/4] macvlan: add vepa and bridge mode
Not many changes this time, just integrated a bug fix and all the coding style feedback from Eric Dumazet and Patrick McHardy. I'll keep the patch for network namespaces on the tx path out of this series for now, because the discussion is still ongoing and it addresses an unrelated issue. --- Version 2 description: The patch to iproute2 has not changed, so I'm not including it this time. Patch 4/4 (the netlink interface) is basically unchanged as well but included for completeness. The other changes have moved forward a bit, to the point where I find them a lot cleaner and am more confident in the code being ready for inclusion. The implementation hardly resembles Erics original patch now, so I've dropped his signed-off-by. Please take a look and ack if you are happy so we can get it into 2.6.33. --- Version 1 description: This is based on an earlier patch from Eric Biederman adding forwarding between macvlans. I extended his approach to allow the administrator to choose the mode for each macvlan, and to implement a functional VEPA between macvlan. Still missing from this is support for communication between the lower device that the macvlans are based on. This would be extremely useful but as others have found out before me requires significant changes not only to macvlan but also to the common transmit path. I've tested VEPA operation with the hairpin support added to the bridge driver by Anna Fischer. --- Arnd Bergmann (4): veth: move loopback logic to common location macvlan: cleanup rx statistics macvlan: implement bridge, VEPA and private mode macvlan: export macvlan mode through netlink drivers/net/macvlan.c | 188 +++++++++++++++++++++++++++++++++++++-------- drivers/net/veth.c | 17 +---- include/linux/if_link.h | 15 ++++ include/linux/netdevice.h | 2 + net/core/dev.c | 40 ++++++++++ 5 files changed, 214 insertions(+), 48 deletions(-)
Arnd Bergmann
2009-Nov-26 16:07 UTC
[Bridge] [PATCHv3 1/4] veth: move loopback logic to common location
The veth driver contains code to forward an skb from the start_xmit function of one network device into the receive path of another device. Moving that code into a common location lets us reuse the code for direct forwarding of data between macvlan ports, and possibly in other drivers. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- drivers/net/veth.c | 17 +++-------------- include/linux/netdevice.h | 2 ++ net/core/dev.c | 40 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+), 14 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 2d657f2..6c4b5a2 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -155,8 +155,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) struct veth_net_stats *stats, *rcv_stats; int length, cpu; - skb_orphan(skb); - priv = netdev_priv(dev); rcv = priv->peer; rcv_priv = netdev_priv(rcv); @@ -168,20 +166,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) if (!(rcv->flags & IFF_UP)) goto tx_drop; - if (skb->len > (rcv->mtu + MTU_PAD)) - goto rx_drop; - - skb->tstamp.tv64 = 0; - skb->pkt_type = PACKET_HOST; - skb->protocol = eth_type_trans(skb, rcv); if (dev->features & NETIF_F_NO_CSUM) skb->ip_summed = rcv_priv->ip_summed; - skb->mark = 0; - secpath_reset(skb); - nf_reset(skb); - - length = skb->len; + length = skb->len + ETH_HLEN; + if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS) + goto rx_drop; stats->tx_bytes += length; stats->tx_packets++; @@ -189,7 +179,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) rcv_stats->rx_bytes += length; rcv_stats->rx_packets++; - netif_rx(skb); return NETDEV_TX_OK; tx_drop: diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 97873e3..9428793 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1562,6 +1562,8 @@ extern int dev_set_mac_address(struct net_device *, extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq); +extern int dev_forward_skb(struct net_device *dev, + struct sk_buff *skb); extern int netdev_budget; diff --git a/net/core/dev.c b/net/core/dev.c index ccefa24..f8baa15 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -105,6 +105,7 @@ #include <net/dst.h> #include <net/pkt_sched.h> #include <net/checksum.h> +#include <net/xfrm.h> #include <linux/highmem.h> #include <linux/init.h> #include <linux/kmod.h> @@ -1419,6 +1420,45 @@ static inline void net_timestamp(struct sk_buff *skb) skb->tstamp.tv64 = 0; } +/** + * dev_forward_skb - loopback an skb to another netif + * + * @dev: destination network device + * @skb: buffer to forward + * + * return values: + * NET_RX_SUCCESS (no congestion) + * NET_RX_DROP (packet was dropped) + * + * dev_forward_skb can be used for injecting an skb from the + * start_xmit function of one device into the receive queue + * of another device. + * + * The receiving device may be in another namespace, so + * we have to clear all information in the skb that could + * impact namespace isolation. + */ +int dev_forward_skb(struct net_device *dev, struct sk_buff *skb) +{ + skb_orphan(skb); + + if (!(dev->flags & IFF_UP)) + return NET_RX_DROP; + + if (skb->len > (dev->mtu + dev->hard_header_len)) + return NET_RX_DROP; + + skb_dst_drop(skb); + skb->tstamp.tv64 = 0; + skb->pkt_type = PACKET_HOST; + skb->protocol = eth_type_trans(skb, dev); + skb->mark = 0; + secpath_reset(skb); + nf_reset(skb); + return netif_rx(skb); +} +EXPORT_SYMBOL_GPL(dev_forward_skb); + /* * Support routine. Sends outgoing frames to any network * taps currently in use. -- 1.6.3.3
We have very similar code for rx statistics in two places in the macvlan driver, with a third one being added in the next patch. Consolidate them into one function to improve overall readability of the driver. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- drivers/net/macvlan.c | 70 ++++++++++++++++++++++++++++-------------------- 1 files changed, 41 insertions(+), 29 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index ae2b5c7..1e7faf9 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -116,42 +116,58 @@ static int macvlan_addr_busy(const struct macvlan_port *port, return 0; } +static inline void macvlan_count_rx(const struct macvlan_dev *vlan, + unsigned int len, bool success, + bool multicast) +{ + struct macvlan_rx_stats *rx_stats; + + rx_stats = per_cpu_ptr(vlan->rx_stats, smp_processor_id()); + if (likely(success)) { + rx_stats->rx_packets++;; + rx_stats->rx_bytes += len; + if (multicast) + rx_stats->multicast++; + } else { + rx_stats->rx_errors++; + } +} + +static int macvlan_broadcast_one(struct sk_buff *skb, struct net_device *dev, + const struct ethhdr *eth) +{ + if (!skb) + return NET_RX_DROP; + + skb->dev = dev; + if (!compare_ether_addr_64bits(eth->h_dest, + dev->broadcast)) + skb->pkt_type = PACKET_BROADCAST; + else + skb->pkt_type = PACKET_MULTICAST; + + return netif_rx(skb); +} + static void macvlan_broadcast(struct sk_buff *skb, const struct macvlan_port *port) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_dev *vlan; struct hlist_node *n; - struct net_device *dev; struct sk_buff *nskb; unsigned int i; - struct macvlan_rx_stats *rx_stats; + int err; if (skb->protocol == htons(ETH_P_PAUSE)) return; for (i = 0; i < MACVLAN_HASH_SIZE; i++) { hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) { - dev = vlan->dev; - rx_stats = per_cpu_ptr(vlan->rx_stats, smp_processor_id()); - nskb = skb_clone(skb, GFP_ATOMIC); - if (nskb == NULL) { - rx_stats->rx_errors++; - continue; - } - - rx_stats->rx_bytes += skb->len + ETH_HLEN; - rx_stats->rx_packets++; - rx_stats->multicast++; - - nskb->dev = dev; - if (!compare_ether_addr_64bits(eth->h_dest, dev->broadcast)) - nskb->pkt_type = PACKET_BROADCAST; - else - nskb->pkt_type = PACKET_MULTICAST; - - netif_rx(nskb); + err = macvlan_broadcast_one(nskb, vlan->dev, eth); + macvlan_count_rx(vlan, skb->len + ETH_HLEN, + err == NET_RX_SUCCESS, 1); } } } @@ -163,7 +179,7 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) const struct macvlan_port *port; const struct macvlan_dev *vlan; struct net_device *dev; - struct macvlan_rx_stats *rx_stats; + unsigned int len; port = rcu_dereference(skb->dev->macvlan_port); if (port == NULL) @@ -183,15 +199,11 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) kfree_skb(skb); return NULL; } - rx_stats = per_cpu_ptr(vlan->rx_stats, smp_processor_id()); + len = skb->len + ETH_HLEN; skb = skb_share_check(skb, GFP_ATOMIC); - if (skb == NULL) { - rx_stats->rx_errors++; + macvlan_count_rx(vlan, len, skb != NULL, 0); + if (!skb) return NULL; - } - - rx_stats->rx_bytes += skb->len + ETH_HLEN; - rx_stats->rx_packets++; skb->dev = dev; skb->pkt_type = PACKET_HOST; -- 1.6.3.3
Arnd Bergmann
2009-Nov-26 16:07 UTC
[Bridge] [PATCHv3 3/4] macvlan: implement bridge, VEPA and private mode
This allows each macvlan slave device to be in one of three modes, depending on the use case: MACVLAN_PRIVATE: The device never communicates with any other device on the same upper_dev. This even includes frames coming back from a reflective relay, where supported by the adjacent bridge. MACVLAN_VEPA: The new Virtual Ethernet Port Aggregator (VEPA) mode, we assume that the adjacent bridge returns all frames where both source and destination are local to the macvlan port, i.e. the bridge is set up as a reflective relay. Broadcast frames coming in from the upper_dev get flooded to all macvlan interfaces in VEPA mode. We never deliver any frames locally. MACVLAN_BRIDGE: We provide the behavior of a simple bridge between different macvlan interfaces on the same port. Frames from one interface to another one get delivered directly and are not sent out externally. Broadcast frames get flooded to all other bridge ports and to the external interface, but when they come back from a reflective relay, we don't deliver them again. Since we know all the MAC addresses, the macvlan bridge mode does not require learning or STP like the bridge module does. Based on an earlier patch "macvlan: Reflect macvlan packets meant for other macvlan devices" by Eric Biederman. Signed-off-by: Arnd Bergmann <arnd at arndb.de> Cc: Eric Biederman <ebiederm at xmission.com> --- drivers/net/macvlan.c | 80 ++++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 72 insertions(+), 8 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 1e7faf9..d6bd843 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -29,9 +29,16 @@ #include <linux/if_link.h> #include <linux/if_macvlan.h> #include <net/rtnetlink.h> +#include <net/xfrm.h> #define MACVLAN_HASH_SIZE (1 << BITS_PER_BYTE) +enum macvlan_mode { + MACVLAN_MODE_PRIVATE = 1, + MACVLAN_MODE_VEPA = 2, + MACVLAN_MODE_BRIDGE = 4, +}; + struct macvlan_port { struct net_device *dev; struct hlist_head vlan_hash[MACVLAN_HASH_SIZE]; @@ -59,6 +66,7 @@ struct macvlan_dev { struct macvlan_port *port; struct net_device *lowerdev; struct macvlan_rx_stats *rx_stats; + enum macvlan_mode mode; }; @@ -134,11 +142,14 @@ static inline void macvlan_count_rx(const struct macvlan_dev *vlan, } static int macvlan_broadcast_one(struct sk_buff *skb, struct net_device *dev, - const struct ethhdr *eth) + const struct ethhdr *eth, bool local) { if (!skb) return NET_RX_DROP; + if (local) + return dev_forward_skb(dev, skb); + skb->dev = dev; if (!compare_ether_addr_64bits(eth->h_dest, dev->broadcast)) @@ -150,7 +161,9 @@ static int macvlan_broadcast_one(struct sk_buff *skb, struct net_device *dev, } static void macvlan_broadcast(struct sk_buff *skb, - const struct macvlan_port *port) + const struct macvlan_port *port, + struct net_device *src, + enum macvlan_mode mode) { const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_dev *vlan; @@ -164,8 +177,12 @@ static void macvlan_broadcast(struct sk_buff *skb, for (i = 0; i < MACVLAN_HASH_SIZE; i++) { hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) { + if (vlan->dev == src || !(vlan->mode & mode)) + continue; + nskb = skb_clone(skb, GFP_ATOMIC); - err = macvlan_broadcast_one(nskb, vlan->dev, eth); + err = macvlan_broadcast_one(nskb, vlan->dev, eth, + mode == MACVLAN_MODE_BRIDGE); macvlan_count_rx(vlan, skb->len + ETH_HLEN, err == NET_RX_SUCCESS, 1); } @@ -178,6 +195,7 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) const struct ethhdr *eth = eth_hdr(skb); const struct macvlan_port *port; const struct macvlan_dev *vlan; + const struct macvlan_dev *src; struct net_device *dev; unsigned int len; @@ -186,7 +204,25 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) return skb; if (is_multicast_ether_addr(eth->h_dest)) { - macvlan_broadcast(skb, port); + src = macvlan_hash_lookup(port, eth->h_source); + if (!src) + /* frame comes from an external address */ + macvlan_broadcast(skb, port, NULL, + MACVLAN_MODE_PRIVATE | + MACVLAN_MODE_VEPA | + MACVLAN_MODE_BRIDGE); + else if (src->mode == MACVLAN_MODE_VEPA) + /* flood to everyone except source */ + macvlan_broadcast(skb, port, src->dev, + MACVLAN_MODE_VEPA | + MACVLAN_MODE_BRIDGE); + else if (src->mode == MACVLAN_MODE_BRIDGE) + /* + * flood only to VEPA ports, bridge ports + * already saw the frame on the way out. + */ + macvlan_broadcast(skb, port, src->dev, + MACVLAN_MODE_VEPA); return skb; } @@ -212,18 +248,46 @@ static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb) return NULL; } +static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) +{ + const struct macvlan_dev *vlan = netdev_priv(dev); + const struct macvlan_port *port = vlan->port; + const struct macvlan_dev *dest; + + if (vlan->mode == MACVLAN_MODE_BRIDGE) { + const struct ethhdr *eth = (void *)skb->data; + + /* send to other bridge ports directly */ + if (is_multicast_ether_addr(eth->h_dest)) { + macvlan_broadcast(skb, port, dev, MACVLAN_MODE_BRIDGE); + goto xmit_world; + } + + dest = macvlan_hash_lookup(port, eth->h_dest); + if (dest && dest->mode == MACVLAN_MODE_BRIDGE) { + unsigned int length = skb->len + ETH_HLEN; + int ret = dev_forward_skb(dest->dev, skb); + macvlan_count_rx(dest, length, + ret == NET_RX_SUCCESS, 0); + + return NET_XMIT_SUCCESS; + } + } + +xmit_world: + skb->dev = vlan->lowerdev; + return dev_queue_xmit(skb); +} + static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb, struct net_device *dev) { int i = skb_get_queue_mapping(skb); struct netdev_queue *txq = netdev_get_tx_queue(dev, i); - const struct macvlan_dev *vlan = netdev_priv(dev); unsigned int len = skb->len; int ret; - skb->dev = vlan->lowerdev; - ret = dev_queue_xmit(skb); - + ret = macvlan_queue_xmit(skb, dev); if (likely(ret == NET_XMIT_SUCCESS)) { txq->tx_packets++; txq->tx_bytes += len; -- 1.6.3.3
Arnd Bergmann
2009-Nov-26 16:07 UTC
[Bridge] [PATCHv3 4/4] macvlan: export macvlan mode through netlink
In order to support all three modes of macvlan at runtime, extend the existing netlink protocol to allow choosing the mode per macvlan slave interface. This depends on a matching patch to iproute2 in order to become accessible in user land. Signed-off-by: Arnd Bergmann <arnd at arndb.de> --- drivers/net/macvlan.c | 56 +++++++++++++++++++++++++++++++++++++++++----- include/linux/if_link.h | 15 ++++++++++++ 2 files changed, 65 insertions(+), 6 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index d6bd843..322112c 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -33,12 +33,6 @@ #define MACVLAN_HASH_SIZE (1 << BITS_PER_BYTE) -enum macvlan_mode { - MACVLAN_MODE_PRIVATE = 1, - MACVLAN_MODE_VEPA = 2, - MACVLAN_MODE_BRIDGE = 4, -}; - struct macvlan_port { struct net_device *dev; struct hlist_head vlan_hash[MACVLAN_HASH_SIZE]; @@ -614,6 +608,17 @@ static int macvlan_validate(struct nlattr *tb[], struct nlattr *data[]) if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) return -EADDRNOTAVAIL; } + + if (data && data[IFLA_MACVLAN_MODE]) { + switch (nla_get_u32(data[IFLA_MACVLAN_MODE])) { + case MACVLAN_MODE_PRIVATE: + case MACVLAN_MODE_VEPA: + case MACVLAN_MODE_BRIDGE: + break; + default: + return -EINVAL; + } + } return 0; } @@ -678,6 +683,10 @@ static int macvlan_newlink(struct net *src_net, struct net_device *dev, vlan->dev = dev; vlan->port = port; + vlan->mode = MACVLAN_MODE_VEPA; + if (data && data[IFLA_MACVLAN_MODE]) + vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]); + err = register_netdevice(dev); if (err < 0) return err; @@ -699,6 +708,36 @@ static void macvlan_dellink(struct net_device *dev, struct list_head *head) macvlan_port_destroy(port->dev); } +static int macvlan_changelink(struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + if (data && data[IFLA_MACVLAN_MODE]) + vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]); + return 0; +} + +static size_t macvlan_get_size(const struct net_device *dev) +{ + return nla_total_size(4); +} + +static int macvlan_fill_info(struct sk_buff *skb, + const struct net_device *dev) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + + NLA_PUT_U32(skb, IFLA_MACVLAN_MODE, vlan->mode); + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + +static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = { + [IFLA_MACVLAN_MODE] = { .type = NLA_U32 }, +}; + static struct rtnl_link_ops macvlan_link_ops __read_mostly = { .kind = "macvlan", .priv_size = sizeof(struct macvlan_dev), @@ -707,6 +746,11 @@ static struct rtnl_link_ops macvlan_link_ops __read_mostly = { .validate = macvlan_validate, .newlink = macvlan_newlink, .dellink = macvlan_dellink, + .maxtype = IFLA_MACVLAN_MAX, + .policy = macvlan_policy, + .changelink = macvlan_changelink, + .get_size = macvlan_get_size, + .fill_info = macvlan_fill_info, }; static int macvlan_device_event(struct notifier_block *unused, diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 1d3b242..6674791 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -181,4 +181,19 @@ struct ifla_vlan_qos_mapping { __u32 to; }; +/* MACVLAN section */ +enum { + IFLA_MACVLAN_UNSPEC, + IFLA_MACVLAN_MODE, + __IFLA_MACVLAN_MAX, +}; + +#define IFLA_MACVLAN_MAX (__IFLA_MACVLAN_MAX - 1) + +enum macvlan_mode { + MACVLAN_MODE_PRIVATE = 1, /* don't talk to other macvlans */ + MACVLAN_MODE_VEPA = 2, /* talk to other ports through ext bridge */ + MACVLAN_MODE_BRIDGE = 4, /* talk to bridge ports directly */ +}; + #endif /* _LINUX_IF_LINK_H */ -- 1.6.3.3
Patrick McHardy
2009-Nov-26 16:26 UTC
[Bridge] [PATCHv3 0/4] macvlan: add vepa and bridge mode
Arnd Bergmann wrote:> Version 2 description: > The patch to iproute2 has not changed, so I'm not including > it this time. Patch 4/4 (the netlink interface) is basically > unchanged as well but included for completeness. > > The other changes have moved forward a bit, to the point where > I find them a lot cleaner and am more confident in the code > being ready for inclusion. The implementation hardly resembles > Erics original patch now, so I've dropped his signed-off-by. > > Please take a look and ack if you are happy so we can get it > into 2.6.33.Looks good to me, nice work. Acked-by: Patrick McHardy <kaber at trash.net> for the entire series.
Apparently Analagous Threads
- [Bridge] [PATCHv2 0/4] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv2 0/4] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv2 0/4] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv3 0/4] macvlan: add vepa and bridge mode
- [Bridge] [PATCHv3 0/4] macvlan: add vepa and bridge mode