Jiri Pirko
2009-Mar-13 18:33 UTC
[Bridge] [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge
Hi all. This is only a draft of patch to consult. I'm aware that it should be divided into multiple patches. I want to know opinion from you folks. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices. Except for mode balance-alb. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. Therefore I introduce another function pointer in struct net_device_ops - ndo_check_mac_address. This function when it's implemented should check passed mac address against the one set in device. I'm using this in bonding driver when the bond is in mode balance-alb to walk thru all slaves and checking if any of them equals passed address. Then in bridge function br_handle_frame_finish() I'm using ndo_check_mac_address to recognize the destination mac address as local. Please look at this and tell me what you think about it. Thanks Jirka Signed-off-by: Jiri Pirko <jpirko at redhat.com> drivers/net/bonding/bond_alb.c | 17 +++++++++++++++++ drivers/net/bonding/bond_alb.h | 1 + drivers/net/bonding/bond_main.c | 11 +++++++++++ include/linux/netdevice.h | 7 +++++++ net/bridge/br_input.c | 5 ++++- 5 files changed, 40 insertions(+), 1 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..b7bcee0 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,23 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +int bond_alb_check_mac_address(struct net_device *bond_dev, void *addr) +{ + struct bonding *bond = netdev_priv(bond_dev); + struct slave *slave = NULL; + int ret = !0; + int i; + + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) { + ret = compare_ether_addr(slave->perm_hwaddr, addr); + if (!ret) + break; + } + read_unlock(&bond->lock); + return ret; +} + void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id) { if (bond->alb_info.current_alb_vlan && diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h index 50968f8..5e39bda 100644 --- a/drivers/net/bonding/bond_alb.h +++ b/drivers/net/bonding/bond_alb.h @@ -127,6 +127,7 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); void bond_alb_monitor(struct work_struct *); int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr); +int bond_alb_check_mac_address(struct net_device *bond_dev, void *addr); void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id); #endif /* __BOND_ALB_H__ */ diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e0578fe..fbff338 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4279,6 +4279,16 @@ unwind: return res; } +static int bond_check_mac_address(struct net_device *bond_dev, void *addr) +{ + struct bonding *bond = netdev_priv(bond_dev); + + if (bond->params.mode == BOND_MODE_ALB) + return bond_alb_check_mac_address(bond_dev, addr); + + return compare_ether_addr(bond_dev->dev_addr, addr); +} + static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev); @@ -4576,6 +4586,7 @@ static const struct net_device_ops bond_netdev_ops = { .ndo_set_multicast_list = bond_set_multicast_list, .ndo_change_mtu = bond_change_mtu, .ndo_set_mac_address = bond_set_mac_address, + .ndo_check_mac_address = bond_check_mac_address, .ndo_neigh_setup = bond_neigh_setup, .ndo_vlan_rx_register = bond_vlan_rx_register, .ndo_vlan_rx_add_vid = bond_vlan_rx_add_vid, diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6593667..e75f691 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -491,6 +491,10 @@ struct netdev_queue { * needs to be changed. If not this interface is not defined, the * mac address can not be changed. * + * int (*ndo_check_mac_address)(struct net_device *dev, void *addr); + * This function is called when the given Media Access Control address + * needs to compared to the one set to the device. + * * int (*ndo_validate_addr)(struct net_device *dev); * Test if Media Access Control address is valid for the device. * @@ -554,6 +558,9 @@ struct net_device_ops { #define HAVE_SET_MAC_ADDR int (*ndo_set_mac_address)(struct net_device *dev, void *addr); +#define HAVE_CHECK_MAC_ADDR + int (*ndo_check_mac_address)(struct net_device *dev, + void *addr); #define HAVE_VALIDATE_ADDR int (*ndo_validate_addr)(struct net_device *dev); #define HAVE_PRIVATE_IOCTL diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 30b8877..b071169 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -39,6 +39,7 @@ int br_handle_frame_finish(struct sk_buff *skb) { const unsigned char *dest = eth_hdr(skb)->h_dest; struct net_bridge_port *p = rcu_dereference(skb->dev->br_port); + struct net_device *dev = p->dev; struct net_bridge *br; struct net_bridge_fdb_entry *dst; struct sk_buff *skb2; @@ -64,7 +65,9 @@ int br_handle_frame_finish(struct sk_buff *skb) if (is_multicast_ether_addr(dest)) { br->dev->stats.multicast++; skb2 = skb; - } else if ((dst = __br_fdb_get(br, dest)) && dst->is_local) { + } else if (((dst = __br_fdb_get(br, dest)) && dst->is_local) || + (dev->netdev_ops->ndo_check_mac_address && + !dev->netdev_ops->ndo_check_mac_address(dev, (unsigned char *) dest))) { skb2 = skb; /* Do not forward the packet since it's local. */ skb = NULL;
Stephen Hemminger
2009-Mar-14 05:39 UTC
[Bridge] [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge
On Fri, 13 Mar 2009 19:33:04 +0100 Jiri Pirko <jpirko at redhat.com> wrote:> Hi all. > > This is only a draft of patch to consult. I'm aware that it should be divided > into multiple patches. I want to know opinion from you folks. > > The problem is described in following bugzilla: > https://bugzilla.redhat.com/show_bug.cgi?id=487763 > > Basically here's what's going on. In every mode, bonding interface uses the same > mac address for all enslaved devices. Except for mode balance-alb. When you put > this kind of bond device into a bridge it will only add one of mac adresses into > a hash list of mac addresses, say X. This mac address is marked as local. But > this bonding interface also has mac address Y. Now then packet arrives with > destination address Y, this address is not marked as local and the packed looks > like it needs to be forwarded. This packet is then lost which is wrong. > > Notice that interfaces can be added and removed from bond while it is in bridge. > Therefore I introduce another function pointer in struct net_device_ops - > ndo_check_mac_address. This function when it's implemented should check passed > mac address against the one set in device. I'm using this in bonding driver when > the bond is in mode balance-alb to walk thru all slaves and checking if any of > them equals passed address. > > Then in bridge function br_handle_frame_finish() I'm using ndo_check_mac_address > to recognize the destination mac address as local. > > Please look at this and tell me what you think about it. > > Thanks > > Jirka >A better and more general way to do this have the dev_set_mac_address function check the return of the notifier and unwind. Then any protocol can easily prevent address from changing.
Jiri Pirko
2009-Mar-25 13:04 UTC
[Bridge] [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge -try2
(resend) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices. Except for mode balance-alb. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patch solves the situation in the bonding without touching bridge code, as Patrick suggested. For every incoming frame to bonding it searches the destination address in slaves list and if any of slave addresses matches, it rewrites the address in frame by the adress of bonding master. This ensures that all frames comming thru the bonding in alb mode have the same address. Jirka Signed-off-by: Jiri Pirko <jpirko at redhat.com> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..2838be0 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,26 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +void bond_alb_change_dest(struct sk_buff *skb) +{ + struct net_device *bond_dev = skb->dev; + struct bonding *bond = netdev_priv(bond_dev); + unsigned char *dest = eth_hdr(skb)->h_dest; + struct slave *slave; + int i; + + if (!memcmp(dest, bond_dev->dev_addr, ETH_ALEN)) + return; + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) { + if (!memcmp(slave->dev->dev_addr, dest, ETH_ALEN)) { + memcpy(dest, bond_dev->dev_addr, ETH_ALEN); + break; + } + } + read_unlock(&bond->lock); +} + void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id) { if (bond->alb_info.current_alb_vlan && diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h index 50968f8..77f36fb 100644 --- a/drivers/net/bonding/bond_alb.h +++ b/drivers/net/bonding/bond_alb.h @@ -127,6 +127,7 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); void bond_alb_monitor(struct work_struct *); int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr); +void bond_alb_change_dest(struct sk_buff *skb); void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id); #endif /* __BOND_ALB_H__ */ diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3d76686..b62fdc4 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4294,6 +4294,19 @@ unwind: return res; } +/* + * Called via bond_change_dest_hook. + * note: already called with rcu_read_lock (preempt_disabled) + */ +void bond_change_dest(struct sk_buff *skb) +{ + struct net_device *bond_dev = skb->dev; + struct bonding *bond = netdev_priv(bond_dev); + + if (bond->params.mode == BOND_MODE_ALB) + bond_alb_change_dest(skb); +} + static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev); @@ -5243,6 +5256,8 @@ static int __init bonding_init(void) register_inetaddr_notifier(&bond_inetaddr_notifier); bond_register_ipv6_notifier(); + bond_change_dest_hook = bond_change_dest; + goto out; err: list_for_each_entry(bond, &bond_dev_list, bond_list) { @@ -5266,6 +5281,8 @@ static void __exit bonding_exit(void) unregister_inetaddr_notifier(&bond_inetaddr_notifier); bond_unregister_ipv6_notifier(); + bond_change_dest_hook = NULL; + bond_destroy_sysfs(); rtnl_lock(); diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index ca849d2..df92b70 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -375,5 +375,7 @@ static inline void bond_unregister_ipv6_notifier(void) } #endif +extern void (*bond_change_dest_hook)(struct sk_buff *skb); + #endif /* _LINUX_BONDING_H */ diff --git a/net/core/dev.c b/net/core/dev.c index e3fe5c7..abe68d9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2061,6 +2061,13 @@ static inline int deliver_skb(struct sk_buff *skb, return pt_prev->func(skb, skb->dev, pt_prev, orig_dev); } +#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) +void (*bond_change_dest_hook)(struct sk_buff *skb) __read_mostly; +EXPORT_SYMBOL(bond_change_dest_hook); +#else +#define bond_change_dest_hook(skb) do {} while (0) +#endif + #if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE) /* These hooks defined here for ATM */ struct net_bridge; @@ -2251,10 +2258,12 @@ int netif_receive_skb(struct sk_buff *skb) null_or_orig = NULL; orig_dev = skb->dev; if (orig_dev->master) { - if (skb_bond_should_drop(skb)) + if (skb_bond_should_drop(skb)) { null_or_orig = orig_dev; /* deliver only exact match */ - else + } else { skb->dev = orig_dev->master; + bond_change_dest_hook(skb); + } } __get_cpu_var(netdev_rx_stat).total++;
Jiri Pirko
2009-Mar-25 15:19 UTC
[Bridge] [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge -try3
(resend, using compare_ether_addr_64bits instead of memcmp) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices. Except for mode balance-alb. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patch solves the situation in the bonding without touching bridge code, as Patrick suggested. For every incoming frame to bonding it searches the destination address in slaves list and if any of slave addresses matches, it rewrites the address in frame by the adress of bonding master. This ensures that all frames comming thru the bonding in alb mode have the same address. Jirka Signed-off-by: Jiri Pirko <jpirko at redhat.com> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..83998f4 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,26 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +void bond_alb_change_dest(struct sk_buff *skb) +{ + struct net_device *bond_dev = skb->dev; + struct bonding *bond = netdev_priv(bond_dev); + unsigned char *dest = eth_hdr(skb)->h_dest; + struct slave *slave; + int i; + + if (!compare_ether_addr_64bits(dest, bond_dev->dev_addr)) + return; + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) { + if (!compare_ether_addr_64bits(slave->dev->dev_addr, dest)) { + memcpy(dest, bond_dev->dev_addr, ETH_ALEN); + break; + } + } + read_unlock(&bond->lock); +} + void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id) { if (bond->alb_info.current_alb_vlan && diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h index 50968f8..77f36fb 100644 --- a/drivers/net/bonding/bond_alb.h +++ b/drivers/net/bonding/bond_alb.h @@ -127,6 +127,7 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); void bond_alb_monitor(struct work_struct *); int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr); +void bond_alb_change_dest(struct sk_buff *skb); void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id); #endif /* __BOND_ALB_H__ */ diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3d76686..b62fdc4 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4294,6 +4294,19 @@ unwind: return res; } +/* + * Called via bond_change_dest_hook. + * note: already called with rcu_read_lock (preempt_disabled) + */ +void bond_change_dest(struct sk_buff *skb) +{ + struct net_device *bond_dev = skb->dev; + struct bonding *bond = netdev_priv(bond_dev); + + if (bond->params.mode == BOND_MODE_ALB) + bond_alb_change_dest(skb); +} + static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev); @@ -5243,6 +5256,8 @@ static int __init bonding_init(void) register_inetaddr_notifier(&bond_inetaddr_notifier); bond_register_ipv6_notifier(); + bond_change_dest_hook = bond_change_dest; + goto out; err: list_for_each_entry(bond, &bond_dev_list, bond_list) { @@ -5266,6 +5281,8 @@ static void __exit bonding_exit(void) unregister_inetaddr_notifier(&bond_inetaddr_notifier); bond_unregister_ipv6_notifier(); + bond_change_dest_hook = NULL; + bond_destroy_sysfs(); rtnl_lock(); diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index ca849d2..df92b70 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -375,5 +375,7 @@ static inline void bond_unregister_ipv6_notifier(void) } #endif +extern void (*bond_change_dest_hook)(struct sk_buff *skb); + #endif /* _LINUX_BONDING_H */ diff --git a/net/core/dev.c b/net/core/dev.c index e3fe5c7..abe68d9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2061,6 +2061,13 @@ static inline int deliver_skb(struct sk_buff *skb, return pt_prev->func(skb, skb->dev, pt_prev, orig_dev); } +#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) +void (*bond_change_dest_hook)(struct sk_buff *skb) __read_mostly; +EXPORT_SYMBOL(bond_change_dest_hook); +#else +#define bond_change_dest_hook(skb) do {} while (0) +#endif + #if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE) /* These hooks defined here for ATM */ struct net_bridge; @@ -2251,10 +2258,12 @@ int netif_receive_skb(struct sk_buff *skb) null_or_orig = NULL; orig_dev = skb->dev; if (orig_dev->master) { - if (skb_bond_should_drop(skb)) + if (skb_bond_should_drop(skb)) { null_or_orig = orig_dev; /* deliver only exact match */ - else + } else { skb->dev = orig_dev->master; + bond_change_dest_hook(skb); + } } __get_cpu_var(netdev_rx_stat).total++;
Jiri Pirko
2009-Mar-26 15:52 UTC
[Bridge] [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge -try4
(resend, updated changelog, hook moved into skb_bond_should_drop, skb_bond_should_drop ifdefed) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patch solves the situation in the bonding without touching bridge code, as Patrick suggested. For every incoming frame to bonding it searches the destination address in slaves list and if any of slave addresses matches, it rewrites the address in frame by the adress of bonding master. This ensures that all frames comming thru the bonding in alb mode have the same address. Jirka Signed-off-by: Jiri Pirko <jpirko at redhat.com> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..b973ede 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,25 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +void bond_alb_change_dest(struct sk_buff *skb, struct net_device *bond_dev) +{ + struct bonding *bond = netdev_priv(bond_dev); + unsigned char *dest = eth_hdr(skb)->h_dest; + struct slave *slave; + int i; + + if (!compare_ether_addr_64bits(dest, bond_dev->dev_addr)) + return; + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) { + if (!compare_ether_addr_64bits(slave->dev->dev_addr, dest)) { + memcpy(dest, bond_dev->dev_addr, ETH_ALEN); + break; + } + } + read_unlock(&bond->lock); +} + void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id) { if (bond->alb_info.current_alb_vlan && diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h index 50968f8..4924dd7 100644 --- a/drivers/net/bonding/bond_alb.h +++ b/drivers/net/bonding/bond_alb.h @@ -127,6 +127,7 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); void bond_alb_monitor(struct work_struct *); int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr); +void bond_alb_change_dest(struct sk_buff *skb, struct net_device *bond_dev); void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id); #endif /* __BOND_ALB_H__ */ diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3d76686..7c7cb81 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4294,6 +4294,18 @@ unwind: return res; } +/* + * Called via bond_change_dest_hook. + * note: already called with rcu_read_lock (preempt_disabled) + */ +void bond_change_dest(struct sk_buff *skb, struct net_device *bond_dev) +{ + struct bonding *bond = netdev_priv(bond_dev); + + if (bond->params.mode == BOND_MODE_ALB) + bond_alb_change_dest(skb, bond_dev); +} + static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev); @@ -5243,6 +5255,8 @@ static int __init bonding_init(void) register_inetaddr_notifier(&bond_inetaddr_notifier); bond_register_ipv6_notifier(); + bond_change_dest_hook = bond_change_dest; + goto out; err: list_for_each_entry(bond, &bond_dev_list, bond_list) { @@ -5266,6 +5280,8 @@ static void __exit bonding_exit(void) unregister_inetaddr_notifier(&bond_inetaddr_notifier); bond_unregister_ipv6_notifier(); + bond_change_dest_hook = NULL; + bond_destroy_sysfs(); rtnl_lock(); diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index ca849d2..7159483 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -375,5 +375,8 @@ static inline void bond_unregister_ipv6_notifier(void) } #endif +extern void (*bond_change_dest_hook)(struct sk_buff *skb, + struct net_device *master); + #endif /* _LINUX_BONDING_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6593667..7af6857 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1860,6 +1860,10 @@ static inline void netif_set_gso_max_size(struct net_device *dev, dev->gso_max_size = size; } +#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) +extern void (*bond_change_dest_hook)(struct sk_buff *skb, + struct net_device *master); + /* On bonding slaves other than the currently active slave, suppress * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and * ARP on active-backup slaves with arp_validate enabled. @@ -1876,22 +1880,31 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) if (dev->priv_flags & IFF_SLAVE_INACTIVE) { if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && skb->protocol == __constant_htons(ETH_P_ARP)) - return 0; + goto dont_drop; if (master->priv_flags & IFF_MASTER_ALB) { if (skb->pkt_type != PACKET_BROADCAST && skb->pkt_type != PACKET_MULTICAST) - return 0; + goto dont_drop; } if (master->priv_flags & IFF_MASTER_8023AD && skb->protocol == __constant_htons(ETH_P_SLOW)) - return 0; + goto dont_drop; return 1; } +dont_drop: + bond_change_dest_hook(skb, master); } + + return 0; +} +#else +static inline int skb_bond_should_drop(struct sk_buff *skb) +{ return 0; } +#endif extern struct pernet_operations __net_initdata loopback_net_ops; #endif /* __KERNEL__ */ diff --git a/net/core/dev.c b/net/core/dev.c index e3fe5c7..d9b758b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2061,6 +2061,12 @@ static inline int deliver_skb(struct sk_buff *skb, return pt_prev->func(skb, skb->dev, pt_prev, orig_dev); } +#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) +void (*bond_change_dest_hook)(struct sk_buff *skb, + struct net_device *master) __read_mostly; +EXPORT_SYMBOL(bond_change_dest_hook); +#endif + #if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE) /* These hooks defined here for ATM */ struct net_bridge;
Jiri Pirko
2009-Apr-13 08:37 UTC
[Bridge] [PATCH 0/4] bonding: allow bond in mode balance-alb to work properly in bridge -try5
(resend, updated changelog, completely reworked) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patchset solves this issue in the best way it can be possibly solved. By adding all mac addresses of all slave devices to the bridge hash list. To carry these addresses the new list has to be introduced in struct net_device. Jirka
Jiri Pirko
2009-Apr-13 08:38 UTC
[Bridge] [PATCH 1/4] net: introduce dev_mac_address_changed
Introducing function dev_mac_address_changed which can be called from driver which changed his mac address to force notifiers to be called. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- include/linux/netdevice.h | 1 + net/core/dev.c | 12 ++++++++++++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e7783f..ff8db51 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1461,6 +1461,7 @@ extern int dev_change_net_namespace(struct net_device *, extern int dev_set_mtu(struct net_device *, int); extern int dev_set_mac_address(struct net_device *, struct sockaddr *); +extern void dev_mac_address_changed(struct net_device *); extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq); diff --git a/net/core/dev.c b/net/core/dev.c index 91d792d..1adc89b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3833,6 +3833,18 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa) return err; } +/** + * dev_mac_address_changed - Notify Media Access Control Address changed + * @dev: device + * + * Notifies the change of the hardware (MAC) address of the device + */ +void dev_mac_address_changed(struct net_device *dev) +{ + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); +} +EXPORT_SYMBOL(dev_mac_address_changed); + /* * Perform the SIOCxIFxxx calls, inside read_lock(dev_base_lock) */ -- 1.6.0.6
Jiri Pirko
2009-Apr-13 08:42 UTC
[Bridge] [PATCH 2/4] net: introduce a list of device addresses dev_addr_list
This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- include/linux/netdevice.h | 51 +++++++++- net/core/dev.c | 264 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 313 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ff8db51..8cf62f1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -210,6 +210,12 @@ struct dev_addr_list #define dmi_users da_users #define dmi_gusers da_gusers +struct hw_addr { + struct list_head list; + unsigned char addr[MAX_ADDR_LEN]; + int refcount; +}; + struct hh_cache { struct hh_cache *hh_next; /* Next entry */ @@ -776,8 +782,12 @@ struct net_device */ unsigned long last_rx; /* Time of last Rx */ /* Interface address info used in eth_type_trans() */ - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast - because most packets are unicast) */ + unsigned char *dev_addr; /* hw address, (before bcast + because most packets are + unicast) */ + + struct list_head dev_addr_list; /* list of device hw addresses */ + spinlock_t dev_addr_list_lock; unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ @@ -1779,6 +1789,32 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) spin_unlock_bh(&dev->addr_list_lock); } +/* Locking helpers for spinlock guarding dev_addr_list */ + +static inline void netif_dev_addr_lock(struct net_device *dev) +{ + spin_lock(&dev->dev_addr_list_lock); +} + +static inline void netif_dev_addr_lock_bh(struct net_device *dev) +{ + spin_lock_bh(&dev->dev_addr_list_lock); +} + +static inline void netif_dev_addr_unlock(struct net_device *dev) +{ + spin_unlock(&dev->dev_addr_list_lock); +} + +static inline void netif_dev_addr_unlock_bh(struct net_device *dev) +{ + spin_unlock_bh(&dev->dev_addr_list_lock); +} + +/* dev_addr_list walker */ +#define for_each_dev_addr(dev, ha) \ + list_for_each_entry(ha, &dev->dev_addr_list, list) + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ extern void ether_setup(struct net_device *dev); @@ -1791,6 +1827,17 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, alloc_netdev_mq(sizeof_priv, name, setup, 1) extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); + +/* Functions used for device addresses handling */ +extern int dev_addr_add(struct net_device *dev, + unsigned char *addr); +extern int dev_addr_del(struct net_device *dev, + unsigned char *addr); +extern int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev); +extern int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev); + /* Functions used for secondary unicast and multicast support */ extern void dev_set_rx_mode(struct net_device *dev); extern void __dev_set_rx_mode(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 1adc89b..0b154b3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3437,6 +3437,263 @@ void dev_set_rx_mode(struct net_device *dev) netif_addr_unlock_bh(dev); } +/* hw addresses list handling functions */ + +static int __hw_addr_add_ii(struct list_head *list, unsigned char *addr, + int addr_len, int ignore_index) +{ + struct hw_addr *ha; + int i = 0; + + if (addr_len > MAX_ADDR_LEN) + return -EINVAL; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len)) { + ha->refcount++; + return 0; + } + } + + ha = kzalloc(sizeof(*ha), GFP_ATOMIC); + if (!ha) + return -ENOMEM; + memcpy(ha->addr, addr, addr_len); + ha->refcount = 1; + list_add_tail(&ha->list, list); + return 0; +} + +static inline int __hw_addr_add(struct list_head *list, unsigned char *addr, + int addr_len) +{ + return __hw_addr_add_ii(list, addr, addr_len, -1); +} + +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr, + int addr_len, int ignore_index) +{ + struct hw_addr *ha; + int i = 0; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len)) { + if (--ha->refcount) + return 0; + list_del(&ha->list); + kfree(ha); + return 0; + } + } + return -ENOENT; +} + +static inline int __hw_addr_del(struct list_head *list, unsigned char *addr, + int addr_len) +{ + return __hw_addr_del_ii(list, addr, addr_len, -1); +} + +static int __hw_addr_add_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, int ignore_index) +{ + int err; + struct hw_addr *ha, *ha2; + + list_for_each_entry(ha, from_list, list) { + err = __hw_addr_add_ii(to_list, ha->addr, addr_len, 0); + if (err) + goto unroll; + } + return 0; +unroll: + list_for_each_entry(ha2, from_list, list) { + if (ha2 == ha) + break; + __hw_addr_del_ii(to_list, ha2->addr, addr_len, 0); + } + return err; +} + +static inline int __hw_addr_add_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len) +{ + return __hw_addr_add_multiple_ii(to_list, from_list, addr_len, -1); +} + +static void __hw_addr_del_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, int ignore_index) +{ + struct hw_addr *ha; + + list_for_each_entry(ha, from_list, list) { + __hw_addr_del_ii(to_list, ha->addr, addr_len, 0); + } +} + +static inline void __hw_addr_del_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len) +{ + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, -1); +} + +static void __hw_addr_flush(struct list_head *list) +{ + struct hw_addr *ha, *tmp; + + list_for_each_entry_safe(ha, tmp, list, list) { + list_del(&ha->list); + } +} + +/* Device addresses handling functions */ + +static void dev_addr_flush(struct net_device *dev) +{ + netif_dev_addr_lock_bh(dev); + __hw_addr_flush(&dev->dev_addr_list); + dev->dev_addr = NULL; + netif_dev_addr_unlock_bh(dev); +} + +static int dev_addr_init(struct net_device *dev) +{ + unsigned char addr[MAX_ADDR_LEN]; + struct hw_addr *ha; + int err; + + spin_lock_init(&dev->dev_addr_list_lock); + INIT_LIST_HEAD(&dev->dev_addr_list); + memset(addr, 0, sizeof(*addr)); + netif_dev_addr_lock_bh(dev); + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr)); + if (!err) { + /* + * Get the first (previously created) address from the list + * and set dev_addr pointer to this location. + */ + ha = list_first_entry(&dev->dev_addr_list, + struct hw_addr, list); + dev->dev_addr = ha->addr; + } + netif_dev_addr_unlock_bh(dev); + return err; +} + +/** + * dev_addr_add - Add a device address + * @dev: device + * @addr: address to add + * + * Add a device address to the device or increase the reference count if + * it already exists. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add(struct net_device *dev, unsigned char *addr) +{ + int err; + + ASSERT_RTNL(); + + netif_dev_addr_lock_bh(dev); + err = __hw_addr_add_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); + netif_dev_addr_unlock_bh(dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add); + +/** + * dev_addr_del - Release a device address. + * @dev: device + * @addr: address to delete + * + * Release reference to a device address and remove it from the device + * if the reference count drops to zero. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del(struct net_device *dev, unsigned char *addr) +{ + int err; + + ASSERT_RTNL(); + + netif_dev_addr_lock_bh(dev); + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); + netif_dev_addr_unlock_bh(dev); + return err; +} +EXPORT_SYMBOL(dev_addr_del); + +/** + * dev_addr_add_multiple - Add device addresses from another device + * @to_dev: device to which addresses will be added + * @from_dev: device from which addresses will be added + * + * Add device addresses of the one device to another. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev) +{ + int err; + + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + + netif_dev_addr_lock_bh(from_dev); + netif_dev_addr_lock_bh(to_dev); + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, 0); + netif_dev_addr_unlock_bh(to_dev); + netif_dev_addr_unlock_bh(from_dev); + + return err; +} +EXPORT_SYMBOL(dev_addr_add_multiple); + +/** + * dev_addr_del_multiple - Delete device addresses by another device + * @to_dev: device where the addresses will be deleted + * @from_dev: device by which addresses the addresses will be deleted + * + * Deletes addresses in to device by the list of addresses in from device. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev) +{ + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + + netif_dev_addr_lock_bh(from_dev); + netif_dev_addr_lock_bh(to_dev); + __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, 0); + netif_dev_addr_unlock_bh(to_dev); + netif_dev_addr_unlock_bh(from_dev); + + return 0; +} +EXPORT_SYMBOL(dev_addr_del_multiple); + +/* unicast and multicast addresses handling functions */ + int __dev_addr_delete(struct dev_addr_list **list, int *count, void *addr, int alen, int glbl) { @@ -4269,6 +4526,9 @@ static void rollback_registered(struct net_device *dev) */ dev_addr_discard(dev); + /* Flush device addresses */ + dev_addr_flush(dev); + if (dev->netdev_ops->ndo_uninit) dev->netdev_ops->ndo_uninit(dev); @@ -4791,6 +5051,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; + dev_addr_init(dev); netdev_init_queues(dev); INIT_LIST_HEAD(&dev->napi_list); @@ -4977,6 +5238,9 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char */ dev_addr_discard(dev); + /* Flush device addresses */ + dev_addr_flush(dev); + netdev_unregister_kobject(dev); /* Actually switch the network namespace */ -- 1.6.0.6
Jiri Pirko
2009-Apr-13 08:44 UTC
[Bridge] [PATCH 3/4] net: bridge: use device address list instead of dev_addr
This patch changes the handling of mac addresses of bridge port devices. Now it uses previously introduced list of device addresses. It allows the bridge to know more then one local mac address per port which is mandatory for the right work in some cases. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- net/bridge/br_fdb.c | 120 +++++++++++++++++++++++++++++++++-------------- net/bridge/br_if.c | 2 +- net/bridge/br_notify.c | 2 +- net/bridge/br_private.h | 4 +- 4 files changed, 89 insertions(+), 39 deletions(-) diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index a48f5ef..6efc556 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -77,10 +77,45 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f) br_fdb_put(f); } -void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) +/* + * Finds out if passed address is one of the addresses assigned to the device. + * Returns 1 on positive result + */ +static inline int is_dev_addr(struct net_device *dev, unsigned char *addr) +{ + struct hw_addr *ha; + int ret = 1; + + netif_dev_addr_lock_bh(dev); + for_each_dev_addr(dev, ha) { + ret = compare_ether_addr(addr, ha->addr); + if (!ret) + break; + } + netif_dev_addr_unlock_bh(dev); + return !ret ? 1 : 0; +} + +static int another_port_has_addr(const struct net_bridge_port *p, + struct net_bridge_fdb_entry *f) +{ + struct net_bridge *br = p->br; + struct net_bridge_port *op; + + list_for_each_entry(op, &br->port_list, list) { + if (op != p && is_dev_addr(op->dev, f->addr.addr)) { + f->dst = op; + return 1; + } + } + return 0; +} + +void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev) { struct net_bridge *br = p->br; int i; + struct hw_addr *ha; spin_lock_bh(&br->hash_lock); @@ -92,26 +127,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) f = hlist_entry(h, struct net_bridge_fdb_entry, hlist); if (f->dst == p && f->is_local) { - /* maybe another port has same hw addr? */ - struct net_bridge_port *op; - list_for_each_entry(op, &br->port_list, list) { - if (op != p && - !compare_ether_addr(op->dev->dev_addr, - f->addr.addr)) { - f->dst = op; - goto insert; - } - } - - /* delete old one */ - fdb_delete(f); - goto insert; + /* + * maybe another port has same hw addr?, + * if not then delete it + */ + if (!another_port_has_addr(p, f)) + fdb_delete(f); } } } - insert: - /* insert new address, may fail if invalid address or dup. */ - fdb_insert(br, p, newaddr); + + /* insert device addresses, may fail if invalid address. */ + + netif_dev_addr_lock_bh(dev); + for_each_dev_addr(dev, ha) { + fdb_insert(br, p, ha->addr); + } + netif_dev_addr_unlock_bh(dev); spin_unlock_bh(&br->hash_lock); } @@ -189,20 +221,9 @@ void br_fdb_delete_by_port(struct net_bridge *br, * then when one port is deleted, assign * the local entry to other port */ - if (f->is_local) { - struct net_bridge_port *op; - list_for_each_entry(op, &br->port_list, list) { - if (op != p && - !compare_ether_addr(op->dev->dev_addr, - f->addr.addr)) { - f->dst = op; - goto skip_delete; - } - } - } - - fdb_delete(f); - skip_delete: ; + if (!f->is_local || + !another_port_has_addr(p, f)) + fdb_delete(f); } } spin_unlock_bh(&br->hash_lock); @@ -338,7 +359,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head, } static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr) + const unsigned char *addr) { struct hlist_head *head = &br->hash[br_mac_hash(addr)]; struct net_bridge_fdb_entry *fdb; @@ -366,13 +387,42 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, return 0; } +static int fdb_insert_dev(struct net_bridge *br, struct net_bridge_port *source, + struct net_device *dev) +{ + struct hw_addr *ha, *ha2; + struct net_bridge_fdb_entry *fdb; + struct hlist_head *head; + int ret = 0; + + netif_dev_addr_lock_bh(dev); + for_each_dev_addr(dev, ha) { + ret = fdb_insert(br, source, ha->addr); + if (ret) + goto unroll; + } + goto unlock; +unroll: + for_each_dev_addr(dev, ha2) { + if (ha2 == ha) + break; + head = &br->hash[br_mac_hash(ha2->addr)]; + fdb = fdb_find(head, ha2->addr); + if (fdb && fdb->is_local) + fdb_delete(fdb); + } +unlock: + netif_dev_addr_unlock_bh(dev); + return ret; +} + int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr) + struct net_device *dev) { int ret; spin_lock_bh(&br->hash_lock); - ret = fdb_insert(br, source, addr); + ret = fdb_insert_dev(br, source, dev); spin_unlock_bh(&br->hash_lock); return ret; } diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 8a96672..789cb30 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -392,7 +392,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) if (err) goto err0; - err = br_fdb_insert(br, p, dev->dev_addr); + err = br_fdb_insert(br, p, dev); if (err) goto err1; diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c index 763a3ec..1423541 100644 --- a/net/bridge/br_notify.c +++ b/net/bridge/br_notify.c @@ -48,7 +48,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v case NETDEV_CHANGEADDR: spin_lock_bh(&br->lock); - br_fdb_changeaddr(p, dev->dev_addr); + br_fdb_changeaddr(p, dev); br_stp_recalculate_bridge_id(br); spin_unlock_bh(&br->lock); break; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index b6c3b71..65ffe3d 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -148,7 +148,7 @@ extern int br_fdb_init(void); extern void br_fdb_fini(void); extern void br_fdb_flush(struct net_bridge *br); extern void br_fdb_changeaddr(struct net_bridge_port *p, - const unsigned char *newaddr); + struct net_device *dev); extern void br_fdb_cleanup(unsigned long arg); extern void br_fdb_delete_by_port(struct net_bridge *br, const struct net_bridge_port *p, int do_all); @@ -161,7 +161,7 @@ extern int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long count, unsigned long off); extern int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr); + struct net_device *dev); extern void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source, const unsigned char *addr); -- 1.6.0.6
Stephen Hemminger
2009-Apr-13 14:49 UTC
[Bridge] [PATCH 2/4] net: introduce a list of device addresses dev_addr_list
On Mon, 13 Apr 2009 10:42:02 +0200 Jiri Pirko <jpirko at redhat.com> wrote:> This patch introduces a new list in struct net_device and brings a set of > functions to handle the work with device address list. The list is a replacement > for the original dev_addr field and because in some situations there is need to > carry several device addresses with the net device. To be backward compatible, > dev_addr is made to point to the first member of the list so original drivers > sees no difference. > > Signed-off-by: Jiri Pirko <jpirko at redhat.com> > --- > include/linux/netdevice.h | 51 +++++++++- > net/core/dev.c | 264 +++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 313 insertions(+), 2 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index ff8db51..8cf62f1 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -210,6 +210,12 @@ struct dev_addr_list > #define dmi_users da_users > #define dmi_gusers da_gusers > > +struct hw_addr { > + struct list_head list; > + unsigned char addr[MAX_ADDR_LEN]; > + int refcount; > +}; > + > struct hh_cache > { > struct hh_cache *hh_next; /* Next entry */ > @@ -776,8 +782,12 @@ struct net_device > */ > unsigned long last_rx; /* Time of last Rx */ > /* Interface address info used in eth_type_trans() */ > - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast > - because most packets are unicast) */ > + unsigned char *dev_addr; /* hw address, (before bcast > + because most packets are > + unicast) */ > + > + struct list_head dev_addr_list; /* list of device hw addresses */ > + spinlock_t dev_addr_list_lock; > > unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ > > @@ -1779,6 +1789,32 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) > spin_unlock_bh(&dev->addr_list_lock); > } > > +/* Locking helpers for spinlock guarding dev_addr_list */ > + > +static inline void netif_dev_addr_lock(struct net_device *dev) > +{ > + spin_lock(&dev->dev_addr_list_lock); > +} > + > +static inline void netif_dev_addr_lock_bh(struct net_device *dev) > +{ > + spin_lock_bh(&dev->dev_addr_list_lock); > +} > + > +static inline void netif_dev_addr_unlock(struct net_device *dev) > +{ > + spin_unlock(&dev->dev_addr_list_lock); > +} > + > +static inline void netif_dev_addr_unlock_bh(struct net_device *dev) > +{ > + spin_unlock_bh(&dev->dev_addr_list_lock); > +} > +This lock is unnecessary, use RCU list for read. Since all changes are under RTNL mutex, there is no chance for conflict on update.
Stephen Hemminger
2009-Apr-13 14:58 UTC
[Bridge] [PATCH 1/4] net: introduce dev_mac_address_changed
On Mon, 13 Apr 2009 10:38:48 +0200 Jiri Pirko <jpirko at redhat.com> wrote:> Introducing function dev_mac_address_changed which can be called from driver > which changed his mac address to force notifiers to be called. > > Signed-off-by: Jiri Pirko <jpirko at redhat.com> > --- > include/linux/netdevice.h | 1 + > net/core/dev.c | 12 ++++++++++++ > 2 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 2e7783f..ff8db51 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1461,6 +1461,7 @@ extern int dev_change_net_namespace(struct net_device *, > extern int dev_set_mtu(struct net_device *, int); > extern int dev_set_mac_address(struct net_device *, > struct sockaddr *); > +extern void dev_mac_address_changed(struct net_device *); > extern int dev_hard_start_xmit(struct sk_buff *skb, > struct net_device *dev, > struct netdev_queue *txq); > diff --git a/net/core/dev.c b/net/core/dev.c > index 91d792d..1adc89b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3833,6 +3833,18 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa) > return err; > } > > +/** > + * dev_mac_address_changed - Notify Media Access Control Address changed > + * @dev: device > + * > + * Notifies the change of the hardware (MAC) address of the device > + */ > +void dev_mac_address_changed(struct net_device *dev) > +{ > + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); > +} > +EXPORT_SYMBOL(dev_mac_address_changed); > + > /* > * Perform the SIOCxIFxxx calls, inside read_lock(dev_base_lock) > */The original version of this that I send, allowed notifiers to return an error to block changing address (error would go back to application). This is how other notifier hooks work (mtu, etc). Why is dev_set_mac_address_changed called out separately, it should be inside dev_set_mac_address.
David Miller
2009-Apr-13 22:54 UTC
[Bridge] [PATCH 3/4] net: bridge: use device address list instead of dev_addr
From: Jiri Pirko <jpirko at redhat.com> Date: Mon, 13 Apr 2009 10:44:08 +0200> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c > index a48f5ef..6efc556 100644 > --- a/net/bridge/br_fdb.c > +++ b/net/bridge/br_fdb.c...> +static inline int is_dev_addr(struct net_device *dev, unsigned char *addr) > +{Please drop the inline, let the compiler work it out.
David Miller
2009-Apr-13 22:54 UTC
[Bridge] [PATCH 2/4] net: introduce a list of device addresses dev_addr_list
From: Stephen Hemminger <shemminger at linux-foundation.org> Date: Mon, 13 Apr 2009 07:49:17 -0700> This lock is unnecessary, use RCU list for read. > Since all changes are under RTNL mutex, there is no chance > for conflict on update.Agreed.
Jiri Pirko
2009-Apr-15 08:17 UTC
[Bridge] [PATCH 0/3] bonding: allow bond in mode balance-alb to work properly in bridge -try6
(resend, rcu list locking, cometics) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patchset solves this issue in the best way it can be possibly solved. By adding all mac addresses of all slave devices to the bridge hash list. To carry these addresses the new list has to be introduced in struct net_device. Jirka
Jiri Pirko
2009-Apr-15 08:21 UTC
[Bridge] [PATCH 2/3] net: bridge: use device address list instead of dev_addr
This patch changes the handling of mac addresses of bridge port devices. Now it uses previously introduced list of device addresses. It allows the bridge to know more then one local mac address per port which is mandatory for the right work in some cases. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- net/bridge/br_fdb.c | 101 ++++++++++++++++++++++++++++++---------------- net/bridge/br_if.c | 2 +- net/bridge/br_notify.c | 2 +- net/bridge/br_private.h | 4 +- 4 files changed, 70 insertions(+), 39 deletions(-) diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index a48f5ef..1e63f76 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -77,10 +77,26 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f) br_fdb_put(f); } -void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) +static bool another_port_has_addr(const struct net_bridge_port *p, + struct net_bridge_fdb_entry *f) +{ + struct net_bridge *br = p->br; + struct net_bridge_port *op; + + list_for_each_entry(op, &br->port_list, list) { + if (op != p && is_etherdev_addr(op->dev, f->addr.addr)) { + f->dst = op; + return 1; + } + } + return 0; +} + +void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev) { struct net_bridge *br = p->br; int i; + struct netdev_hw_addr *ha; spin_lock_bh(&br->hash_lock); @@ -92,26 +108,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) f = hlist_entry(h, struct net_bridge_fdb_entry, hlist); if (f->dst == p && f->is_local) { - /* maybe another port has same hw addr? */ - struct net_bridge_port *op; - list_for_each_entry(op, &br->port_list, list) { - if (op != p && - !compare_ether_addr(op->dev->dev_addr, - f->addr.addr)) { - f->dst = op; - goto insert; - } - } - - /* delete old one */ - fdb_delete(f); - goto insert; + /* + * maybe another port has same hw addr?, + * if not then delete it + */ + if (!another_port_has_addr(p, f)) + fdb_delete(f); } } } - insert: - /* insert new address, may fail if invalid address or dup. */ - fdb_insert(br, p, newaddr); + + /* insert device addresses, may fail if invalid address. */ + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + fdb_insert(br, p, ha->addr); + } + rcu_read_unlock(); spin_unlock_bh(&br->hash_lock); } @@ -189,20 +202,9 @@ void br_fdb_delete_by_port(struct net_bridge *br, * then when one port is deleted, assign * the local entry to other port */ - if (f->is_local) { - struct net_bridge_port *op; - list_for_each_entry(op, &br->port_list, list) { - if (op != p && - !compare_ether_addr(op->dev->dev_addr, - f->addr.addr)) { - f->dst = op; - goto skip_delete; - } - } - } - - fdb_delete(f); - skip_delete: ; + if (!f->is_local || + !another_port_has_addr(p, f)) + fdb_delete(f); } } spin_unlock_bh(&br->hash_lock); @@ -338,7 +340,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head, } static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr) + const unsigned char *addr) { struct hlist_head *head = &br->hash[br_mac_hash(addr)]; struct net_bridge_fdb_entry *fdb; @@ -366,13 +368,42 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, return 0; } +static int fdb_insert_dev(struct net_bridge *br, struct net_bridge_port *source, + struct net_device *dev) +{ + struct netdev_hw_addr *ha, *ha2; + struct net_bridge_fdb_entry *fdb; + struct hlist_head *head; + int ret = 0; + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + ret = fdb_insert(br, source, ha->addr); + if (ret) + goto unroll; + } + goto unlock; +unroll: + for_each_dev_addr(dev, ha2) { + if (ha2 == ha) + break; + head = &br->hash[br_mac_hash(ha2->addr)]; + fdb = fdb_find(head, ha2->addr); + if (fdb && fdb->is_local) + fdb_delete(fdb); + } +unlock: + rcu_read_unlock(); + return ret; +} + int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr) + struct net_device *dev) { int ret; spin_lock_bh(&br->hash_lock); - ret = fdb_insert(br, source, addr); + ret = fdb_insert_dev(br, source, dev); spin_unlock_bh(&br->hash_lock); return ret; } diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 8a96672..789cb30 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -392,7 +392,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) if (err) goto err0; - err = br_fdb_insert(br, p, dev->dev_addr); + err = br_fdb_insert(br, p, dev); if (err) goto err1; diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c index 763a3ec..1423541 100644 --- a/net/bridge/br_notify.c +++ b/net/bridge/br_notify.c @@ -48,7 +48,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v case NETDEV_CHANGEADDR: spin_lock_bh(&br->lock); - br_fdb_changeaddr(p, dev->dev_addr); + br_fdb_changeaddr(p, dev); br_stp_recalculate_bridge_id(br); spin_unlock_bh(&br->lock); break; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index b6c3b71..65ffe3d 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -148,7 +148,7 @@ extern int br_fdb_init(void); extern void br_fdb_fini(void); extern void br_fdb_flush(struct net_bridge *br); extern void br_fdb_changeaddr(struct net_bridge_port *p, - const unsigned char *newaddr); + struct net_device *dev); extern void br_fdb_cleanup(unsigned long arg); extern void br_fdb_delete_by_port(struct net_bridge *br, const struct net_bridge_port *p, int do_all); @@ -161,7 +161,7 @@ extern int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long count, unsigned long off); extern int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr); + struct net_device *dev); extern void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source, const unsigned char *addr); -- 1.6.0.6
Jiri Pirko
2009-Apr-15 08:29 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
Wed, Apr 15, 2009 at 10:26:04AM CEST, lizf at cn.fujitsu.com wrote:>> +static int __hw_addr_add_ii(struct list_head *list, unsigned char *addr, >> + int addr_len, int ignore_index) >> +{ >> + struct netdev_hw_addr *ha; >> + int i = 0; >> + >> + if (addr_len > MAX_ADDR_LEN) >> + return -EINVAL; >> + >> + rcu_read_lock(); >> + list_for_each_entry_rcu(ha, list, list) { >> + if (i++ != ignore_index && >> + !memcmp(ha->addr, addr, addr_len)) { >> + ha->refcount++; >> + return 0; > >missing rcu_read_unlock() ? >Sure! Thanks...>> + } >> + } >> + rcu_read_unlock();
Patrick McHardy
2009-Apr-15 10:13 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
David Miller wrote:> From: Eric Dumazet <dada1 at cosmosbay.com> > Date: Wed, 15 Apr 2009 11:27:50 +0200 > >> Since you obviously need a write lock here to be sure following >> can be done by one cpu only. >> >> You have same problem all over this patch. > > RTNL semaphore is held across all modification operations.If this will also be used for multicast lists, changes can happen (IPv6) without the RTNL.
David Miller
2009-Apr-15 10:15 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
From: Patrick McHardy <kaber at trash.net> Date: Wed, 15 Apr 2009 12:13:57 +0200> David Miller wrote: >> From: Eric Dumazet <dada1 at cosmosbay.com> >> Date: Wed, 15 Apr 2009 11:27:50 +0200 >> >>> Since you obviously need a write lock here to be sure following >>> can be done by one cpu only. >>> >>> You have same problem all over this patch. >> RTNL semaphore is held across all modification operations. > > If this will also be used for multicast lists, changes can happen > (IPv6) without the RTNL.Indeed, that is true :-/
Patrick McHardy
2009-Apr-15 10:41 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
David Miller wrote:> From: Patrick McHardy <kaber at trash.net> > Date: Wed, 15 Apr 2009 12:13:57 +0200 > >> David Miller wrote: >>> From: Eric Dumazet <dada1 at cosmosbay.com> >>> Date: Wed, 15 Apr 2009 11:27:50 +0200 >>> >>>> Since you obviously need a write lock here to be sure following >>>> can be done by one cpu only. >>>> >>>> You have same problem all over this patch. >>> RTNL semaphore is held across all modification operations. >> If this will also be used for multicast lists, changes can happen >> (IPv6) without the RTNL. > > Indeed, that is true :-/Herbert (I think) suggested to make address list updates in softirq context a two-step process, where addresses would first be added to a temporary list and the final change would be done in process context while holding the RTNL. Given the complicated mess we currently have, this would be a very worthwhile change IMO.
David Miller
2009-Apr-15 10:45 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
From: Patrick McHardy <kaber at trash.net> Date: Wed, 15 Apr 2009 12:41:01 +0200> Herbert (I think) suggested to make address list updates in softirq > context a two-step process, where addresses would first be added to > a temporary list and the final change would be done in process context > while holding the RTNL. > > Given the complicated mess we currently have, this would be a very > worthwhile change IMO.This would break the IPV6 TAHI tests if you think we could use such an idea for that. When IPV6 packets arrive that influence multicast and unicast address lists, the effect must be essentially immediate. Such that a subsequent packet will cause the kernel the behave with the necessary side effects, no matter how quickly that next packet arrives.
Patrick McHardy
2009-Apr-15 10:47 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
David Miller wrote:> From: Patrick McHardy <kaber at trash.net> > Date: Wed, 15 Apr 2009 12:41:01 +0200 > >> Herbert (I think) suggested to make address list updates in softirq >> context a two-step process, where addresses would first be added to >> a temporary list and the final change would be done in process context >> while holding the RTNL. >> >> Given the complicated mess we currently have, this would be a very >> worthwhile change IMO. > > This would break the IPV6 TAHI tests if you think we could use > such an idea for that. > > When IPV6 packets arrive that influence multicast and unicast > address lists, the effect must be essentially immediate. Such > that a subsequent packet will cause the kernel the behave > with the necessary side effects, no matter how quickly that > next packet arrives.I see, thanks for the explanation.
Eric Dumazet
2009-Apr-15 18:54 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list (v2)
Jiri Pirko a ?crit :> changes against last patch version: > -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush > -removed unnecessary rcu_read locking in dev_addr_init > -use compare_ether_addr_64bits instead of compare_ether_addr > -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr > -use call_rcu instead of rcu_synchronize > -moved is_etherdev_addr into __KERNEL__ ifdef > > This patch introduces a new list in struct net_device and brings a set of > functions to handle the work with device address list. The list is a replacement > for the original dev_addr field and because in some situations there is need to > carry several device addresses with the net device. To be backward compatible, > dev_addr is made to point to the first member of the list so original drivers > sees no difference. > > Signed-off-by: Jiri Pirko <jpirko at redhat.com> > --- > include/linux/etherdevice.h | 27 +++++ > include/linux/netdevice.h | 32 +++++- > net/core/dev.c | 271 +++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 328 insertions(+), 2 deletions(-) > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h > index a1f17ab..3d7a668 100644 > --- a/include/linux/etherdevice.h > +++ b/include/linux/etherdevice.h > @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], > return compare_ether_addr(addr1, addr2); > #endif > } > + > +/** > + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. > + * @dev: Pointer to a device structure > + * @addr: Pointer to a six-byte array containing the Ethernet address > + * > + * Compare passed address with all addresses of the device. Return true if the > + * address if one of the device addresses. > + * > + * Note that this function calls compare_ether_addr_64bits() so take care of > + * the right padding. > + */ > +static inline bool is_etherdev_addr(const struct net_device *dev, > + const u8 addr[6 + 2]) > +{ > + struct netdev_hw_addr *ha; > + int res = 1; > + > + rcu_read_lock(); > + for_each_dev_addr(dev, ha) { > + res = compare_ether_addr_64bits(addr, ha->addr); > + if (!res) > + break; > + } > + rcu_read_unlock(); > + return !res; > +} > #endif /* __KERNEL__ */ > > /** > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 2e7783f..89ad6d2 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -210,6 +210,13 @@ struct dev_addr_list > #define dmi_users da_users > #define dmi_gusers da_gusers > > +struct netdev_hw_addr { > + struct list_head list; > + unsigned char addr[MAX_ADDR_LEN]; > + int refcount; > + struct rcu_head rcu_head; > +}; > + > struct hh_cache > { > struct hh_cache *hh_next; /* Next entry */ > @@ -776,8 +783,11 @@ struct net_device > */ > unsigned long last_rx; /* Time of last Rx */ > /* Interface address info used in eth_type_trans() */ > - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast > - because most packets are unicast) */ > + unsigned char *dev_addr; /* hw address, (before bcast > + because most packets are > + unicast) */ > + > + struct list_head dev_addr_list; /* list of device hw addresses */ > > unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ > > @@ -1778,6 +1788,13 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) > spin_unlock_bh(&dev->addr_list_lock); > } > > +/* > + * dev_addr_list walker. Should be used only for read access. Call with > + * rcu_read_lock held. > + */ > +#define for_each_dev_addr(dev, ha) \ > + list_for_each_entry_rcu(ha, &dev->dev_addr_list, list) > + > /* These functions live elsewhere (drivers/net/net_init.c, but related) */ > > extern void ether_setup(struct net_device *dev); > @@ -1790,6 +1807,17 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, > alloc_netdev_mq(sizeof_priv, name, setup, 1) > extern int register_netdev(struct net_device *dev); > extern void unregister_netdev(struct net_device *dev); > + > +/* Functions used for device addresses handling */ > +extern int dev_addr_add(struct net_device *dev, > + unsigned char *addr); > +extern int dev_addr_del(struct net_device *dev, > + unsigned char *addr); > +extern int dev_addr_add_multiple(struct net_device *to_dev, > + struct net_device *from_dev); > +extern int dev_addr_del_multiple(struct net_device *to_dev, > + struct net_device *from_dev); > + > /* Functions used for secondary unicast and multicast support */ > extern void dev_set_rx_mode(struct net_device *dev); > extern void __dev_set_rx_mode(struct net_device *dev); > diff --git a/net/core/dev.c b/net/core/dev.c > index 91d792d..961be4f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3437,6 +3437,270 @@ void dev_set_rx_mode(struct net_device *dev) > netif_addr_unlock_bh(dev); > } > > +/* hw addresses list handling functions */ > + > +static int __hw_addr_add_ii(struct list_head *list, unsigned char *addr, > + int addr_len, int ignore_index) > +{ > + struct netdev_hw_addr *ha; > + int i = 0; > + > + if (addr_len > MAX_ADDR_LEN) > + return -EINVAL; > +Please put here the ASSERT_RTNL(), not in various callers, since this is the place where we really assume rtnl lock is locked by us. You still use rcu_read_lock()/unlock() and rcu variant here... But caller of this function has RTNL (or other lock) so dont use rcu here, as it seems inconsistent with kzalloc() code that comes next.> + rcu_read_lock(); > + list_for_each_entry_rcu(ha, list, list) { > + if (i++ != ignore_index && > + !memcmp(ha->addr, addr, addr_len)) { > + ha->refcount++; > + rcu_read_unlock(); > + return 0; > + } > + } > + rcu_read_unlock(); > + > + ha = kzalloc(max(sizeof(*ha), L1_CACHE_BYTES), GFP_ATOMIC); > + if (!ha) > + return -ENOMEM; > + memcpy(ha->addr, addr, addr_len); > + ha->refcount = 1; > + list_add_tail_rcu(&ha->list, list); > + return 0; > +} > + > +static int __hw_addr_add(struct list_head *list, unsigned char *addr, > + int addr_len) > +{ > + return __hw_addr_add_ii(list, addr, addr_len, -1); > +} > + > +static void ha_rcu_free(struct rcu_head *head) > +{ > + struct netdev_hw_addr *ha; > + > + ha = container_of(head, struct netdev_hw_addr, rcu_head); > + kfree(ha); > +} > + > +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr, > + int addr_len, int ignore_index) > +{ > + struct netdev_hw_addr *ha; > + int i = 0; > +ASSERT_RTNL() here, not in callers.> + list_for_each_entry(ha, list, list) { > + if (i++ != ignore_index && > + !memcmp(ha->addr, addr, addr_len)) { > + if (--ha->refcount) > + return 0; > + list_del_rcu(&ha->list); > + call_rcu(&ha->rcu_head, ha_rcu_free); > + return 0; > + } > + } > + return -ENOENT; > +} > + > +static int __hw_addr_del(struct list_head *list, unsigned char *addr, > + int addr_len) > +{ > + return __hw_addr_del_ii(list, addr, addr_len, -1); > +} > + > +static int __hw_addr_add_multiple_ii(struct list_head *to_list, > + struct list_head *from_list, > + int addr_len, int ignore_index) > +{ > + int err = 0; > + struct netdev_hw_addr *ha, *ha2; > +same here, no need for rcu_read_lock(), since you are going to change list, you have RTNL lock or equivalent.> + rcu_read_lock(); > + list_for_each_entry_rcu(ha, from_list, list) { > + err = __hw_addr_add_ii(to_list, ha->addr, addr_len, 0); > + if (err) > + goto unroll; > + } > + goto unlock; > +unroll: > + list_for_each_entry_rcu(ha2, from_list, list) { > + if (ha2 == ha) > + break; > + __hw_addr_del_ii(to_list, ha2->addr, addr_len, 0); > + } > +unlock: > + rcu_read_unlock(); > + return err; > +} > + > +static int __hw_addr_add_multiple(struct list_head *to_list, > + struct list_head *from_list, > + int addr_len) > +{ > + return __hw_addr_add_multiple_ii(to_list, from_list, addr_len, -1); > +} > + > +static void __hw_addr_del_multiple_ii(struct list_head *to_list, > + struct list_head *from_list, > + int addr_len, int ignore_index) > +{ > + struct netdev_hw_addr *ha; > +same here, no rcu_read_lock() needed...> + rcu_read_lock(); > + list_for_each_entry_rcu(ha, from_list, list) { > + __hw_addr_del_ii(to_list, ha->addr, addr_len, 0); > + } > + rcu_read_unlock(); > +} > + > +static void __hw_addr_del_multiple(struct list_head *to_list, > + struct list_head *from_list, > + int addr_len) > +{ > + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, -1); > +} > + > +static void __hw_addr_flush(struct list_head *list) > +{ > + struct netdev_hw_addr *ha, *tmp; > +ASSERT_RTNL();> + list_for_each_entry_safe(ha, tmp, list, list) { > + list_del_rcu(&ha->list); > + call_rcu(&ha->rcu_head, ha_rcu_free); > + } > +} > + > +/* Device addresses handling functions */ > + > +static void dev_addr_flush(struct net_device *dev) > +{ > + ASSERT_RTNL(); > + > + __hw_addr_flush(&dev->dev_addr_list); > + dev->dev_addr = NULL;seems risky here to set this to NULL... You could use a static var to avoid further NULL dereference. static char nulladdr[MAX_ADDR_LEN]; dev->dev_addr = nulladdr;> +} > + > +static int dev_addr_init(struct net_device *dev) > +{ > + unsigned char addr[MAX_ADDR_LEN]; > + struct netdev_hw_addr *ha; > + int err; > + > + ASSERT_RTNL(); > + > + INIT_LIST_HEAD(&dev->dev_addr_list); > + memset(addr, 0, sizeof(*addr)); > + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr)); > + if (!err) { > + /* > + * Get the first (previously created) address from the list > + * and set dev_addr pointer to this location. > + */ > + ha = list_first_entry(&dev->dev_addr_list, > + struct netdev_hw_addr, list); > + dev->dev_addr = ha->addr; > + } > + return err; > +} > + > +/** > + * dev_addr_add - Add a device address > + * @dev: device > + * @addr: address to add > + * > + * Add a device address to the device or increase the reference count if > + * it already exists. > + * > + * The caller must hold the rtnl_mutex. > + */ > +int dev_addr_add(struct net_device *dev, unsigned char *addr) > +{ > + int err; > + > + ASSERT_RTNL(); > + > + err = __hw_addr_add_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); > + if (!err) > + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); > + return err; > +} > +EXPORT_SYMBOL(dev_addr_add); > + > +/** > + * dev_addr_del - Release a device address. > + * @dev: device > + * @addr: address to delete > + * > + * Release reference to a device address and remove it from the device > + * if the reference count drops to zero. > + * > + * The caller must hold the rtnl_mutex. > + */ > +int dev_addr_del(struct net_device *dev, unsigned char *addr) > +{ > + int err; > + > + ASSERT_RTNL(); > + > + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); > + if (!err) > + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); > + return err; > +} > +EXPORT_SYMBOL(dev_addr_del); > + > +/** > + * dev_addr_add_multiple - Add device addresses from another device > + * @to_dev: device to which addresses will be added > + * @from_dev: device from which addresses will be added > + * > + * Add device addresses of the one device to another. > + * > + * The caller must hold the rtnl_mutex. > + */ > +int dev_addr_add_multiple(struct net_device *to_dev, > + struct net_device *from_dev) > +{ > + int err; > + > + ASSERT_RTNL(); > + > + if (from_dev->addr_len != to_dev->addr_len) > + return -EINVAL; > + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, > + &from_dev->dev_addr_list, > + to_dev->addr_len, 0); > + if (!err) > + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); > + return err; > +} > +EXPORT_SYMBOL(dev_addr_add_multiple); > + > +/** > + * dev_addr_del_multiple - Delete device addresses by another device > + * @to_dev: device where the addresses will be deleted > + * @from_dev: device by which addresses the addresses will be deleted > + * > + * Deletes addresses in to device by the list of addresses in from device. > + * > + * The caller must hold the rtnl_mutex. > + */ > +int dev_addr_del_multiple(struct net_device *to_dev, > + struct net_device *from_dev) > +{ > + ASSERT_RTNL(); > + > + if (from_dev->addr_len != to_dev->addr_len) > + return -EINVAL; > + __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, > + &from_dev->dev_addr_list, > + to_dev->addr_len, 0); > + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); > + return 0; > +} > +EXPORT_SYMBOL(dev_addr_del_multiple); > + > +/* unicast and multicast addresses handling functions */ > + > int __dev_addr_delete(struct dev_addr_list **list, int *count, > void *addr, int alen, int glbl) > { > @@ -4257,6 +4521,9 @@ static void rollback_registered(struct net_device *dev) > */ > dev_addr_discard(dev); > > + /* Flush device addresses */ > + dev_addr_flush(dev); > +Are you sure that no driver in tree will dereference dev->dev_addr after this point ?> if (dev->netdev_ops->ndo_uninit) > dev->netdev_ops->ndo_uninit(dev); > > @@ -4779,6 +5046,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, > > dev->gso_max_size = GSO_MAX_SIZE; > > + dev_addr_init(dev); > netdev_init_queues(dev); > > INIT_LIST_HEAD(&dev->napi_list); > @@ -4965,6 +5233,9 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char > */ > dev_addr_discard(dev); > > + /* Flush device addresses */ > + dev_addr_flush(dev); > + > netdev_unregister_kobject(dev); > > /* Actually switch the network namespace */
Jiri Pirko
2009-Apr-17 11:57 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list (v3)
v2 -> v3 (current): -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 32 +++++- net/core/dev.c | 261 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 318 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the device. Return true if the + * address if one of the device addresses. + * + * Note that this function calls compare_ether_addr_64bits() so take care of + * the right padding. + */ +static inline bool is_etherdev_addr(const struct net_device *dev, + const u8 addr[6 + 2]) +{ + struct netdev_hw_addr *ha; + int res = 1; + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + res = compare_ether_addr_64bits(addr, ha->addr); + if (!res) + break; + } + rcu_read_unlock(); + return !res; +} #endif /* __KERNEL__ */ /** diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e7783f..89ad6d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -210,6 +210,13 @@ struct dev_addr_list #define dmi_users da_users #define dmi_gusers da_gusers +struct netdev_hw_addr { + struct list_head list; + unsigned char addr[MAX_ADDR_LEN]; + int refcount; + struct rcu_head rcu_head; +}; + struct hh_cache { struct hh_cache *hh_next; /* Next entry */ @@ -776,8 +783,11 @@ struct net_device */ unsigned long last_rx; /* Time of last Rx */ /* Interface address info used in eth_type_trans() */ - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast - because most packets are unicast) */ + unsigned char *dev_addr; /* hw address, (before bcast + because most packets are + unicast) */ + + struct list_head dev_addr_list; /* list of device hw addresses */ unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ @@ -1778,6 +1788,13 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) spin_unlock_bh(&dev->addr_list_lock); } +/* + * dev_addr_list walker. Should be used only for read access. Call with + * rcu_read_lock held. + */ +#define for_each_dev_addr(dev, ha) \ + list_for_each_entry_rcu(ha, &dev->dev_addr_list, list) + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ extern void ether_setup(struct net_device *dev); @@ -1790,6 +1807,17 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, alloc_netdev_mq(sizeof_priv, name, setup, 1) extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); + +/* Functions used for device addresses handling */ +extern int dev_addr_add(struct net_device *dev, + unsigned char *addr); +extern int dev_addr_del(struct net_device *dev, + unsigned char *addr); +extern int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev); +extern int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev); + /* Functions used for secondary unicast and multicast support */ extern void dev_set_rx_mode(struct net_device *dev); extern void __dev_set_rx_mode(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 343883f..b4503ac 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3438,6 +3438,263 @@ void dev_set_rx_mode(struct net_device *dev) netif_addr_unlock_bh(dev); } +/* hw addresses list handling functions */ + +static int __hw_addr_add_ii(struct list_head *list, unsigned char *addr, + int addr_len, int ignore_index) +{ + struct netdev_hw_addr *ha; + int i = 0; + + if (addr_len > MAX_ADDR_LEN) + return -EINVAL; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len)) { + ha->refcount++; + return 0; + } + } + + ha = kzalloc(max(sizeof(*ha), L1_CACHE_BYTES), GFP_ATOMIC); + if (!ha) + return -ENOMEM; + memcpy(ha->addr, addr, addr_len); + ha->refcount = 1; + list_add_tail_rcu(&ha->list, list); + return 0; +} + +static int __hw_addr_add(struct list_head *list, unsigned char *addr, + int addr_len) +{ + return __hw_addr_add_ii(list, addr, addr_len, -1); +} + +static void ha_rcu_free(struct rcu_head *head) +{ + struct netdev_hw_addr *ha; + + ha = container_of(head, struct netdev_hw_addr, rcu_head); + kfree(ha); +} + +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr, + int addr_len, int ignore_index) +{ + struct netdev_hw_addr *ha; + int i = 0; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len)) { + if (--ha->refcount) + return 0; + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + return 0; + } + } + return -ENOENT; +} + +static int __hw_addr_del(struct list_head *list, unsigned char *addr, + int addr_len) +{ + return __hw_addr_del_ii(list, addr, addr_len, -1); +} + +static int __hw_addr_add_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, int ignore_index) +{ + int err; + struct netdev_hw_addr *ha, *ha2; + + list_for_each_entry(ha, from_list, list) { + err = __hw_addr_add_ii(to_list, ha->addr, addr_len, 0); + if (err) + goto unroll; + } + return 0; + +unroll: + list_for_each_entry(ha2, from_list, list) { + if (ha2 == ha) + break; + __hw_addr_del_ii(to_list, ha2->addr, addr_len, 0); + } + return err; +} + +static int __hw_addr_add_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len) +{ + return __hw_addr_add_multiple_ii(to_list, from_list, addr_len, -1); +} + +static void __hw_addr_del_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, int ignore_index) +{ + struct netdev_hw_addr *ha; + + list_for_each_entry(ha, from_list, list) { + __hw_addr_del_ii(to_list, ha->addr, addr_len, 0); + } +} + +static void __hw_addr_del_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len) +{ + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, -1); +} + +static void __hw_addr_flush(struct list_head *list) +{ + struct netdev_hw_addr *ha, *tmp; + + list_for_each_entry_safe(ha, tmp, list, list) { + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + } +} + +/* Device addresses handling functions */ + +static void dev_addr_flush(struct net_device *dev) +{ + ASSERT_RTNL(); + + __hw_addr_flush(&dev->dev_addr_list); + dev->dev_addr = NULL; +} + +static int dev_addr_init(struct net_device *dev) +{ + unsigned char addr[MAX_ADDR_LEN]; + struct netdev_hw_addr *ha; + int err; + + ASSERT_RTNL(); + + INIT_LIST_HEAD(&dev->dev_addr_list); + memset(addr, 0, sizeof(*addr)); + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr)); + if (!err) { + /* + * Get the first (previously created) address from the list + * and set dev_addr pointer to this location. + */ + ha = list_first_entry(&dev->dev_addr_list, + struct netdev_hw_addr, list); + dev->dev_addr = ha->addr; + } + return err; +} + +/** + * dev_addr_add - Add a device address + * @dev: device + * @addr: address to add + * + * Add a device address to the device or increase the reference count if + * it already exists. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add(struct net_device *dev, unsigned char *addr) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_add_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add); + +/** + * dev_addr_del - Release a device address. + * @dev: device + * @addr: address to delete + * + * Release reference to a device address and remove it from the device + * if the reference count drops to zero. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del(struct net_device *dev, unsigned char *addr) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_del); + +/** + * dev_addr_add_multiple - Add device addresses from another device + * @to_dev: device to which addresses will be added + * @from_dev: device from which addresses will be added + * + * Add device addresses of the one device to another. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev) +{ + int err; + + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add_multiple); + +/** + * dev_addr_del_multiple - Delete device addresses by another device + * @to_dev: device where the addresses will be deleted + * @from_dev: device by which addresses the addresses will be deleted + * + * Deletes addresses in to device by the list of addresses in from device. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev) +{ + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, 0); + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return 0; +} +EXPORT_SYMBOL(dev_addr_del_multiple); + +/* unicast and multicast addresses handling functions */ + int __dev_addr_delete(struct dev_addr_list **list, int *count, void *addr, int alen, int glbl) { @@ -4780,6 +5037,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; + dev_addr_init(dev); netdev_init_queues(dev); INIT_LIST_HEAD(&dev->napi_list); @@ -4805,6 +5063,9 @@ void free_netdev(struct net_device *dev) kfree(dev->_tx); + /* Flush device addresses */ + dev_addr_flush(dev); + list_for_each_entry_safe(p, n, &dev->napi_list, dev_list) netif_napi_del(p); -- 1.6.0.6
Jiri Pirko
2009-Apr-18 08:58 UTC
[Bridge] [PATCH 1/3] net: introduce a list of device addresses dev_addr_list (v4)
v3 -> v4 (current): -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 32 +++++- net/core/dev.c | 261 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 318 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the device. Return true if the + * address if one of the device addresses. + * + * Note that this function calls compare_ether_addr_64bits() so take care of + * the right padding. + */ +static inline bool is_etherdev_addr(const struct net_device *dev, + const u8 addr[6 + 2]) +{ + struct netdev_hw_addr *ha; + int res = 1; + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + res = compare_ether_addr_64bits(addr, ha->addr); + if (!res) + break; + } + rcu_read_unlock(); + return !res; +} #endif /* __KERNEL__ */ /** diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e7783f..89ad6d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -210,6 +210,13 @@ struct dev_addr_list #define dmi_users da_users #define dmi_gusers da_gusers +struct netdev_hw_addr { + struct list_head list; + unsigned char addr[MAX_ADDR_LEN]; + int refcount; + struct rcu_head rcu_head; +}; + struct hh_cache { struct hh_cache *hh_next; /* Next entry */ @@ -776,8 +783,11 @@ struct net_device */ unsigned long last_rx; /* Time of last Rx */ /* Interface address info used in eth_type_trans() */ - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast - because most packets are unicast) */ + unsigned char *dev_addr; /* hw address, (before bcast + because most packets are + unicast) */ + + struct list_head dev_addr_list; /* list of device hw addresses */ unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ @@ -1778,6 +1788,13 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) spin_unlock_bh(&dev->addr_list_lock); } +/* + * dev_addr_list walker. Should be used only for read access. Call with + * rcu_read_lock held. + */ +#define for_each_dev_addr(dev, ha) \ + list_for_each_entry_rcu(ha, &dev->dev_addr_list, list) + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ extern void ether_setup(struct net_device *dev); @@ -1790,6 +1807,17 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, alloc_netdev_mq(sizeof_priv, name, setup, 1) extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); + +/* Functions used for device addresses handling */ +extern int dev_addr_add(struct net_device *dev, + unsigned char *addr); +extern int dev_addr_del(struct net_device *dev, + unsigned char *addr); +extern int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev); +extern int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev); + /* Functions used for secondary unicast and multicast support */ extern void dev_set_rx_mode(struct net_device *dev); extern void __dev_set_rx_mode(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 343883f..2274294 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3438,6 +3438,263 @@ void dev_set_rx_mode(struct net_device *dev) netif_addr_unlock_bh(dev); } +/* hw addresses list handling functions */ + +static int __hw_addr_add_ii(struct list_head *list, unsigned char *addr, + int addr_len, int ignore_index) +{ + struct netdev_hw_addr *ha; + int i = 0; + + if (addr_len > MAX_ADDR_LEN) + return -EINVAL; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len)) { + ha->refcount++; + return 0; + } + } + + ha = kmalloc(max(sizeof(*ha), L1_CACHE_BYTES), GFP_ATOMIC); + if (!ha) + return -ENOMEM; + memcpy(ha->addr, addr, addr_len); + ha->refcount = 1; + list_add_tail_rcu(&ha->list, list); + return 0; +} + +static int __hw_addr_add(struct list_head *list, unsigned char *addr, + int addr_len) +{ + return __hw_addr_add_ii(list, addr, addr_len, -1); +} + +static void ha_rcu_free(struct rcu_head *head) +{ + struct netdev_hw_addr *ha; + + ha = container_of(head, struct netdev_hw_addr, rcu_head); + kfree(ha); +} + +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr, + int addr_len, int ignore_index) +{ + struct netdev_hw_addr *ha; + int i = 0; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len)) { + if (--ha->refcount) + return 0; + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + return 0; + } + } + return -ENOENT; +} + +static int __hw_addr_del(struct list_head *list, unsigned char *addr, + int addr_len) +{ + return __hw_addr_del_ii(list, addr, addr_len, -1); +} + +static int __hw_addr_add_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, int ignore_index) +{ + int err; + struct netdev_hw_addr *ha, *ha2; + + list_for_each_entry(ha, from_list, list) { + err = __hw_addr_add_ii(to_list, ha->addr, addr_len, 0); + if (err) + goto unroll; + } + return 0; + +unroll: + list_for_each_entry(ha2, from_list, list) { + if (ha2 == ha) + break; + __hw_addr_del_ii(to_list, ha2->addr, addr_len, 0); + } + return err; +} + +static int __hw_addr_add_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len) +{ + return __hw_addr_add_multiple_ii(to_list, from_list, addr_len, -1); +} + +static void __hw_addr_del_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, int ignore_index) +{ + struct netdev_hw_addr *ha; + + list_for_each_entry(ha, from_list, list) { + __hw_addr_del_ii(to_list, ha->addr, addr_len, 0); + } +} + +static void __hw_addr_del_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len) +{ + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, -1); +} + +static void __hw_addr_flush(struct list_head *list) +{ + struct netdev_hw_addr *ha, *tmp; + + list_for_each_entry_safe(ha, tmp, list, list) { + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + } +} + +/* Device addresses handling functions */ + +static void dev_addr_flush(struct net_device *dev) +{ + /* rtnl_mutex must be held here */ + + __hw_addr_flush(&dev->dev_addr_list); + dev->dev_addr = NULL; +} + +static int dev_addr_init(struct net_device *dev) +{ + unsigned char addr[MAX_ADDR_LEN]; + struct netdev_hw_addr *ha; + int err; + + /* rtnl_mutex must be held here */ + + INIT_LIST_HEAD(&dev->dev_addr_list); + memset(addr, 0, sizeof(*addr)); + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr)); + if (!err) { + /* + * Get the first (previously created) address from the list + * and set dev_addr pointer to this location. + */ + ha = list_first_entry(&dev->dev_addr_list, + struct netdev_hw_addr, list); + dev->dev_addr = ha->addr; + } + return err; +} + +/** + * dev_addr_add - Add a device address + * @dev: device + * @addr: address to add + * + * Add a device address to the device or increase the reference count if + * it already exists. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add(struct net_device *dev, unsigned char *addr) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_add_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add); + +/** + * dev_addr_del - Release a device address. + * @dev: device + * @addr: address to delete + * + * Release reference to a device address and remove it from the device + * if the reference count drops to zero. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del(struct net_device *dev, unsigned char *addr) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_del); + +/** + * dev_addr_add_multiple - Add device addresses from another device + * @to_dev: device to which addresses will be added + * @from_dev: device from which addresses will be added + * + * Add device addresses of the one device to another. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev) +{ + int err; + + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add_multiple); + +/** + * dev_addr_del_multiple - Delete device addresses by another device + * @to_dev: device where the addresses will be deleted + * @from_dev: device by which addresses the addresses will be deleted + * + * Deletes addresses in to device by the list of addresses in from device. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev) +{ + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, 0); + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return 0; +} +EXPORT_SYMBOL(dev_addr_del_multiple); + +/* unicast and multicast addresses handling functions */ + int __dev_addr_delete(struct dev_addr_list **list, int *count, void *addr, int alen, int glbl) { @@ -4780,6 +5037,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; + dev_addr_init(dev); netdev_init_queues(dev); INIT_LIST_HEAD(&dev->napi_list); @@ -4805,6 +5063,9 @@ void free_netdev(struct net_device *dev) kfree(dev->_tx); + /* Flush device addresses */ + dev_addr_flush(dev); + list_for_each_entry_safe(p, n, &dev->napi_list, dev_list) netif_napi_del(p); -- 1.6.0.6
Jiri Pirko
2009-May-04 11:14 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v5)
v4 -> v5 (current): -added device address type (suggested by davem) -removed refcounting (better to have simplier code then safe potentially few bytes) v3 -> v4: -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 37 ++++++- net/core/dev.c | 271 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 333 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the device. Return true if the + * address if one of the device addresses. + * + * Note that this function calls compare_ether_addr_64bits() so take care of + * the right padding. + */ +static inline bool is_etherdev_addr(const struct net_device *dev, + const u8 addr[6 + 2]) +{ + struct netdev_hw_addr *ha; + int res = 1; + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + res = compare_ether_addr_64bits(addr, ha->addr); + if (!res) + break; + } + rcu_read_unlock(); + return !res; +} #endif /* __KERNEL__ */ /** diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5a96a1a..a95befc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -210,6 +210,16 @@ struct dev_addr_list #define dmi_users da_users #define dmi_gusers da_gusers +struct netdev_hw_addr { + struct list_head list; + unsigned char addr[MAX_ADDR_LEN]; + unsigned char type; +#define NETDEV_HW_ADDR_T_LAN 1 +#define NETDEV_HW_ADDR_T_SAN 2 +#define NETDEV_HW_ADDR_T_SLAVE 3 + struct rcu_head rcu_head; +}; + struct hh_cache { struct hh_cache *hh_next; /* Next entry */ @@ -776,8 +786,11 @@ struct net_device */ unsigned long last_rx; /* Time of last Rx */ /* Interface address info used in eth_type_trans() */ - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast - because most packets are unicast) */ + unsigned char *dev_addr; /* hw address, (before bcast + because most packets are + unicast) */ + + struct list_head dev_addr_list; /* list of device hw addresses */ unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ @@ -1778,6 +1791,13 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) spin_unlock_bh(&dev->addr_list_lock); } +/* + * dev_addr_list walker. Should be used only for read access. Call with + * rcu_read_lock held. + */ +#define for_each_dev_addr(dev, ha) \ + list_for_each_entry_rcu(ha, &dev->dev_addr_list, list) + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ extern void ether_setup(struct net_device *dev); @@ -1790,6 +1810,19 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, alloc_netdev_mq(sizeof_priv, name, setup, 1) extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); + +/* Functions used for device addresses handling */ +extern int dev_addr_add(struct net_device *dev, unsigned char *addr, + unsigned char addr_type); +extern int dev_addr_del(struct net_device *dev, unsigned char *addr, + unsigned char addr_type); +extern int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type); +extern int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type); + /* Functions used for secondary unicast and multicast support */ extern void dev_set_rx_mode(struct net_device *dev); extern void __dev_set_rx_mode(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 308a7d0..d5770f9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3443,6 +3443,273 @@ void dev_set_rx_mode(struct net_device *dev) netif_addr_unlock_bh(dev); } +/* hw addresses list handling functions */ + +static int __hw_addr_add(struct list_head *list, unsigned char *addr, + int addr_len, unsigned char addr_type) +{ + struct netdev_hw_addr *ha; + int alloc_size; + + if (addr_len > MAX_ADDR_LEN) + return -EINVAL; + + alloc_size = sizeof(*ha); + if (alloc_size < L1_CACHE_BYTES) + alloc_size = L1_CACHE_BYTES; + ha = kmalloc(alloc_size, GFP_ATOMIC); + if (!ha) + return -ENOMEM; + memcpy(ha->addr, addr, addr_len); + ha->type = addr_type; + list_add_tail_rcu(&ha->list, list); + return 0; +} + +static void ha_rcu_free(struct rcu_head *head) +{ + struct netdev_hw_addr *ha; + + ha = container_of(head, struct netdev_hw_addr, rcu_head); + kfree(ha); +} + +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr, + int addr_len, unsigned char addr_type, + int ignore_index) +{ + struct netdev_hw_addr *ha; + int i = 0; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len) && + (ha->type == addr_type || !addr_type)) { + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + return 0; + } + } + return -ENOENT; +} + +static int __hw_addr_del(struct list_head *list, unsigned char *addr, + int addr_len, unsigned char addr_type) +{ + return __hw_addr_del_ii(list, addr, addr_len, addr_type, -1); +} + +static int __hw_addr_add_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, unsigned char addr_type, + int ignore_index) +{ + int err; + struct netdev_hw_addr *ha, *ha2; + unsigned char type; + + list_for_each_entry(ha, from_list, list) { + type = addr_type ? addr_type : ha->type; + err = __hw_addr_add(to_list, ha->addr, addr_len, type); + if (err) + goto unroll; + } + return 0; + +unroll: + list_for_each_entry(ha2, from_list, list) { + if (ha2 == ha) + break; + type = addr_type ? addr_type : ha2->type; + __hw_addr_del_ii(to_list, ha2->addr, addr_len, type, + ignore_index); + } + return err; +} + +static int __hw_addr_add_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len, unsigned char addr_type) +{ + return __hw_addr_add_multiple_ii(to_list, from_list, addr_len, + addr_type, -1); +} + +static void __hw_addr_del_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, unsigned char addr_type, + int ignore_index) +{ + struct netdev_hw_addr *ha; + unsigned char type; + + list_for_each_entry(ha, from_list, list) { + type = addr_type ? addr_type : ha->type; + __hw_addr_del_ii(to_list, ha->addr, addr_len, addr_type, + ignore_index); + } +} + +static void __hw_addr_del_multiple(struct list_head *to_list, + struct list_head *from_list, + int addr_len, unsigned char addr_type) +{ + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, addr_type, -1); +} + +static void __hw_addr_flush(struct list_head *list) +{ + struct netdev_hw_addr *ha, *tmp; + + list_for_each_entry_safe(ha, tmp, list, list) { + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + } +} + +/* Device addresses handling functions */ + +static void dev_addr_flush(struct net_device *dev) +{ + /* rtnl_mutex must be held here */ + + __hw_addr_flush(&dev->dev_addr_list); + dev->dev_addr = NULL; +} + +static int dev_addr_init(struct net_device *dev) +{ + unsigned char addr[MAX_ADDR_LEN]; + struct netdev_hw_addr *ha; + int err; + + /* rtnl_mutex must be held here */ + + INIT_LIST_HEAD(&dev->dev_addr_list); + memset(addr, 0, sizeof(*addr)); + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr), + NETDEV_HW_ADDR_T_LAN); + if (!err) { + /* + * Get the first (previously created) address from the list + * and set dev_addr pointer to this location. + */ + ha = list_first_entry(&dev->dev_addr_list, + struct netdev_hw_addr, list); + dev->dev_addr = ha->addr; + } + return err; +} + +/** + * dev_addr_add - Add a device address + * @dev: device + * @addr: address to add + * @addr_type: address type + * + * Add a device address to the device or increase the reference count if + * it already exists. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add(struct net_device *dev, unsigned char *addr, + unsigned char addr_type) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_add(&dev->dev_addr_list, addr, dev->addr_len, + addr_type); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add); + +/** + * dev_addr_del - Release a device address. + * @dev: device + * @addr: address to delete + * @addr_type: address type + * + * Release reference to a device address and remove it from the device + * if the reference count drops to zero. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del(struct net_device *dev, unsigned char *addr, + unsigned char addr_type) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, + addr_type, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_del); + +/** + * dev_addr_add_multiple - Add device addresses from another device + * @to_dev: device to which addresses will be added + * @from_dev: device from which addresses will be added + * @addr_type: address type - 0 means type will be used from from_dev + * + * Add device addresses of the one device to another. + ** + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type) +{ + int err; + + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, addr_type, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add_multiple); + +/** + * dev_addr_del_multiple - Delete device addresses by another device + * @to_dev: device where the addresses will be deleted + * @from_dev: device by which addresses the addresses will be deleted + * @addr_type: address type - 0 means type will used from from_dev + * + * Deletes addresses in to device by the list of addresses in from device. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type) +{ + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, addr_type, 0); + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return 0; +} +EXPORT_SYMBOL(dev_addr_del_multiple); + +/* unicast and multicast addresses handling functions */ + int __dev_addr_delete(struct dev_addr_list **list, int *count, void *addr, int alen, int glbl) { @@ -4785,6 +5052,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; + dev_addr_init(dev); netdev_init_queues(dev); INIT_LIST_HEAD(&dev->napi_list); @@ -4810,6 +5078,9 @@ void free_netdev(struct net_device *dev) kfree(dev->_tx); + /* Flush device addresses */ + dev_addr_flush(dev); + list_for_each_entry_safe(p, n, &dev->napi_list, dev_list) netif_napi_del(p);
David Miller
2009-May-05 04:37 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v5)
From: Jiri Pirko <jpirko at redhat.com> Date: Mon, 4 May 2009 13:14:18 +0200> +static void __hw_addr_del_multiple(struct list_head *to_list, > + struct list_head *from_list, > + int addr_len, unsigned char addr_type) > +{ > + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, addr_type, -1); > +}Unused static function, this will create build warnings. Or, it should :-) If you plan to use such a function in subsequent patches, add it in those changes not here. Otherwise I have no fundamental objection to this patch, nice work!
Jiri Pirko
2009-May-05 12:48 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v6)
v5 -> v6 (current): -removed so far unused static functions -corrected dev_addr_del_multiple to call del instead of add v4 -> v5: -added device address type (suggested by davem) -removed refcounting (better to have simplier code then safe potentially few bytes) v3 -> v4: -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 37 ++++++- net/core/dev.c | 250 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 312 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the device. Return true if the + * address if one of the device addresses. + * + * Note that this function calls compare_ether_addr_64bits() so take care of + * the right padding. + */ +static inline bool is_etherdev_addr(const struct net_device *dev, + const u8 addr[6 + 2]) +{ + struct netdev_hw_addr *ha; + int res = 1; + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + res = compare_ether_addr_64bits(addr, ha->addr); + if (!res) + break; + } + rcu_read_unlock(); + return !res; +} #endif /* __KERNEL__ */ /** diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5a96a1a..a95befc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -210,6 +210,16 @@ struct dev_addr_list #define dmi_users da_users #define dmi_gusers da_gusers +struct netdev_hw_addr { + struct list_head list; + unsigned char addr[MAX_ADDR_LEN]; + unsigned char type; +#define NETDEV_HW_ADDR_T_LAN 1 +#define NETDEV_HW_ADDR_T_SAN 2 +#define NETDEV_HW_ADDR_T_SLAVE 3 + struct rcu_head rcu_head; +}; + struct hh_cache { struct hh_cache *hh_next; /* Next entry */ @@ -776,8 +786,11 @@ struct net_device */ unsigned long last_rx; /* Time of last Rx */ /* Interface address info used in eth_type_trans() */ - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast - because most packets are unicast) */ + unsigned char *dev_addr; /* hw address, (before bcast + because most packets are + unicast) */ + + struct list_head dev_addr_list; /* list of device hw addresses */ unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */ @@ -1778,6 +1791,13 @@ static inline void netif_addr_unlock_bh(struct net_device *dev) spin_unlock_bh(&dev->addr_list_lock); } +/* + * dev_addr_list walker. Should be used only for read access. Call with + * rcu_read_lock held. + */ +#define for_each_dev_addr(dev, ha) \ + list_for_each_entry_rcu(ha, &dev->dev_addr_list, list) + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ extern void ether_setup(struct net_device *dev); @@ -1790,6 +1810,19 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, alloc_netdev_mq(sizeof_priv, name, setup, 1) extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); + +/* Functions used for device addresses handling */ +extern int dev_addr_add(struct net_device *dev, unsigned char *addr, + unsigned char addr_type); +extern int dev_addr_del(struct net_device *dev, unsigned char *addr, + unsigned char addr_type); +extern int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type); +extern int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type); + /* Functions used for secondary unicast and multicast support */ extern void dev_set_rx_mode(struct net_device *dev); extern void __dev_set_rx_mode(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 308a7d0..b2f752b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3443,6 +3443,252 @@ void dev_set_rx_mode(struct net_device *dev) netif_addr_unlock_bh(dev); } +/* hw addresses list handling functions */ + +static int __hw_addr_add(struct list_head *list, unsigned char *addr, + int addr_len, unsigned char addr_type) +{ + struct netdev_hw_addr *ha; + int alloc_size; + + if (addr_len > MAX_ADDR_LEN) + return -EINVAL; + + alloc_size = sizeof(*ha); + if (alloc_size < L1_CACHE_BYTES) + alloc_size = L1_CACHE_BYTES; + ha = kmalloc(alloc_size, GFP_ATOMIC); + if (!ha) + return -ENOMEM; + memcpy(ha->addr, addr, addr_len); + ha->type = addr_type; + list_add_tail_rcu(&ha->list, list); + return 0; +} + +static void ha_rcu_free(struct rcu_head *head) +{ + struct netdev_hw_addr *ha; + + ha = container_of(head, struct netdev_hw_addr, rcu_head); + kfree(ha); +} + +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr, + int addr_len, unsigned char addr_type, + int ignore_index) +{ + struct netdev_hw_addr *ha; + int i = 0; + + list_for_each_entry(ha, list, list) { + if (i++ != ignore_index && + !memcmp(ha->addr, addr, addr_len) && + (ha->type == addr_type || !addr_type)) { + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + return 0; + } + } + return -ENOENT; +} + +static int __hw_addr_add_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, unsigned char addr_type, + int ignore_index) +{ + int err; + struct netdev_hw_addr *ha, *ha2; + unsigned char type; + + list_for_each_entry(ha, from_list, list) { + type = addr_type ? addr_type : ha->type; + err = __hw_addr_add(to_list, ha->addr, addr_len, type); + if (err) + goto unroll; + } + return 0; + +unroll: + list_for_each_entry(ha2, from_list, list) { + if (ha2 == ha) + break; + type = addr_type ? addr_type : ha2->type; + __hw_addr_del_ii(to_list, ha2->addr, addr_len, type, + ignore_index); + } + return err; +} + +static void __hw_addr_del_multiple_ii(struct list_head *to_list, + struct list_head *from_list, + int addr_len, unsigned char addr_type, + int ignore_index) +{ + struct netdev_hw_addr *ha; + unsigned char type; + + list_for_each_entry(ha, from_list, list) { + type = addr_type ? addr_type : ha->type; + __hw_addr_del_ii(to_list, ha->addr, addr_len, addr_type, + ignore_index); + } +} + +static void __hw_addr_flush(struct list_head *list) +{ + struct netdev_hw_addr *ha, *tmp; + + list_for_each_entry_safe(ha, tmp, list, list) { + list_del_rcu(&ha->list); + call_rcu(&ha->rcu_head, ha_rcu_free); + } +} + +/* Device addresses handling functions */ + +static void dev_addr_flush(struct net_device *dev) +{ + /* rtnl_mutex must be held here */ + + __hw_addr_flush(&dev->dev_addr_list); + dev->dev_addr = NULL; +} + +static int dev_addr_init(struct net_device *dev) +{ + unsigned char addr[MAX_ADDR_LEN]; + struct netdev_hw_addr *ha; + int err; + + /* rtnl_mutex must be held here */ + + INIT_LIST_HEAD(&dev->dev_addr_list); + memset(addr, 0, sizeof(*addr)); + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr), + NETDEV_HW_ADDR_T_LAN); + if (!err) { + /* + * Get the first (previously created) address from the list + * and set dev_addr pointer to this location. + */ + ha = list_first_entry(&dev->dev_addr_list, + struct netdev_hw_addr, list); + dev->dev_addr = ha->addr; + } + return err; +} + +/** + * dev_addr_add - Add a device address + * @dev: device + * @addr: address to add + * @addr_type: address type + * + * Add a device address to the device or increase the reference count if + * it already exists. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add(struct net_device *dev, unsigned char *addr, + unsigned char addr_type) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_add(&dev->dev_addr_list, addr, dev->addr_len, + addr_type); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add); + +/** + * dev_addr_del - Release a device address. + * @dev: device + * @addr: address to delete + * @addr_type: address type + * + * Release reference to a device address and remove it from the device + * if the reference count drops to zero. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del(struct net_device *dev, unsigned char *addr, + unsigned char addr_type) +{ + int err; + + ASSERT_RTNL(); + + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, + addr_type, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + return err; +} +EXPORT_SYMBOL(dev_addr_del); + +/** + * dev_addr_add_multiple - Add device addresses from another device + * @to_dev: device to which addresses will be added + * @from_dev: device from which addresses will be added + * @addr_type: address type - 0 means type will be used from from_dev + * + * Add device addresses of the one device to another. + ** + * The caller must hold the rtnl_mutex. + */ +int dev_addr_add_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type) +{ + int err; + + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, addr_type, 0); + if (!err) + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return err; +} +EXPORT_SYMBOL(dev_addr_add_multiple); + +/** + * dev_addr_del_multiple - Delete device addresses by another device + * @to_dev: device where the addresses will be deleted + * @from_dev: device by which addresses the addresses will be deleted + * @addr_type: address type - 0 means type will used from from_dev + * + * Deletes addresses in to device by the list of addresses in from device. + * + * The caller must hold the rtnl_mutex. + */ +int dev_addr_del_multiple(struct net_device *to_dev, + struct net_device *from_dev, + unsigned char addr_type) +{ + ASSERT_RTNL(); + + if (from_dev->addr_len != to_dev->addr_len) + return -EINVAL; + __hw_addr_del_multiple_ii(&to_dev->dev_addr_list, + &from_dev->dev_addr_list, + to_dev->addr_len, addr_type, 0); + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev); + return 0; +} +EXPORT_SYMBOL(dev_addr_del_multiple); + +/* unicast and multicast addresses handling functions */ + int __dev_addr_delete(struct dev_addr_list **list, int *count, void *addr, int alen, int glbl) { @@ -4785,6 +5031,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; + dev_addr_init(dev); netdev_init_queues(dev); INIT_LIST_HEAD(&dev->napi_list); @@ -4810,6 +5057,9 @@ void free_netdev(struct net_device *dev) kfree(dev->_tx); + /* Flush device addresses */ + dev_addr_flush(dev); + list_for_each_entry_safe(p, n, &dev->napi_list, dev_list) netif_napi_del(p);
David Miller
2009-May-05 19:27 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v6)
From: Jiri Pirko <jpirko at redhat.com> Date: Tue, 5 May 2009 14:48:28 +0200> This patch introduces a new list in struct net_device and brings a set of > functions to handle the work with device address list. The list is a replacement > for the original dev_addr field and because in some situations there is need to > carry several device addresses with the net device. To be backward compatible, > dev_addr is made to point to the first member of the list so original drivers > sees no difference. > > Signed-off-by: Jiri Pirko <jpirko at redhat.com>Applied to net-next-2.6, thanks!
Jiri Pirko
2009-May-06 14:46 UTC
[Bridge] [PATCH net-next] net: bridge: use device address list instead of dev_addr (repost)
(repost, no modifications) This patch changes the handling of mac addresses of bridge port devices. Now it uses previously introduced list of device addresses. It allows the bridge to know more then one local mac address per port which is mandatory for the right work in some cases. Signed-off-by: Jiri Pirko <jpirko at redhat.com> --- net/bridge/br_fdb.c | 101 ++++++++++++++++++++++++++++++---------------- net/bridge/br_if.c | 2 +- net/bridge/br_notify.c | 2 +- net/bridge/br_private.h | 4 +- 4 files changed, 70 insertions(+), 39 deletions(-) diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index a48f5ef..1e63f76 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -77,10 +77,26 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f) br_fdb_put(f); } -void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) +static bool another_port_has_addr(const struct net_bridge_port *p, + struct net_bridge_fdb_entry *f) +{ + struct net_bridge *br = p->br; + struct net_bridge_port *op; + + list_for_each_entry(op, &br->port_list, list) { + if (op != p && is_etherdev_addr(op->dev, f->addr.addr)) { + f->dst = op; + return 1; + } + } + return 0; +} + +void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev) { struct net_bridge *br = p->br; int i; + struct netdev_hw_addr *ha; spin_lock_bh(&br->hash_lock); @@ -92,26 +108,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) f = hlist_entry(h, struct net_bridge_fdb_entry, hlist); if (f->dst == p && f->is_local) { - /* maybe another port has same hw addr? */ - struct net_bridge_port *op; - list_for_each_entry(op, &br->port_list, list) { - if (op != p && - !compare_ether_addr(op->dev->dev_addr, - f->addr.addr)) { - f->dst = op; - goto insert; - } - } - - /* delete old one */ - fdb_delete(f); - goto insert; + /* + * maybe another port has same hw addr?, + * if not then delete it + */ + if (!another_port_has_addr(p, f)) + fdb_delete(f); } } } - insert: - /* insert new address, may fail if invalid address or dup. */ - fdb_insert(br, p, newaddr); + + /* insert device addresses, may fail if invalid address. */ + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + fdb_insert(br, p, ha->addr); + } + rcu_read_unlock(); spin_unlock_bh(&br->hash_lock); } @@ -189,20 +202,9 @@ void br_fdb_delete_by_port(struct net_bridge *br, * then when one port is deleted, assign * the local entry to other port */ - if (f->is_local) { - struct net_bridge_port *op; - list_for_each_entry(op, &br->port_list, list) { - if (op != p && - !compare_ether_addr(op->dev->dev_addr, - f->addr.addr)) { - f->dst = op; - goto skip_delete; - } - } - } - - fdb_delete(f); - skip_delete: ; + if (!f->is_local || + !another_port_has_addr(p, f)) + fdb_delete(f); } } spin_unlock_bh(&br->hash_lock); @@ -338,7 +340,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head, } static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr) + const unsigned char *addr) { struct hlist_head *head = &br->hash[br_mac_hash(addr)]; struct net_bridge_fdb_entry *fdb; @@ -366,13 +368,42 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, return 0; } +static int fdb_insert_dev(struct net_bridge *br, struct net_bridge_port *source, + struct net_device *dev) +{ + struct netdev_hw_addr *ha, *ha2; + struct net_bridge_fdb_entry *fdb; + struct hlist_head *head; + int ret = 0; + + rcu_read_lock(); + for_each_dev_addr(dev, ha) { + ret = fdb_insert(br, source, ha->addr); + if (ret) + goto unroll; + } + goto unlock; +unroll: + for_each_dev_addr(dev, ha2) { + if (ha2 == ha) + break; + head = &br->hash[br_mac_hash(ha2->addr)]; + fdb = fdb_find(head, ha2->addr); + if (fdb && fdb->is_local) + fdb_delete(fdb); + } +unlock: + rcu_read_unlock(); + return ret; +} + int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr) + struct net_device *dev) { int ret; spin_lock_bh(&br->hash_lock); - ret = fdb_insert(br, source, addr); + ret = fdb_insert_dev(br, source, dev); spin_unlock_bh(&br->hash_lock); return ret; } diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 8a96672..789cb30 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -392,7 +392,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) if (err) goto err0; - err = br_fdb_insert(br, p, dev->dev_addr); + err = br_fdb_insert(br, p, dev); if (err) goto err1; diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c index 763a3ec..1423541 100644 --- a/net/bridge/br_notify.c +++ b/net/bridge/br_notify.c @@ -48,7 +48,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v case NETDEV_CHANGEADDR: spin_lock_bh(&br->lock); - br_fdb_changeaddr(p, dev->dev_addr); + br_fdb_changeaddr(p, dev); br_stp_recalculate_bridge_id(br); spin_unlock_bh(&br->lock); break; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index b6c3b71..65ffe3d 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -148,7 +148,7 @@ extern int br_fdb_init(void); extern void br_fdb_fini(void); extern void br_fdb_flush(struct net_bridge *br); extern void br_fdb_changeaddr(struct net_bridge_port *p, - const unsigned char *newaddr); + struct net_device *dev); extern void br_fdb_cleanup(unsigned long arg); extern void br_fdb_delete_by_port(struct net_bridge *br, const struct net_bridge_port *p, int do_all); @@ -161,7 +161,7 @@ extern int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long count, unsigned long off); extern int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, - const unsigned char *addr); + struct net_device *dev); extern void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source, const unsigned char *addr);
Eric Dumazet
2009-May-06 15:08 UTC
[Bridge] [PATCH net-next] net: bridge: use device address list instead of dev_addr (repost)
Jiri Pirko a ?crit :> (repost, no modifications)Well, some changes are welcome :)> > This patch changes the handling of mac addresses of bridge port devices. Now > it uses previously introduced list of device addresses. It allows the bridge to > know more then one local mac address per port which is mandatory for the right > work in some cases. > > Signed-off-by: Jiri Pirko <jpirko at redhat.com> > --- > net/bridge/br_fdb.c | 101 ++++++++++++++++++++++++++++++---------------- > net/bridge/br_if.c | 2 +- > net/bridge/br_notify.c | 2 +- > net/bridge/br_private.h | 4 +- > 4 files changed, 70 insertions(+), 39 deletions(-) > > diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c > index a48f5ef..1e63f76 100644 > --- a/net/bridge/br_fdb.c > +++ b/net/bridge/br_fdb.c > @@ -77,10 +77,26 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f) > br_fdb_put(f); > } > > -void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) > +static bool another_port_has_addr(const struct net_bridge_port *p, > + struct net_bridge_fdb_entry *f) > +{ > + struct net_bridge *br = p->br; > + struct net_bridge_port *op; > + > + list_for_each_entry(op, &br->port_list, list) { > + if (op != p && is_etherdev_addr(op->dev, f->addr.addr)) { > + f->dst = op; > + return 1; > + } > + } > + return 0; > +} > + > +void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev) > { > struct net_bridge *br = p->br; > int i; > + struct netdev_hw_addr *ha; > > spin_lock_bh(&br->hash_lock); > > @@ -92,26 +108,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr) > > f = hlist_entry(h, struct net_bridge_fdb_entry, hlist); > if (f->dst == p && f->is_local) { > - /* maybe another port has same hw addr? */ > - struct net_bridge_port *op; > - list_for_each_entry(op, &br->port_list, list) { > - if (op != p && > - !compare_ether_addr(op->dev->dev_addr, > - f->addr.addr)) { > - f->dst = op; > - goto insert; > - } > - } > - > - /* delete old one */ > - fdb_delete(f); > - goto insert; > + /* > + * maybe another port has same hw addr?, > + * if not then delete it > + */ > + if (!another_port_has_addr(p, f)) > + fdb_delete(f); > } > } > } > - insert: > - /* insert new address, may fail if invalid address or dup. */ > - fdb_insert(br, p, newaddr); > + > + /* insert device addresses, may fail if invalid address. */ > + > + rcu_read_lock();Same problem than a previous patch Jiri. You should not use rcu_read_lock()/rcu_read_unlock() at all in this context, since you already own a lock.> + for_each_dev_addr(dev, ha) { > + fdb_insert(br, p, ha->addr); > + } > + rcu_read_unlock(); > > spin_unlock_bh(&br->hash_lock); > } > @@ -189,20 +202,9 @@ void br_fdb_delete_by_port(struct net_bridge *br, > * then when one port is deleted, assign > * the local entry to other port > */ > - if (f->is_local) { > - struct net_bridge_port *op; > - list_for_each_entry(op, &br->port_list, list) { > - if (op != p && > - !compare_ether_addr(op->dev->dev_addr, > - f->addr.addr)) { > - f->dst = op; > - goto skip_delete; > - } > - } > - } > - > - fdb_delete(f); > - skip_delete: ; > + if (!f->is_local || > + !another_port_has_addr(p, f)) > + fdb_delete(f); > } > } > spin_unlock_bh(&br->hash_lock); > @@ -338,7 +340,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head, > } > > static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, > - const unsigned char *addr) > + const unsigned char *addr) > { > struct hlist_head *head = &br->hash[br_mac_hash(addr)]; > struct net_bridge_fdb_entry *fdb; > @@ -366,13 +368,42 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source, > return 0; > } > > +static int fdb_insert_dev(struct net_bridge *br, struct net_bridge_port *source, > + struct net_device *dev) > +{ > + struct netdev_hw_addr *ha, *ha2; > + struct net_bridge_fdb_entry *fdb; > + struct hlist_head *head; > + int ret = 0; > + > + rcu_read_lock();You should not use rcu_read_lock()/rcu_read_unlock() at all in this context> + for_each_dev_addr(dev, ha) { > + ret = fdb_insert(br, source, ha->addr); > + if (ret) > + goto unroll; > + } > + goto unlock; > +unroll: > + for_each_dev_addr(dev, ha2) { > + if (ha2 == ha) > + break; > + head = &br->hash[br_mac_hash(ha2->addr)]; > + fdb = fdb_find(head, ha2->addr); > + if (fdb && fdb->is_local) > + fdb_delete(fdb); > + } > +unlock: > + rcu_read_unlock(); > + return ret; > +} > + > int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, > - const unsigned char *addr) > + struct net_device *dev) > { > int ret; > > spin_lock_bh(&br->hash_lock); > - ret = fdb_insert(br, source, addr); > + ret = fdb_insert_dev(br, source, dev); > spin_unlock_bh(&br->hash_lock); > return ret; > } > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c > index 8a96672..789cb30 100644 > --- a/net/bridge/br_if.c > +++ b/net/bridge/br_if.c > @@ -392,7 +392,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) > if (err) > goto err0; > > - err = br_fdb_insert(br, p, dev->dev_addr); > + err = br_fdb_insert(br, p, dev); > if (err) > goto err1; > > diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c > index 763a3ec..1423541 100644 > --- a/net/bridge/br_notify.c > +++ b/net/bridge/br_notify.c > @@ -48,7 +48,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v > > case NETDEV_CHANGEADDR: > spin_lock_bh(&br->lock); > - br_fdb_changeaddr(p, dev->dev_addr); > + br_fdb_changeaddr(p, dev); > br_stp_recalculate_bridge_id(br); > spin_unlock_bh(&br->lock); > break; > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index b6c3b71..65ffe3d 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -148,7 +148,7 @@ extern int br_fdb_init(void); > extern void br_fdb_fini(void); > extern void br_fdb_flush(struct net_bridge *br); > extern void br_fdb_changeaddr(struct net_bridge_port *p, > - const unsigned char *newaddr); > + struct net_device *dev); > extern void br_fdb_cleanup(unsigned long arg); > extern void br_fdb_delete_by_port(struct net_bridge *br, > const struct net_bridge_port *p, int do_all); > @@ -161,7 +161,7 @@ extern int br_fdb_fillbuf(struct net_bridge *br, void *buf, > unsigned long count, unsigned long off); > extern int br_fdb_insert(struct net_bridge *br, > struct net_bridge_port *source, > - const unsigned char *addr); > + struct net_device *dev); > extern void br_fdb_update(struct net_bridge *br, > struct net_bridge_port *source, > const unsigned char *addr); > >
Stephen Hemminger
2009-May-08 22:38 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v6)
On Tue, 05 May 2009 12:27:18 -0700 (PDT) David Miller <davem at davemloft.net> wrote:> From: Jiri Pirko <jpirko at redhat.com> > Date: Tue, 5 May 2009 14:48:28 +0200 > > > This patch introduces a new list in struct net_device and brings a set of > > functions to handle the work with device address list. The list is a replacement > > for the original dev_addr field and because in some situations there is need to > > carry several device addresses with the net device. To be backward compatible, > > dev_addr is made to point to the first member of the list so original drivers > > sees no difference. > > > > Signed-off-by: Jiri Pirko <jpirko at redhat.com> > > Applied to net-next-2.6, thanks! > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.htmlNot sure if this is such a good idea since the purpose of this was to fix a bonding/bridging interaction, but it breaks STP on bridging.
David Miller
2009-May-08 23:00 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v6)
From: Stephen Hemminger <shemminger at vyatta.com> Date: Fri, 8 May 2009 15:38:42 -0700> Not sure if this is such a good idea since the purpose of this was to fix > a bonding/bridging interaction, but it breaks STP on bridging.Thanks for not paying attention... :-/ The Intel folks want to have an address list functionality so they can public MAC addresses meant for FCOE and other purposes. So even if the bonding bits bomb, we still need this.
Stephen Hemminger
2009-May-08 23:12 UTC
[Bridge] [PATCH] net: introduce a list of device addresses dev_addr_list (v6)
On Fri, 08 May 2009 16:00:08 -0700 (PDT) David Miller <davem at davemloft.net> wrote:> From: Stephen Hemminger <shemminger at vyatta.com> > Date: Fri, 8 May 2009 15:38:42 -0700 > > > Not sure if this is such a good idea since the purpose of this was to fix > > a bonding/bridging interaction, but it breaks STP on bridging. > > Thanks for not paying attention... :-/ > > The Intel folks want to have an address list functionality so > they can public MAC addresses meant for FCOE and other purposes. > > So even if the bonding bits bomb, we still need this.But the other infrastructure may have same issues (netfilter, etc). Just seems like it would be either to have multiple network devices so that upper layers could disambiguate easier. --