Allan W. Nielsen
2020-Jan-27 10:57 UTC
[Bridge] [RFC net-next v3 09/10] net: bridge: mrp: Integrate MRP into the bridge
On 26.01.2020 18:12, Andrew Lunn wrote:> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Sun, Jan 26, 2020 at 02:01:11PM +0100, Horatiu Vultur wrote: > > The 01/25/2020 17:16, Andrew Lunn wrote: > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > > > > br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb) > > > > @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) > > > > return RX_HANDLER_CONSUMED; > > > > } > > > > } > > > > +#ifdef CONFIG_BRIDGE_MRP > > > > + /* If there is no MRP instance do normal forwarding */ > > > > + if (!p->mrp_aware) > > > > + goto forward; > > > > + > > > > + if (skb->protocol == htons(ETH_P_MRP)) > > > > + return RX_HANDLER_PASS; > > > > > > What MAC address is used for these MRP frames? It would make sense to > > > use a L2 link local destination address, since i assume they are not > > > supposed to be forwarded by the bridge. If so, you could extend the > > > if (unlikely(is_link_local_ether_addr(dest))) condition. > > > > The MAC addresses used by MRP frames are: > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 - used by MRP_Test frames > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x2 - used by the rest of MRP frames. > > > > If we will add support also for MIM/MIC. These requires 2 more MAC > > addresses: > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 - used by MRP_InTest frames. > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x4 - used by the other MRP interconnect > > frames. > > Hi Horatiu > > I made the wrong guess about how this protocol worked when i said L2 > link local. These MAC addresses are L2 multicast. > > And you are using a raw socket to receive them into userspace when > needed. > > 'Thinking allowed' here. > > +------------------------------------------+ > | | > +-->|H1|<---------->|H2|<---------->|H3|<--+ > eth0 eth1 eth0 eth1 eth0 eth1 > ^ > | > Blocked > > > There are three major classes of user case here: > > 1) Pure software solution > You need the software bridge in the client to forward these frames > from the left side to the right side.As far as I understand it is not the bridge which forward these frames - it is the user-space tool. This was to put as much functionality in user-space and only use the kernel to configure the HW. We can (and should) discuss if this is the right decision.> (Does the standard give these two ports names)?Horatiu?> In the master, the left port is blocked, so the bridge drops them > anyway. You have a RAW socket open on both eth0 and eth1, so you get > to see the frames, even if the bridge drops them.Yes, in the current patch-set such frames are forwarded by the user-space daemon. We would properly have better performance if we do this in kernel-space.> 2) Hardware offload to an MRP unaware switch. > > I'm thinking about a plain switch supported by DSA, Marvell, Broadcom, > etc. It has no special knowledge of MRP.We have implemented this on Ocelot - which is not MRP aware at all. Not sure what facilities Marvell and Broadcom has, but it is not a lot which is needed.> Ideally, you want the switch to forward MRP_Test frames left to right > for a client.Yes. If we have only 1 ring, then we can do that with a MAC table entry. If we have more than 1 ring, then we will need a TCAM rule of some kind. In the what we have today on Ocelot, we do not do this is HW, we do the forwarding in SW. BTW: It is not only from left to right, it is also from right to left. The MRM will inject packets on both ring ports, and monitor both. This is to detect asymmetrical link down or similar. The two ports are treated the same. But you can set a priority (the primary/secondary) to state your preference on what port to use if both are up and the ring is closed.> In a master, i think you have a problem, since the port > is blocked. The hardware is unlikely to recognise these frames as > special, since they are not in the 01-80-C2-XX-XX-XX block, and let > them through. So your raw socket is never going to see them, and you > cannot detect open/closed ring.Again, I do not know how other HW is designed, but all the SOC's we are working with, does allow us to add a TCAM rule which can redirect these frames to the CPU even on a blocked port.> I don't know how realistic it is to support MRP in this case, and i > also don't think you can fall back to a pure software solution, > because the software bridge is going to offload the basic bridge > operation to the hardware. It would be nice if you could detect this, > and return -EOPNOTSUPP.We do want to support this on Ocelot, but you are right, if the current running bridge, cannot block a port, and still get the MRP frames on that port, then it cannot support MRM. And this we need to detect in some way.> 3) Hardware offload to an MRP aware switch. > > For a client, you tell it which port is left, which is right, and > assume it forwards the frames. For a master, you again tell it which > is left, which is right, and ask it send MRP_Test frames out right, > and report changes in open/closed on the right port. You don't need > the CPU to see the MRP_Test frames, so the switch has no need to > forward them to the CPU. > > We should think about the general case of a bridge with many ports, > and many pairs of ports using MRP. This makes the forwarding of these > frames interesting. Given that they are multicast, the default action > of the software bridge is that it will flood them. Does the protocol > handle seeing MRP_Test from some other loop? Do we need to avoid this?Yes, we need to avoid. We cannot "just" do normal flooding.> You could avoid this by adding MDB entries to the bridge. However, > this does not scale to more then one ring.I would prefer a solution where the individual drivers can do what is best on the given HW. - If we have a 2 ported switch, then flooding seems like a perfect valid approach. There will be only 1 ring. - If we have a many ported switch, then we could use MAC-table entry - if the user only configure 1 ring. - When adding more rings, it either needs to return error, or use other HW facilities.> I don't think an MDB is associated to an ingress port. So you cannot > sayAgree.> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port1 egress port2 > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port3 egress port4 > > The best you can say is > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 egress port2, port4 > > I'm sure there are other issues i'm missing, but it is interesting to > think about all this.Yes, the solution Horatiu has chosen, is not to forward MRP frames, received in MRP ring ports at all. This is done by the user-space tool. Again, not sure if this is the right way to do it, but it is what patch v3 does. The alternative to this would be to learn the bridge how to forward MRP frames when it is a MRC. The user-space tool then never needs to do this, it know that the kernel will take care of this part (either in SW or in HW). /Allan
Horatiu Vultur
2020-Jan-27 13:02 UTC
[Bridge] [RFC net-next v3 09/10] net: bridge: mrp: Integrate MRP into the bridge
The 01/27/2020 11:57, Allan W. Nielsen wrote:> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 26.01.2020 18:12, Andrew Lunn wrote: > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > On Sun, Jan 26, 2020 at 02:01:11PM +0100, Horatiu Vultur wrote: > > > The 01/25/2020 17:16, Andrew Lunn wrote: > > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > > > > > > br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb) > > > > > @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) > > > > > return RX_HANDLER_CONSUMED; > > > > > } > > > > > } > > > > > +#ifdef CONFIG_BRIDGE_MRP > > > > > + /* If there is no MRP instance do normal forwarding */ > > > > > + if (!p->mrp_aware) > > > > > + goto forward; > > > > > + > > > > > + if (skb->protocol == htons(ETH_P_MRP)) > > > > > + return RX_HANDLER_PASS; > > > > > > > > What MAC address is used for these MRP frames? It would make sense to > > > > use a L2 link local destination address, since i assume they are not > > > > supposed to be forwarded by the bridge. If so, you could extend the > > > > if (unlikely(is_link_local_ether_addr(dest))) condition. > > > > > > The MAC addresses used by MRP frames are: > > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 - used by MRP_Test frames > > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x2 - used by the rest of MRP frames. > > > > > > If we will add support also for MIM/MIC. These requires 2 more MAC > > > addresses: > > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 - used by MRP_InTest frames. > > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x4 - used by the other MRP interconnect > > > frames. > > > > Hi Horatiu > > > > I made the wrong guess about how this protocol worked when i said L2 > > link local. These MAC addresses are L2 multicast. > > > > And you are using a raw socket to receive them into userspace when > > needed. > > > > 'Thinking allowed' here. > > > > +------------------------------------------+ > > | | > > +-->|H1|<---------->|H2|<---------->|H3|<--+ > > eth0 eth1 eth0 eth1 eth0 eth1 > > ^ > > | > > Blocked > > > > > > There are three major classes of user case here: > > > > 1) Pure software solution > > You need the software bridge in the client to forward these frames > > from the left side to the right side. > As far as I understand it is not the bridge which forward these frames - > it is the user-space tool. This was to put as much functionality in > user-space and only use the kernel to configure the HW. We can (and > should) discuss if this is the right decision. > > > (Does the standard give these two ports names)? > Horatiu?They don't have a specific name, the standard names them as "ring ports". And to differentiate between them, they have roles: primary or secondary. These roles are used to know which port needs to be blocked(the secondary) and which needs to forward the frames. One observation, these roles are not fix for the entire time. The ring ports can interchange their roles. For example if eth0 is the primary port and eth1 is the secondary port and then the eth0 link goes down then eth0 will have the secondary role and eth1 will become primary port.> > > In the master, the left port is blocked, so the bridge drops them > > anyway. You have a RAW socket open on both eth0 and eth1, so you get > > to see the frames, even if the bridge drops them. > Yes, in the current patch-set such frames are forwarded by the > user-space daemon. > > We would properly have better performance if we do this in kernel-space. > > > > 2) Hardware offload to an MRP unaware switch. > > > > I'm thinking about a plain switch supported by DSA, Marvell, Broadcom, > > etc. It has no special knowledge of MRP. > We have implemented this on Ocelot - which is not MRP aware at all. Not > sure what facilities Marvell and Broadcom has, but it is not a lot which > is needed.Here is a small confusion. The implementation that we have done on Ocelot doesn't have at all HW offload(I have hacked the network driver to remove this support, so basically is just 4 NICs). Therefor all the non-MRP frames switching were done by the SW bridge and forwarding of MRP frames were done in the userspace.> > > Ideally, you want the switch to forward MRP_Test frames left to right > > for a client. > Yes. If we have only 1 ring, then we can do that with a MAC table entry. > If we have more than 1 ring, then we will need a TCAM rule of some kind. > > In the what we have today on Ocelot, we do not do this is HW, we do the > forwarding in SW. > > BTW: It is not only from left to right, it is also from right to left. > The MRM will inject packets on both ring ports, and monitor both. This > is to detect asymmetrical link down or similar. The two ports are > treated the same. But you can set a priority (the primary/secondary) to > state your preference on what port to use if both are up and the ring is > closed.A small observation, the primary/secondary are defined in the standard as roles and not priority. And yes it uses this role(primary/secondary) to decide which port to block.> > > In a master, i think you have a problem, since the port > > is blocked. The hardware is unlikely to recognise these frames as > > special, since they are not in the 01-80-C2-XX-XX-XX block, and let > > them through. So your raw socket is never going to see them, and you > > cannot detect open/closed ring. > Again, I do not know how other HW is designed, but all the SOC's we are > working with, does allow us to add a TCAM rule which can redirect these > frames to the CPU even on a blocked port. > > > I don't know how realistic it is to support MRP in this case, and i > > also don't think you can fall back to a pure software solution, > > because the software bridge is going to offload the basic bridge > > operation to the hardware. It would be nice if you could detect this, > > and return -EOPNOTSUPP. > We do want to support this on Ocelot, but you are right, if the current > running bridge, cannot block a port, and still get the MRP frames on > that port, then it cannot support MRM. And this we need to detect in > some way. > > > 3) Hardware offload to an MRP aware switch. > > > > For a client, you tell it which port is left, which is right, and > > assume it forwards the frames. For a master, you again tell it which > > is left, which is right, and ask it send MRP_Test frames out right, > > and report changes in open/closed on the right port. You don't need > > the CPU to see the MRP_Test frames, so the switch has no need to > > forward them to the CPU. > > > > We should think about the general case of a bridge with many ports, > > and many pairs of ports using MRP. This makes the forwarding of these > > frames interesting. Given that they are multicast, the default action > > of the software bridge is that it will flood them. Does the protocol > > handle seeing MRP_Test from some other loop? Do we need to avoid this? > Yes, we need to avoid. We cannot "just" do normal flooding. > > > You could avoid this by adding MDB entries to the bridge. However, > > this does not scale to more then one ring. > I would prefer a solution where the individual drivers can do what is > best on the given HW. > > - If we have a 2 ported switch, then flooding seems like a perfect valid > approach. There will be only 1 ring. > - If we have a many ported switch, then we could use MAC-table entry - > if the user only configure 1 ring. > - When adding more rings, it either needs to return error, or use > other HW facilities. > > > I don't think an MDB is associated to an ingress port. So you cannot > > say > Agree. > > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port1 egress port2 > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port3 egress port4 > > > > The best you can say is > > > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 egress port2, port4 > > > > I'm sure there are other issues i'm missing, but it is interesting to > > think about all this. > Yes, the solution Horatiu has chosen, is not to forward MRP frames, > received in MRP ring ports at all. This is done by the user-space tool. > > Again, not sure if this is the right way to do it, but it is what patch > v3 does. > > The alternative to this would be to learn the bridge how to forward MRP > frames when it is a MRC. The user-space tool then never needs to do > this, it know that the kernel will take care of this part (either in SW > or in HW). > > /Allan-- /Horatiu
Andrew Lunn
2020-Jan-27 13:40 UTC
[Bridge] [RFC net-next v3 09/10] net: bridge: mrp: Integrate MRP into the bridge
> > 'Thinking allowed' here. > > > > +------------------------------------------+ > > | | > > +-->|H1|<---------->|H2|<---------->|H3|<--+ > > eth0 eth1 eth0 eth1 eth0 eth1 > > ^ > > | > > Blocked > > > > > > There are three major classes of user case here: > > > > 1) Pure software solution > > You need the software bridge in the client to forward these frames > > from the left side to the right side.> As far as I understand it is not the bridge which forward these frames - > it is the user-space tool. This was to put as much functionality in > user-space and only use the kernel to configure the HW. We can (and > should) discuss if this is the right decision.So i need to flip the point around. How does the software switch know not to forward the frames? Are you adding an MDB?> We would properly have better performance if we do this in kernel-space.Yes, that is what i think. And if you can do it without any additional code, using the forwarding tables, so much the better.> BTW: It is not only from left to right, it is also from right to left. > The MRM will inject packets on both ring ports, and monitor both.Using the same MAC address in both directions? I need to think what that implies for MDB entries. It probably just works, since you never flood back out the ingress port.> Again, I do not know how other HW is designed, but all the SOC's we are > working with, does allow us to add a TCAM rule which can redirect these > frames to the CPU even on a blocked port.It is not in scope for what you are doing, but i wonder how we describe this in a generic Linux way? And then how we push it down to the hardware? For the Marvell Switches, it might be possible to do this without the TCAM. You can add forwarding DB entries marked as Management. It is unclear if this overrides the blocked state, but it would be a bit odd if it did not.> > You could avoid this by adding MDB entries to the bridge. However, > > this does not scale to more then one ring. > I would prefer a solution where the individual drivers can do what is > best on the given HW.The nice thing about adding MDB is that it is making use of the software bridge facilities. In general, the software bridge and hardware bridges are pretty similar. If you can solve the problem using generic software bridge features, not additional special cases in code, you have good chance of being able to offload it to a hardware bridge which is not MRP aware. The switchdev API for MRP specific features should then allow you to make use of any additional features the hardware might have.> Yes, the solution Horatiu has chosen, is not to forward MRP frames, > received in MRP ring ports at all. This is done by the user-space tool. > > Again, not sure if this is the right way to do it, but it is what patch > v3 does. > > The alternative to this would be to learn the bridge how to forward MRP > frames when it is a MRC. The user-space tool then never needs to do > this, it know that the kernel will take care of this part (either in SW > or in HW).I think that should be considered. I'm not saying it is the best way, just that some thought should be put into it to figure out what it actually implies. Andrew