thr3ads.net - Linux Ethernet Bridging - [Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Nikolay Aleksandrov

2015-Aug-28 15:26 UTC

[Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

> On Aug 28, 2015, at 5:31 AM, Vlad Yasevich <vyasevic at redhat.com>
wrote:
> 
> On 08/27/2015 10:17 PM, Nikolay Aleksandrov wrote:
>> 
>>> On Aug 27, 2015, at 4:47 PM, Vlad Yasevich <vyasevic at
redhat.com> wrote:
>>> 
>>> On 08/27/2015 05:02 PM, Nikolay Aleksandrov wrote:
>>>> 
>>>>> On Aug 26, 2015, at 9:57 PM, roopa <roopa at
cumulusnetworks.com> wrote:
>>>>> 
>>>>> On 8/26/15, 4:33 AM, Nikolay Aleksandrov wrote:
>>>>>>> On Aug 25, 2015, at 11:06 PM, David Miller
<davem at davemloft.net> wrote:
>>>>>>> 
>>>>>>> From: Nikolay Aleksandrov <nikolay at
cumulusnetworks.com>
>>>>>>> Date: Tue, 25 Aug 2015 22:28:16 -0700
>>>>>>> 
>>>>>>>> Certainly, that should be done and I will look
into it, but the
>>>>>>>> essence of this patch is a bit different. The
problem here is not
>>>>>>>> the size of the fdb entries, it?s more the
number of them - having
>>>>>>>> 96000 entries (even if they were 1 byte ones)
is just way too much
>>>>>>>> especially when the fdb hash size is small and
static. We could work
>>>>>>>> on making it dynamic though, but still these
type of local entries
>>>>>>>> per vlan per port can easily be avoided with
this option.
>>>>>>> 96000 bits can be stored in 12k.  Get where I'm
going with this?
>>>>>>> 
>>>>>>> Look at the problem sideways.
>>>>>> Oh okay, I misunderstood your previous comment. I?ll
look into that.
>>>>>> 
>>>>> I just wanted to add the other problems we have had with
keeping these macs (mostly from userspace POV):
>>>>> - add/del netlink notification storms
>>>>> - and large netlink dumps
>>>>> 
>>>>> In addition to in-kernel optimizations, will be nice to
have a solution that reduces the burden on userspace. That will need a newer
netlink dump format for fdbs. Considering all the changes needed, Nikolays patch
seems less intrusive.
>>>> 
>>>> Right, we need to take these into account as well. I?ll
continue the discussion on this (or restart it) because
>>>> I looked into using a bitmap for the local entries only and
while it fixes the scalability issue, it presents
>>>> a few new ones which are mostly related to the fact that these
entries now exist only without a vlan
>>>> and if a new mac comes along which matches one of these but is
in a vlan, the entry will get created
>>>> in br_fdb_update() unless we add a second lookup, but that will
slow down the learning path.
>>>> Also this change requires an update of every fdb function that
uses the vid as a key (every fdb function?!)
>>>> because now we can have the mac in two places instead of one
which is a pretty big churn with lots
>>>> of conditionals all over the place and I don?t like it. Adding
this complexity for the local addresses only
>>>> seems like an overkill, so I think to drop this issue for now.
>>> 
>>> I seem to recall Roopa and I and maybe a few others have discussing
this a few
>>> years ago at plumbers, I can't remember the details any more. 
All these local
>>> addresses add a ton of confusion.  Does anyone (Stephen?) remember
what the
>>> original reason was for all these local addresses? I wonder if we
can have
>>> a nob to disable all of them (not just per vlan)?  That might be
cleaner and
>>> easier to swallow.
>>> 
>> 
>> Right, this would be the easiest way and if the others agree - I?ll
post a patch for it so we can
>> have some way to resolve it today and even if we fix the scalability
issue, this is still a valid case
>> that some people don?t want local fdbs installed automatically.
>> Any objections to this ?
>> 
>>>> This patch (that works around the initial problem) also has
these issues.
>>>> Note that one way to take care of this in a more
straight-forward way would be to have each entry
>>>> with some sort of a bitmap (like Vlad has tried earlier) and
then we can combine the paths so most
>>>> of these issues disappear, but that will not be easy as was
already commented earlier. I?ve looked
>>>> briefly into doing this with rhashtable so we can keep the
memory footprint for each entry relatively
>>>> small but it still affects the performance and we can have
thousands of resizes happening.
>>>> 
>>> 
>>> So, one of the earlier approaches that I've tried (before
rhashtable was
>>> in the kernel) was to have a hash of vlan ids each with a data
structure
>>> pointing to a list of ports for a given vlan as well as a list of
fdbs for
>>> a given vlan.  As far as scalability goes, that's really the
best approach.
>>> It would also allow us to do packet accounting per vlan.  The only
concern
>>> at the time was performance of ingress lookup.   I think
rhashtables might
>>> help with this as well as ability to grow the footprint of the vlan
hash
>>> table dynamically.
>>> 
>>> -vlad
>>> 
>> I?ll look into it but I?m guessing the learning will become a more
complicated process with additional
>> allocations and some hash handling.
> 
> I don't remember learning being all that complicated.  The hash only
changed under
> rtnl when vlans were added/removed.  The nice this is that we wouldn't
need
> to rebalance, because if the vlan is removed all fdb links get removed too.
They
> don't move to another bucket (But that was with static hash.  Need to
look at rhash in
> more detail).
> 
> If you want, I might still have patches hanging around on my machine that
had a hash
> table implementation.  I can send them to you.
> 
> -vlad
> 
:-) Okay, I?m putting the crystal ball away. If you could send me these patches
it?d be great so
I don?t have to start this from scratch.

Thanks,
 Nik
>> 
>>>> On the notification side if we can fix that, we can actually
delete the 96000 entries without creating a
>>>> huge notification storm and do a user-land workaround of the
original issue, so I?ll look into that next.
>>>> 
>>>> Any comments or ideas are very welcome.
>>>> 
>>>> Thank you,
>>>> Nik

Vlad Yasevich

2015-Aug-29 01:11 UTC

head link

[Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

On 08/28/2015 11:26 AM, Nikolay Aleksandrov wrote:> 
>> On Aug 28, 2015, at 5:31 AM, Vlad Yasevich <vyasevic at
redhat.com> wrote:
>>
>> On 08/27/2015 10:17 PM, Nikolay Aleksandrov wrote:
>>>
>>>> On Aug 27, 2015, at 4:47 PM, Vlad Yasevich <vyasevic at
redhat.com> wrote:
>>>>
>>>> On 08/27/2015 05:02 PM, Nikolay Aleksandrov wrote:
>>>>>
>>>>>> On Aug 26, 2015, at 9:57 PM, roopa <roopa at
cumulusnetworks.com> wrote:
>>>>>>
>>>>>> On 8/26/15, 4:33 AM, Nikolay Aleksandrov wrote:
>>>>>>>> On Aug 25, 2015, at 11:06 PM, David Miller
<davem at davemloft.net> wrote:
>>>>>>>>
>>>>>>>> From: Nikolay Aleksandrov <nikolay at
cumulusnetworks.com>
>>>>>>>> Date: Tue, 25 Aug 2015 22:28:16 -0700
>>>>>>>>
>>>>>>>>> Certainly, that should be done and I will
look into it, but the
>>>>>>>>> essence of this patch is a bit different.
The problem here is not
>>>>>>>>> the size of the fdb entries, it?s more the
number of them - having
>>>>>>>>> 96000 entries (even if they were 1 byte
ones) is just way too much
>>>>>>>>> especially when the fdb hash size is small
and static. We could work
>>>>>>>>> on making it dynamic though, but still
these type of local entries
>>>>>>>>> per vlan per port can easily be avoided
with this option.
>>>>>>>> 96000 bits can be stored in 12k.  Get where
I'm going with this?
>>>>>>>>
>>>>>>>> Look at the problem sideways.
>>>>>>> Oh okay, I misunderstood your previous comment.
I?ll look into that.
>>>>>>>
>>>>>> I just wanted to add the other problems we have had
with keeping these macs (mostly from userspace POV):
>>>>>> - add/del netlink notification storms
>>>>>> - and large netlink dumps
>>>>>>
>>>>>> In addition to in-kernel optimizations, will be nice to
have a solution that reduces the burden on userspace. That will need a newer
netlink dump format for fdbs. Considering all the changes needed, Nikolays patch
seems less intrusive.
>>>>>
>>>>> Right, we need to take these into account as well. I?ll
continue the discussion on this (or restart it) because
>>>>> I looked into using a bitmap for the local entries only and
while it fixes the scalability issue, it presents
>>>>> a few new ones which are mostly related to the fact that
these entries now exist only without a vlan
>>>>> and if a new mac comes along which matches one of these but
is in a vlan, the entry will get created
>>>>> in br_fdb_update() unless we add a second lookup, but that
will slow down the learning path.
>>>>> Also this change requires an update of every fdb function
that uses the vid as a key (every fdb function?!)
>>>>> because now we can have the mac in two places instead of
one which is a pretty big churn with lots
>>>>> of conditionals all over the place and I don?t like it.
Adding this complexity for the local addresses only
>>>>> seems like an overkill, so I think to drop this issue for
now.
>>>>
>>>> I seem to recall Roopa and I and maybe a few others have
discussing this a few
>>>> years ago at plumbers, I can't remember the details any
more.  All these local
>>>> addresses add a ton of confusion.  Does anyone (Stephen?)
remember what the
>>>> original reason was for all these local addresses? I wonder if
we can have
>>>> a nob to disable all of them (not just per vlan)?  That might
be cleaner and
>>>> easier to swallow.
>>>>
>>>
>>> Right, this would be the easiest way and if the others agree - I?ll
post a patch for it so we can
>>> have some way to resolve it today and even if we fix the
scalability issue, this is still a valid case
>>> that some people don?t want local fdbs installed automatically.
>>> Any objections to this ?
>>>
>>>>> This patch (that works around the initial problem) also has
these issues.
>>>>> Note that one way to take care of this in a more
straight-forward way would be to have each entry
>>>>> with some sort of a bitmap (like Vlad has tried earlier)
and then we can combine the paths so most
>>>>> of these issues disappear, but that will not be easy as was
already commented earlier. I?ve looked
>>>>> briefly into doing this with rhashtable so we can keep the
memory footprint for each entry relatively
>>>>> small but it still affects the performance and we can have
thousands of resizes happening.
>>>>>
>>>>
>>>> So, one of the earlier approaches that I've tried (before
rhashtable was
>>>> in the kernel) was to have a hash of vlan ids each with a data
structure
>>>> pointing to a list of ports for a given vlan as well as a list
of fdbs for
>>>> a given vlan.  As far as scalability goes, that's really
the best approach.
>>>> It would also allow us to do packet accounting per vlan.  The
only concern
>>>> at the time was performance of ingress lookup.   I think
rhashtables might
>>>> help with this as well as ability to grow the footprint of the
vlan hash
>>>> table dynamically.
>>>>
>>>> -vlad
>>>>
>>> I?ll look into it but I?m guessing the learning will become a more
complicated process with additional
>>> allocations and some hash handling.
>>
>> I don't remember learning being all that complicated.  The hash
only changed under
>> rtnl when vlans were added/removed.  The nice this is that we
wouldn't need
>> to rebalance, because if the vlan is removed all fdb links get removed
too.  They
>> don't move to another bucket (But that was with static hash.  Need
to look at rhash in
>> more detail).
>>
>> If you want, I might still have patches hanging around on my machine
that had a hash
>> table implementation.  I can send them to you.
>>
>> -vlad
>>
> 
> :-) Okay, I?m putting the crystal ball away. If you could send me these
patches it?d be great so
> I don?t have to start this from scratch.
> 
So, I forgot that I lost an old disk that had all that code, so I am a bit
bummed about
that.  I did however find the series that got posted.
http://www.spinics.net/lists/netdev/msg219737.html

That was the series where I briefly switch from bitmaps to hash and list.
However, I see that the fdb code that was playing with never got posted...

Sorry.

-vlad
> Thanks,
>  Nik
> 
>>>
>>>>> On the notification side if we can fix that, we can
actually delete the 96000 entries without creating a
>>>>> huge notification storm and do a user-land workaround of
the original issue, so I?ll look into that next.
>>>>>
>>>>> Any comments or ideas are very welcome.
>>>>>
>>>>> Thank you,
>>>>> Nik
>

Nikolay Aleksandrov

2015-Sep-13 13:22 UTC

head link

[Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

On 08/29/2015 03:11 AM, Vlad Yasevich wrote:> On 08/28/2015 11:26 AM, Nikolay Aleksandrov wrote:
>>
>>> On Aug 28, 2015, at 5:31 AM, Vlad Yasevich <vyasevic at
redhat.com> wrote:
>>>
>>> On 08/27/2015 10:17 PM, Nikolay Aleksandrov wrote:
<<<snip>>>>>>
>>> I don't remember learning being all that complicated.  The hash
only changed under
>>> rtnl when vlans were added/removed.  The nice this is that we
wouldn't need
>>> to rebalance, because if the vlan is removed all fdb links get
removed too.  They
>>> don't move to another bucket (But that was with static hash. 
Need to look at rhash in
>>> more detail).
>>>
>>> If you want, I might still have patches hanging around on my
machine that had a hash
>>> table implementation.  I can send them to you.
>>>
>>> -vlad
>>>
>>
>> :-) Okay, I?m putting the crystal ball away. If you could send me these
patches it?d be great so
>> I don?t have to start this from scratch.
>>
> 
> So, I forgot that I lost an old disk that had all that code, so I am a bit
bummed about
> that.  I did however find the series that got posted.
> http://www.spinics.net/lists/netdev/msg219737.html
> 
> That was the series where I briefly switch from bitmaps to hash and list.
> However, I see that the fdb code that was playing with never got posted...
> 
> Sorry.
> 
> -vlad
> 
So I've been looking into this for some time now and did a basic
implementation of vlan handling
using rhashtables, here are some thoughts and a slightly different proposition.
First a few scenarios (the memory footprint is only the extra memory needed for
the
vlans):
Current memory footprint for 48 ports & 2000 vlans ~ 50k

1. Bridge with vlan hash with port bitmaps (similar to Vlad's first set)
- On input we have hash lookup + bitmap lookup
- If (r)hashtable is used we need additional list to handle stable list walks
which are
needed all over the place from error handling to compressed vlan dumps which
actually
need this list to be kept sorted since the already exposed user interfaces need
to
be handled without visible changes, but they also allow for per-port vlan
compressed
dumping which isn't easy to handle. Mostly the stability issue with
rhashtable
is with resizing since these entries change only under rtnl, also we need the
sorted
order because of the compressed dump. One alternative way to solve this is to
build the
sorted list each time a dump is requested, but again this falls under the
workarounds
needed to satisfy current behaviour requirements.
If this is chosen my preference is to have the vlans also in a list which is
kept sorted
for the walks, then the compressed request can be satisfied easier.
- memory footprint for 2000 vlans with 48 ports ~ 1.5 MB

2. Bridge with vlan hash, ports with vlan hashes (need a special per-port struct
because
of the tagged/untagged case, we basically need per-port per-vlan flags)
- On input we have 1 hash lookup only from the port vlan hash where get a
pointer
to the bridge's vlan entry so we get the global vlan context as well as the
local
- Same rhashtable handling requirements apply + more complexity & memory due
to having
to keep in sync multiple (per-port, per-bridge global) rhashtables
- memory footprint for 2000 vlans with 48 ports ~ 2.6 MB

Up until now I've done partially point 1 to see how much churn it would take
and the
basic change is huge. Also the memory footprint increases a lot.
So I'd propose a third option which you may call middle ground between the
current
implementation (which is very fast and compact) and points 1 & 2:

What do you think about adding an auxiliary per-vlan global context using
rhashtable
which is not used in the ingress/egress decision making ? We can contain it
via either a Kconfig option (so it can be compiled out) or via a dynamic
run-time option
so people who would like more features can enabled it on demand and are willing
to
trade some performance and memory.
This way we won't have to change most of the current API and won't have
to add workarounds
to keep the user-facing behaviour the same, also the syncing is reduced to
a refcount and the memory footprint is kept minimal.
The initial new features I'd like to introduce are per-vlan counters and
also per-vlan
flags which at first will be used to enable/disable multicast on a vlan basis.
In terms of performance if this is enabled it is close to point 1 but without
the changes
all over the API and more importantly with much less memory footprint.
The memory footprint of this option with 2000 vlans & 48 ports ~ +70k
(without the per-cpu
counters, any additional feature will naturally add to this). This is because we
don't
have a per-port increase for each vlan added and only keep the global context.

If it's acceptable to take the performance/memory hit and the huge churn,
then I can continue
with 1 or 2, but I'm not a big fan of that idea.

Feedback before I go any further on this would be much appreciated.

Thank you,
 Nik

Linux Ethernet Bridging - Sep 2015 - [Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

[Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

[Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

[Bridge] [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans