thr3ads.net - LARTC - Layer 3 switching... [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Grant Taylor

2007-Oct-04 20:56 UTC

Layer 3 switching...

Is it even possible or even worth while to do layer 3 switching 
(bridging) on a Linux system?

Or would this be considered routing even though everything is done on 
OSI Layer 2?

Which would be faster, Layer 3 switching (bridging) on OSI Layer 2 or 
routing on OSI Layer 3?



Grant. . . .

John Default

2007-Oct-05 10:05 UTC

head link

Re: Layer 3 switching...

Hi

I was told that layer 3 switches are faster because "routing" there is
done by some ASIC hardware. Is there any advantage in having another 
routing code in bridging when everything is done in software which is 
same slow as normal routing? The only speed gain would be in keeping the 
routing code very simple with limited functionality, but i think that 
the trend is to put there more and more functionality which would end up 
in having two same slow, same function code in two places.

(i was taugth that packets are routed on L3, frames are 
switched(bridged) on L2. And L3 switch does L2 switching + L3 routing 
but in hardware. routers are completely a software thing, switches 
hardware thing, and bridge is switch in software.)

Please excuse me if i am missing your idea completely.

Grant Taylor wrote:> Is it even possible or even worth while to do layer 3 switching 
> (bridging) on a Linux system?
>
> Or would this be considered routing even though everything is done on 
> OSI Layer 2?
>
> Which would be faster, Layer 3 switching (bridging) on OSI Layer 2 or 
> routing on OSI Layer 3?
>
>
>
> Grant. . . .
> _______________________________________________
> LARTC mailing list
> LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc-- 

___________________________________
S pozdravom / Best regards

John Default

Grant Taylor

2007-Oct-05 14:48 UTC

head link

Re: Layer 3 switching...

On 10/05/07 05:05, John Default wrote:> I was told that layer 3 switches are faster because "routing"
there is
> done by some ASIC hardware. Is there any advantage in having another 
> routing code in bridging when everything is done in software which is 
> same slow as normal routing? The only speed gain would be in keeping the 
> routing code very simple with limited functionality, but i think that 
> the trend is to put there more and more functionality which would end up 
> in having two same slow, same function code in two places.
Ah, there in lies the difference in what you are saying, which as a norm 
is probably correct and something that I do not disagree with.  I guess 
I should say that my introduction to L3 switching is actually on Cisco 
Catalyst 5000 / 5500 L2 switches where they depend on an external Cisco 
L3 router to assist in the L3 switching.  Rater that is to say that the 
L2 switch and the L3 router communicate with each other to combined do 
L3 switching.  As I understand it, the L2 switch will send initial 
packets to the L3 router along with some meta data.  The L3 router will 
route the packets and send them back to the L2 switch with updated meta 
data.  Then the L2 switch will have learned with the help of the L3 
router that the packets can be altered on L2 to emulate L3 routing but 
this time in hardware.  Thus the L2 switch depends on the L3 router to 
do the initial routing and then the L2 switch will subsequently step up 
and L2 switch across L3 boundaries based on what it learned from the L3 
router.

So, I guess I should say that I''m not wanting to (re)implement the 
routing code in the kernel, it does quite fine for me thank you very 
much.  ;)  I''m looking for a way to alter source / destination MAC 
addresses of packets on L2 to emulate what happens in routing.  I 
believe that I could SNAT / DNAT the MAC addresses of the packet via 
EBTables on L2 to achieve the effect of an L3 route.  I would do this by 
having the bridging code in the kernel learn from cached (?) results of 
a previous L3 route.

In other words if the packet is in a NEW connection state, send it on up 
to L3 routing.  If the packet is in an ESTABLISHED state and we can pull 
information form the systems ARP cache to know the destination MAC 
address for the next subnet as well as pull the correct source MAC 
address for the interface on the next subnet, then we could just SNAT / 
DNAT the MAC addresses on L2 and send the packet back out on the 
appropriate wire.

I''m wondering if this NATing of the source and destination MAC
addresses
on L2 would be faster than passing the packet up to L3 routing.  It is 
my belief that L3 will do more sanity checks on packets than L2 will. 
These sanity checks will take time to perform which could be avoided if 
we could just NAT the source and destination MAC addresses on L2.  Or at 
least that''s what I think.  I could be very wrong about it.
> (i was taugth that packets are routed on L3, frames are 
> switched(bridged) on L2. And L3 switch does L2 switching + L3 routing 
> but in hardware. routers are completely a software thing, switches 
> hardware thing, and bridge is switch in software.)
I can agree with that statement.  However I''ll spin what you said a 
little bit and then I think you can see how I''m logically progressing
on
down the line.

Switching is a L2 operation, no matter what that operation is.  Routing 
is a L3 operation, no matter what that operation is.  Thus if we perform 
some sort of L3 type operation on L2 then we are performing some sort of 
switching operation.  If that operation happens to be routing which is 
normally a L3 operation, then we are doing a L3 like operation on L2, 
thus L3 switching.  So now that I have circularly argues that, how about 
an example.

Let''s say that we have two end point hosts on separate subnets with an 
intermediary router.

       +---------+     +-------------------+     +---------+
  IP:  | 4.0.0.9 +-----+ 4.0.0.1 : 5.0.0.1 +-----+ 5.0.0.9 |
MAC:  | ..00:0f |     | ..11:1e : ..22:2d |     | ..33:3c |
       +---------+     +-------------------+     +---------+

If I want to send an ICMP ping from 4.0.0.9 to 5.0.0.9 the ethernet 
frames will be sent from ..00:0f to ..11:1e and from ..22:2d to ..33:3c.

Note that the routing code on the intermediary router will see that the 
packet needs to be routed from one subnet to the other and will do so 
just fine with out any problems at all.  However this is a layer 3 
operation.

What I''m wanting to do is educate L2 enough so that it can use cached 
results from L3 to perform a similar operation on L2 in the future. 
Thus when the frame from 4.0.0.9 with a MAC address of ..00:0f comes in 
destined to 5.0.0.9 with the router''s MAC address of ..11:1e
I''m wanting
to alter the frame coming in to the switch such that the new destination 
MAC address will be ..33:3c with a new source MAC address of ..22:2d 
based on contents of the system''s ARP cache with a little bit of help.

It is my belief that this L2 operation of SNATing and DNATing the MAC 
addresses with out sending the data up to L3 will be faster than sending 
the data up to L3 and doing its full processing.  At least that is what 
this entire discussion is based on.  At the very least I believe I''m 
going to do some controlled tests to see if this will even work with 
manually entered static configurations.

If this does work, I think it would be possible to come up with a new 
EBTables target that could alter the destination MAC address based on 
the contents of the system''s ARP cache (the system just spoke to the 
target, thus the target MAC should be in the ARP cache, if not the ARP 
code does a fine job at it''s job and can get us the MAC address).  The 
only hiccup that I don''t have an answer for at the moment is picking
the
  correct source MAC address.  However looking at the contents of the 
ARP cache we see that the interface is listed as well.  So we could do a 
simple translation from interface to source MAC address.  Thus I believe 
we have the basis of a rough crude logistical algorithm to L3 switch (a 
n L3 operation on L2) traffic through a Linux system.
> Please excuse me if i am missing your idea completely.
Please read and chew on what I''ve brain farted to the mailing list. 
Poke holes in it and let''s discuss this.  If this truly will not work,
I
have only wasted some bandwidth and bytes on drives, nothing else.  All 
the while we will have hopefully cleared a few cob webs from our 
collective brains.  ;)  At least for a few minutes while I try to make a 
fool of my self. :}

Grant. . . .

John Default

2007-Oct-06 11:16 UTC

head link

Re: Layer 3 switching...

Grant Taylor wrote:> On 10/05/07 05:05, John Default wrote:
>> I was told that layer 3 switches are faster because "routing"
there
>> is done by some ASIC hardware. Is there any advantage in having 
>> another routing code in bridging when everything is done in software 
>> which is same slow as normal routing? The only speed gain would be in 
>> keeping the routing code very simple with limited functionality, but 
>> i think that the trend is to put there more and more functionality 
>> which would end up in having two same slow, same function code in two 
>> places.
>
> Ah, there in lies the difference in what you are saying, which as a 
> norm is probably correct and something that I do not disagree with.  I 
> guess I should say that my introduction to L3 switching is actually on 
> Cisco Catalyst 5000 / 5500 L2 switches where they depend on an 
> external Cisco L3 router to assist in the L3 switching.  Rater that is 
> to say that the L2 switch and the L3 router communicate with each 
> other to combined do L3 switching.  As I understand it, the L2 switch 
> will send initial packets to the L3 router along with some meta data.  
> The L3 router will route the packets and send them back to the L2 
> switch with updated meta data.  Then the L2 switch will have learned 
> with the help of the L3 router that the packets can be altered on L2 
> to emulate L3 routing but this time in hardware.  Thus the L2 switch 
> depends on the L3 router to do the initial routing and then the L2 
> switch will subsequently step up and L2 switch across L3 boundaries 
> based on what it learned from the L3 router.
>
> So, I guess I should say that I''m not wanting to (re)implement the
> routing code in the kernel, it does quite fine for me thank you very 
> much.  ;)  I''m looking for a way to alter source / destination MAC
> addresses of packets on L2 to emulate what happens in routing.  I 
> believe that I could SNAT / DNAT the MAC addresses of the packet via 
> EBTables on L2 to achieve the effect of an L3 route.  I would do this 
> by having the bridging code in the kernel learn from cached (?) 
> results of a previous L3 route.
>
> In other words if the packet is in a NEW connection state, send it on 
> up to L3 routing.  If the packet is in an ESTABLISHED state and we can 
> pull information form the systems ARP cache to know the destination 
> MAC address for the next subnet as well as pull the correct source MAC 
> address for the interface on the next subnet, then we could just SNAT 
> / DNAT the MAC addresses on L2 and send the packet back out on the 
> appropriate wire.
>
> I''m wondering if this NATing of the source and destination MAC 
> addresses on L2 would be faster than passing the packet up to L3 
> routing.  It is my belief that L3 will do more sanity checks on 
> packets than L2 will. These sanity checks will take time to perform 
> which could be avoided if we could just NAT the source and destination 
> MAC addresses on L2.  Or at least that''s what I think.  I could be
> very wrong about it.
>
>> (i was taugth that packets are routed on L3, frames are 
>> switched(bridged) on L2. And L3 switch does L2 switching + L3 routing 
>> but in hardware. routers are completely a software thing, switches 
>> hardware thing, and bridge is switch in software.)
>
> I can agree with that statement.  However I''ll spin what you said
a
> little bit and then I think you can see how I''m logically
progressing
> on down the line.
>
> Switching is a L2 operation, no matter what that operation is.  
> Routing is a L3 operation, no matter what that operation is.  Thus if 
> we perform some sort of L3 type operation on L2 then we are performing 
> some sort of switching operation.  If that operation happens to be 
> routing which is normally a L3 operation, then we are doing a L3 like 
> operation on L2, thus L3 switching.  So now that I have circularly 
> argues that, how about an example.
>
> Let''s say that we have two end point hosts on separate subnets
with an
> intermediary router.
>
>       +---------+     +-------------------+     +---------+
>  IP:  | 4.0.0.9 +-----+ 4.0.0.1 : 5.0.0.1 +-----+ 5.0.0.9 |
> MAC:  | ..00:0f |     | ..11:1e : ..22:2d |     | ..33:3c |
>       +---------+     +-------------------+     +---------+
>
> If I want to send an ICMP ping from 4.0.0.9 to 5.0.0.9 the ethernet 
> frames will be sent from ..00:0f to ..11:1e and from ..22:2d to ..33:3c.
>
> Note that the routing code on the intermediary router will see that 
> the packet needs to be routed from one subnet to the other and will do 
> so just fine with out any problems at all.  However this is a layer 3 
> operation.
>
> What I''m wanting to do is educate L2 enough so that it can use
cached
> results from L3 to perform a similar operation on L2 in the future. 
> Thus when the frame from 4.0.0.9 with a MAC address of ..00:0f comes 
> in destined to 5.0.0.9 with the router''s MAC address of ..11:1e
I''m
> wanting to alter the frame coming in to the switch such that the new 
> destination MAC address will be ..33:3c with a new source MAC address 
> of ..22:2d based on contents of the system''s ARP cache with a
little
> bit of help.
>
> It is my belief that this L2 operation of SNATing and DNATing the MAC 
> addresses with out sending the data up to L3 will be faster than 
> sending the data up to L3 and doing its full processing.  At least 
> that is what this entire discussion is based on.  At the very least I 
> believe I''m going to do some controlled tests to see if this will
even
> work with manually entered static configurations.
>
> If this does work, I think it would be possible to come up with a new 
> EBTables target that could alter the destination MAC address based on 
> the contents of the system''s ARP cache (the system just spoke to
the
> target, thus the target MAC should be in the ARP cache, if not the ARP 
> code does a fine job at it''s job and can get us the MAC address). 
The
> only hiccup that I don''t have an answer for at the moment is
picking
> the  correct source MAC address.  However looking at the contents of 
> the ARP cache we see that the interface is listed as well.  So we 
> could do a simple translation from interface to source MAC address.  
> Thus I believe we have the basis of a rough crude logistical algorithm 
> to L3 switch (a n L3 operation on L2) traffic through a Linux system.
>So, now i get it (after your first mail, it wasn''t possible :)).  I 
think the idea is great, but.

What everything would you we actually avoid ? For correct operation we 
will have to look at destination IP anyway, skipping only ip header 
check (iphdr checksum, version, maybe length check), which consists of 
functions that are implemented in very quick way (sum through 20B 
written in assembly..) (probably few tens of nanoseconds on 1GHz processor)

With the probability of damaged packet header we probably can skip 
checking.  But there are some security problems that can arise from that.

Then we avoid lookup in routing table. But routing already does have 
cache (i don''t know how effective) for routes to avoid doing the lookup
for each packet. Will this be much faster than route cache ?

Bringing it down to lower, dumber layer we risk that we will somehow 
mess up policy routing,  multipath routing and probably some other 
advanced things.

Another thing is that turning the l3 switching on, router will start to 
behave little bit different as usually, what could confuse the 
administrator ...

What about NAT and other packet-changing things in iptables (and QoS 
marking and the like)?  Stealing packet before layer3 processing we 
avoid these things as well i think.  Hm this could really become a problem.
There could be mechanism for detecting if packet is changed anyhow and 
then we would not touch it, but if box is meant for changing packets, 
then we would have to implement it too or process no packets at all 
...(you are right, who would use l3 switch for NAT : ) )

... and you should probably decrement and check the ttl too : )
>> Please excuse me if i am missing your idea completely.
>
> Please read and chew on what I''ve brain farted to the mailing
list.
> Poke holes in it and let''s discuss this.  If this truly will not
work,
> I have only wasted some bandwidth and bytes on drives, nothing else.  
> All the while we will have hopefully cleared a few cob webs from our 
> collective brains.  ;)  At least for a few minutes while I try to make 
> a fool of my self. :}
>I just mentioned few things that came to my mind that might need to be 
considered. But otherwise i think the idea is very nice. I will try to 
find out more, just need to find time to read the source ; )

(disclaimer: I am just beginner, with my stupid questions i am just 
trying to help your thinking process)>
>
> Grant. . . .
> _______________________________________________
> LARTC mailing list
> LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc-- 
___________________________________
S pozdravom / Best regards

John Default

Mohan Sundaram

2007-Oct-06 12:27 UTC

head link

Re: Layer 3 switching...

John Default wrote:> Grant Taylor wrote:
>> On 10/05/07 05:05, John Default wrote:
>>> I was told that layer 3 switches are faster because
"routing" there
>>> is done by some ASIC hardware. Is there any advantage in having 
>>> another routing code in bridging when everything is done in
software
>>> which is same slow as normal routing? The only speed gain would be
in
>>> keeping the routing code very simple with limited functionality,
but
>>> i think that the trend is to put there more and more functionality 
>>> which would end up in having two same slow, same function code in
two
>>> places.CISCO CEF works somewhat in this fashion for routing only. I''ve been 
building network gear for a while now.

I had this idea but no buyers. Route cache is for destination IPs 
normally. If the router does stateful filtering, then it has 
connections/ flows. Once a look up is done for a flow based on 
destination or policy routing, the exit interface with new packet header 
values and frame header value is also made part of the route cache. Thus 
the resultant of all L3/L2 actions are attached to a flow and used. This 
would include NAT translations.

The above idea gives good speed but fails for encapsulations, packet 
based load balancing and effecting inline change in configurations for 
existing flows. Being a commercial product, unless it is fully baked, it 
does not fly. User is responsible is also an arguement that is not 
accepted in such scenarios. Further this is IP specific and cannot do 
well in multi-protocol routers unless IP encapsulations like GRE are 
used as a standard.

An extension was to tie flows to MPLS labels but this was getting into 
core routing/switching space while focus was on CPE side.

Mohan

Grant Taylor

2007-Oct-08 14:48 UTC

head link

Re: Layer 3 switching...

On 10/06/07 06:16, John Default wrote:> So, now i get it (after your first mail, it wasn''t possible :)). 
I
> think the idea is great, but.
> 
> What everything would you we actually avoid ? For correct operation we 
> will have to look at destination IP anyway, skipping only ip header 
> check (iphdr checksum, version, maybe length check), which consists of 
> functions that are implemented in very quick way (sum through 20B 
> written in assembly..) (probably few tens of nanoseconds on 1GHz processor)
True...
> With the probability of damaged packet header we probably can skip 
> checking.  But there are some security problems that can arise from that.
Agreed.
> Then we avoid lookup in routing table. But routing already does have 
> cache (i don''t know how effective) for routes to avoid doing the
lookup
> for each packet. Will this be much faster than route cache ?
> Bringing it down to lower, dumber layer we risk that we will somehow 
> mess up policy routing,  multipath routing and probably some other 
> advanced things.
> Another thing is that turning the l3 switching on, router will start to 
> behave little bit different as usually, what could confuse the 
> administrator ...
I''m not thinking about making this an all or nothing type of 
application.  I would rather turn on L3 switching as desired and use the 
existing kernel as is for any thing else.  The intent is to not mess 
things up, but optimize when basic routing will be the predominant task.
> What about NAT and other packet-changing things in iptables (and QoS 
> marking and the like)?  Stealing packet before layer3 processing we 
> avoid these things as well i think.  Hm this could really become a problem.
> There could be mechanism for detecting if packet is changed anyhow and 
> then we would not touch it, but if box is meant for changing packets, 
> then we would have to implement it too or process no packets at all 
> ...(you are right, who would use l3 switch for NAT : ) )
This, again, is not a scenario for L3 switching, at least not in its 
first incarnation.  However basic NATing would not be difficult to 
implement, just alter the source IP like the source MAC is altered.
> ... and you should probably decrement and check the ttl too : )
Agreed.
> I just mentioned few things that came to my mind that might need to be 
> considered. But otherwise i think the idea is very nice. I will try to 
> find out more, just need to find time to read the source ; )
These are all very good points and deserve to be addressed.  Thank you 
for discussing things, that''s exactly what I was wanting.
> (disclaimer: I am just beginner, with my stupid questions i am just 
> trying to help your thinking process)
(See my last statement.)



Grant. . . .

Grant Taylor

2007-Oct-08 15:00 UTC

head link

Re: Layer 3 switching...

On 10/06/07 07:27, Mohan Sundaram wrote:> CISCO CEF works somewhat in this fashion for routing only. I''ve
been
> building network gear for a while now.
*nod*
> I had this idea but no buyers. Route cache is for destination IPs 
> normally. If the router does stateful filtering, then it has 
> connections / flows. Once a look up is done for a flow based on 
> destination or policy routing, the exit interface with new packet header 
> values and frame header value is also made part of the route cache. Thus 
> the resultant of all L3/L2 actions are attached to a flow and used. This 
> would include NAT translations.
Sounds like the route cache has been well thought out in the Cisco gear.
> The above idea gives good speed but fails for encapsulations, packet 
> based load balancing and effecting inline change in configurations for 
> existing flows. Being a commercial product, unless it is fully baked, it 
> does not fly. User is responsible is also an arguement that is not 
> accepted in such scenarios. Further this is IP specific and cannot do 
> well in multi-protocol routers unless IP encapsulations like GRE are 
> used as a standard.
I don''t think that L3 switching that I''m referring to is meant
to be
used in all locations, especially some of the ones that you reference. 
However L3 switching would be good in a core network between edge and 
core networks (presuming that there is no firewalling / filtering going 
on between the two).  I would never use a L3 switch as the interface to 
WANs and / or the ISPs, at least not today in this day and age.
> An extension was to tie flows to MPLS labels but this was getting into 
> core routing / switching space while focus was on CPE side.
I think MPLS in and of its own right is a very promising technology, all 
be it somewhat isolated to larger networks with their own complex core. 
  Rather it is my understanding that MPLS is primarily intra company, 
not inter company which is where I think it could have more benefit. 
However I could be wrong about this.  (If a discussion is going to 
ensue, let''s start a new thread.)



Grant. . . .

Maybe Matching Threads

Search for more seemingly similar threads

LARTC - Oct 2007 - Layer 3 switching...

Layer 3 switching...

Re: Layer 3 switching...

Re: Layer 3 switching...

Re: Layer 3 switching...

Re: Layer 3 switching...

Re: Layer 3 switching...

Re: Layer 3 switching...

Maybe Matching Threads