thr3ads.net - LARTC - [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Jesper Dangaard Brouer

2006-Jun-14 09:40 UTC

[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

The Linux traffic''s control engine inaccurately calculates
transmission times for packets sent over ADSL links.  For
some packet sizes the error rises to over 50%.  This occurs
because ADSL uses ATM as its link layer transport, and ATM
transmits packets in fixed sized 53 byte cells.

The following patches to iproute2 and the kernel add an
option to calculate traffic transmission times over all
ATM links, including ADSL, with perfect accuracy.

A longer presentation of the patch, its rational, what it
does and how to use it can be found here:
   http://www.stuart.id.au/russell/files/tc/tc-atm/

A earlier version of the patch, and a _detailed_ empirical
investigation of its effects can be found here:
   http://www.adsl-optimizer.dk/

The patches are both backwards and forwards compatible.
This means unpatched kernels will work with a patched
version of iproute2, and an unpatched iproute2 will work
on patches kernels.


This is a combined effort of Jesper Brouer and Russell Stuart,
to get these patches into the upstream kernel.

Let the discussion start about what we need to change to get this
upstream?

We see this as a feature enhancement, as thus hope that it can be
queued in davem''s net-2.6.18.git tree.

---
Regards,
 Jesper Brouer & Russell Stuart.

jamal

2006-Jun-14 12:06 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

I have taken linux-kernel off the list.

Russell''s site is inaccessible to me (I actually think this is related
to some DNS issues i may be having) and your masters is too long to
spend 2 minutes and glean it; so heres a question or two for you:

- Have you tried to do a long-lived session such as a large FTP and 
seen how far off the deviation was? That would provide some interesting
data point.
- To be a devil''s advocate (and not claim there is no issue), where do
you draw the line with "overhead"? 
Example the smallest ethernet packet is 64 bytes of which 14 bytes are
ethernet headers ("overhead" for IP) - and this is not counting CRC
etc.
If you were to set an MTU of say 64 bytes and tried to do a http or ftp,
how accurate do you think the calculation would be? I would think not
very different.
Does it matter if it is accurate on the majority of the cases?
- For further reflection: Have you considered the case where the rate
table has already been considered on some link speed in user space and
then somewhere post-config the physical link speed changes? This would
happen in the case where ethernet AN is involved and the partner makes
some changes (use ethtool). 

I would say the last bullet is a more interesting problem than a corner
case of some link layer technology that has high overhead.
Your work would be more interesting if it was generic for many link
layers instead of just ATM.


cheers,
jamal

On Wed, 2006-14-06 at 11:40 +0200, Jesper Dangaard Brouer
wrote:> The Linux traffic''s control engine inaccurately calculates
> transmission times for packets sent over ADSL links.  For
> some packet sizes the error rises to over 50%.  This occurs
> because ADSL uses ATM as its link layer transport, and ATM
> transmits packets in fixed sized 53 byte cells.
> 
> The following patches to iproute2 and the kernel add an
> option to calculate traffic transmission times over all
> ATM links, including ADSL, with perfect accuracy.
> 
> A longer presentation of the patch, its rational, what it
> does and how to use it can be found here:
>    http://www.stuart.id.au/russell/files/tc/tc-atm/
> 
> A earlier version of the patch, and a _detailed_ empirical
> investigation of its effects can be found here:
>    http://www.adsl-optimizer.dk/
> 
> The patches are both backwards and forwards compatible.
> This means unpatched kernels will work with a patched
> version of iproute2, and an unpatched iproute2 will work
> on patches kernels.
> 
> 
> This is a combined effort of Jesper Brouer and Russell Stuart,
> to get these patches into the upstream kernel.
> 
> Let the discussion start about what we need to change to get this
> upstream?
> 
> We see this as a feature enhancement, as thus hope that it can be
> queued in davem''s net-2.6.18.git tree.
> 
> ---
> Regards,
>  Jesper Brouer & Russell Stuart.
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jesper Dangaard Brouer

2006-Jun-14 12:55 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:
> Russell''s site is inaccessible to me (I actually think this is
related
> to some DNS issues i may be having) 
Strange, I have access to Russell''s site.  Maybe its his redirect
feature that confuses your browser, try:
 http://ace-host.stuart.id.au/russell/files/tc/tc-atm/
> and your masters is too long to
> spend 2 minutes and glean it; so heres a question or two for you:
Yes, I is quite long and very detailed.  But it worth reading (... says
the author him self ;-))

> - Have you tried to do a long-lived session such as a large FTP and 
> seen how far off the deviation was? That would provide some interesting
> data point.
The deviation can be calculated.  The impact is of cause small for large
packets.  But the argument that bulk TCP transfers is not as badly
affected, is wrong because all the TCP ACK packets gets maximum penalty.

On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will
use more that one ATM frame, causing 2 ATM frames to be send that
consumes 106 bytes, eg. 62% overhead.  On a small upstream ADSL line
that hurts! (See thesis page 53, table 5.3 "Overhead summary").

> - To be a devil''s advocate (and not claim there is no issue),
where do
> you draw the line with "overhead"? 
> Example the smallest ethernet packet is 64 bytes of which 14 bytes are
> ethernet headers ("overhead" for IP) - and this is not counting
CRC etc.
> If you were to set an MTU of say 64 bytes and tried to do a http or ftp,
> how accurate do you think the calculation would be? I would think not
> very different.
I do think we handle this situation, but I''m not quite sure that I
fully
understand the question (sorry).

> Does it matter if it is accurate on the majority of the cases?
> - For further reflection: Have you considered the case where the rate
> table has already been considered on some link speed in user space and
> then somewhere post-config the physical link speed changes? This would
> happen in the case where ethernet AN is involved and the partner makes
> some changes (use ethtool). 
>
> I would say the last bullet is a more interesting problem than a corner
> case of some link layer technology that has high overhead.
We only claim to do magic on ATM/ADSL links... nothing else ;-)

> Your work would be more interesting if it was generic for many link
> layers instead of just ATM.
Well, we did consider to do so, but we though that it would be harder to
get it into the kernel.

Actually thats the reason for the defines:
 #define	ATM_CELL_SIZE		53
 #define	ATM_CELL_PAYLOAD	48

Changing these should should make it possible to adapt to any other SAR
(Segment And Reasembly) link layer.  

> On Wed, 2006-14-06 at 11:40 +0200, Jesper Dangaard Brouer wrote:
> > The Linux traffic''s control engine inaccurately calculates
> > transmission times for packets sent over ADSL links.  For
> > some packet sizes the error rises to over 50%.  This occurs
> > because ADSL uses ATM as its link layer transport, and ATM
> > transmits packets in fixed sized 53 byte cells.
> > 
> > The following patches to iproute2 and the kernel add an
> > option to calculate traffic transmission times over all
> > ATM links, including ADSL, with perfect accuracy.
> > 
> > A longer presentation of the patch, its rational, what it
> > does and how to use it can be found here:
> >    http://www.stuart.id.au/russell/files/tc/tc-atm/
> > 
> > A earlier version of the patch, and a _detailed_ empirical
> > investigation of its effects can be found here:
> >    http://www.adsl-optimizer.dk/
> > 
> > The patches are both backwards and forwards compatible.
> > This means unpatched kernels will work with a patched
> > version of iproute2, and an unpatched iproute2 will work
> > on patches kernels.
> > 
> > 
> > This is a combined effort of Jesper Brouer and Russell Stuart,
> > to get these patches into the upstream kernel.
> > 
> > Let the discussion start about what we need to change to get this
> > upstream?
> > 
> > We see this as a feature enhancement, as thus hope that it can be
> > queued in davem''s net-2.6.18.git tree.
> > 
> > ---
> > Regards,
> >  Jesper Brouer & Russell Stuart.
> > 
> 
Thanks for your comments :-)

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk

Phillip Susi

2006-Jun-14 14:27 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Jesper Dangaard Brouer wrote:> The Linux traffic''s control engine inaccurately calculates
> transmission times for packets sent over ADSL links.  For
> some packet sizes the error rises to over 50%.  This occurs
> because ADSL uses ATM as its link layer transport, and ATM
> transmits packets in fixed sized 53 byte cells.
> 
I could have sworn that DSL uses its own framing protocol that is 
similar to the frame/superframe structure of HDSL ( T1 ) lines and over 
that you can run ATM or ethernet.  Or is it typically ethernet -> ATM -> 
HDSL?

In any case, why does the kernel care about the exact time that the IP 
packet has been received and reassembled on the headend?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Furniss

2006-Jun-14 15:32 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

jamal wrote:> I have taken linux-kernel off the list.
> 
> Russell''s site is inaccessible to me (I actually think this is
related
> to some DNS issues i may be having) and your masters is too long to
> spend 2 minutes and glean it; so heres a question or two for you:
> 
> - Have you tried to do a long-lived session such as a large FTP and 
> seen how far off the deviation was? That would provide some interesting
> data point.
> - To be a devil''s advocate (and not claim there is no issue),
where do
> you draw the line with "overhead"? 
Me and many others have run a smilar hack for years, there is also a 
userspace project still alive which does the same.

The difference is that without it I would need to sacrifice almost half 
my 288kbit atm/dsl showtime bandwidth to be sure of control.

With the modification I can run at 286kbit / 288 and know I will never 
have jitter worse than the bitrate latency of a mtu packet. The 286 
figure was choses to allow a full buffer to drain/ allow for timer 
innaccuracy etc. On a p200 with tsc, 2.6.12 it''s never gone over for me
- though talking of timers I notice on my desktop 2.6.16 I gain 2 
minutes a day now.

Andy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jamal

2006-Jun-15 12:57 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Wed, 2006-14-06 at 14:55 +0200, Jesper Dangaard Brouer
wrote:> On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:
> > - Have you tried to do a long-lived session such as a large FTP and 
> > seen how far off the deviation was? That would provide some
interesting
> > data point.
> 
> The deviation can be calculated.  The impact is of cause small for large
> packets.  
>
> But the argument that bulk TCP transfers is not as badly
> affected, is wrong because all the TCP ACK packets gets maximum penalty.
> 
ACKs have always played a prominent role. The last numbers i have seen
for North America (but i think probably valid globaly) show in the range
of 40% ACKs in internet traffic
http://netweb.usc.edu/~rsinha/pkt-sizes/
I suspect a lot of these stats are on their way to change with voip, p2p
etc.
But i dont think it is ACKs perse that you or Russell are contending
cause these issues. It''s the presence of ATM . And all evidence seems
to
point to the fact that ISPs bill you for something other than your
point of view, no?
> On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will
> use more that one ATM frame, causing 2 ATM frames to be send that
> consumes 106 bytes, eg. 62% overhead.  On a small upstream ADSL line
> that hurts! (See thesis page 53, table 5.3 "Overhead summary").
> 
But how are you connected to the DSLAM? In north America it is typically
ethernet. If i use the current tables i dont see much of a problem with
say cable modems. Are you trying to compensate for the accounting
differences between what your service provider measures (accounting for
their ATM cells) and what you do accounting for your ethernet frames? 
I guess i am lost as to where the ATM is in the topology and more
importantly whether we (Linux) mis-account or whether your approach is
trying to compensate for the ISPs mis-accounting.
> 
> > - To be a devil''s advocate (and not claim there is no issue),
where do
> > you draw the line with "overhead"? 
> > Example the smallest ethernet packet is 64 bytes of which 14 bytes are
> > ethernet headers ("overhead" for IP) - and this is not
counting CRC etc.
> > If you were to set an MTU of say 64 bytes and tried to do a http or
ftp,
> > how accurate do you think the calculation would be? I would think not
> > very different.
> 
> I do think we handle this situation, but I''m not quite sure that I
fully
> understand the question (sorry).
> 
Assume the following:
- You had ethernet end to end. Is there still a problem?
- Take it a notch up and assume you had ethernet with MTU of 64B. This
way you will have all your packets being small and having high overhead.
Do you still have a problem?
> 
> > Does it matter if it is accurate on the majority of the cases?
> > - For further reflection: Have you considered the case where the rate
> > table has already been considered on some link speed in user space and
> > then somewhere post-config the physical link speed changes? This would
> > happen in the case where ethernet AN is involved and the partner makes
> > some changes (use ethtool). 
> >
> > I would say the last bullet is a more interesting problem than a
corner
> > case of some link layer technology that has high overhead.
> 
> We only claim to do magic on ATM/ADSL links... nothing else ;-)
> 
This is well and good given the focus of your thesis. Up/down here we
need something more generic. Your masters-thesis is a good start but
consider doing the phd next and complete this work;->
> 
> > Your work would be more interesting if it was generic for many link
> > layers instead of just ATM.
> 
> Well, we did consider to do so, but we though that it would be harder to
> get it into the kernel.
> 
> Actually thats the reason for the defines:
>  #define	ATM_CELL_SIZE		53
>  #define	ATM_CELL_PAYLOAD	48
> 
> Changing these should should make it possible to adapt to any other SAR
> (Segment And Reasembly) link layer.  
> 
You are still speaking ATM (and the above may still be valid), but: 
Could you for example look at the netdevice->type and from that figure
out the link layer overhead and compensate for it.
Obviously a lot more useful if such activity is doable in user space
without any knowledge of the kernel? and therefore zero change to the
kernel and everything then becomes forward and backward compatible.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jamal

2006-Jun-15 13:16 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Wed, 2006-14-06 at 14:55 +0200, Jesper Dangaard Brouer
wrote:> On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:
> > - Have you tried to do a long-lived session such as a large FTP and 
> > seen how far off the deviation was? That would provide some
interesting
> > data point.
> 
> The deviation can be calculated.  The impact is of cause small for large
> packets.  
>
> But the argument that bulk TCP transfers is not as badly
> affected, is wrong because all the TCP ACK packets gets maximum penalty.
> 
ACKs have always played a prominent role. The last numbers i have seen
for North America (but i think probably valid globaly) show in the range
of 40% ACKs in internet traffic
http://netweb.usc.edu/~rsinha/pkt-sizes/
I suspect a lot of these stats are on their way to change with voip, p2p
etc.
But i dont think it is ACKs perse that you or Russell are contending
cause these issues. It''s the presence of ATM . And all evidence seems
to
point to the fact that ISPs bill you for something other than your
point of view, no?
> On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will
> use more that one ATM frame, causing 2 ATM frames to be send that
> consumes 106 bytes, eg. 62% overhead.  On a small upstream ADSL line
> that hurts! (See thesis page 53, table 5.3 "Overhead summary").
> 
But how are you connected to the DSLAM? In north America it is typically
ethernet. If i use the current tables i dont see much of a problem with
say cable modems. Are you trying to compensate for the accounting
differences between what your service provider measures (accounting for
their ATM cells) and what you do accounting for your ethernet frames? 
I guess i am lost as to where the ATM is in the topology and more
importantly whether we (Linux) mis-account or whether your approach is
trying to compensate for the ISPs mis-accounting.
> 
> > - To be a devil''s advocate (and not claim there is no issue),
where do
> > you draw the line with "overhead"? 
> > Example the smallest ethernet packet is 64 bytes of which 14 bytes are
> > ethernet headers ("overhead" for IP) - and this is not
counting CRC etc.
> > If you were to set an MTU of say 64 bytes and tried to do a http or
ftp,
> > how accurate do you think the calculation would be? I would think not
> > very different.
> 
> I do think we handle this situation, but I''m not quite sure that I
fully
> understand the question (sorry).
> 
Assume the following:
- You had ethernet end to end. Is there still a problem?
- Take it a notch up and assume you had ethernet with MTU of 64B. This
way you will have all your packets being small and having high overhead.
Do you still have a problem?
> 
> > Does it matter if it is accurate on the majority of the cases?
> > - For further reflection: Have you considered the case where the rate
> > table has already been considered on some link speed in user space and
> > then somewhere post-config the physical link speed changes? This would
> > happen in the case where ethernet AN is involved and the partner makes
> > some changes (use ethtool). 
> >
> > I would say the last bullet is a more interesting problem than a
corner
> > case of some link layer technology that has high overhead.
> 
> We only claim to do magic on ATM/ADSL links... nothing else ;-)
> 
This is well and good given the focus of your thesis. Up/down here we
need something more generic. Your masters-thesis is a good start but
consider doing the phd next and complete this work;->
> 
> > Your work would be more interesting if it was generic for many link
> > layers instead of just ATM.
> 
> Well, we did consider to do so, but we though that it would be harder to
> get it into the kernel.
> 
> Actually thats the reason for the defines:
>  #define	ATM_CELL_SIZE		53
>  #define	ATM_CELL_PAYLOAD	48
> 
> Changing these should should make it possible to adapt to any other SAR
> (Segment And Reasembly) link layer.  
> 
You are still speaking ATM (and the above may still be valid), but: 
Could you for example look at the netdevice->type and from that figure
out the link layer overhead and compensate for it.
Obviously a lot more useful if such activity is doable in user space
without any knowledge of the kernel? and therefore zero change to the
kernel and everything then becomes forward and backward compatible.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jesper Dangaard Brouer

2006-Jun-16 08:26 UTC

head link

[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

(Resend message bounced to LARTC)

The Linux traffic''s control engine inaccurately calculates
transmission times for packets sent over ADSL links.  For
some packet sizes the error rises to over 50%.  This occurs
because ADSL uses ATM as its link layer transport, and ATM
transmits packets in fixed sized 53 byte cells.

The following patches to iproute2 and the kernel add an
option to calculate traffic transmission times over all
ATM links, including ADSL, with perfect accuracy.

A longer presentation of the patch, its rational, what it
does and how to use it can be found here:
    http://www.stuart.id.au/russell/files/tc/tc-atm/

A earlier version of the patch, and a _detailed_ empirical
investigation of its effects can be found here:
    http://www.adsl-optimizer.dk/

The patches are both backwards and forwards compatible.
This means unpatched kernels will work with a patched
version of iproute2, and an unpatched iproute2 will work
on patches kernels.


This is a combined effort of Jesper Brouer and Russell Stuart,
to get these patches into the upstream kernel.

Let the discussion start about what we need to change to get this
upstream?

We see this as a feature enhancement, as thus hope that it can be
queued in davem''s net-2.6.18.git tree.

---
Regards,
  Jesper Brouer & Russell Stuart.

Patrick McHardy

2006-Jun-20 00:54 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

jamal wrote:> - For further reflection: Have you considered the case where the rate
> table has already been considered on some link speed in user space and
> then somewhere post-config the physical link speed changes? This would
> happen in the case where ethernet AN is involved and the partner makes
> some changes (use ethtool). 
> 
> I would say the last bullet is a more interesting problem than a corner
> case of some link layer technology that has high overhead.
> Your work would be more interesting if it was generic for many link
> layers instead of just ATM.
I''ve thought about this a couple of times, scaling the virtual clock
rate should be enough for "simple" qdiscs like TBF or HTB, which have
a linear relation between time and bandwidth. I haven''t really thought
about the effects on HFSC yet, on a small scale the relation is
non-linear. But this is a different problem from trying to accomodate
for link-layer overhead.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy

2006-Jun-20 01:04 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

jamal wrote:> You are still speaking ATM (and the above may still be valid), but: 
> Could you for example look at the netdevice->type and from that figure
> out the link layer overhead and compensate for it.
> Obviously a lot more useful if such activity is doable in user space
> without any knowledge of the kernel? and therefore zero change to the
> kernel and everything then becomes forward and backward compatible.
It would be nice to have support for HFSC as well, which unfortunately
needs to be done in the kernel since it doesn''t use rate tables.
What about qdiscs like SFQ (which uses the packet size in quantum
calculations)? I guess it would make sense to use the wire-length
there as well.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jamal

2006-Jun-20 14:56 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Tue, 2006-20-06 at 02:54 +0200, Patrick McHardy
wrote:> jamal wrote:
> > - For further reflection: Have you considered the case where the rate
> > table has already been considered on some link speed in user space and
> > then somewhere post-config the physical link speed changes? This would
> > happen in the case where ethernet AN is involved and the partner makes
> > some changes (use ethtool). 
> > 
[..]> I''ve thought about this a couple of times, scaling the virtual
clock
> rate should be enough for "simple" qdiscs like TBF or HTB, which
have
> a linear relation between time and bandwidth. I haven''t really
thought
> about the effects on HFSC yet, on a small scale the relation is
> non-linear. 
Does HFSC not depend on bandwith? How is rate control achieved?
> But this is a different problem from trying to accomodate
> for link-layer overhead.
> 
Yes it is different issue.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jamal

2006-Jun-20 14:59 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy
wrote:> jamal wrote:
> > You are still speaking ATM (and the above may still be valid), but: 
> > Could you for example look at the netdevice->type and from that
figure
> > out the link layer overhead and compensate for it.
> > Obviously a lot more useful if such activity is doable in user space
> > without any knowledge of the kernel? and therefore zero change to the
> > kernel and everything then becomes forward and backward compatible.
> 
> It would be nice to have support for HFSC as well, which unfortunately
> needs to be done in the kernel since it doesn''t use rate tables.
> What about qdiscs like SFQ (which uses the packet size in quantum
> calculations)? I guess it would make sense to use the wire-length
> there as well.
Didnt even think of that ;-> 
Is it getting too complicated? 

BTW, I forgot to mention one thing on the bandwidth issue is we could do
is send netlink events on link speed changes too; some listener
somewhere would then do the adjustment.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy

2006-Jun-20 15:09 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

jamal wrote:> On Tue, 2006-20-06 at 02:54 +0200, Patrick McHardy wrote:
> 
>>jamal wrote:
>>
>>>- For further reflection: Have you considered the case where the
rate
>>>table has already been considered on some link speed in user space
and
>>>then somewhere post-config the physical link speed changes? This
would
>>>happen in the case where ethernet AN is involved and the partner
makes
>>>some changes (use ethtool). 
>>>
> 
> [..]
> 
>>I''ve thought about this a couple of times, scaling the virtual
clock
>>rate should be enough for "simple" qdiscs like TBF or HTB,
which have
>>a linear relation between time and bandwidth. I haven''t really
thought
>>about the effects on HFSC yet, on a small scale the relation is
>>non-linear. 
> 
> 
> Does HFSC not depend on bandwith? How is rate control achieved?
"Depend on bandwidth" is not the right term. All of TBF, HTB and HFSC
provide bandwidth per time, but with TBF and HTB the relation between
the amount of bandwidth is linear to the amount of time, with HFSC
it is only on a linear on larger scale since it uses service curves,
which are represented as two linear pieces. So you have bandwidth b1
for time t1, bandwidth b2 after that until eternity. By scaling the
clock rate you alter after how much time b2 kicks in, which affects
the guaranteed delays. The end result should be that both bandwidth
and delay scale up or down proportionally, but I''m not sure that this
is what HFSC would do in all cases (on small scale). But it should
be easy to answer with a bit more time for visualizing it.

The thing I''m not sure about is whether this wouldn''t be
handled better
by userspace, if the link layer speed changes you might not want
proportional scaling but prefer to still give a fixed amount of that
bandwidth to some class, for example VoIP traffic. Do we have netlink
notifications for link speed changes?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy

2006-Jun-20 15:16 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

jamal wrote:> On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy wrote:
> 
>>It would be nice to have support for HFSC as well, which unfortunately
>>needs to be done in the kernel since it doesn''t use rate
tables.
>>What about qdiscs like SFQ (which uses the packet size in quantum
>>calculations)? I guess it would make sense to use the wire-length
>>there as well.
> 
> 
> Didnt even think of that ;-> 
> Is it getting too complicated? 
The code wouldn''t be very complicated, it just adds some overhead. If
you do something like I described in my previous mail the overhead for
people not using it would be an additional pointer test before reading
skb->len. I guess we could also make it a compile time option.
I personally think this is something that really improves our quality
of implementation, after all, its "wire" resources qdiscs are meant
to manage.
> BTW, I forgot to mention one thing on the bandwidth issue is we could do
> is send netlink events on link speed changes too; some listener
> somewhere would then do the adjustment.
See the mail I just wrote :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Russell Stuart

2006-Jun-23 12:37 UTC

head link

RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Thu, 2006-06-22 at 14:29 -0400, jamal wrote: > Russell,
> 
> I did look at what you sent me and somewhere in those discussions i
> argue that the changes compensate to make the rate be a goodput
> instead of advertised throughput.
I did see that, but didn''t realise you were responding to 
me.  A lot of discussion has gone on since and evidently 
quite a bit of which was addressed to me.  I will try to 
answer the some of the points.   Sorry for the digest 
like reply :(

On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote:> I''m 
> not sure if that matters but for modern processors I''m also
sceptical
> that the clever computation is actually any faster than just doing the
> maths, especially if something cache intensive is also running.
Assuming you are referring to the rate tables - I hadn''t
thought about it, but I guess I would agree.   However, this 
patch wasn''t trying to radically re-engineer the traffic 
control engines rate calculation code.  Quite the reverse I -
was was trying to change it as little as possible.  The kernel 
part of the patch actually only introduced one small change - 
the optional addition of a constant the packet length.

On Thu, 2006-06-15 at 08:57 -0400, jamal wrote: > But i dont think it is ACKs perse that you or Russell are contending
> cause these issues. It''s the presence of ATM . And all evidence
seems to
> point to the fact that ISPs bill you for something other than your
> point of view, no?
I don''t know about anywhere else, but certainly here in
Australia some ISP''s creative in how they advertise their
link speeds.  Again that is not the issue we were trying 
to address with the patch.

On Thu, 2006-06-15 at 08:57 -0400, jamal wrote: > You are still speaking ATM (and the above may still be valid), but: 
> Could you for example look at the netdevice->type and from that figure
> out the link layer overhead and compensate for it.
As others have pointed out, this doesn''t work for the ADSL 
user.  An ADSL modem is connected to the box using either 
ethernet, wireless or USB.

On Thu, 2006-06-15 at 09:03 -0400, jamal wrote: > It is probably doable by just looking at netdevice->type and figuring
> the link layer technology. Totally in user space and building the
> compensated for tables there before telling the kernel (advantage is no
> kernel changes and therefore it would work with older kernels as well).
Others have had this same thought, and have spent time trying
to come up with a user space only solution.  They failed because 
it isn''t possible.  To understand why see this thread:

  http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html

Also, the user space patch does improve the performance of 
older kernels (ie unpatched kernels).  Rather than getting 
the rate wrong 99.9% of the time, older kernels only get it 
wrong 14% of the time, on average.

On Tue, 2006-06-20 at 03:04 +0200, Patrick McHardy wrote:
> What about qdiscs like SFQ (which uses the packet size in quantum
> calculations)? I guess it would make sense to use the wire-length
> there as well.
Being pedantic, SQF automatically assigns traffic to classes 
and gives each class an equal share of the available bandwidth.  
As I am sure you are aware SQF''s trick is that it randomly 
changes its classification algorithm - every second in the Linux 
implementation.  If there are errors in rate calculation this 
randomisation will ensure they are distributed equally between 
the classes as time goes on.  So no, accurate packets sizes are 
not that important to SQF.

But they are important to many other qdiscs, and I am sure 
that was your point.  SQF just happened to be a bad example.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> What this means is that Linux computes based on ethernet
> headers. Somewhere downstream ATM (refer to above) comes in and that
> causes mismatch in what Linux expects to be the bandwidth and what
> your service provider who doesnt account for the ATM overhead when
> they sell you "1.5Mbps".
> Reminds me of hard disk vendors who define 1K to be 1000 to show
> how large their drives are.
> Yes, Linux cant tell if your service provider is lying to you.
No, it can''t.  But you can measure the bandwidth you are 
getting from your ISP and plug that into the tc command 
line.  The web page I sent to you describes how to do this
for ADSL lines.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> > On Mon, 2006-19-06 at 21:31 +0200, Jesper Dangaard Brouer wrote:
> > The issue here is, that ATM does not have fixed overhead (due to
alignment
> > and padding).  This means that a fixed reduction of the bandwidth is
not
> > the solution.  We could reduce the bandwidth to the worst-case
overhead,
> > which is 62%, I do not think that is a good solution...
> > 
> 
> I dont see it as wrong to be honest with you. Your mileage may vary.
Jamal am I reading this correctly?  Did you just say that you 
don''t see having to reduce your available bandwidth by 62% to 
take account of deficiencies in Linux traffic engine as wrong?  
Why on earth would you say that?

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> Dont have time to read your doc and dont get me wrong, there is a
> "quark" practical problem: As practical as the hard disk
manufacturer
> who claims that they have 11G drive when it is 10G.
This reads like we don''t see the same problem in the same way.
Your disk example is a 10% error that effects less savvy users.
The ATM problem we are trying to address effects a big chunk of 
all Linux''s traffic control users.  (Big chunk as counted by
boxes, not bytes.)

Something like 60% of all broadband connections use ADSL.  Most 
of the remainder live in the US and use cable. Or at least so 
says this web page:
  http://tinyurl.com/pydnj
Extrapolating from that, I think it is safe to say fair chunk
of all people using the Linux Traffic Control engine use ADSL,
and thus may benefit from this patch.

Now it is true that right now these people may not see a great
benefit from the patch.  Those that will are divided into two
categories:

1.  Those that saturate their upstream bandwidth.  This isn''t
    hard to do on ADSL, due to its first letter.  It effects
    people who use run web sites, email lists - which is bugger
    all, and those who play games or run P2P - which is most
    home users.

2.  Those that use Voip.  Again there aren''t many people who do
    this right now, but that will change.  Its not hard to 
    envisage a future where real time streaming like this will
    come to dominate Internet traffic.  Voip effects the other
    major group of users out there - business.

Ergo I believe that in the long term the patch will benefit a
lot of people.  The next argument is how much it will benefit
them.

It turns out that the patch is only useful if you have some
small packets that MUST have priority on the ADSL link. 
Jesper''s traffic was TCP ACK''s (he was addressing problem 1) 
and mine was VOIP traffic.  This would seem a trivial problem 
to solve with Linux''s traffic control engine.  I don''t know 
what path Jesper took - but I tried using it in the obvious 
fashion and it didn''t work.  A couple of large emails would 
take out an office''s phone system.  It took me days of head 
scratching to figure out why.

The cause was ADSL using ATM as a carrier.  In my case I was 
using approx 110 byte packets.  Do the sums.  It takes 3 ATM 
cells to carry an 110 byte packets.  That is 159 bytes.  A
50% error.  That meant the ISP was doing the traffic control, 
and he wasn''t prioritising VOIP traffic.  Sure, you can 
optimise the values  you pass to tc for 110 byte packets.  But 
then it fails miserably for a other packet sizes; such as 
a different VOIP codec, or TCP acks. The only solution is to 
understate your available bandwidth by at least a 1/3rd.  I 
hope you don''t consider that acceptable.

The reason this patch wasn''t thought of until now is that
large packets don''t see much benefit.  For similar packet
sizes the maximum error is determined by the ATM cell size 
(you can be +/- one ATM cell) and that is 53 bytes.  This 
means on packets around MTU size the error is 53/1500 = 3.5%.  
Hardly worth worrying about.  For traditional Internet usage, 
ie the one ADSL was designed for, the upstream channel, ie 
the one carrying the TCP ACKS, was rarely saturated.  The 
speed was limited by the downstream channel - the one 
carrying MTU sized packets.

So in summary - no, Jamal, I see no correspondence between 
your 10/11Gb hard drives example and this patch.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> It needs to be
> resolved - but not in an intrusive way in my opinion.
To be honest, I didn''t think the patch was that intrusive.
It adds an optional constant to the skb->len.  Hardly earth
shattering.

On Tue, 2006-06-20 at 16:45 +0200, Patrick McHardy wrote:
> Handling all qdiscs would mean adding a pointer to a mapping table
> to struct net_device and using something like "skb_wire_len(skb,
dev)"
> instead of skb->len in the queueing layer. That of course
doesn''t
> mean that we can''t still provide pre-adjusted ratetables for
qdiscs
> that use them.
Yes, that would work well, and is probably how it should of
been done when the kernel stuff was originally written.  As 
it happens Jesper''s original solution was closer to this.  The 
reason we choose not to go that way it is would change the 
kernel-userspace API.   The current patch solves the problem 
and works well as possible on all kernel versions - both 
patched and unpatched.

Now that I think about to change things the way you suggest
here does seem simple enough.  But it probably belongs in a 
different patch.  We wrote this patch to fix a specific problem 
with ATM links, and it should succeed or fail on the merits 
of doing that.  Cleaning up the kernel code to do what you 
suggest is a different issue.  Let whether it to should be 
done, or not, be based on its own merits.

On Tue, 2006-06-20 at 11:38 -0400, jamal wrote: > The issue is really is whether Linux should be interested in the
> throughput it is told about or the goodput (also known as effective
> throughput) the service provider offers. Two different issues by
> definition. <snip>
On Thu, 2006-06-22 at 14:29 -0400, jamal wrote:> I did look at what you sent me and somewhere in those discussions i
> argue that the changes compensate to make the rate be a goodput
> instead of advertised throughput. Throughput is typically what 
> schedulers work with and is typically to what is on the wire.
> Goodput tends to be end-to-end; so somewhere down the road ATM
> "reduces" the goodput but not the throughput.
> I am actaully just fine with telling the scheduler you have less
> throughput than what your ISP is telling you. I am also
> not against a generic change as long as it is non-intrusive because i
> believe this is a practical issue and Patrick Mchardy says he can
> deliver such a patch.
I have read your throughput versus goodput thing a couple of
times, and I''m sorry - I don''t understand.  What is it you
would like us to achieve?

As for the patch being invasive, it changes 37 lines of 
kernel code.  No other suggestion I have seen here will be 
that small.

If making the patch generic, ie allowing it to handle cell 
sizes other than ATM, then let me know I will make the
change on the weekend.  It is just a user space change.

One final point: if you are happy with an invasive patch that
changes the world, I have a suggestion.  Modularise the rate
calculation function.  We have qdisc modules, filter modules
and whatnot - so add another type.  Rate calculation.  The
current system can become the default rate calculation module
if none is specified.  Patrick can have his system, and Alan
can have his.  And we can add an ATM one.  If you wish, I can
(with Jespers help, I hope) re-do the patch in that style,
producing the default one and an ATM one.  My personal
preference though would be to put this patch in, and then
let this new idea stand or fall on its own merits.

Russell Stuart

2006-Jun-23 14:40 UTC

head link

[LARTC] RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Thu, 2006-06-22 at 14:29 -0400, jamal wrote: > Russell,
> 
> I did look at what you sent me and somewhere in those discussions i
> argue that the changes compensate to make the rate be a goodput
> instead of advertised throughput.
I did see that, but didn''t realise you were responding to 
me.  A lot of discussion has gone on since and evidently 
quite a bit of which was addressed to me.  I will try to 
answer the some of the points.   Sorry for the digest 
like reply :(

On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote:> I''m 
> not sure if that matters but for modern processors I''m also
sceptical
> that the clever computation is actually any faster than just doing the
> maths, especially if something cache intensive is also running.
Assuming you are referring to the rate tables - I hadn''t
thought about it, but I guess I would agree.   However, this 
patch wasn''t trying to radically re-engineer the traffic 
control engines rate calculation code.  Quite the reverse I -
was was trying to change it as little as possible.  The kernel 
part of the patch actually only introduced one small change - 
the optional addition of a constant the packet length.

On Thu, 2006-06-15 at 08:57 -0400, jamal wrote: > But i dont think it is ACKs perse that you or Russell are contending
> cause these issues. It''s the presence of ATM . And all evidence
seems to
> point to the fact that ISPs bill you for something other than your
> point of view, no?
I don''t know about anywhere else, but certainly here in
Australia some ISP''s creative in how they advertise their
link speeds.  Again that is not the issue we were trying 
to address with the patch.

On Thu, 2006-06-15 at 08:57 -0400, jamal wrote: > You are still speaking ATM (and the above may still be valid), but: 
> Could you for example look at the netdevice->type and from that figure
> out the link layer overhead and compensate for it.
As others have pointed out, this doesn''t work for the ADSL 
user.  An ADSL modem is connected to the box using either 
ethernet, wireless or USB.

On Thu, 2006-06-15 at 09:03 -0400, jamal wrote: > It is probably doable by just looking at netdevice->type and figuring
> the link layer technology. Totally in user space and building the
> compensated for tables there before telling the kernel (advantage is no
> kernel changes and therefore it would work with older kernels as well).
Others have had this same thought, and have spent time trying
to come up with a user space only solution.  They failed because 
it isn''t possible.  To understand why see this thread:

  http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html

Also, the user space patch does improve the performance of 
older kernels (ie unpatched kernels).  Rather than getting 
the rate wrong 99.9% of the time, older kernels only get it 
wrong 14% of the time, on average.

On Tue, 2006-06-20 at 03:04 +0200, Patrick McHardy wrote:
> What about qdiscs like SFQ (which uses the packet size in quantum
> calculations)? I guess it would make sense to use the wire-length
> there as well.
Being pedantic, SQF automatically assigns traffic to classes 
and gives each class an equal share of the available bandwidth.  
As I am sure you are aware SQF''s trick is that it randomly 
changes its classification algorithm - every second in the Linux 
implementation.  If there are errors in rate calculation this 
randomisation will ensure they are distributed equally between 
the classes as time goes on.  So no, accurate packets sizes are 
not that important to SQF.

But they are important to many other qdiscs, and I am sure 
that was your point.  SQF just happened to be a bad example.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> What this means is that Linux computes based on ethernet
> headers. Somewhere downstream ATM (refer to above) comes in and that
> causes mismatch in what Linux expects to be the bandwidth and what
> your service provider who doesnt account for the ATM overhead when
> they sell you "1.5Mbps".
> Reminds me of hard disk vendors who define 1K to be 1000 to show
> how large their drives are.
> Yes, Linux cant tell if your service provider is lying to you.
No, it can''t.  But you can measure the bandwidth you are 
getting from your ISP and plug that into the tc command 
line.  The web page I sent to you describes how to do this
for ADSL lines.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> > On Mon, 2006-19-06 at 21:31 +0200, Jesper Dangaard Brouer wrote:
> > The issue here is, that ATM does not have fixed overhead (due to
alignment
> > and padding).  This means that a fixed reduction of the bandwidth is
not
> > the solution.  We could reduce the bandwidth to the worst-case
overhead,
> > which is 62%, I do not think that is a good solution...
> > 
> 
> I dont see it as wrong to be honest with you. Your mileage may vary.
Jamal am I reading this correctly?  Did you just say that you 
don''t see having to reduce your available bandwidth by 62% to 
take account of deficiencies in Linux traffic engine as wrong?  
Why on earth would you say that?

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> Dont have time to read your doc and dont get me wrong, there is a
> "quark" practical problem: As practical as the hard disk
manufacturer
> who claims that they have 11G drive when it is 10G.
This reads like we don''t see the same problem in the same way.
Your disk example is a 10% error that effects less savvy users.
The ATM problem we are trying to address effects a big chunk of 
all Linux''s traffic control users.  (Big chunk as counted by
boxes, not bytes.)

Something like 60% of all broadband connections use ADSL.  Most 
of the remainder live in the US and use cable. Or at least so 
says this web page:
  http://tinyurl.com/pydnj
Extrapolating from that, I think it is safe to say fair chunk
of all people using the Linux Traffic Control engine use ADSL,
and thus may benefit from this patch.

Now it is true that right now these people may not see a great
benefit from the patch.  Those that will are divided into two
categories:

1.  Those that saturate their upstream bandwidth.  This isn''t
    hard to do on ADSL, due to its first letter.  It effects
    people who use run web sites, email lists - which is bugger
    all, and those who play games or run P2P - which is most
    home users.

2.  Those that use Voip.  Again there aren''t many people who do
    this right now, but that will change.  Its not hard to 
    envisage a future where real time streaming like this will
    come to dominate Internet traffic.  Voip effects the other
    major group of users out there - business.

Ergo I believe that in the long term the patch will benefit a
lot of people.  The next argument is how much it will benefit
them.

It turns out that the patch is only useful if you have some
small packets that MUST have priority on the ADSL link. 
Jesper''s traffic was TCP ACK''s (he was addressing problem 1) 
and mine was VOIP traffic.  This would seem a trivial problem 
to solve with Linux''s traffic control engine.  I don''t know 
what path Jesper took - but I tried using it in the obvious 
fashion and it didn''t work.  A couple of large emails would 
take out an office''s phone system.  It took me days of head 
scratching to figure out why.

The cause was ADSL using ATM as a carrier.  In my case I was 
using approx 110 byte packets.  Do the sums.  It takes 3 ATM 
cells to carry an 110 byte packets.  That is 159 bytes.  A
50% error.  That meant the ISP was doing the traffic control, 
and he wasn''t prioritising VOIP traffic.  Sure, you can 
optimise the values  you pass to tc for 110 byte packets.  But 
then it fails miserably for a other packet sizes; such as 
a different VOIP codec, or TCP acks. The only solution is to 
understate your available bandwidth by at least a 1/3rd.  I 
hope you don''t consider that acceptable.

The reason this patch wasn''t thought of until now is that
large packets don''t see much benefit.  For similar packet
sizes the maximum error is determined by the ATM cell size 
(you can be +/- one ATM cell) and that is 53 bytes.  This 
means on packets around MTU size the error is 53/1500 = 3.5%.  
Hardly worth worrying about.  For traditional Internet usage, 
ie the one ADSL was designed for, the upstream channel, ie 
the one carrying the TCP ACKS, was rarely saturated.  The 
speed was limited by the downstream channel - the one 
carrying MTU sized packets.

So in summary - no, Jamal, I see no correspondence between 
your 10/11Gb hard drives example and this patch.

On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> It needs to be
> resolved - but not in an intrusive way in my opinion.
To be honest, I didn''t think the patch was that intrusive.
It adds an optional constant to the skb->len.  Hardly earth
shattering.

On Tue, 2006-06-20 at 16:45 +0200, Patrick McHardy wrote:
> Handling all qdiscs would mean adding a pointer to a mapping table
> to struct net_device and using something like "skb_wire_len(skb,
dev)"
> instead of skb->len in the queueing layer. That of course
doesn''t
> mean that we can''t still provide pre-adjusted ratetables for
qdiscs
> that use them.
Yes, that would work well, and is probably how it should of
been done when the kernel stuff was originally written.  As 
it happens Jesper''s original solution was closer to this.  The 
reason we choose not to go that way it is would change the 
kernel-userspace API.   The current patch solves the problem 
and works well as possible on all kernel versions - both 
patched and unpatched.

Now that I think about to change things the way you suggest
here does seem simple enough.  But it probably belongs in a 
different patch.  We wrote this patch to fix a specific problem 
with ATM links, and it should succeed or fail on the merits 
of doing that.  Cleaning up the kernel code to do what you 
suggest is a different issue.  Let whether it to should be 
done, or not, be based on its own merits.

On Tue, 2006-06-20 at 11:38 -0400, jamal wrote: > The issue is really is whether Linux should be interested in the
> throughput it is told about or the goodput (also known as effective
> throughput) the service provider offers. Two different issues by
> definition. <snip>
On Thu, 2006-06-22 at 14:29 -0400, jamal wrote:> I did look at what you sent me and somewhere in those discussions i
> argue that the changes compensate to make the rate be a goodput
> instead of advertised throughput. Throughput is typically what 
> schedulers work with and is typically to what is on the wire.
> Goodput tends to be end-to-end; so somewhere down the road ATM
> "reduces" the goodput but not the throughput.
> I am actaully just fine with telling the scheduler you have less
> throughput than what your ISP is telling you. I am also
> not against a generic change as long as it is non-intrusive because i
> believe this is a practical issue and Patrick Mchardy says he can
> deliver such a patch.
I have read your throughput versus goodput thing a couple of
times, and I''m sorry - I don''t understand.  What is it you
would like us to achieve?

As for the patch being invasive, it changes 37 lines of 
kernel code.  No other suggestion I have seen here will be 
that small.

If making the patch generic, ie allowing it to handle cell 
sizes other than ATM, then let me know I will make the
change on the weekend.  It is just a user space change.

One final point: if you are happy with an invasive patch that
changes the world, I have a suggestion.  Modularise the rate
calculation function.  We have qdisc modules, filter modules
and whatnot - so add another type.  Rate calculation.  The
current system can become the default rate calculation module
if none is specified.  Patrick can have his system, and Alan
can have his.  And we can add an ATM one.  If you wish, I can
(with Jespers help, I hope) re-do the patch in that style,
producing the default one and an ATM one.  My personal
preference though would be to put this patch in, and then
let this new idea stand or fall on its own merits.

Russell Stuart

2006-Jun-26 00:45 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Fri, 2006-06-23 at 17:21 +0200, Patrick McHardy wrote:
> Not really. The randomization doesn''t happen by default, but it
doesn''t
> influence this anyway. SFQ allows flows to send up to "quantum"
bytes
> at a time before moving on to the next one. A flow that sends 75 * 20
> byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it
> needs 4800bytes. A flow that sents 1500byte packets will only need
> 1504 bytes on the wire, but will be treated equally. So it does make
> a different for SFQ.
I hadn''t even thought to check.  My bad.  The S in SFQ stands 
for stochastic, so something that does without randomisation 
the algorithm implemented couldn''t really be called SFQ - 
particularly as it weakens the algorithm considerably.  I 
hope that most users do specify a perturb.

Your 20 byte example is hardly realistic.  skb->len includes 
the 14 byte ethernet header, so there is a total of 6 data 
bytes in a 20 byte packet.  The IP header alone is 20 bytes.  
TCP as implemented on Linux adds another 32 bytes (20 + the 
rtt option).  In other words I agree with Jamal''s comments 
elsewhere - optimising for MPU sized packets doesn''t seem 
like a win.
> Not a problem as long as the new stuff doesn''t break anything
existing.
> My patch introduces a TCA_STAB (for size table), similar to the _RTAB
> attributes. Old iproute with new kernel and new iproute with old kernel
> both work fine.
OK, good.
> Its not about cleanup, its about providing the same capabilities
> to all qdiscs instead of just a few selected ones and generalizing
> it so it is also usable for non-ATM overhead calculations.
Perhaps I chose my words poorly.

My intent was to contrast the size and goals of the two
proposed patches.  The ATM patch is a 37 line patch.  It 
includes some minor cleanups.  From the pseudo code you 
have posted what you are proposing is a more ambitious and 
much larger patch that moves a chunk of user space code 
into the kernel.  I am a complete newbie when it comes to 
getting code into the kernel, but that strikes me as 
contentious.  I would rather not have the ATM patch 
depend on it.

By the by, here are a couple of observations:

1.  The entries in the current rtab are already very closely 
    related to packet lengths.  They are actually the packet
    length multiplied by a constant that converts the units
    from "bytes" to "jiffies".  The constant is the same for
    all entries in the table.

2.  As such, the current rtab could already be used by SFQ
    and any other qdisc that needs to know the packet length.
    That SFQ doesn''t do this is probably because it doesn''t
    effect its performance overly.

3.  Be that as it may, the current RTAB isn''t in the most
    convenient form for SFQ, and I am guessing it is in a 
    very inconvenient form for HFSC.  Adding a new version 
    that is identical except that it contains the raw packet 
    length would be a simple change.  In that format it
    could be used by all qdiscs.  The users of the existing
    rtab would have to do the multiplication that converts
    the packet length to jiffies in the kernel.  This means
    the conceptually at least, should the gootput change
    you need to change this one constant, not the entire
    table.

4.  Much as you seem to dislike having the rate / packet length
    calculations in user space, having them there makes it easy 
    to add new technologies such as ATM.  You just have to 
    change a user space tool - not the kernel.

5.  We still did have to modify the kernel for ATM.  That was
    because of its rather unusual characteristics.  However,
    it you look at the size of modifications made to the kernel
    verses the size made to the user space tool, (37 lines
    versus 303 lines,) the bulk of the work was does in user
    space.

Russell Stuart

2006-Jun-26 02:50 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Fri, 2006-06-23 at 17:21 +0200, Patrick McHardy wrote:
> Not really. The randomization doesn''t happen by default, but it
doesn''t
> influence this anyway. SFQ allows flows to send up to "quantum"
bytes
> at a time before moving on to the next one. A flow that sends 75 * 20
> byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it
> needs 4800bytes. A flow that sents 1500byte packets will only need
> 1504 bytes on the wire, but will be treated equally. So it does make
> a different for SFQ.
I hadn''t even thought to check.  My bad.  The S in SFQ stands 
for stochastic, so something that does without randomisation 
the algorithm implemented couldn''t really be called SFQ - 
particularly as it weakens the algorithm considerably.  I 
hope that most users do specify a perturb.

Your 20 byte example is hardly realistic.  skb->len includes 
the 14 byte ethernet header, so there is a total of 6 data 
bytes in a 20 byte packet.  The IP header alone is 20 bytes.  
TCP as implemented on Linux adds another 32 bytes (20 + the 
rtt option).  In other words I agree with Jamal''s comments 
elsewhere - optimising for MPU sized packets doesn''t seem 
like a win.
> Not a problem as long as the new stuff doesn''t break anything
existing.
> My patch introduces a TCA_STAB (for size table), similar to the _RTAB
> attributes. Old iproute with new kernel and new iproute with old kernel
> both work fine.
OK, good.
> Its not about cleanup, its about providing the same capabilities
> to all qdiscs instead of just a few selected ones and generalizing
> it so it is also usable for non-ATM overhead calculations.
Perhaps I chose my words poorly.

My intent was to contrast the size and goals of the two
proposed patches.  The ATM patch is a 37 line patch.  It 
includes some minor cleanups.  From the pseudo code you 
have posted what you are proposing is a more ambitious and 
much larger patch that moves a chunk of user space code 
into the kernel.  I am a complete newbie when it comes to 
getting code into the kernel, but that strikes me as 
contentious.  I would rather not have the ATM patch 
depend on it.

By the by, here are a couple of observations:

1.  The entries in the current rtab are already very closely 
    related to packet lengths.  They are actually the packet
    length multiplied by a constant that converts the units
    from "bytes" to "jiffies".  The constant is the same for
    all entries in the table.

2.  As such, the current rtab could already be used by SFQ
    and any other qdisc that needs to know the packet length.
    That SFQ doesn''t do this is probably because it doesn''t
    effect its performance overly.

3.  Be that as it may, the current RTAB isn''t in the most
    convenient form for SFQ, and I am guessing it is in a 
    very inconvenient form for HFSC.  Adding a new version 
    that is identical except that it contains the raw packet 
    length would be a simple change.  In that format it
    could be used by all qdiscs.  The users of the existing
    rtab would have to do the multiplication that converts
    the packet length to jiffies in the kernel.  This means
    the conceptually at least, should the gootput change
    you need to change this one constant, not the entire
    table.

4.  Much as you seem to dislike having the rate / packet length
    calculations in user space, having them there makes it easy 
    to add new technologies such as ATM.  You just have to 
    change a user space tool - not the kernel.

5.  We still did have to modify the kernel for ATM.  That was
    because of its rather unusual characteristics.  However,
    it you look at the size of modifications made to the kernel
    verses the size made to the user space tool, (37 lines
    versus 303 lines,) the bulk of the work was does in user
    space.

Russell Stuart

2006-Jun-26 04:23 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On 25/06/2006 12:13 AM, jamal wrote:> You can actually stop reading here if you have gathered the view at
> this point that i am not objecting to the simple approach Patrick is
> going with...
Perhaps this is my problem.  I am not sure I understand
what Patrick is proposing.  I can wait for his patch, I
guess.
> Indeed and i referred to it in the exchanges. 
> And yes, I was arguing that the tc scheme you describe would not be so
> bad either if the cost of making a generic change is expensive.
OK.  I take it from this you think there is merit in
the idea of adding code so the kernel can calculate
the ATM link speeds correctly.  The discussion is
really about the best way to go about it?

If so, excellent.  I am not really too fussy about how
it is achieved, I just want my VOIP connections to
work well on stock kernels.
> There are a lot of link layer issues that you may end up knowing of
> (other than the ATM fragmentation overhead) in regards to something
> downstream and you keep adding knobs is just adding more bloat. 
> Example: If that 3rd hop was wireless that happened to be doing CDMA RLP
> with a lot of retransmits, or wireless that varied its throughput from
> 1-3Mbps at any point in time or it was a satellite link that had a lot
> of latency etc etc. You could always have some way to tweak these via
> the kernel. In-fact people have written schedulers specifically for
> these sorts of link layer problems (I think even some of the IEEE 802.11
> or wimax folks have standardized specific schedulers). You basically
> have to draw a line somewhere. My line was "can it be done via user
> space? yes - do it there".
If you mean by adding lots of knobs, you mean we need a knob
for 802.11, a knob for ATM, a knob for ethernet and so on,
then we do need lots of knobs.  And you need to know which
of those layers is the bottle neck, so you know what knob to
fit.  But you only ever need one knob on a given link.

I can only think of two ways out of needing lots of knobs.
One is to have a qdisc that doesn''t need to know the link
speed in order to shape traffic to it gets to the scheduling
and not someone upstream.  Sounds like black magic to me,
but perhaps HFSC does this - I have not read the papers
yet, but I plan to do so soon.

The second way is to automatically calculate the link speed,
using a daemon perhaps :).  Again it sounds like black
magic.  Note that there is already code in the kernel that
does this, but it lives in the layers above - in TCP and
DCCP.  I am referring to Westwood, and friends.  These
algorithms live in the layers above because the need feed
back from the network - which can only come from the other
end of connection unless ECN is working.

I have not been able to figure out how Patrick intends to
solve these problems from his posts so far, so I am waiting
for his code.  Hopefully it will include a lot of comments.
> Patrick seems to have a simple way to compensate generically for link
> layer fragmentation, so i will not argue the practically; hopefully that
> settles it? ;->
Yes, it does.  It will be interesting to see what Patrick
comes up with.

Russell Stuart

2006-Jun-26 06:27 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On 25/06/2006 12:13 AM, jamal wrote:> You can actually stop reading here if you have gathered the view at
> this point that i am not objecting to the simple approach Patrick is
> going with...
Perhaps this is my problem.  I am not sure I understand
what Patrick is proposing.  I can wait for his patch, I
guess.
> Indeed and i referred to it in the exchanges. 
> And yes, I was arguing that the tc scheme you describe would not be so
> bad either if the cost of making a generic change is expensive.
OK.  I take it from this you think there is merit in
the idea of adding code so the kernel can calculate
the ATM link speeds correctly.  The discussion is
really about the best way to go about it?

If so, excellent.  I am not really too fussy about how
it is achieved, I just want my VOIP connections to
work well on stock kernels.
> There are a lot of link layer issues that you may end up knowing of
> (other than the ATM fragmentation overhead) in regards to something
> downstream and you keep adding knobs is just adding more bloat. 
> Example: If that 3rd hop was wireless that happened to be doing CDMA RLP
> with a lot of retransmits, or wireless that varied its throughput from
> 1-3Mbps at any point in time or it was a satellite link that had a lot
> of latency etc etc. You could always have some way to tweak these via
> the kernel. In-fact people have written schedulers specifically for
> these sorts of link layer problems (I think even some of the IEEE 802.11
> or wimax folks have standardized specific schedulers). You basically
> have to draw a line somewhere. My line was "can it be done via user
> space? yes - do it there".
If you mean by adding lots of knobs, you mean we need a knob
for 802.11, a knob for ATM, a knob for ethernet and so on,
then we do need lots of knobs.  And you need to know which
of those layers is the bottle neck, so you know what knob to
fit.  But you only ever need one knob on a given link.

I can only think of two ways out of needing lots of knobs.
One is to have a qdisc that doesn''t need to know the link
speed in order to shape traffic to it gets to the scheduling
and not someone upstream.  Sounds like black magic to me,
but perhaps HFSC does this - I have not read the papers
yet, but I plan to do so soon.

The second way is to automatically calculate the link speed,
using a daemon perhaps :).  Again it sounds like black
magic.  Note that there is already code in the kernel that
does this, but it lives in the layers above - in TCP and
DCCP.  I am referring to Westwood, and friends.  These
algorithms live in the layers above because the need feed
back from the network - which can only come from the other
end of connection unless ECN is working.

I have not been able to figure out how Patrick intends to
solve these problems from his posts so far, so I am waiting
for his code.  Hopefully it will include a lot of comments.
> Patrick seems to have a simple way to compensate generically for link
> layer fragmentation, so i will not argue the practically; hopefully that
> settles it? ;->
Yes, it does.  It will be interesting to see what Patrick
comes up with.

Russell Stuart

2006-Jun-27 06:19 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On 26/06/2006 9:10 PM, Patrick McHardy wrote:>>5.  We still did have to modify the kernel for ATM.  That was
>>    because of its rather unusual characteristics.  However,
>>    it you look at the size of modifications made to the kernel
>>    verses the size made to the user space tool, (37 lines
>>    versus 303 lines,) the bulk of the work was does in user
>>    space.
> 
> I''m sorry, but arguing that a limited special case solution is
> better because it needs slightly less code is just not reasonable.
Without seeing your actual proposal it is difficult to
judge whether this is a reasonable trade-off or not.
Hopefully we will see your code soon.  Do you have any
idea when?

Russell Stuart

2006-Jun-27 08:22 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On 26/06/2006 9:10 PM, Patrick McHardy wrote:>>5.  We still did have to modify the kernel for ATM.  That was
>>    because of its rather unusual characteristics.  However,
>>    it you look at the size of modifications made to the kernel
>>    verses the size made to the user space tool, (37 lines
>>    versus 303 lines,) the bulk of the work was does in user
>>    space.
> 
> I''m sorry, but arguing that a limited special case solution is
> better because it needs slightly less code is just not reasonable.
Without seeing your actual proposal it is difficult to
judge whether this is a reasonable trade-off or not.
Hopefully we will see your code soon.  Do you have any
idea when?

Russell Stuart

2006-Jul-06 00:39 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy
wrote:> Unfortunately I still didn''t got to cleaning them up, so
I''m sending
> them in their preliminary state. Its not much that is missing, but
> the netem usage of skb->cb needs to be integrated better, I failed
> to move it to the qdisc_skb_cb so far because of circular includes.
Cleanups aside, architecturally the bulk of your patch 
looks like a no-brainier to me.  The calculation of
packet length should be in one place.  Caching it in
skb->cb was a nice touch.
> But nothing unfixable. I''m mostly interested if the current
size-tables
> can express what you need for ATM, I wasn''t able to understand the
> big comment in tc_core.c in your patch.
Unfortunately you do things in the wrong order for ATM.
See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
for an overview of the problem, and then the attached email for
a detailed description of how the current patch addresses it.
It is a trivial fix.

As I said earlier, RTAB and STAB contain the same numbers,
just scaled differently.  The ATM patch stuffed around with
RTAB.  With your patch in place it will have to do the same 
exactly the same thing with STAB - because RTAB and STAB
carry the same data.  So to me the two patches seem
orthogonal.

One observation is the size optimisation you applied to STAB, 
making it variable length, could also be applied to RTAB.  
In fact it should be.  Then they would be identical, apart 
from the scaling.  Even the lookup operation (performed in
qdisc_init_len in your patch) would be identical.

However, now you lot have made me go away and think, I have
another idea on how to attack this.  Perhaps it will be
more palatable to you.  It would replace RTAB and STAB with
a 28 byte structure for most protocol stacks - well all I can
think of off the top of my head, anyway.  RTAB would have to
remain for backwards compatibility, of course.

_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

Russell Stuart

2006-Jul-06 03:43 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy
wrote:> Unfortunately I still didn''t got to cleaning them up, so
I''m sending
> them in their preliminary state. Its not much that is missing, but
> the netem usage of skb->cb needs to be integrated better, I failed
> to move it to the qdisc_skb_cb so far because of circular includes.
Cleanups aside, architecturally the bulk of your patch 
looks like a no-brainier to me.  The calculation of
packet length should be in one place.  Caching it in
skb->cb was a nice touch.
> But nothing unfixable. I''m mostly interested if the current
size-tables
> can express what you need for ATM, I wasn''t able to understand the
> big comment in tc_core.c in your patch.
Unfortunately you do things in the wrong order for ATM.
See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
for an overview of the problem, and then the attached email for
a detailed description of how the current patch addresses it.
It is a trivial fix.

As I said earlier, RTAB and STAB contain the same numbers,
just scaled differently.  The ATM patch stuffed around with
RTAB.  With your patch in place it will have to do the same 
exactly the same thing with STAB - because RTAB and STAB
carry the same data.  So to me the two patches seem
orthogonal.

One observation is the size optimisation you applied to STAB, 
making it variable length, could also be applied to RTAB.  
In fact it should be.  Then they would be identical, apart 
from the scaling.  Even the lookup operation (performed in
qdisc_init_len in your patch) would be identical.

However, now you lot have made me go away and think, I have
another idea on how to attack this.  Perhaps it will be
more palatable to you.  It would replace RTAB and STAB with
a 28 byte structure for most protocol stacks - well all I can
think of off the top of my head, anyway.  RTAB would have to
remain for backwards compatibility, of course.

-------------- next part --------------
An embedded message was scrubbed...
From: Russell Stuart <russell@stuart.id.au>
Subject: Re: Getting ATM patches into the kernel
Date: Fri, 19 May 2006 22:59:34 +1000
Size: 10566
Url:
http://mailman.ds9a.nl/pipermail/lartc/attachments/20060706/fff4a390/attachment.mht

Russell Stuart

2006-Jul-10 08:44 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Fri, 2006-07-07 at 10:00 +0200, Patrick McHardy
wrote:> Russell Stuart wrote:
> > Unfortunately you do things in the wrong order for ATM.
> > See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
> > for an overview of the problem, and then the attached email for
> > a detailed description of how the current patch addresses it.
> > It is a trivial fix.
> 
> Actually that was the part I didn''t understand, you keep talking
> (also in that comment in tc_core.c) about an "unknown overhead".
> What is that and why would it be unknown? The mail you attached
> is quite long, is there an simple example that shows what you
> mean?
The "unknown overhead" is just the overhead passed to tc
using the "tc ... overhead xxx" option.  It is probably
what you intended to put into your addend attribute.

It is "unknown" because the kernel currently doesn''t use
it.  It is passed in the tc_ratespec, but is ignored by
the kernel as are most fields in there.

The easy way to fix the "ATM" problem described in the big
comment is simply to add the "overhead" to the packet 
length before doing the RTAB lookup.  (Identical comments 
apply to STAB).  If you don''t accept this or understand
why, then go read the "long emails" which attempt to
explain it in detail.  Jesper''s initial version of the
patch did just that, BTW.

However if you do that then you have to adjust RTAB for
all cases (not just ATM) to reflect that the kernel is 
now adding the overhead.  Thus the RTAB tc sends to the 
kernel now changes for different kernel versions, making 
modern versions of tc incompatible with older kernels, 
and visa versa.  I didn''t consider that acceptable.

My solution to this to give the kernel the old format
RTAB (ie the one that assumed the kernel didn''t add the
overhead) and a small adjustment.  This small adjustment 
is called cell_align in the ATM patch.  You do the same 
thing with cell_align as the previous solution did with 
the overhead - ie add it in just before looking up RTAB.  
This is in effect all the kernel part of the ATM patch
does - make the kernel accept the cell_align option,
and add it to skb->len before looking up RTAB.

The difference between cell_align and overhead is that
cell_align is always 0 when there is no packetisation,
and even when non zero it is small (less than 1<<cell_log, 
ie less than 8 for typical MTU''s).  So for anything bar 
ATM it is zero which means old kernels are completely
unaffected, and even for ATM not sending it produces a 
small error which means older kernels still benefit from
the "ATM" user space patch.   This makes the proposed 
"ATM" version of tc both forward and  backward compatible.

One other point arises here.  The fields in "tc_ratespec"
that "tc" fills and the kernel ignores are there so "tc 
show" will work.  The essence of the problem is "tc"
compiles the stuff you give it into a single "RTAB".  
That "RTAB" can''t be reverse compiled into the original 
numbers the user provided.  So if "tc show" is to work,
"tc" has to save that information somewhere.  I don''t
think the "tc_ratespec" was the best choice for two
reasons.

Firstly, having the fields show up in tc_ratespec 
makes it seem like the kernel can use them.  It can''t,
as the "overhead" example above shows.  Secondly, from
tc''s point of view it is inflexible.  Over time new
features have been be added to "tc", and each time a
new way of encoding it in the existing "tc_ratespec" 
has to be invented.  Thus we now have hacks like the
storing the "overhead" in the upper bits of the MPU
figure.

A better solution would be to provide a TLV (ie a 
TCA_XXX constant) for TC''s private use.  From the 
kernels point of view it would be an opaque structure
which just saves and echos back when asked.  This
would solve both problems.
> > However, now you lot have made me go away and think, I have
> > another idea on how to attack this.  Perhaps it will be
> > more palatable to you.  It would replace RTAB and STAB with
> > a 28 byte structure for most protocol stacks - well all I can
> > think of off the top of my head, anyway.  RTAB would have to
> > remain for backwards compatibility, of course.
> 
> Can you describe in more detail?
OK, but first I want to make the point that the only
reason I suggest this is to get some sort of ATM
patch into the kernel, as the current patch on the
table is having a rough time.

Alan Cox made the point earlier (if I understood him
correctly) that this tabling lookup probably isn''t
a big win on modern CPU''s - we may be better off
moving it all into the kernel.  Thinking about this,
I tried to come up with a way of describing the
mapping between skb->len and the on the wire packet
length for every protocol I know.  This is what I
came up with.

Assume we have a packet length L, which is to be
transported by some protocol.  For now we consider
one protocol only, ie: TCP, PPP, ATM, Ethernet or
whatever.   I will generalise it to multiple protocols
later.  I think a generalised transformation can be 
made using using 5 numbers which are applied in this
order:

  Overhead - A fixed overhead that is added to L.

  Mpu      - Minimum packet size.  If the result of
             (Overhead+L) is smaller that this, then 
             the new result becomes this size.

  Round    - The result is then rounded up to this
             many bytes.  For protocols that always
             transmit single bytes this figure would be
             1.  If there were some protocol that
             transmitted data as 4 byte chunks then this
             would be 4.  For ATM it is 48.

  CellPay  - If the packet is broken down into smaller
             packets when sent, then this is the amount
             of data that will fit into each chunk.

  CallOver - This is the additional overhead each cell
             carries.

The idea is the kernel would do this calculation on the
fly for each packet.  If you represent this set of number 
numbers as a comma separated list in the order they were 
presented above, then here are some examples:

  IP:       20
  Ethernet: 18,64
  PPP:      2
  ATM:      0,0,48,48,5

It may be that 5 numbers are a overkill.  It is for all
protocols I am aware of - for those you could get away
with 4.  But I am no expert.

The next step is to generalise for many protocols.  As
the protocols are stacked the length output by one 
protocol becoming the input length for the downstream 
one.  So we just need to apply the same transformation 
serially. I will use ''+'' to indicate the stacking.  For 
a typical ATM stack, PPPoE over LLC, we have:

  ppp:2+pppoe:6+ethernet:14,64+llc:8+all5:4+atm:0,0,48,48,5

If this were implemented naively, then the kernel would
have to apply the above calculation 6 times, like this:

  Protocol   InputLength    OutputLength
  ---------  ------------   ----------------
  ppp        skb->len       skb->len+2
  pppoe:     skb->len+2     skb->len+2+6
  ethernet:  skb->len+2+6   skb->len+2+6+14
  ... and so on.

But it can be optimised.  In this particular case we can
combine those six operations into 1:

  adsl_pppoe_llc:34,64,48,48,5

The five numbers have the same meaning as before.  It
it not difficult to come up with a generalised rule that
allows you to do this for most cases.  For the remainder
(if they exist - I can''t think of any) the kernel would
have to apply the transformation iteratively.

Before going on, it is worth while comparing this to the
current RTAB solution (and by implication STAB):

  1.  Oddly, the number of steps and hence speed for 
      common protocols is probably the same.  Compare:
        RTAB - You have to add an OverHead in the general
               case.
             - You have to scale by cell_log.
             - You have to ensure the overhead+skb->len
               doesn''t overflow / underflow the RTAB.
             - You have to do the lookup.
        New  - You have to add overhead.
             - You have to check the MPU.
             - You have to check if you have to apply
               Round,CellPay,CellOver - but you won''t
               have to for any protocol except ATM.

  2.  Because of the cell_log, RTAB gives an 100% accurate
      answer 1 time in every (1<<cell_log) packet lengths.
      The new version is always 100% accurate.

  3.  The new version isn''t as flexible as RTAB, as least
      from the kernels point of view.  Conceivably there are 
      protocols that could be handled by RTAB that are not 
      handled by the new one.  Since RTAB is computed in
      user space, this implies these new protocols might
      be handled by a user space change only.  The new
      version would always require a kernel change.  Note 
      however that even RTAB required a kernel change for
      ATM, however.

So far we have what your STAB would provide us with - a
way to calculate the packet length.  It takes 5 int''s for 
every protocol stack I can think of.  It probably runs
faster for most protocols but is less robust to the
introduction of new protocols.

But some qdisc''s need to know how long it takes to send a 
packet.  This is what RTAB provides us with, in fact.  
So if we were to do away with RTAB completely, then we need
a way for the kernel to covert packet lengths into the time
it takes to send a packet.  This is what I discuss next.
The comments apply to both STAB and the new algorithm above,
as they both compute packet lengths.

If J is the time it takes to send one byte over a link, 
then we can compute RTAB from STAB like this:

  for (int i = 0; i < array_length(STAB); i += 1)
    RTAB[i] = STAB[i] * J;

This is exactly the operation "tc" performs now in user
space.  It is possibly in user space because J is usually
less than 1, and thus is most conveniently represented as
a floating point number.  Floating point operations in
the kernel are verboten.

I can think of two ways to move this operation into the
kernel.  The straight forward way is to represent J as a 
scale and a division.  Ie the kernel does:

  RTAB[i] = (STAB[i] << scale) / divisor.

The second way depends on that fact that most CPU''s can
multiply two 32 uint''s to produce a 64 bit result in a
single operation.   I don''t know whether this operation is
available to kernel code.  But if it is, J can be 
represented as a 32 bit fixed point number, with the
implied decimal point after the most significant 8 bits.
Then this operation would suffice:

  extern long long mul64(unsigned long, a, unsigned long b);

  RTAB[i] = (unsigned long)(mul64(STAB[i], J) >> 24);

This method doesn''t use division, and is probably faster
on lower end CPU''s.  It would handle 100G Ethernet on a 
machine with Hz == 1000, and 1200 bits/sec on a machine
with Hz == 10000.

Russell Stuart

2006-Jul-18 02:06 UTC

head link

RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:> And yes, I was arguing that the tc scheme you describe would not be so
> bad either if the cost of making a generic change is expensive.
<snip>> Patrick seems to have a simple way to compensate generically for link
> layer fragmentation, so i will not argue the practically; hopefully that
> settles it? ;->
Things seem to have died down.  Patrick''s patch seemed 
unrelated to ATM to me.  I did put up another suggestion, 
but I don''t think anybody was too impressed with the 
idea.  So that leave the current ATM patch as the only 
one we have on the table that addresses the ATM issue.

Since you don''t think it is "too bad", can we proceed 
with it?

Russell Stuart

2006-Jul-18 04:45 UTC

head link

[LARTC] RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:> And yes, I was arguing that the tc scheme you describe would not be so
> bad either if the cost of making a generic change is expensive.
<snip>> Patrick seems to have a simple way to compensate generically for link
> layer fragmentation, so i will not argue the practically; hopefully that
> settles it? ;->
Things seem to have died down.  Patrick''s patch seemed 
unrelated to ATM to me.  I did put up another suggestion, 
but I don''t think anybody was too impressed with the 
idea.  So that leave the current ATM patch as the only 
one we have on the table that addresses the ATM issue.

Since you don''t think it is "too bad", can we proceed 
with it?

Russell Stuart

2006-Jul-20 04:56 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy
wrote:> Please excuse my silence, I was travelling and am still catching up
> with my mails.
Sorry.  Had I realised you were busy I would of
waited.
> > - As it stands, it doesn''t help the qdiscs that use 
> >   RTAB.  So unless he proposes to remove RTAB entirely 
> >   the ATM patch as it will still have to go in.
> 
> Why? The length calculated by my STABs (or something similar)
> is used by _all_ qdiscs. Not only for transmission time calculation,
> but also for statistics and estimators.
Oh.  I didn''t see where it is used for the time 
calculation in your patch.  Did I miss something,
or is that the unfinished bit?

This is possibly my stumbling block.  If you don''t remove
RTAB the ATM patch as stands will be needed.  Your patch
didn''t remove RTAB, and you didn''t say it was intended to,
so I presume it wasn''t going to.
>  If the length calculation
> doesn''t fit for ATM, that can be fixed.
Yes of course.  Just to be clear: as far as I am concerned
this never was an issue.
> > - A bit of effort was put into making this current
> >   ATM patch both backwards and forwards compatible.
> >   Patricks patch would work with newer kernels,
> >   obviously.  Older kernels, and in particular the
> >   kernel that Debian is Etch is likely to distribute
> >   would miss out.
> 
> True, but it provides more consistency, and making current
> kernels behave better is more important than old kernels.
I guess provided the new "tc" works with older kernels this
is OK - although a disappoint to me.  Works here being defined
as "works as well as a previous the version of tc does".  For 
me not working would be OK as well provided "tc" issued a 
warning message to the effect that it "needs kernel version 
XXX or above"", but doing that would probably require it to 
look at the kernel version.  Looking at the kernel version 
in tc seems to be frowned upon.
> You seem to have misunderstood my patch. It doesn''t need to
> touch RTABs, it just calculates the packet length as seen
> on the wire (whereever it is) and uses that thoughout the
> entire qdisc layer.
No, you have it in reverse - as I said above.  My problem is 
that your patch does not touch RTAB.  Several qdiscs really 
don''t care about the length of a packet (other than for 
keeping track of stats) - they just care about how long 
it takes to send.  Off the top of my these are HTB, CBQ 
and TBF.  They use RTAB to make this calculation.  So unless
you replace RTAB with STAB the current ATM patch will still 
be needed.
> > One other point - the optimisation Patrick proposes
> > for STAB (over RTAB) was to make the number of entries
> > variable.  This seems like a good idea.  However there 
> > is no such thing as a free lunch, and if you did 
> > indeed reduce the number of entries to 16 for Ethernet 
> > (as I think Patrick suggested), then each entry would
> > cover 1500/16 = 93 different packet lengths.  Ie,
> > entry 0 would cover packet lengths 0..93, entry 1
> > 94..186, and so on.  A single entry can''t be right
> > for all those packet lengths, so again we are back
> > to a average 30% error for typical VOIP length
> > packets.
> 
> My patch doesn''t uses fixed sized cells, so it can deal
> with anything, worst case is you use one cell per packet
> size. Optimizing size and lookup speed for ethernet makes
> a lot more sense than optimizing for ADSL.
I was just responding to a point you made earlier, when
you said STAB could only use 16 entries as opposed to the
256 used by RTAB.  I suspect nobody would actually do that 
because of the inaccuracy it creates, so the comparison is
perhaps unfair.  I agree the flexibility of making STAB 
variable length is a good idea, and comes at 0 cost in 
the kernel.

Andy Furniss wrote:> > Russell Stuart wrote:
> >> The kernel will have to do a shift and a division
> >> for each packet, which I assume is permissible.
> > 
> > 
> > I guess that is for others to decide :-) I think Patrick has a point
> > about sfq/htb drr, Like you I guess, I thought that alot of extra per
> > packet calculations would have got an instant NO.
> 
> Its only done once per packet (currently, it might be interesting to
> override the length for specific classes and their childs, for example
> if you do queueing on eth0 and have an DSL router one hop apart).
> The division is gone in my patch btw.
Unlike the packet length the time calculation can''t be
cached in the skb.  Most classes in HTB/CBQ use different
packet transmission rates.

Russell Stuart

2006-Jul-20 05:47 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

On Thu, 2006-07-20 at 01:00 +0400, Alexey Kuznetsov
wrote:> Hello!
So you really do exist?  I thought it was just
rumour.
> Well, if fixed point arithmetics is not a problem.
It shouldn''t be.  Any decimal number can be expressed
as a fraction, eg:

  0.00123 = 123/100000

Which can be calculated as a multiply and a divide. With
MTU''s up to 2048, it should be possible to do this with
99.9999% accuracy (ie 2048/2^23).

With a bit more work in userspace (ie in tc), it can be
be reduced to a multiply and a shift.
> Plus, remember, the function is not R*size, it is at least
> R*size+addend, to account for link overhead. Plus account for padding
> of small packets. Plus, when policing it should deaccount already added
> link headers, QoS counts only network payload.
Yes, it is flexible - and has served us well up until
now.  It doesn''t work well for ATM, but with a small
bit of extra calculation in the kernel it could.
However, it turns out that ATM is a special case.  If 
ATM''s cell payload was 58 bytes instead of 48 bytes 
(say), then it would not be possible to produce a RTAB 
that had small errors (eg < 10%) for smallish packet 
sizes (< 290 bytes).  I seem to have trouble 
explaining why in a concise way that people understand, 
so I won''t try here.

So when Alan Cox said our ATM patch didn''t solve the 
packetisation problem in general, he was right as our
patch just built upon RTAB.  Patrick''s STAB proposal 
in general either for that matter, as it is just another 
implementation of RTAB with the same limitations.  The 
only way I can think of to solve it in general is to 
move many more calculations into the kernel - as I 
proposed in a long winded answer to Patrick earlier 
in this thread.

But doing so would get rid of the table implementation 
and the flexibility it has given us to date.  For that 
reason I feel uncomfortable with it.

The engineering decision becomes this - are there any
other protocols like ATM out there that could justify 
such a change?  (In my more cynical moments I think of 
it differently - has/is the world going to make a 
second engineering fuck up on the scale of ATM again?  
How on earth did anyone decide that pushing data 
packets over ATM, as happens in ADSL, was a good 
idea?)  I know of no other such protocols.  But then
I don''t have an encyclopedic knowledge of comms
protocols, so that doesn''t mean much.  I suspect you
know a good deal more about them than I do.  What say
you?

Russell Stuart

2006-Jul-20 07:50 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy
wrote:> Please excuse my silence, I was travelling and am still catching up
> with my mails.
Sorry.  Had I realised you were busy I would of
waited.
> > - As it stands, it doesn''t help the qdiscs that use 
> >   RTAB.  So unless he proposes to remove RTAB entirely 
> >   the ATM patch as it will still have to go in.
> 
> Why? The length calculated by my STABs (or something similar)
> is used by _all_ qdiscs. Not only for transmission time calculation,
> but also for statistics and estimators.
Oh.  I didn''t see where it is used for the time 
calculation in your patch.  Did I miss something,
or is that the unfinished bit?

This is possibly my stumbling block.  If you don''t remove
RTAB the ATM patch as stands will be needed.  Your patch
didn''t remove RTAB, and you didn''t say it was intended to,
so I presume it wasn''t going to.
>  If the length calculation
> doesn''t fit for ATM, that can be fixed.
Yes of course.  Just to be clear: as far as I am concerned
this never was an issue.
> > - A bit of effort was put into making this current
> >   ATM patch both backwards and forwards compatible.
> >   Patricks patch would work with newer kernels,
> >   obviously.  Older kernels, and in particular the
> >   kernel that Debian is Etch is likely to distribute
> >   would miss out.
> 
> True, but it provides more consistency, and making current
> kernels behave better is more important than old kernels.
I guess provided the new "tc" works with older kernels this
is OK - although a disappoint to me.  Works here being defined
as "works as well as a previous the version of tc does".  For 
me not working would be OK as well provided "tc" issued a 
warning message to the effect that it "needs kernel version 
XXX or above"", but doing that would probably require it to 
look at the kernel version.  Looking at the kernel version 
in tc seems to be frowned upon.
> You seem to have misunderstood my patch. It doesn''t need to
> touch RTABs, it just calculates the packet length as seen
> on the wire (whereever it is) and uses that thoughout the
> entire qdisc layer.
No, you have it in reverse - as I said above.  My problem is 
that your patch does not touch RTAB.  Several qdiscs really 
don''t care about the length of a packet (other than for 
keeping track of stats) - they just care about how long 
it takes to send.  Off the top of my these are HTB, CBQ 
and TBF.  They use RTAB to make this calculation.  So unless
you replace RTAB with STAB the current ATM patch will still 
be needed.
> > One other point - the optimisation Patrick proposes
> > for STAB (over RTAB) was to make the number of entries
> > variable.  This seems like a good idea.  However there 
> > is no such thing as a free lunch, and if you did 
> > indeed reduce the number of entries to 16 for Ethernet 
> > (as I think Patrick suggested), then each entry would
> > cover 1500/16 = 93 different packet lengths.  Ie,
> > entry 0 would cover packet lengths 0..93, entry 1
> > 94..186, and so on.  A single entry can''t be right
> > for all those packet lengths, so again we are back
> > to a average 30% error for typical VOIP length
> > packets.
> 
> My patch doesn''t uses fixed sized cells, so it can deal
> with anything, worst case is you use one cell per packet
> size. Optimizing size and lookup speed for ethernet makes
> a lot more sense than optimizing for ADSL.
I was just responding to a point you made earlier, when
you said STAB could only use 16 entries as opposed to the
256 used by RTAB.  I suspect nobody would actually do that 
because of the inaccuracy it creates, so the comparison is
perhaps unfair.  I agree the flexibility of making STAB 
variable length is a good idea, and comes at 0 cost in 
the kernel.

Andy Furniss wrote:> > Russell Stuart wrote:
> >> The kernel will have to do a shift and a division
> >> for each packet, which I assume is permissible.
> > 
> > 
> > I guess that is for others to decide :-) I think Patrick has a point
> > about sfq/htb drr, Like you I guess, I thought that alot of extra per
> > packet calculations would have got an instant NO.
> 
> Its only done once per packet (currently, it might be interesting to
> override the length for specific classes and their childs, for example
> if you do queueing on eth0 and have an DSL router one hop apart).
> The division is gone in my patch btw.
Unlike the packet length the time calculation can''t be
cached in the skb.  Most classes in HTB/CBQ use different
packet transmission rates.

Russell Stuart

2006-Jul-20 07:51 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

On Thu, 2006-07-20 at 01:00 +0400, Alexey Kuznetsov
wrote:> Hello!
So you really do exist?  I thought it was just
rumour.
> Well, if fixed point arithmetics is not a problem.
It shouldn''t be.  Any decimal number can be expressed
as a fraction, eg:

  0.00123 = 123/100000

Which can be calculated as a multiply and a divide. With
MTU''s up to 2048, it should be possible to do this with
99.9999% accuracy (ie 2048/2^23).

With a bit more work in userspace (ie in tc), it can be
be reduced to a multiply and a shift.
> Plus, remember, the function is not R*size, it is at least
> R*size+addend, to account for link overhead. Plus account for padding
> of small packets. Plus, when policing it should deaccount already added
> link headers, QoS counts only network payload.
Yes, it is flexible - and has served us well up until
now.  It doesn''t work well for ATM, but with a small
bit of extra calculation in the kernel it could.
However, it turns out that ATM is a special case.  If 
ATM''s cell payload was 58 bytes instead of 48 bytes 
(say), then it would not be possible to produce a RTAB 
that had small errors (eg < 10%) for smallish packet 
sizes (< 290 bytes).  I seem to have trouble 
explaining why in a concise way that people understand, 
so I won''t try here.

So when Alan Cox said our ATM patch didn''t solve the 
packetisation problem in general, he was right as our
patch just built upon RTAB.  Patrick''s STAB proposal 
in general either for that matter, as it is just another 
implementation of RTAB with the same limitations.  The 
only way I can think of to solve it in general is to 
move many more calculations into the kernel - as I 
proposed in a long winded answer to Patrick earlier 
in this thread.

But doing so would get rid of the table implementation 
and the flexibility it has given us to date.  For that 
reason I feel uncomfortable with it.

The engineering decision becomes this - are there any
other protocols like ATM out there that could justify 
such a change?  (In my more cynical moments I think of 
it differently - has/is the world going to make a 
second engineering fuck up on the scale of ATM again?  
How on earth did anyone decide that pushing data 
packets over ATM, as happens in ADSL, was a good 
idea?)  I know of no other such protocols.  But then
I don''t have an encyclopedic knowledge of comms
protocols, so that doesn''t mean much.  I suspect you
know a good deal more about them than I do.  What say
you?

Russell Stuart

2006-Jul-30 23:06 UTC

head link

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Thu, 2006-07-20 at 14:56 +1000, Russell Stuart wrote:> On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote:
> > Please excuse my silence, I was travelling and am still catching up
> > with my mails.
> 
> Sorry.  Had I realised you were busy I would of
> waited.
> 
> > > - As it stands, it doesn''t help the qdiscs that use 
> > >   RTAB.  So unless he proposes to remove RTAB entirely 
> > >   the ATM patch as it will still have to go in.
> > 
> > Why? The length calculated by my STABs (or something similar)
> > is used by _all_ qdiscs. Not only for transmission time calculation,
> > but also for statistics and estimators.
> 
> Oh.  I didn''t see where it is used for the time 
> calculation in your patch.  Did I miss something,
> or is that the unfinished bit?
> 
> This is possibly my stumbling block.  If you don''t remove
> RTAB the ATM patch as stands will be needed.  Your patch
> didn''t remove RTAB, and you didn''t say it was intended
to,
> so I presume it wasn''t going to.
It has gone quiet again.  In my mind the one unresolved issue
is whether Patrick intended to remove RTAB with his patch.
If not, the ATM patch as it stands will have to go in.

Patrick - it would be nice to hear from you.

Russell Stuart

2006-Jul-31 23:32 UTC

head link

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

On Thu, 2006-07-20 at 14:56 +1000, Russell Stuart wrote:> On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote:
> > Please excuse my silence, I was travelling and am still catching up
> > with my mails.
> 
> Sorry.  Had I realised you were busy I would of
> waited.
> 
> > > - As it stands, it doesn''t help the qdiscs that use 
> > >   RTAB.  So unless he proposes to remove RTAB entirely 
> > >   the ATM patch as it will still have to go in.
> > 
> > Why? The length calculated by my STABs (or something similar)
> > is used by _all_ qdiscs. Not only for transmission time calculation,
> > but also for statistics and estimators.
> 
> Oh.  I didn''t see where it is used for the time 
> calculation in your patch.  Did I miss something,
> or is that the unfinished bit?
> 
> This is possibly my stumbling block.  If you don''t remove
> RTAB the ATM patch as stands will be needed.  Your patch
> didn''t remove RTAB, and you didn''t say it was intended
to,
> so I presume it wasn''t going to.
It has gone quiet again.  In my mind the one unresolved issue
is whether Patrick intended to remove RTAB with his patch.
If not, the ATM patch as it stands will have to go in.

Patrick - it would be nice to hear from you.

LARTC - Jun 2006 - [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL