Jesper Dangaard Brouer
2006-Jun-14 09:40 UTC
[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
The Linux traffic''s control engine inaccurately calculates transmission times for packets sent over ADSL links. For some packet sizes the error rises to over 50%. This occurs because ADSL uses ATM as its link layer transport, and ATM transmits packets in fixed sized 53 byte cells. The following patches to iproute2 and the kernel add an option to calculate traffic transmission times over all ATM links, including ADSL, with perfect accuracy. A longer presentation of the patch, its rational, what it does and how to use it can be found here: http://www.stuart.id.au/russell/files/tc/tc-atm/ A earlier version of the patch, and a _detailed_ empirical investigation of its effects can be found here: http://www.adsl-optimizer.dk/ The patches are both backwards and forwards compatible. This means unpatched kernels will work with a patched version of iproute2, and an unpatched iproute2 will work on patches kernels. This is a combined effort of Jesper Brouer and Russell Stuart, to get these patches into the upstream kernel. Let the discussion start about what we need to change to get this upstream? We see this as a feature enhancement, as thus hope that it can be queued in davem''s net-2.6.18.git tree. --- Regards, Jesper Brouer & Russell Stuart.
I have taken linux-kernel off the list. Russell''s site is inaccessible to me (I actually think this is related to some DNS issues i may be having) and your masters is too long to spend 2 minutes and glean it; so heres a question or two for you: - Have you tried to do a long-lived session such as a large FTP and seen how far off the deviation was? That would provide some interesting data point. - To be a devil''s advocate (and not claim there is no issue), where do you draw the line with "overhead"? Example the smallest ethernet packet is 64 bytes of which 14 bytes are ethernet headers ("overhead" for IP) - and this is not counting CRC etc. If you were to set an MTU of say 64 bytes and tried to do a http or ftp, how accurate do you think the calculation would be? I would think not very different. Does it matter if it is accurate on the majority of the cases? - For further reflection: Have you considered the case where the rate table has already been considered on some link speed in user space and then somewhere post-config the physical link speed changes? This would happen in the case where ethernet AN is involved and the partner makes some changes (use ethtool). I would say the last bullet is a more interesting problem than a corner case of some link layer technology that has high overhead. Your work would be more interesting if it was generic for many link layers instead of just ATM. cheers, jamal On Wed, 2006-14-06 at 11:40 +0200, Jesper Dangaard Brouer wrote:> The Linux traffic''s control engine inaccurately calculates > transmission times for packets sent over ADSL links. For > some packet sizes the error rises to over 50%. This occurs > because ADSL uses ATM as its link layer transport, and ATM > transmits packets in fixed sized 53 byte cells. > > The following patches to iproute2 and the kernel add an > option to calculate traffic transmission times over all > ATM links, including ADSL, with perfect accuracy. > > A longer presentation of the patch, its rational, what it > does and how to use it can be found here: > http://www.stuart.id.au/russell/files/tc/tc-atm/ > > A earlier version of the patch, and a _detailed_ empirical > investigation of its effects can be found here: > http://www.adsl-optimizer.dk/ > > The patches are both backwards and forwards compatible. > This means unpatched kernels will work with a patched > version of iproute2, and an unpatched iproute2 will work > on patches kernels. > > > This is a combined effort of Jesper Brouer and Russell Stuart, > to get these patches into the upstream kernel. > > Let the discussion start about what we need to change to get this > upstream? > > We see this as a feature enhancement, as thus hope that it can be > queued in davem''s net-2.6.18.git tree. > > --- > Regards, > Jesper Brouer & Russell Stuart. >- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer
2006-Jun-14 12:55 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:> Russell''s site is inaccessible to me (I actually think this is related > to some DNS issues i may be having)Strange, I have access to Russell''s site. Maybe its his redirect feature that confuses your browser, try: http://ace-host.stuart.id.au/russell/files/tc/tc-atm/> and your masters is too long to > spend 2 minutes and glean it; so heres a question or two for you:Yes, I is quite long and very detailed. But it worth reading (... says the author him self ;-))> - Have you tried to do a long-lived session such as a large FTP and > seen how far off the deviation was? That would provide some interesting > data point.The deviation can be calculated. The impact is of cause small for large packets. But the argument that bulk TCP transfers is not as badly affected, is wrong because all the TCP ACK packets gets maximum penalty. On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will use more that one ATM frame, causing 2 ATM frames to be send that consumes 106 bytes, eg. 62% overhead. On a small upstream ADSL line that hurts! (See thesis page 53, table 5.3 "Overhead summary").> - To be a devil''s advocate (and not claim there is no issue), where do > you draw the line with "overhead"? > Example the smallest ethernet packet is 64 bytes of which 14 bytes are > ethernet headers ("overhead" for IP) - and this is not counting CRC etc. > If you were to set an MTU of say 64 bytes and tried to do a http or ftp, > how accurate do you think the calculation would be? I would think not > very different.I do think we handle this situation, but I''m not quite sure that I fully understand the question (sorry).> Does it matter if it is accurate on the majority of the cases? > - For further reflection: Have you considered the case where the rate > table has already been considered on some link speed in user space and > then somewhere post-config the physical link speed changes? This would > happen in the case where ethernet AN is involved and the partner makes > some changes (use ethtool). > > I would say the last bullet is a more interesting problem than a corner > case of some link layer technology that has high overhead.We only claim to do magic on ATM/ADSL links... nothing else ;-)> Your work would be more interesting if it was generic for many link > layers instead of just ATM.Well, we did consider to do so, but we though that it would be harder to get it into the kernel. Actually thats the reason for the defines: #define ATM_CELL_SIZE 53 #define ATM_CELL_PAYLOAD 48 Changing these should should make it possible to adapt to any other SAR (Segment And Reasembly) link layer.> On Wed, 2006-14-06 at 11:40 +0200, Jesper Dangaard Brouer wrote: > > The Linux traffic''s control engine inaccurately calculates > > transmission times for packets sent over ADSL links. For > > some packet sizes the error rises to over 50%. This occurs > > because ADSL uses ATM as its link layer transport, and ATM > > transmits packets in fixed sized 53 byte cells. > > > > The following patches to iproute2 and the kernel add an > > option to calculate traffic transmission times over all > > ATM links, including ADSL, with perfect accuracy. > > > > A longer presentation of the patch, its rational, what it > > does and how to use it can be found here: > > http://www.stuart.id.au/russell/files/tc/tc-atm/ > > > > A earlier version of the patch, and a _detailed_ empirical > > investigation of its effects can be found here: > > http://www.adsl-optimizer.dk/ > > > > The patches are both backwards and forwards compatible. > > This means unpatched kernels will work with a patched > > version of iproute2, and an unpatched iproute2 will work > > on patches kernels. > > > > > > This is a combined effort of Jesper Brouer and Russell Stuart, > > to get these patches into the upstream kernel. > > > > Let the discussion start about what we need to change to get this > > upstream? > > > > We see this as a feature enhancement, as thus hope that it can be > > queued in davem''s net-2.6.18.git tree. > > > > --- > > Regards, > > Jesper Brouer & Russell Stuart. > > >Thanks for your comments :-) -- Med venlig hilsen / Best regards Jesper Brouer ComX Networks A/S Linux Network developer Cand. Scient Datalog / MSc. Author of http://adsl-optimizer.dk
Phillip Susi
2006-Jun-14 14:27 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
Jesper Dangaard Brouer wrote:> The Linux traffic''s control engine inaccurately calculates > transmission times for packets sent over ADSL links. For > some packet sizes the error rises to over 50%. This occurs > because ADSL uses ATM as its link layer transport, and ATM > transmits packets in fixed sized 53 byte cells. >I could have sworn that DSL uses its own framing protocol that is similar to the frame/superframe structure of HDSL ( T1 ) lines and over that you can run ATM or ethernet. Or is it typically ethernet -> ATM -> HDSL? In any case, why does the kernel care about the exact time that the IP packet has been received and reassembled on the headend? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andy Furniss
2006-Jun-14 15:32 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote:> I have taken linux-kernel off the list. > > Russell''s site is inaccessible to me (I actually think this is related > to some DNS issues i may be having) and your masters is too long to > spend 2 minutes and glean it; so heres a question or two for you: > > - Have you tried to do a long-lived session such as a large FTP and > seen how far off the deviation was? That would provide some interesting > data point. > - To be a devil''s advocate (and not claim there is no issue), where do > you draw the line with "overhead"?Me and many others have run a smilar hack for years, there is also a userspace project still alive which does the same. The difference is that without it I would need to sacrifice almost half my 288kbit atm/dsl showtime bandwidth to be sure of control. With the modification I can run at 286kbit / 288 and know I will never have jitter worse than the bitrate latency of a mtu packet. The 286 figure was choses to allow a full buffer to drain/ allow for timer innaccuracy etc. On a p200 with tsc, 2.6.12 it''s never gone over for me - though talking of timers I notice on my desktop 2.6.16 I gain 2 minutes a day now. Andy. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2006-14-06 at 14:55 +0200, Jesper Dangaard Brouer wrote:> On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:> > - Have you tried to do a long-lived session such as a large FTP and > > seen how far off the deviation was? That would provide some interesting > > data point. > > The deviation can be calculated. The impact is of cause small for large > packets. > > But the argument that bulk TCP transfers is not as badly > affected, is wrong because all the TCP ACK packets gets maximum penalty. >ACKs have always played a prominent role. The last numbers i have seen for North America (but i think probably valid globaly) show in the range of 40% ACKs in internet traffic http://netweb.usc.edu/~rsinha/pkt-sizes/ I suspect a lot of these stats are on their way to change with voip, p2p etc. But i dont think it is ACKs perse that you or Russell are contending cause these issues. It''s the presence of ATM . And all evidence seems to point to the fact that ISPs bill you for something other than your point of view, no?> On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will > use more that one ATM frame, causing 2 ATM frames to be send that > consumes 106 bytes, eg. 62% overhead. On a small upstream ADSL line > that hurts! (See thesis page 53, table 5.3 "Overhead summary"). >But how are you connected to the DSLAM? In north America it is typically ethernet. If i use the current tables i dont see much of a problem with say cable modems. Are you trying to compensate for the accounting differences between what your service provider measures (accounting for their ATM cells) and what you do accounting for your ethernet frames? I guess i am lost as to where the ATM is in the topology and more importantly whether we (Linux) mis-account or whether your approach is trying to compensate for the ISPs mis-accounting.> > > - To be a devil''s advocate (and not claim there is no issue), where do > > you draw the line with "overhead"? > > Example the smallest ethernet packet is 64 bytes of which 14 bytes are > > ethernet headers ("overhead" for IP) - and this is not counting CRC etc. > > If you were to set an MTU of say 64 bytes and tried to do a http or ftp, > > how accurate do you think the calculation would be? I would think not > > very different. > > I do think we handle this situation, but I''m not quite sure that I fully > understand the question (sorry). >Assume the following: - You had ethernet end to end. Is there still a problem? - Take it a notch up and assume you had ethernet with MTU of 64B. This way you will have all your packets being small and having high overhead. Do you still have a problem?> > > Does it matter if it is accurate on the majority of the cases? > > - For further reflection: Have you considered the case where the rate > > table has already been considered on some link speed in user space and > > then somewhere post-config the physical link speed changes? This would > > happen in the case where ethernet AN is involved and the partner makes > > some changes (use ethtool). > > > > I would say the last bullet is a more interesting problem than a corner > > case of some link layer technology that has high overhead. > > We only claim to do magic on ATM/ADSL links... nothing else ;-) >This is well and good given the focus of your thesis. Up/down here we need something more generic. Your masters-thesis is a good start but consider doing the phd next and complete this work;->> > > Your work would be more interesting if it was generic for many link > > layers instead of just ATM. > > Well, we did consider to do so, but we though that it would be harder to > get it into the kernel. > > Actually thats the reason for the defines: > #define ATM_CELL_SIZE 53 > #define ATM_CELL_PAYLOAD 48 > > Changing these should should make it possible to adapt to any other SAR > (Segment And Reasembly) link layer. >You are still speaking ATM (and the above may still be valid), but: Could you for example look at the netdevice->type and from that figure out the link layer overhead and compensate for it. Obviously a lot more useful if such activity is doable in user space without any knowledge of the kernel? and therefore zero change to the kernel and everything then becomes forward and backward compatible. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2006-14-06 at 14:55 +0200, Jesper Dangaard Brouer wrote:> On Wed, 2006-06-14 at 08:06 -0400, jamal wrote:> > - Have you tried to do a long-lived session such as a large FTP and > > seen how far off the deviation was? That would provide some interesting > > data point. > > The deviation can be calculated. The impact is of cause small for large > packets. > > But the argument that bulk TCP transfers is not as badly > affected, is wrong because all the TCP ACK packets gets maximum penalty. >ACKs have always played a prominent role. The last numbers i have seen for North America (but i think probably valid globaly) show in the range of 40% ACKs in internet traffic http://netweb.usc.edu/~rsinha/pkt-sizes/ I suspect a lot of these stats are on their way to change with voip, p2p etc. But i dont think it is ACKs perse that you or Russell are contending cause these issues. It''s the presence of ATM . And all evidence seems to point to the fact that ISPs bill you for something other than your point of view, no?> On an ADSL link with more than 8 bytes overhead, a 40 bytes TCP ACK will > use more that one ATM frame, causing 2 ATM frames to be send that > consumes 106 bytes, eg. 62% overhead. On a small upstream ADSL line > that hurts! (See thesis page 53, table 5.3 "Overhead summary"). >But how are you connected to the DSLAM? In north America it is typically ethernet. If i use the current tables i dont see much of a problem with say cable modems. Are you trying to compensate for the accounting differences between what your service provider measures (accounting for their ATM cells) and what you do accounting for your ethernet frames? I guess i am lost as to where the ATM is in the topology and more importantly whether we (Linux) mis-account or whether your approach is trying to compensate for the ISPs mis-accounting.> > > - To be a devil''s advocate (and not claim there is no issue), where do > > you draw the line with "overhead"? > > Example the smallest ethernet packet is 64 bytes of which 14 bytes are > > ethernet headers ("overhead" for IP) - and this is not counting CRC etc. > > If you were to set an MTU of say 64 bytes and tried to do a http or ftp, > > how accurate do you think the calculation would be? I would think not > > very different. > > I do think we handle this situation, but I''m not quite sure that I fully > understand the question (sorry). >Assume the following: - You had ethernet end to end. Is there still a problem? - Take it a notch up and assume you had ethernet with MTU of 64B. This way you will have all your packets being small and having high overhead. Do you still have a problem?> > > Does it matter if it is accurate on the majority of the cases? > > - For further reflection: Have you considered the case where the rate > > table has already been considered on some link speed in user space and > > then somewhere post-config the physical link speed changes? This would > > happen in the case where ethernet AN is involved and the partner makes > > some changes (use ethtool). > > > > I would say the last bullet is a more interesting problem than a corner > > case of some link layer technology that has high overhead. > > We only claim to do magic on ATM/ADSL links... nothing else ;-) >This is well and good given the focus of your thesis. Up/down here we need something more generic. Your masters-thesis is a good start but consider doing the phd next and complete this work;->> > > Your work would be more interesting if it was generic for many link > > layers instead of just ATM. > > Well, we did consider to do so, but we though that it would be harder to > get it into the kernel. > > Actually thats the reason for the defines: > #define ATM_CELL_SIZE 53 > #define ATM_CELL_PAYLOAD 48 > > Changing these should should make it possible to adapt to any other SAR > (Segment And Reasembly) link layer. >You are still speaking ATM (and the above may still be valid), but: Could you for example look at the netdevice->type and from that figure out the link layer overhead and compensate for it. Obviously a lot more useful if such activity is doable in user space without any knowledge of the kernel? and therefore zero change to the kernel and everything then becomes forward and backward compatible. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer
2006-Jun-16 08:26 UTC
[PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
(Resend message bounced to LARTC) The Linux traffic''s control engine inaccurately calculates transmission times for packets sent over ADSL links. For some packet sizes the error rises to over 50%. This occurs because ADSL uses ATM as its link layer transport, and ATM transmits packets in fixed sized 53 byte cells. The following patches to iproute2 and the kernel add an option to calculate traffic transmission times over all ATM links, including ADSL, with perfect accuracy. A longer presentation of the patch, its rational, what it does and how to use it can be found here: http://www.stuart.id.au/russell/files/tc/tc-atm/ A earlier version of the patch, and a _detailed_ empirical investigation of its effects can be found here: http://www.adsl-optimizer.dk/ The patches are both backwards and forwards compatible. This means unpatched kernels will work with a patched version of iproute2, and an unpatched iproute2 will work on patches kernels. This is a combined effort of Jesper Brouer and Russell Stuart, to get these patches into the upstream kernel. Let the discussion start about what we need to change to get this upstream? We see this as a feature enhancement, as thus hope that it can be queued in davem''s net-2.6.18.git tree. --- Regards, Jesper Brouer & Russell Stuart.
Patrick McHardy
2006-Jun-20 00:54 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote:> - For further reflection: Have you considered the case where the rate > table has already been considered on some link speed in user space and > then somewhere post-config the physical link speed changes? This would > happen in the case where ethernet AN is involved and the partner makes > some changes (use ethtool). > > I would say the last bullet is a more interesting problem than a corner > case of some link layer technology that has high overhead. > Your work would be more interesting if it was generic for many link > layers instead of just ATM.I''ve thought about this a couple of times, scaling the virtual clock rate should be enough for "simple" qdiscs like TBF or HTB, which have a linear relation between time and bandwidth. I haven''t really thought about the effects on HFSC yet, on a small scale the relation is non-linear. But this is a different problem from trying to accomodate for link-layer overhead. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2006-Jun-20 01:04 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote:> You are still speaking ATM (and the above may still be valid), but: > Could you for example look at the netdevice->type and from that figure > out the link layer overhead and compensate for it. > Obviously a lot more useful if such activity is doable in user space > without any knowledge of the kernel? and therefore zero change to the > kernel and everything then becomes forward and backward compatible.It would be nice to have support for HFSC as well, which unfortunately needs to be done in the kernel since it doesn''t use rate tables. What about qdiscs like SFQ (which uses the packet size in quantum calculations)? I guess it would make sense to use the wire-length there as well. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2006-20-06 at 02:54 +0200, Patrick McHardy wrote:> jamal wrote: > > - For further reflection: Have you considered the case where the rate > > table has already been considered on some link speed in user space and > > then somewhere post-config the physical link speed changes? This would > > happen in the case where ethernet AN is involved and the partner makes > > some changes (use ethtool). > >[..]> I''ve thought about this a couple of times, scaling the virtual clock > rate should be enough for "simple" qdiscs like TBF or HTB, which have > a linear relation between time and bandwidth. I haven''t really thought > about the effects on HFSC yet, on a small scale the relation is > non-linear.Does HFSC not depend on bandwith? How is rate control achieved?> But this is a different problem from trying to accomodate > for link-layer overhead. >Yes it is different issue. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy wrote:> jamal wrote: > > You are still speaking ATM (and the above may still be valid), but: > > Could you for example look at the netdevice->type and from that figure > > out the link layer overhead and compensate for it. > > Obviously a lot more useful if such activity is doable in user space > > without any knowledge of the kernel? and therefore zero change to the > > kernel and everything then becomes forward and backward compatible. > > It would be nice to have support for HFSC as well, which unfortunately > needs to be done in the kernel since it doesn''t use rate tables. > What about qdiscs like SFQ (which uses the packet size in quantum > calculations)? I guess it would make sense to use the wire-length > there as well.Didnt even think of that ;-> Is it getting too complicated? BTW, I forgot to mention one thing on the bandwidth issue is we could do is send netlink events on link speed changes too; some listener somewhere would then do the adjustment. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2006-Jun-20 15:09 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote:> On Tue, 2006-20-06 at 02:54 +0200, Patrick McHardy wrote: > >>jamal wrote: >> >>>- For further reflection: Have you considered the case where the rate >>>table has already been considered on some link speed in user space and >>>then somewhere post-config the physical link speed changes? This would >>>happen in the case where ethernet AN is involved and the partner makes >>>some changes (use ethtool). >>> > > [..] > >>I''ve thought about this a couple of times, scaling the virtual clock >>rate should be enough for "simple" qdiscs like TBF or HTB, which have >>a linear relation between time and bandwidth. I haven''t really thought >>about the effects on HFSC yet, on a small scale the relation is >>non-linear. > > > Does HFSC not depend on bandwith? How is rate control achieved?"Depend on bandwidth" is not the right term. All of TBF, HTB and HFSC provide bandwidth per time, but with TBF and HTB the relation between the amount of bandwidth is linear to the amount of time, with HFSC it is only on a linear on larger scale since it uses service curves, which are represented as two linear pieces. So you have bandwidth b1 for time t1, bandwidth b2 after that until eternity. By scaling the clock rate you alter after how much time b2 kicks in, which affects the guaranteed delays. The end result should be that both bandwidth and delay scale up or down proportionally, but I''m not sure that this is what HFSC would do in all cases (on small scale). But it should be easy to answer with a bit more time for visualizing it. The thing I''m not sure about is whether this wouldn''t be handled better by userspace, if the link layer speed changes you might not want proportional scaling but prefer to still give a fixed amount of that bandwidth to some class, for example VoIP traffic. Do we have netlink notifications for link speed changes? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2006-Jun-20 15:16 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
jamal wrote:> On Tue, 2006-20-06 at 03:04 +0200, Patrick McHardy wrote: > >>It would be nice to have support for HFSC as well, which unfortunately >>needs to be done in the kernel since it doesn''t use rate tables. >>What about qdiscs like SFQ (which uses the packet size in quantum >>calculations)? I guess it would make sense to use the wire-length >>there as well. > > > Didnt even think of that ;-> > Is it getting too complicated?The code wouldn''t be very complicated, it just adds some overhead. If you do something like I described in my previous mail the overhead for people not using it would be an additional pointer test before reading skb->len. I guess we could also make it a compile time option. I personally think this is something that really improves our quality of implementation, after all, its "wire" resources qdiscs are meant to manage.> BTW, I forgot to mention one thing on the bandwidth issue is we could do > is send netlink events on link speed changes too; some listener > somewhere would then do the adjustment.See the mail I just wrote :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Russell Stuart
2006-Jun-23 12:37 UTC
RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Thu, 2006-06-22 at 14:29 -0400, jamal wrote:> Russell, > > I did look at what you sent me and somewhere in those discussions i > argue that the changes compensate to make the rate be a goodput > instead of advertised throughput.I did see that, but didn''t realise you were responding to me. A lot of discussion has gone on since and evidently quite a bit of which was addressed to me. I will try to answer the some of the points. Sorry for the digest like reply :( On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote:> I''m > not sure if that matters but for modern processors I''m also sceptical > that the clever computation is actually any faster than just doing the > maths, especially if something cache intensive is also running.Assuming you are referring to the rate tables - I hadn''t thought about it, but I guess I would agree. However, this patch wasn''t trying to radically re-engineer the traffic control engines rate calculation code. Quite the reverse I - was was trying to change it as little as possible. The kernel part of the patch actually only introduced one small change - the optional addition of a constant the packet length. On Thu, 2006-06-15 at 08:57 -0400, jamal wrote:> But i dont think it is ACKs perse that you or Russell are contending > cause these issues. It''s the presence of ATM . And all evidence seems to > point to the fact that ISPs bill you for something other than your > point of view, no?I don''t know about anywhere else, but certainly here in Australia some ISP''s creative in how they advertise their link speeds. Again that is not the issue we were trying to address with the patch. On Thu, 2006-06-15 at 08:57 -0400, jamal wrote:> You are still speaking ATM (and the above may still be valid), but: > Could you for example look at the netdevice->type and from that figure > out the link layer overhead and compensate for it.As others have pointed out, this doesn''t work for the ADSL user. An ADSL modem is connected to the box using either ethernet, wireless or USB. On Thu, 2006-06-15 at 09:03 -0400, jamal wrote:> It is probably doable by just looking at netdevice->type and figuring > the link layer technology. Totally in user space and building the > compensated for tables there before telling the kernel (advantage is no > kernel changes and therefore it would work with older kernels as well).Others have had this same thought, and have spent time trying to come up with a user space only solution. They failed because it isn''t possible. To understand why see this thread: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html Also, the user space patch does improve the performance of older kernels (ie unpatched kernels). Rather than getting the rate wrong 99.9% of the time, older kernels only get it wrong 14% of the time, on average. On Tue, 2006-06-20 at 03:04 +0200, Patrick McHardy wrote:> What about qdiscs like SFQ (which uses the packet size in quantum > calculations)? I guess it would make sense to use the wire-length > there as well.Being pedantic, SQF automatically assigns traffic to classes and gives each class an equal share of the available bandwidth. As I am sure you are aware SQF''s trick is that it randomly changes its classification algorithm - every second in the Linux implementation. If there are errors in rate calculation this randomisation will ensure they are distributed equally between the classes as time goes on. So no, accurate packets sizes are not that important to SQF. But they are important to many other qdiscs, and I am sure that was your point. SQF just happened to be a bad example. On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> What this means is that Linux computes based on ethernet > headers. Somewhere downstream ATM (refer to above) comes in and that > causes mismatch in what Linux expects to be the bandwidth and what > your service provider who doesnt account for the ATM overhead when > they sell you "1.5Mbps". > Reminds me of hard disk vendors who define 1K to be 1000 to show > how large their drives are. > Yes, Linux cant tell if your service provider is lying to you.No, it can''t. But you can measure the bandwidth you are getting from your ISP and plug that into the tc command line. The web page I sent to you describes how to do this for ADSL lines. On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> > On Mon, 2006-19-06 at 21:31 +0200, Jesper Dangaard Brouer wrote: > > The issue here is, that ATM does not have fixed overhead (due to alignment > > and padding). This means that a fixed reduction of the bandwidth is not > > the solution. We could reduce the bandwidth to the worst-case overhead, > > which is 62%, I do not think that is a good solution... > > > > I dont see it as wrong to be honest with you. Your mileage may vary.Jamal am I reading this correctly? Did you just say that you don''t see having to reduce your available bandwidth by 62% to take account of deficiencies in Linux traffic engine as wrong? Why on earth would you say that? On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> Dont have time to read your doc and dont get me wrong, there is a > "quark" practical problem: As practical as the hard disk manufacturer > who claims that they have 11G drive when it is 10G.This reads like we don''t see the same problem in the same way. Your disk example is a 10% error that effects less savvy users. The ATM problem we are trying to address effects a big chunk of all Linux''s traffic control users. (Big chunk as counted by boxes, not bytes.) Something like 60% of all broadband connections use ADSL. Most of the remainder live in the US and use cable. Or at least so says this web page: http://tinyurl.com/pydnj Extrapolating from that, I think it is safe to say fair chunk of all people using the Linux Traffic Control engine use ADSL, and thus may benefit from this patch. Now it is true that right now these people may not see a great benefit from the patch. Those that will are divided into two categories: 1. Those that saturate their upstream bandwidth. This isn''t hard to do on ADSL, due to its first letter. It effects people who use run web sites, email lists - which is bugger all, and those who play games or run P2P - which is most home users. 2. Those that use Voip. Again there aren''t many people who do this right now, but that will change. Its not hard to envisage a future where real time streaming like this will come to dominate Internet traffic. Voip effects the other major group of users out there - business. Ergo I believe that in the long term the patch will benefit a lot of people. The next argument is how much it will benefit them. It turns out that the patch is only useful if you have some small packets that MUST have priority on the ADSL link. Jesper''s traffic was TCP ACK''s (he was addressing problem 1) and mine was VOIP traffic. This would seem a trivial problem to solve with Linux''s traffic control engine. I don''t know what path Jesper took - but I tried using it in the obvious fashion and it didn''t work. A couple of large emails would take out an office''s phone system. It took me days of head scratching to figure out why. The cause was ADSL using ATM as a carrier. In my case I was using approx 110 byte packets. Do the sums. It takes 3 ATM cells to carry an 110 byte packets. That is 159 bytes. A 50% error. That meant the ISP was doing the traffic control, and he wasn''t prioritising VOIP traffic. Sure, you can optimise the values you pass to tc for 110 byte packets. But then it fails miserably for a other packet sizes; such as a different VOIP codec, or TCP acks. The only solution is to understate your available bandwidth by at least a 1/3rd. I hope you don''t consider that acceptable. The reason this patch wasn''t thought of until now is that large packets don''t see much benefit. For similar packet sizes the maximum error is determined by the ATM cell size (you can be +/- one ATM cell) and that is 53 bytes. This means on packets around MTU size the error is 53/1500 = 3.5%. Hardly worth worrying about. For traditional Internet usage, ie the one ADSL was designed for, the upstream channel, ie the one carrying the TCP ACKS, was rarely saturated. The speed was limited by the downstream channel - the one carrying MTU sized packets. So in summary - no, Jamal, I see no correspondence between your 10/11Gb hard drives example and this patch. On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> It needs to be > resolved - but not in an intrusive way in my opinion.To be honest, I didn''t think the patch was that intrusive. It adds an optional constant to the skb->len. Hardly earth shattering. On Tue, 2006-06-20 at 16:45 +0200, Patrick McHardy wrote:> Handling all qdiscs would mean adding a pointer to a mapping table > to struct net_device and using something like "skb_wire_len(skb, dev)" > instead of skb->len in the queueing layer. That of course doesn''t > mean that we can''t still provide pre-adjusted ratetables for qdiscs > that use them.Yes, that would work well, and is probably how it should of been done when the kernel stuff was originally written. As it happens Jesper''s original solution was closer to this. The reason we choose not to go that way it is would change the kernel-userspace API. The current patch solves the problem and works well as possible on all kernel versions - both patched and unpatched. Now that I think about to change things the way you suggest here does seem simple enough. But it probably belongs in a different patch. We wrote this patch to fix a specific problem with ATM links, and it should succeed or fail on the merits of doing that. Cleaning up the kernel code to do what you suggest is a different issue. Let whether it to should be done, or not, be based on its own merits. On Tue, 2006-06-20 at 11:38 -0400, jamal wrote:> The issue is really is whether Linux should be interested in the > throughput it is told about or the goodput (also known as effective > throughput) the service provider offers. Two different issues by > definition.<snip> On Thu, 2006-06-22 at 14:29 -0400, jamal wrote:> I did look at what you sent me and somewhere in those discussions i > argue that the changes compensate to make the rate be a goodput > instead of advertised throughput. Throughput is typically what > schedulers work with and is typically to what is on the wire. > Goodput tends to be end-to-end; so somewhere down the road ATM > "reduces" the goodput but not the throughput. > I am actaully just fine with telling the scheduler you have less > throughput than what your ISP is telling you. I am also > not against a generic change as long as it is non-intrusive because i > believe this is a practical issue and Patrick Mchardy says he can > deliver such a patch.I have read your throughput versus goodput thing a couple of times, and I''m sorry - I don''t understand. What is it you would like us to achieve? As for the patch being invasive, it changes 37 lines of kernel code. No other suggestion I have seen here will be that small. If making the patch generic, ie allowing it to handle cell sizes other than ATM, then let me know I will make the change on the weekend. It is just a user space change. One final point: if you are happy with an invasive patch that changes the world, I have a suggestion. Modularise the rate calculation function. We have qdisc modules, filter modules and whatnot - so add another type. Rate calculation. The current system can become the default rate calculation module if none is specified. Patrick can have his system, and Alan can have his. And we can add an ATM one. If you wish, I can (with Jespers help, I hope) re-do the patch in that style, producing the default one and an ATM one. My personal preference though would be to put this patch in, and then let this new idea stand or fall on its own merits.
Russell Stuart
2006-Jun-23 14:40 UTC
[LARTC] RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Thu, 2006-06-22 at 14:29 -0400, jamal wrote:> Russell, > > I did look at what you sent me and somewhere in those discussions i > argue that the changes compensate to make the rate be a goodput > instead of advertised throughput.I did see that, but didn''t realise you were responding to me. A lot of discussion has gone on since and evidently quite a bit of which was addressed to me. I will try to answer the some of the points. Sorry for the digest like reply :( On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote:> I''m > not sure if that matters but for modern processors I''m also sceptical > that the clever computation is actually any faster than just doing the > maths, especially if something cache intensive is also running.Assuming you are referring to the rate tables - I hadn''t thought about it, but I guess I would agree. However, this patch wasn''t trying to radically re-engineer the traffic control engines rate calculation code. Quite the reverse I - was was trying to change it as little as possible. The kernel part of the patch actually only introduced one small change - the optional addition of a constant the packet length. On Thu, 2006-06-15 at 08:57 -0400, jamal wrote:> But i dont think it is ACKs perse that you or Russell are contending > cause these issues. It''s the presence of ATM . And all evidence seems to > point to the fact that ISPs bill you for something other than your > point of view, no?I don''t know about anywhere else, but certainly here in Australia some ISP''s creative in how they advertise their link speeds. Again that is not the issue we were trying to address with the patch. On Thu, 2006-06-15 at 08:57 -0400, jamal wrote:> You are still speaking ATM (and the above may still be valid), but: > Could you for example look at the netdevice->type and from that figure > out the link layer overhead and compensate for it.As others have pointed out, this doesn''t work for the ADSL user. An ADSL modem is connected to the box using either ethernet, wireless or USB. On Thu, 2006-06-15 at 09:03 -0400, jamal wrote:> It is probably doable by just looking at netdevice->type and figuring > the link layer technology. Totally in user space and building the > compensated for tables there before telling the kernel (advantage is no > kernel changes and therefore it would work with older kernels as well).Others have had this same thought, and have spent time trying to come up with a user space only solution. They failed because it isn''t possible. To understand why see this thread: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html Also, the user space patch does improve the performance of older kernels (ie unpatched kernels). Rather than getting the rate wrong 99.9% of the time, older kernels only get it wrong 14% of the time, on average. On Tue, 2006-06-20 at 03:04 +0200, Patrick McHardy wrote:> What about qdiscs like SFQ (which uses the packet size in quantum > calculations)? I guess it would make sense to use the wire-length > there as well.Being pedantic, SQF automatically assigns traffic to classes and gives each class an equal share of the available bandwidth. As I am sure you are aware SQF''s trick is that it randomly changes its classification algorithm - every second in the Linux implementation. If there are errors in rate calculation this randomisation will ensure they are distributed equally between the classes as time goes on. So no, accurate packets sizes are not that important to SQF. But they are important to many other qdiscs, and I am sure that was your point. SQF just happened to be a bad example. On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> What this means is that Linux computes based on ethernet > headers. Somewhere downstream ATM (refer to above) comes in and that > causes mismatch in what Linux expects to be the bandwidth and what > your service provider who doesnt account for the ATM overhead when > they sell you "1.5Mbps". > Reminds me of hard disk vendors who define 1K to be 1000 to show > how large their drives are. > Yes, Linux cant tell if your service provider is lying to you.No, it can''t. But you can measure the bandwidth you are getting from your ISP and plug that into the tc command line. The web page I sent to you describes how to do this for ADSL lines. On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> > On Mon, 2006-19-06 at 21:31 +0200, Jesper Dangaard Brouer wrote: > > The issue here is, that ATM does not have fixed overhead (due to alignment > > and padding). This means that a fixed reduction of the bandwidth is not > > the solution. We could reduce the bandwidth to the worst-case overhead, > > which is 62%, I do not think that is a good solution... > > > > I dont see it as wrong to be honest with you. Your mileage may vary.Jamal am I reading this correctly? Did you just say that you don''t see having to reduce your available bandwidth by 62% to take account of deficiencies in Linux traffic engine as wrong? Why on earth would you say that? On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> Dont have time to read your doc and dont get me wrong, there is a > "quark" practical problem: As practical as the hard disk manufacturer > who claims that they have 11G drive when it is 10G.This reads like we don''t see the same problem in the same way. Your disk example is a 10% error that effects less savvy users. The ATM problem we are trying to address effects a big chunk of all Linux''s traffic control users. (Big chunk as counted by boxes, not bytes.) Something like 60% of all broadband connections use ADSL. Most of the remainder live in the US and use cable. Or at least so says this web page: http://tinyurl.com/pydnj Extrapolating from that, I think it is safe to say fair chunk of all people using the Linux Traffic Control engine use ADSL, and thus may benefit from this patch. Now it is true that right now these people may not see a great benefit from the patch. Those that will are divided into two categories: 1. Those that saturate their upstream bandwidth. This isn''t hard to do on ADSL, due to its first letter. It effects people who use run web sites, email lists - which is bugger all, and those who play games or run P2P - which is most home users. 2. Those that use Voip. Again there aren''t many people who do this right now, but that will change. Its not hard to envisage a future where real time streaming like this will come to dominate Internet traffic. Voip effects the other major group of users out there - business. Ergo I believe that in the long term the patch will benefit a lot of people. The next argument is how much it will benefit them. It turns out that the patch is only useful if you have some small packets that MUST have priority on the ADSL link. Jesper''s traffic was TCP ACK''s (he was addressing problem 1) and mine was VOIP traffic. This would seem a trivial problem to solve with Linux''s traffic control engine. I don''t know what path Jesper took - but I tried using it in the obvious fashion and it didn''t work. A couple of large emails would take out an office''s phone system. It took me days of head scratching to figure out why. The cause was ADSL using ATM as a carrier. In my case I was using approx 110 byte packets. Do the sums. It takes 3 ATM cells to carry an 110 byte packets. That is 159 bytes. A 50% error. That meant the ISP was doing the traffic control, and he wasn''t prioritising VOIP traffic. Sure, you can optimise the values you pass to tc for 110 byte packets. But then it fails miserably for a other packet sizes; such as a different VOIP codec, or TCP acks. The only solution is to understate your available bandwidth by at least a 1/3rd. I hope you don''t consider that acceptable. The reason this patch wasn''t thought of until now is that large packets don''t see much benefit. For similar packet sizes the maximum error is determined by the ATM cell size (you can be +/- one ATM cell) and that is 53 bytes. This means on packets around MTU size the error is 53/1500 = 3.5%. Hardly worth worrying about. For traditional Internet usage, ie the one ADSL was designed for, the upstream channel, ie the one carrying the TCP ACKS, was rarely saturated. The speed was limited by the downstream channel - the one carrying MTU sized packets. So in summary - no, Jamal, I see no correspondence between your 10/11Gb hard drives example and this patch. On Tue, 2006-06-20 at 10:06 -0400, jamal wrote:> It needs to be > resolved - but not in an intrusive way in my opinion.To be honest, I didn''t think the patch was that intrusive. It adds an optional constant to the skb->len. Hardly earth shattering. On Tue, 2006-06-20 at 16:45 +0200, Patrick McHardy wrote:> Handling all qdiscs would mean adding a pointer to a mapping table > to struct net_device and using something like "skb_wire_len(skb, dev)" > instead of skb->len in the queueing layer. That of course doesn''t > mean that we can''t still provide pre-adjusted ratetables for qdiscs > that use them.Yes, that would work well, and is probably how it should of been done when the kernel stuff was originally written. As it happens Jesper''s original solution was closer to this. The reason we choose not to go that way it is would change the kernel-userspace API. The current patch solves the problem and works well as possible on all kernel versions - both patched and unpatched. Now that I think about to change things the way you suggest here does seem simple enough. But it probably belongs in a different patch. We wrote this patch to fix a specific problem with ATM links, and it should succeed or fail on the merits of doing that. Cleaning up the kernel code to do what you suggest is a different issue. Let whether it to should be done, or not, be based on its own merits. On Tue, 2006-06-20 at 11:38 -0400, jamal wrote:> The issue is really is whether Linux should be interested in the > throughput it is told about or the goodput (also known as effective > throughput) the service provider offers. Two different issues by > definition.<snip> On Thu, 2006-06-22 at 14:29 -0400, jamal wrote:> I did look at what you sent me and somewhere in those discussions i > argue that the changes compensate to make the rate be a goodput > instead of advertised throughput. Throughput is typically what > schedulers work with and is typically to what is on the wire. > Goodput tends to be end-to-end; so somewhere down the road ATM > "reduces" the goodput but not the throughput. > I am actaully just fine with telling the scheduler you have less > throughput than what your ISP is telling you. I am also > not against a generic change as long as it is non-intrusive because i > believe this is a practical issue and Patrick Mchardy says he can > deliver such a patch.I have read your throughput versus goodput thing a couple of times, and I''m sorry - I don''t understand. What is it you would like us to achieve? As for the patch being invasive, it changes 37 lines of kernel code. No other suggestion I have seen here will be that small. If making the patch generic, ie allowing it to handle cell sizes other than ATM, then let me know I will make the change on the weekend. It is just a user space change. One final point: if you are happy with an invasive patch that changes the world, I have a suggestion. Modularise the rate calculation function. We have qdisc modules, filter modules and whatnot - so add another type. Rate calculation. The current system can become the default rate calculation module if none is specified. Patrick can have his system, and Alan can have his. And we can add an ATM one. If you wish, I can (with Jespers help, I hope) re-do the patch in that style, producing the default one and an ATM one. My personal preference though would be to put this patch in, and then let this new idea stand or fall on its own merits.
Russell Stuart
2006-Jun-26 00:45 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Fri, 2006-06-23 at 17:21 +0200, Patrick McHardy wrote:> Not really. The randomization doesn''t happen by default, but it doesn''t > influence this anyway. SFQ allows flows to send up to "quantum" bytes > at a time before moving on to the next one. A flow that sends 75 * 20 > byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it > needs 4800bytes. A flow that sents 1500byte packets will only need > 1504 bytes on the wire, but will be treated equally. So it does make > a different for SFQ.I hadn''t even thought to check. My bad. The S in SFQ stands for stochastic, so something that does without randomisation the algorithm implemented couldn''t really be called SFQ - particularly as it weakens the algorithm considerably. I hope that most users do specify a perturb. Your 20 byte example is hardly realistic. skb->len includes the 14 byte ethernet header, so there is a total of 6 data bytes in a 20 byte packet. The IP header alone is 20 bytes. TCP as implemented on Linux adds another 32 bytes (20 + the rtt option). In other words I agree with Jamal''s comments elsewhere - optimising for MPU sized packets doesn''t seem like a win.> Not a problem as long as the new stuff doesn''t break anything existing. > My patch introduces a TCA_STAB (for size table), similar to the _RTAB > attributes. Old iproute with new kernel and new iproute with old kernel > both work fine.OK, good.> Its not about cleanup, its about providing the same capabilities > to all qdiscs instead of just a few selected ones and generalizing > it so it is also usable for non-ATM overhead calculations.Perhaps I chose my words poorly. My intent was to contrast the size and goals of the two proposed patches. The ATM patch is a 37 line patch. It includes some minor cleanups. From the pseudo code you have posted what you are proposing is a more ambitious and much larger patch that moves a chunk of user space code into the kernel. I am a complete newbie when it comes to getting code into the kernel, but that strikes me as contentious. I would rather not have the ATM patch depend on it. By the by, here are a couple of observations: 1. The entries in the current rtab are already very closely related to packet lengths. They are actually the packet length multiplied by a constant that converts the units from "bytes" to "jiffies". The constant is the same for all entries in the table. 2. As such, the current rtab could already be used by SFQ and any other qdisc that needs to know the packet length. That SFQ doesn''t do this is probably because it doesn''t effect its performance overly. 3. Be that as it may, the current RTAB isn''t in the most convenient form for SFQ, and I am guessing it is in a very inconvenient form for HFSC. Adding a new version that is identical except that it contains the raw packet length would be a simple change. In that format it could be used by all qdiscs. The users of the existing rtab would have to do the multiplication that converts the packet length to jiffies in the kernel. This means the conceptually at least, should the gootput change you need to change this one constant, not the entire table. 4. Much as you seem to dislike having the rate / packet length calculations in user space, having them there makes it easy to add new technologies such as ATM. You just have to change a user space tool - not the kernel. 5. We still did have to modify the kernel for ATM. That was because of its rather unusual characteristics. However, it you look at the size of modifications made to the kernel verses the size made to the user space tool, (37 lines versus 303 lines,) the bulk of the work was does in user space.
Russell Stuart
2006-Jun-26 02:50 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Fri, 2006-06-23 at 17:21 +0200, Patrick McHardy wrote:> Not really. The randomization doesn''t happen by default, but it doesn''t > influence this anyway. SFQ allows flows to send up to "quantum" bytes > at a time before moving on to the next one. A flow that sends 75 * 20 > byte will in the eyes of SFQ use 1500bytes, on the (ethernet) wire it > needs 4800bytes. A flow that sents 1500byte packets will only need > 1504 bytes on the wire, but will be treated equally. So it does make > a different for SFQ.I hadn''t even thought to check. My bad. The S in SFQ stands for stochastic, so something that does without randomisation the algorithm implemented couldn''t really be called SFQ - particularly as it weakens the algorithm considerably. I hope that most users do specify a perturb. Your 20 byte example is hardly realistic. skb->len includes the 14 byte ethernet header, so there is a total of 6 data bytes in a 20 byte packet. The IP header alone is 20 bytes. TCP as implemented on Linux adds another 32 bytes (20 + the rtt option). In other words I agree with Jamal''s comments elsewhere - optimising for MPU sized packets doesn''t seem like a win.> Not a problem as long as the new stuff doesn''t break anything existing. > My patch introduces a TCA_STAB (for size table), similar to the _RTAB > attributes. Old iproute with new kernel and new iproute with old kernel > both work fine.OK, good.> Its not about cleanup, its about providing the same capabilities > to all qdiscs instead of just a few selected ones and generalizing > it so it is also usable for non-ATM overhead calculations.Perhaps I chose my words poorly. My intent was to contrast the size and goals of the two proposed patches. The ATM patch is a 37 line patch. It includes some minor cleanups. From the pseudo code you have posted what you are proposing is a more ambitious and much larger patch that moves a chunk of user space code into the kernel. I am a complete newbie when it comes to getting code into the kernel, but that strikes me as contentious. I would rather not have the ATM patch depend on it. By the by, here are a couple of observations: 1. The entries in the current rtab are already very closely related to packet lengths. They are actually the packet length multiplied by a constant that converts the units from "bytes" to "jiffies". The constant is the same for all entries in the table. 2. As such, the current rtab could already be used by SFQ and any other qdisc that needs to know the packet length. That SFQ doesn''t do this is probably because it doesn''t effect its performance overly. 3. Be that as it may, the current RTAB isn''t in the most convenient form for SFQ, and I am guessing it is in a very inconvenient form for HFSC. Adding a new version that is identical except that it contains the raw packet length would be a simple change. In that format it could be used by all qdiscs. The users of the existing rtab would have to do the multiplication that converts the packet length to jiffies in the kernel. This means the conceptually at least, should the gootput change you need to change this one constant, not the entire table. 4. Much as you seem to dislike having the rate / packet length calculations in user space, having them there makes it easy to add new technologies such as ATM. You just have to change a user space tool - not the kernel. 5. We still did have to modify the kernel for ATM. That was because of its rather unusual characteristics. However, it you look at the size of modifications made to the kernel verses the size made to the user space tool, (37 lines versus 303 lines,) the bulk of the work was does in user space.
Russell Stuart
2006-Jun-26 04:23 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On 25/06/2006 12:13 AM, jamal wrote:> You can actually stop reading here if you have gathered the view at > this point that i am not objecting to the simple approach Patrick is > going with...Perhaps this is my problem. I am not sure I understand what Patrick is proposing. I can wait for his patch, I guess.> Indeed and i referred to it in the exchanges. > And yes, I was arguing that the tc scheme you describe would not be so > bad either if the cost of making a generic change is expensive.OK. I take it from this you think there is merit in the idea of adding code so the kernel can calculate the ATM link speeds correctly. The discussion is really about the best way to go about it? If so, excellent. I am not really too fussy about how it is achieved, I just want my VOIP connections to work well on stock kernels.> There are a lot of link layer issues that you may end up knowing of > (other than the ATM fragmentation overhead) in regards to something > downstream and you keep adding knobs is just adding more bloat. > Example: If that 3rd hop was wireless that happened to be doing CDMA RLP > with a lot of retransmits, or wireless that varied its throughput from > 1-3Mbps at any point in time or it was a satellite link that had a lot > of latency etc etc. You could always have some way to tweak these via > the kernel. In-fact people have written schedulers specifically for > these sorts of link layer problems (I think even some of the IEEE 802.11 > or wimax folks have standardized specific schedulers). You basically > have to draw a line somewhere. My line was "can it be done via user > space? yes - do it there".If you mean by adding lots of knobs, you mean we need a knob for 802.11, a knob for ATM, a knob for ethernet and so on, then we do need lots of knobs. And you need to know which of those layers is the bottle neck, so you know what knob to fit. But you only ever need one knob on a given link. I can only think of two ways out of needing lots of knobs. One is to have a qdisc that doesn''t need to know the link speed in order to shape traffic to it gets to the scheduling and not someone upstream. Sounds like black magic to me, but perhaps HFSC does this - I have not read the papers yet, but I plan to do so soon. The second way is to automatically calculate the link speed, using a daemon perhaps :). Again it sounds like black magic. Note that there is already code in the kernel that does this, but it lives in the layers above - in TCP and DCCP. I am referring to Westwood, and friends. These algorithms live in the layers above because the need feed back from the network - which can only come from the other end of connection unless ECN is working. I have not been able to figure out how Patrick intends to solve these problems from his posts so far, so I am waiting for his code. Hopefully it will include a lot of comments.> Patrick seems to have a simple way to compensate generically for link > layer fragmentation, so i will not argue the practically; hopefully that > settles it? ;->Yes, it does. It will be interesting to see what Patrick comes up with.
Russell Stuart
2006-Jun-26 06:27 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On 25/06/2006 12:13 AM, jamal wrote:> You can actually stop reading here if you have gathered the view at > this point that i am not objecting to the simple approach Patrick is > going with...Perhaps this is my problem. I am not sure I understand what Patrick is proposing. I can wait for his patch, I guess.> Indeed and i referred to it in the exchanges. > And yes, I was arguing that the tc scheme you describe would not be so > bad either if the cost of making a generic change is expensive.OK. I take it from this you think there is merit in the idea of adding code so the kernel can calculate the ATM link speeds correctly. The discussion is really about the best way to go about it? If so, excellent. I am not really too fussy about how it is achieved, I just want my VOIP connections to work well on stock kernels.> There are a lot of link layer issues that you may end up knowing of > (other than the ATM fragmentation overhead) in regards to something > downstream and you keep adding knobs is just adding more bloat. > Example: If that 3rd hop was wireless that happened to be doing CDMA RLP > with a lot of retransmits, or wireless that varied its throughput from > 1-3Mbps at any point in time or it was a satellite link that had a lot > of latency etc etc. You could always have some way to tweak these via > the kernel. In-fact people have written schedulers specifically for > these sorts of link layer problems (I think even some of the IEEE 802.11 > or wimax folks have standardized specific schedulers). You basically > have to draw a line somewhere. My line was "can it be done via user > space? yes - do it there".If you mean by adding lots of knobs, you mean we need a knob for 802.11, a knob for ATM, a knob for ethernet and so on, then we do need lots of knobs. And you need to know which of those layers is the bottle neck, so you know what knob to fit. But you only ever need one knob on a given link. I can only think of two ways out of needing lots of knobs. One is to have a qdisc that doesn''t need to know the link speed in order to shape traffic to it gets to the scheduling and not someone upstream. Sounds like black magic to me, but perhaps HFSC does this - I have not read the papers yet, but I plan to do so soon. The second way is to automatically calculate the link speed, using a daemon perhaps :). Again it sounds like black magic. Note that there is already code in the kernel that does this, but it lives in the layers above - in TCP and DCCP. I am referring to Westwood, and friends. These algorithms live in the layers above because the need feed back from the network - which can only come from the other end of connection unless ECN is working. I have not been able to figure out how Patrick intends to solve these problems from his posts so far, so I am waiting for his code. Hopefully it will include a lot of comments.> Patrick seems to have a simple way to compensate generically for link > layer fragmentation, so i will not argue the practically; hopefully that > settles it? ;->Yes, it does. It will be interesting to see what Patrick comes up with.
Russell Stuart
2006-Jun-27 06:19 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On 26/06/2006 9:10 PM, Patrick McHardy wrote:>>5. We still did have to modify the kernel for ATM. That was >> because of its rather unusual characteristics. However, >> it you look at the size of modifications made to the kernel >> verses the size made to the user space tool, (37 lines >> versus 303 lines,) the bulk of the work was does in user >> space. > > I''m sorry, but arguing that a limited special case solution is > better because it needs slightly less code is just not reasonable.Without seeing your actual proposal it is difficult to judge whether this is a reasonable trade-off or not. Hopefully we will see your code soon. Do you have any idea when?
Russell Stuart
2006-Jun-27 08:22 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On 26/06/2006 9:10 PM, Patrick McHardy wrote:>>5. We still did have to modify the kernel for ATM. That was >> because of its rather unusual characteristics. However, >> it you look at the size of modifications made to the kernel >> verses the size made to the user space tool, (37 lines >> versus 303 lines,) the bulk of the work was does in user >> space. > > I''m sorry, but arguing that a limited special case solution is > better because it needs slightly less code is just not reasonable.Without seeing your actual proposal it is difficult to judge whether this is a reasonable trade-off or not. Hopefully we will see your code soon. Do you have any idea when?
Russell Stuart
2006-Jul-06 00:39 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy wrote:> Unfortunately I still didn''t got to cleaning them up, so I''m sending > them in their preliminary state. Its not much that is missing, but > the netem usage of skb->cb needs to be integrated better, I failed > to move it to the qdisc_skb_cb so far because of circular includes.Cleanups aside, architecturally the bulk of your patch looks like a no-brainier to me. The calculation of packet length should be in one place. Caching it in skb->cb was a nice touch.> But nothing unfixable. I''m mostly interested if the current size-tables > can express what you need for ATM, I wasn''t able to understand the > big comment in tc_core.c in your patch.Unfortunately you do things in the wrong order for ATM. See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html for an overview of the problem, and then the attached email for a detailed description of how the current patch addresses it. It is a trivial fix. As I said earlier, RTAB and STAB contain the same numbers, just scaled differently. The ATM patch stuffed around with RTAB. With your patch in place it will have to do the same exactly the same thing with STAB - because RTAB and STAB carry the same data. So to me the two patches seem orthogonal. One observation is the size optimisation you applied to STAB, making it variable length, could also be applied to RTAB. In fact it should be. Then they would be identical, apart from the scaling. Even the lookup operation (performed in qdisc_init_len in your patch) would be identical. However, now you lot have made me go away and think, I have another idea on how to attack this. Perhaps it will be more palatable to you. It would replace RTAB and STAB with a 28 byte structure for most protocol stacks - well all I can think of off the top of my head, anyway. RTAB would have to remain for backwards compatibility, of course. _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Russell Stuart
2006-Jul-06 03:43 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy wrote:> Unfortunately I still didn''t got to cleaning them up, so I''m sending > them in their preliminary state. Its not much that is missing, but > the netem usage of skb->cb needs to be integrated better, I failed > to move it to the qdisc_skb_cb so far because of circular includes.Cleanups aside, architecturally the bulk of your patch looks like a no-brainier to me. The calculation of packet length should be in one place. Caching it in skb->cb was a nice touch.> But nothing unfixable. I''m mostly interested if the current size-tables > can express what you need for ATM, I wasn''t able to understand the > big comment in tc_core.c in your patch.Unfortunately you do things in the wrong order for ATM. See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html for an overview of the problem, and then the attached email for a detailed description of how the current patch addresses it. It is a trivial fix. As I said earlier, RTAB and STAB contain the same numbers, just scaled differently. The ATM patch stuffed around with RTAB. With your patch in place it will have to do the same exactly the same thing with STAB - because RTAB and STAB carry the same data. So to me the two patches seem orthogonal. One observation is the size optimisation you applied to STAB, making it variable length, could also be applied to RTAB. In fact it should be. Then they would be identical, apart from the scaling. Even the lookup operation (performed in qdisc_init_len in your patch) would be identical. However, now you lot have made me go away and think, I have another idea on how to attack this. Perhaps it will be more palatable to you. It would replace RTAB and STAB with a 28 byte structure for most protocol stacks - well all I can think of off the top of my head, anyway. RTAB would have to remain for backwards compatibility, of course. -------------- next part -------------- An embedded message was scrubbed... From: Russell Stuart <russell@stuart.id.au> Subject: Re: Getting ATM patches into the kernel Date: Fri, 19 May 2006 22:59:34 +1000 Size: 10566 Url: http://mailman.ds9a.nl/pipermail/lartc/attachments/20060706/fff4a390/attachment.mht
Russell Stuart
2006-Jul-10 08:44 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Fri, 2006-07-07 at 10:00 +0200, Patrick McHardy wrote:> Russell Stuart wrote: > > Unfortunately you do things in the wrong order for ATM. > > See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html > > for an overview of the problem, and then the attached email for > > a detailed description of how the current patch addresses it. > > It is a trivial fix. > > Actually that was the part I didn''t understand, you keep talking > (also in that comment in tc_core.c) about an "unknown overhead". > What is that and why would it be unknown? The mail you attached > is quite long, is there an simple example that shows what you > mean?The "unknown overhead" is just the overhead passed to tc using the "tc ... overhead xxx" option. It is probably what you intended to put into your addend attribute. It is "unknown" because the kernel currently doesn''t use it. It is passed in the tc_ratespec, but is ignored by the kernel as are most fields in there. The easy way to fix the "ATM" problem described in the big comment is simply to add the "overhead" to the packet length before doing the RTAB lookup. (Identical comments apply to STAB). If you don''t accept this or understand why, then go read the "long emails" which attempt to explain it in detail. Jesper''s initial version of the patch did just that, BTW. However if you do that then you have to adjust RTAB for all cases (not just ATM) to reflect that the kernel is now adding the overhead. Thus the RTAB tc sends to the kernel now changes for different kernel versions, making modern versions of tc incompatible with older kernels, and visa versa. I didn''t consider that acceptable. My solution to this to give the kernel the old format RTAB (ie the one that assumed the kernel didn''t add the overhead) and a small adjustment. This small adjustment is called cell_align in the ATM patch. You do the same thing with cell_align as the previous solution did with the overhead - ie add it in just before looking up RTAB. This is in effect all the kernel part of the ATM patch does - make the kernel accept the cell_align option, and add it to skb->len before looking up RTAB. The difference between cell_align and overhead is that cell_align is always 0 when there is no packetisation, and even when non zero it is small (less than 1<<cell_log, ie less than 8 for typical MTU''s). So for anything bar ATM it is zero which means old kernels are completely unaffected, and even for ATM not sending it produces a small error which means older kernels still benefit from the "ATM" user space patch. This makes the proposed "ATM" version of tc both forward and backward compatible. One other point arises here. The fields in "tc_ratespec" that "tc" fills and the kernel ignores are there so "tc show" will work. The essence of the problem is "tc" compiles the stuff you give it into a single "RTAB". That "RTAB" can''t be reverse compiled into the original numbers the user provided. So if "tc show" is to work, "tc" has to save that information somewhere. I don''t think the "tc_ratespec" was the best choice for two reasons. Firstly, having the fields show up in tc_ratespec makes it seem like the kernel can use them. It can''t, as the "overhead" example above shows. Secondly, from tc''s point of view it is inflexible. Over time new features have been be added to "tc", and each time a new way of encoding it in the existing "tc_ratespec" has to be invented. Thus we now have hacks like the storing the "overhead" in the upper bits of the MPU figure. A better solution would be to provide a TLV (ie a TCA_XXX constant) for TC''s private use. From the kernels point of view it would be an opaque structure which just saves and echos back when asked. This would solve both problems.> > However, now you lot have made me go away and think, I have > > another idea on how to attack this. Perhaps it will be > > more palatable to you. It would replace RTAB and STAB with > > a 28 byte structure for most protocol stacks - well all I can > > think of off the top of my head, anyway. RTAB would have to > > remain for backwards compatibility, of course. > > Can you describe in more detail?OK, but first I want to make the point that the only reason I suggest this is to get some sort of ATM patch into the kernel, as the current patch on the table is having a rough time. Alan Cox made the point earlier (if I understood him correctly) that this tabling lookup probably isn''t a big win on modern CPU''s - we may be better off moving it all into the kernel. Thinking about this, I tried to come up with a way of describing the mapping between skb->len and the on the wire packet length for every protocol I know. This is what I came up with. Assume we have a packet length L, which is to be transported by some protocol. For now we consider one protocol only, ie: TCP, PPP, ATM, Ethernet or whatever. I will generalise it to multiple protocols later. I think a generalised transformation can be made using using 5 numbers which are applied in this order: Overhead - A fixed overhead that is added to L. Mpu - Minimum packet size. If the result of (Overhead+L) is smaller that this, then the new result becomes this size. Round - The result is then rounded up to this many bytes. For protocols that always transmit single bytes this figure would be 1. If there were some protocol that transmitted data as 4 byte chunks then this would be 4. For ATM it is 48. CellPay - If the packet is broken down into smaller packets when sent, then this is the amount of data that will fit into each chunk. CallOver - This is the additional overhead each cell carries. The idea is the kernel would do this calculation on the fly for each packet. If you represent this set of number numbers as a comma separated list in the order they were presented above, then here are some examples: IP: 20 Ethernet: 18,64 PPP: 2 ATM: 0,0,48,48,5 It may be that 5 numbers are a overkill. It is for all protocols I am aware of - for those you could get away with 4. But I am no expert. The next step is to generalise for many protocols. As the protocols are stacked the length output by one protocol becoming the input length for the downstream one. So we just need to apply the same transformation serially. I will use ''+'' to indicate the stacking. For a typical ATM stack, PPPoE over LLC, we have: ppp:2+pppoe:6+ethernet:14,64+llc:8+all5:4+atm:0,0,48,48,5 If this were implemented naively, then the kernel would have to apply the above calculation 6 times, like this: Protocol InputLength OutputLength --------- ------------ ---------------- ppp skb->len skb->len+2 pppoe: skb->len+2 skb->len+2+6 ethernet: skb->len+2+6 skb->len+2+6+14 ... and so on. But it can be optimised. In this particular case we can combine those six operations into 1: adsl_pppoe_llc:34,64,48,48,5 The five numbers have the same meaning as before. It it not difficult to come up with a generalised rule that allows you to do this for most cases. For the remainder (if they exist - I can''t think of any) the kernel would have to apply the transformation iteratively. Before going on, it is worth while comparing this to the current RTAB solution (and by implication STAB): 1. Oddly, the number of steps and hence speed for common protocols is probably the same. Compare: RTAB - You have to add an OverHead in the general case. - You have to scale by cell_log. - You have to ensure the overhead+skb->len doesn''t overflow / underflow the RTAB. - You have to do the lookup. New - You have to add overhead. - You have to check the MPU. - You have to check if you have to apply Round,CellPay,CellOver - but you won''t have to for any protocol except ATM. 2. Because of the cell_log, RTAB gives an 100% accurate answer 1 time in every (1<<cell_log) packet lengths. The new version is always 100% accurate. 3. The new version isn''t as flexible as RTAB, as least from the kernels point of view. Conceivably there are protocols that could be handled by RTAB that are not handled by the new one. Since RTAB is computed in user space, this implies these new protocols might be handled by a user space change only. The new version would always require a kernel change. Note however that even RTAB required a kernel change for ATM, however. So far we have what your STAB would provide us with - a way to calculate the packet length. It takes 5 int''s for every protocol stack I can think of. It probably runs faster for most protocols but is less robust to the introduction of new protocols. But some qdisc''s need to know how long it takes to send a packet. This is what RTAB provides us with, in fact. So if we were to do away with RTAB completely, then we need a way for the kernel to covert packet lengths into the time it takes to send a packet. This is what I discuss next. The comments apply to both STAB and the new algorithm above, as they both compute packet lengths. If J is the time it takes to send one byte over a link, then we can compute RTAB from STAB like this: for (int i = 0; i < array_length(STAB); i += 1) RTAB[i] = STAB[i] * J; This is exactly the operation "tc" performs now in user space. It is possibly in user space because J is usually less than 1, and thus is most conveniently represented as a floating point number. Floating point operations in the kernel are verboten. I can think of two ways to move this operation into the kernel. The straight forward way is to represent J as a scale and a division. Ie the kernel does: RTAB[i] = (STAB[i] << scale) / divisor. The second way depends on that fact that most CPU''s can multiply two 32 uint''s to produce a 64 bit result in a single operation. I don''t know whether this operation is available to kernel code. But if it is, J can be represented as a 32 bit fixed point number, with the implied decimal point after the most significant 8 bits. Then this operation would suffice: extern long long mul64(unsigned long, a, unsigned long b); RTAB[i] = (unsigned long)(mul64(STAB[i], J) >> 24); This method doesn''t use division, and is probably faster on lower end CPU''s. It would handle 100G Ethernet on a machine with Hz == 1000, and 1200 bits/sec on a machine with Hz == 10000.
Russell Stuart
2006-Jul-18 02:06 UTC
RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:> And yes, I was arguing that the tc scheme you describe would not be so > bad either if the cost of making a generic change is expensive.<snip>> Patrick seems to have a simple way to compensate generically for link > layer fragmentation, so i will not argue the practically; hopefully that > settles it? ;->Things seem to have died down. Patrick''s patch seemed unrelated to ATM to me. I did put up another suggestion, but I don''t think anybody was too impressed with the idea. So that leave the current ATM patch as the only one we have on the table that addresses the ATM issue. Since you don''t think it is "too bad", can we proceed with it?
Russell Stuart
2006-Jul-18 04:45 UTC
[LARTC] RE: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Sat, 2006-06-24 at 10:13 -0400, jamal wrote:> And yes, I was arguing that the tc scheme you describe would not be so > bad either if the cost of making a generic change is expensive.<snip>> Patrick seems to have a simple way to compensate generically for link > layer fragmentation, so i will not argue the practically; hopefully that > settles it? ;->Things seem to have died down. Patrick''s patch seemed unrelated to ATM to me. I did put up another suggestion, but I don''t think anybody was too impressed with the idea. So that leave the current ATM patch as the only one we have on the table that addresses the ATM issue. Since you don''t think it is "too bad", can we proceed with it?
Russell Stuart
2006-Jul-20 04:56 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote:> Please excuse my silence, I was travelling and am still catching up > with my mails.Sorry. Had I realised you were busy I would of waited.> > - As it stands, it doesn''t help the qdiscs that use > > RTAB. So unless he proposes to remove RTAB entirely > > the ATM patch as it will still have to go in. > > Why? The length calculated by my STABs (or something similar) > is used by _all_ qdiscs. Not only for transmission time calculation, > but also for statistics and estimators.Oh. I didn''t see where it is used for the time calculation in your patch. Did I miss something, or is that the unfinished bit? This is possibly my stumbling block. If you don''t remove RTAB the ATM patch as stands will be needed. Your patch didn''t remove RTAB, and you didn''t say it was intended to, so I presume it wasn''t going to.> If the length calculation > doesn''t fit for ATM, that can be fixed.Yes of course. Just to be clear: as far as I am concerned this never was an issue.> > - A bit of effort was put into making this current > > ATM patch both backwards and forwards compatible. > > Patricks patch would work with newer kernels, > > obviously. Older kernels, and in particular the > > kernel that Debian is Etch is likely to distribute > > would miss out. > > True, but it provides more consistency, and making current > kernels behave better is more important than old kernels.I guess provided the new "tc" works with older kernels this is OK - although a disappoint to me. Works here being defined as "works as well as a previous the version of tc does". For me not working would be OK as well provided "tc" issued a warning message to the effect that it "needs kernel version XXX or above"", but doing that would probably require it to look at the kernel version. Looking at the kernel version in tc seems to be frowned upon.> You seem to have misunderstood my patch. It doesn''t need to > touch RTABs, it just calculates the packet length as seen > on the wire (whereever it is) and uses that thoughout the > entire qdisc layer.No, you have it in reverse - as I said above. My problem is that your patch does not touch RTAB. Several qdiscs really don''t care about the length of a packet (other than for keeping track of stats) - they just care about how long it takes to send. Off the top of my these are HTB, CBQ and TBF. They use RTAB to make this calculation. So unless you replace RTAB with STAB the current ATM patch will still be needed.> > One other point - the optimisation Patrick proposes > > for STAB (over RTAB) was to make the number of entries > > variable. This seems like a good idea. However there > > is no such thing as a free lunch, and if you did > > indeed reduce the number of entries to 16 for Ethernet > > (as I think Patrick suggested), then each entry would > > cover 1500/16 = 93 different packet lengths. Ie, > > entry 0 would cover packet lengths 0..93, entry 1 > > 94..186, and so on. A single entry can''t be right > > for all those packet lengths, so again we are back > > to a average 30% error for typical VOIP length > > packets. > > My patch doesn''t uses fixed sized cells, so it can deal > with anything, worst case is you use one cell per packet > size. Optimizing size and lookup speed for ethernet makes > a lot more sense than optimizing for ADSL.I was just responding to a point you made earlier, when you said STAB could only use 16 entries as opposed to the 256 used by RTAB. I suspect nobody would actually do that because of the inaccuracy it creates, so the comparison is perhaps unfair. I agree the flexibility of making STAB variable length is a good idea, and comes at 0 cost in the kernel. Andy Furniss wrote:> > Russell Stuart wrote: > >> The kernel will have to do a shift and a division > >> for each packet, which I assume is permissible. > > > > > > I guess that is for others to decide :-) I think Patrick has a point > > about sfq/htb drr, Like you I guess, I thought that alot of extra per > > packet calculations would have got an instant NO. > > Its only done once per packet (currently, it might be interesting to > override the length for specific classes and their childs, for example > if you do queueing on eth0 and have an DSL router one hop apart). > The division is gone in my patch btw.Unlike the packet length the time calculation can''t be cached in the skb. Most classes in HTB/CBQ use different packet transmission rates.
Russell Stuart
2006-Jul-20 05:47 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)
On Thu, 2006-07-20 at 01:00 +0400, Alexey Kuznetsov wrote:> Hello!So you really do exist? I thought it was just rumour.> Well, if fixed point arithmetics is not a problem.It shouldn''t be. Any decimal number can be expressed as a fraction, eg: 0.00123 = 123/100000 Which can be calculated as a multiply and a divide. With MTU''s up to 2048, it should be possible to do this with 99.9999% accuracy (ie 2048/2^23). With a bit more work in userspace (ie in tc), it can be be reduced to a multiply and a shift.> Plus, remember, the function is not R*size, it is at least > R*size+addend, to account for link overhead. Plus account for padding > of small packets. Plus, when policing it should deaccount already added > link headers, QoS counts only network payload.Yes, it is flexible - and has served us well up until now. It doesn''t work well for ATM, but with a small bit of extra calculation in the kernel it could. However, it turns out that ATM is a special case. If ATM''s cell payload was 58 bytes instead of 48 bytes (say), then it would not be possible to produce a RTAB that had small errors (eg < 10%) for smallish packet sizes (< 290 bytes). I seem to have trouble explaining why in a concise way that people understand, so I won''t try here. So when Alan Cox said our ATM patch didn''t solve the packetisation problem in general, he was right as our patch just built upon RTAB. Patrick''s STAB proposal in general either for that matter, as it is just another implementation of RTAB with the same limitations. The only way I can think of to solve it in general is to move many more calculations into the kernel - as I proposed in a long winded answer to Patrick earlier in this thread. But doing so would get rid of the table implementation and the flexibility it has given us to date. For that reason I feel uncomfortable with it. The engineering decision becomes this - are there any other protocols like ATM out there that could justify such a change? (In my more cynical moments I think of it differently - has/is the world going to make a second engineering fuck up on the scale of ATM again? How on earth did anyone decide that pushing data packets over ATM, as happens in ADSL, was a good idea?) I know of no other such protocols. But then I don''t have an encyclopedic knowledge of comms protocols, so that doesn''t mean much. I suspect you know a good deal more about them than I do. What say you?
Russell Stuart
2006-Jul-20 07:50 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote:> Please excuse my silence, I was travelling and am still catching up > with my mails.Sorry. Had I realised you were busy I would of waited.> > - As it stands, it doesn''t help the qdiscs that use > > RTAB. So unless he proposes to remove RTAB entirely > > the ATM patch as it will still have to go in. > > Why? The length calculated by my STABs (or something similar) > is used by _all_ qdiscs. Not only for transmission time calculation, > but also for statistics and estimators.Oh. I didn''t see where it is used for the time calculation in your patch. Did I miss something, or is that the unfinished bit? This is possibly my stumbling block. If you don''t remove RTAB the ATM patch as stands will be needed. Your patch didn''t remove RTAB, and you didn''t say it was intended to, so I presume it wasn''t going to.> If the length calculation > doesn''t fit for ATM, that can be fixed.Yes of course. Just to be clear: as far as I am concerned this never was an issue.> > - A bit of effort was put into making this current > > ATM patch both backwards and forwards compatible. > > Patricks patch would work with newer kernels, > > obviously. Older kernels, and in particular the > > kernel that Debian is Etch is likely to distribute > > would miss out. > > True, but it provides more consistency, and making current > kernels behave better is more important than old kernels.I guess provided the new "tc" works with older kernels this is OK - although a disappoint to me. Works here being defined as "works as well as a previous the version of tc does". For me not working would be OK as well provided "tc" issued a warning message to the effect that it "needs kernel version XXX or above"", but doing that would probably require it to look at the kernel version. Looking at the kernel version in tc seems to be frowned upon.> You seem to have misunderstood my patch. It doesn''t need to > touch RTABs, it just calculates the packet length as seen > on the wire (whereever it is) and uses that thoughout the > entire qdisc layer.No, you have it in reverse - as I said above. My problem is that your patch does not touch RTAB. Several qdiscs really don''t care about the length of a packet (other than for keeping track of stats) - they just care about how long it takes to send. Off the top of my these are HTB, CBQ and TBF. They use RTAB to make this calculation. So unless you replace RTAB with STAB the current ATM patch will still be needed.> > One other point - the optimisation Patrick proposes > > for STAB (over RTAB) was to make the number of entries > > variable. This seems like a good idea. However there > > is no such thing as a free lunch, and if you did > > indeed reduce the number of entries to 16 for Ethernet > > (as I think Patrick suggested), then each entry would > > cover 1500/16 = 93 different packet lengths. Ie, > > entry 0 would cover packet lengths 0..93, entry 1 > > 94..186, and so on. A single entry can''t be right > > for all those packet lengths, so again we are back > > to a average 30% error for typical VOIP length > > packets. > > My patch doesn''t uses fixed sized cells, so it can deal > with anything, worst case is you use one cell per packet > size. Optimizing size and lookup speed for ethernet makes > a lot more sense than optimizing for ADSL.I was just responding to a point you made earlier, when you said STAB could only use 16 entries as opposed to the 256 used by RTAB. I suspect nobody would actually do that because of the inaccuracy it creates, so the comparison is perhaps unfair. I agree the flexibility of making STAB variable length is a good idea, and comes at 0 cost in the kernel. Andy Furniss wrote:> > Russell Stuart wrote: > >> The kernel will have to do a shift and a division > >> for each packet, which I assume is permissible. > > > > > > I guess that is for others to decide :-) I think Patrick has a point > > about sfq/htb drr, Like you I guess, I thought that alot of extra per > > packet calculations would have got an instant NO. > > Its only done once per packet (currently, it might be interesting to > override the length for specific classes and their childs, for example > if you do queueing on eth0 and have an DSL router one hop apart). > The division is gone in my patch btw.Unlike the packet length the time calculation can''t be cached in the skb. Most classes in HTB/CBQ use different packet transmission rates.
Russell Stuart
2006-Jul-20 07:51 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)
On Thu, 2006-07-20 at 01:00 +0400, Alexey Kuznetsov wrote:> Hello!So you really do exist? I thought it was just rumour.> Well, if fixed point arithmetics is not a problem.It shouldn''t be. Any decimal number can be expressed as a fraction, eg: 0.00123 = 123/100000 Which can be calculated as a multiply and a divide. With MTU''s up to 2048, it should be possible to do this with 99.9999% accuracy (ie 2048/2^23). With a bit more work in userspace (ie in tc), it can be be reduced to a multiply and a shift.> Plus, remember, the function is not R*size, it is at least > R*size+addend, to account for link overhead. Plus account for padding > of small packets. Plus, when policing it should deaccount already added > link headers, QoS counts only network payload.Yes, it is flexible - and has served us well up until now. It doesn''t work well for ATM, but with a small bit of extra calculation in the kernel it could. However, it turns out that ATM is a special case. If ATM''s cell payload was 58 bytes instead of 48 bytes (say), then it would not be possible to produce a RTAB that had small errors (eg < 10%) for smallish packet sizes (< 290 bytes). I seem to have trouble explaining why in a concise way that people understand, so I won''t try here. So when Alan Cox said our ATM patch didn''t solve the packetisation problem in general, he was right as our patch just built upon RTAB. Patrick''s STAB proposal in general either for that matter, as it is just another implementation of RTAB with the same limitations. The only way I can think of to solve it in general is to move many more calculations into the kernel - as I proposed in a long winded answer to Patrick earlier in this thread. But doing so would get rid of the table implementation and the flexibility it has given us to date. For that reason I feel uncomfortable with it. The engineering decision becomes this - are there any other protocols like ATM out there that could justify such a change? (In my more cynical moments I think of it differently - has/is the world going to make a second engineering fuck up on the scale of ATM again? How on earth did anyone decide that pushing data packets over ATM, as happens in ADSL, was a good idea?) I know of no other such protocols. But then I don''t have an encyclopedic knowledge of comms protocols, so that doesn''t mean much. I suspect you know a good deal more about them than I do. What say you?
Russell Stuart
2006-Jul-30 23:06 UTC
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Thu, 2006-07-20 at 14:56 +1000, Russell Stuart wrote:> On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote: > > Please excuse my silence, I was travelling and am still catching up > > with my mails. > > Sorry. Had I realised you were busy I would of > waited. > > > > - As it stands, it doesn''t help the qdiscs that use > > > RTAB. So unless he proposes to remove RTAB entirely > > > the ATM patch as it will still have to go in. > > > > Why? The length calculated by my STABs (or something similar) > > is used by _all_ qdiscs. Not only for transmission time calculation, > > but also for statistics and estimators. > > Oh. I didn''t see where it is used for the time > calculation in your patch. Did I miss something, > or is that the unfinished bit? > > This is possibly my stumbling block. If you don''t remove > RTAB the ATM patch as stands will be needed. Your patch > didn''t remove RTAB, and you didn''t say it was intended to, > so I presume it wasn''t going to.It has gone quiet again. In my mind the one unresolved issue is whether Patrick intended to remove RTAB with his patch. If not, the ATM patch as it stands will have to go in. Patrick - it would be nice to hear from you.
Russell Stuart
2006-Jul-31 23:32 UTC
[LARTC] Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Thu, 2006-07-20 at 14:56 +1000, Russell Stuart wrote:> On Wed, 2006-07-19 at 16:50 +0200, Patrick McHardy wrote: > > Please excuse my silence, I was travelling and am still catching up > > with my mails. > > Sorry. Had I realised you were busy I would of > waited. > > > > - As it stands, it doesn''t help the qdiscs that use > > > RTAB. So unless he proposes to remove RTAB entirely > > > the ATM patch as it will still have to go in. > > > > Why? The length calculated by my STABs (or something similar) > > is used by _all_ qdiscs. Not only for transmission time calculation, > > but also for statistics and estimators. > > Oh. I didn''t see where it is used for the time > calculation in your patch. Did I miss something, > or is that the unfinished bit? > > This is possibly my stumbling block. If you don''t remove > RTAB the ATM patch as stands will be needed. Your patch > didn''t remove RTAB, and you didn''t say it was intended to, > so I presume it wasn''t going to.It has gone quiet again. In my mind the one unresolved issue is whether Patrick intended to remove RTAB with his patch. If not, the ATM patch as it stands will have to go in. Patrick - it would be nice to hear from you.