Hi, After preparing my talk on CBQ/HTB (http://ds9a.nl/cbq-presentation ), I finally understood how CBQ and filters etc truly work. And I wrote it down. Check out the Linux Advanced Routing & Shaping HOWTO, it''s changed a lot! Especially this part is very new, please check it for mistakes and inconsistencies: http://ds9a.nl/2.4Routing/HOWTO//cvs/2.4routing/output/2.4routing-9.html I even got ''split'' and ''defmap'' figured out, which should be a first. There is not a single other page online that tells you correctly what these do. One thing - does *anybody* understand how hash tables work in tc filter, and what they do? Furthermore, I could use some help with the tc filter police things. So if you do understand how these work, please drop me a line. Thanks! -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
bert hubert
2001-Dec-03 02:00 UTC
CBQ and all other qdiscs now REALLY completely documented (almost!)
On Sat, Dec 01, 2001 at 01:33:41AM +0100, bert hubert wrote:> One thing - does *anybody* understand how hash tables work in tc filter, and > what they do? Furthermore, I could use some help with the tc filter police > things.Thanks to Andreas Steinmetz and David Sauer, tc hash tables are now documented as well, thanks! See: http://ds9a.nl/2.4Routing/HOWTO//cvs/2.4routing/output/2.4routing-12.html And then ''Hashing filters for very fast massive filtering''. I also finished documenting all parameters for TBF, CBQ, SFQ, PRIO, bfifo, pfifo and pfifo_fast. All queues in the Linux kernel are now described in the Linux Advanced Routing & Shaping HOWTO, which can be found on http://ds9a.nl/2.4Routing I want to send this off to the LDP and Freshmeat somewhere next week, I *would really* like people who are knowledgeable about this subject (this means you, ANK & Jamal 8) ) to read through this. This HOWTO is rapidly becoming the perceived authoritative source for traffic control in linux (google on ''Linux Routing'' finds it), it might as well be right! So if you have any time at all, check the parts you know about. I expect mistakes. The parts of the table of contents that document stuff in the kernel not documented elsewhere: 9. Queueing Disciplines for Bandwidth Management 9.1 Queues and Queueing Disciplines explained 9.2 Simple, classless Queueing Disciplines 9.2.1 pfifo_fast 9.2.1.1 Parameters & usage 9.2.2 Token Bucket Filter 9.2.2.1 Parameters & usage 9.2.2.2 Sample configuration 9.2.3 Stochastic Fairness Queueing 9.2.3.1 Parameters & usage 9.2.3.2 Sample configuration 9.3 Advice for when to use which queue 9.4 Classful Queueing Disciplines 9.4.1 Flow within classful qdiscs & classes 9.4.2 The qdisc family: roots, handles, siblings and parents 9.4.2.1 How filters are used to classify traffic 9.4.2.2 How packets are dequeued to the hardware 9.4.3 The PRIO qdisc 9.4.3.1 PRIO parameters & usage 9.4.3.2 Sample configuration 9.4.4 The famous CBQ qdisc 9.4.4.1 CBQ shaping in detail 9.4.4.2 CBQ classful behaviour 9.4.4.3 CBQ parameters that determine link sharing & borrowing 9.4.4.4 Sample configuration 9.4.4.5 Other CBQ parameters: split & defmap 9.4.5 Hierarchical Token Bucket 9.4.5.1 Sample configuration 9.5 Classifying packets with filters 9.5.1 Some simple filtering examples 9.5.2 All the filtering commands you will normally need (...) 12. Advanced filters for (re-)classifying packets 12.1 The "u32" classifier 12.1.1 U32 selector 12.1.2 General selectors 12.1.3 Specific selectors 12.2 The "route" classifier 12.3 Policing filters 12.4 Hashing filters for very fast massive filtering (...) 14. Advanced & less common queueing disciplines 14.1 bfifo/pfifo 14.1.1 Parameters & usage 14.2 Clark-Shenker-Zhang algorithm (CSZ) 14.3 DSMARK 14.3.1 Introduction 14.3.2 What is Dsmark related to? 14.3.3 Differentiated Services guidelines 14.3.4 Working with Dsmark 14.3.5 How SCH_DSMARK works. 14.3.6 TC_INDEX Filter 14.4 Ingress policer qdisc 14.5 Random Early Drop (RED) 14.6 VC/ATM emulation 14.7 Weighted Round Robin (WRR) The only thing left to document are Policing filters. Regards, bert hubert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
Jim Fleming
2001-Dec-03 02:26 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
----- Original Message ----- From: "bert hubert" <ahu@ds9a.nl>> > The only thing left to document are Policing filters. >This may help... http://www.dot-biz.com/IPv4/Tutorial/ Jim Fleming http://www.IPv8.info IPv16....One Better !!
Alberto Bertogli
2001-Dec-03 02:45 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Mon, Dec 03, 2001 at 03:32:15AM +0100, bert hubert wrote:> On Sun, Dec 02, 2001 at 11:16:49PM -0300, Alberto Bertogli wrote: > > > The comment, still present today (of course =) is: > > > > Note that the peak rate TBF is much more tough: with MTU 1500 > > P_crit = 150Kbytes/sec. So, if you need greater peak > > rates, use alpha with HZ=1000 :-) > > > > Even below in the description, when it describes peakrate parameter, it > > says "However, due to de default 10ms timer resolution of Unix, with > > 10.000 bits average packets, we are limited to 1mbit/s of peakrate!" > > 150Kbytes/sec -> 1mbit/s! (more or less). Depends a bit on what your average > packet size is. >Yeah I''m just stupid and read everything as bps. Luckly i forgot to CC to the list =)> SFQ doesn''t shape, so it''s a whole different ballgame.Yeah, I shape with CBQ anyway.. what reminds me to my next question: is HTB going to be merged in 2.4? is it ready for production use? Thanks, Alberto
bert hubert
2001-Dec-03 02:53 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Sun, Dec 02, 2001 at 11:45:57PM -0300, Alberto Bertogli wrote:> > SFQ doesn''t shape, so it''s a whole different ballgame. > > Yeah, I shape with CBQ anyway.. what reminds me to my next question: is > HTB going to be merged in 2.4? is it ready for production use?I think HTB is ready. Martin is on top of things and is rapidly cleaning up remaining issues. He is now asking the networking maintainers of Linux for advice on what needs to be done. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
jamal
2001-Dec-08 19:20 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Mon, 3 Dec 2001, bert hubert wrote:> On Sat, Dec 01, 2001 at 01:33:41AM +0100, bert hubert wrote: > > > One thing - does *anybody* understand how hash tables work in tc filter, and > > what they do? Furthermore, I could use some help with the tc filter police > > things. > > Thanks to Andreas Steinmetz and David Sauer, tc hash tables are now > documented as well, thanks! > > See: > > http://ds9a.nl/2.4Routing/HOWTO//cvs/2.4routing/output/2.4routing-12.html > > And then ''Hashing filters for very fast massive filtering''. > > I also finished documenting all parameters for TBF, CBQ, SFQ, PRIO, > bfifo, pfifo and pfifo_fast. All queues in the Linux kernel are now > described in the Linux Advanced Routing & Shaping HOWTO, which can be found on > > http://ds9a.nl/2.4Routing > > I want to send this off to the LDP and Freshmeat somewhere next week, I > *would really* like people who are knowledgeable about this subject (this > means you, ANK & Jamal 8) ) to read through this. > > This HOWTO is rapidly becoming the perceived authoritative source for > traffic control in linux (google on ''Linux Routing'' finds it), it might as > well be right! So if you have any time at all, check the parts you know > about. I expect mistakes. > > The parts of the table of contents that document stuff in the kernel not > documented elsewhere:"not documented elsewhere" comes out rude. Werner and I (and even Alexey when he was in the mood -- and i have seen some good documentation by other people as well) have spent numerous hours documenting, presenting and answering questions on mailing lists at times Sample docs that i was personally involved in: ftp://icaftp.epfl.ch/pub/linux/diffserv/misc/dsid-01.txt.gz You need to introduce the big picture to the user. and what is wrong with the definitions used in http://www.davin.ottawa.on.ca/ols/img10.htm that forced you to introduce your own? Actually, the big picture is: http://www.davin.ottawa.on.ca/ols/img9.htm Also http://www.linuxjournal.com/article.php?sid=3369 (was written in 98 but got published in 99) Now despite all the bitching above, i think your efforts are noble. [My complaints about your style is you often are trying to present facts by using opinions. For example despite a lot of effort in the past to explain ingress qdisc to you in the past and, pointing you to very good documentation from CISCO you still ended using your opinions on what you thought it should be;-> My scanning of the document shows opinions still posing as miscontrued facts. It is improving compared to what i saw last when we discussed ingress. Let me clarify one thing in this email; i''ll read what you have later. Lets start by your description of TC_PRIO and TOS mappings etc: Your descriptions of these values is insufficient. Consider this a tutorial and reword it as you wish but please avoid opinions. Ok here''s clarification, this applies to both prio, default fifo 3 band queueing and CBQ defaultmap classification; applies to both packets being forwarded as well as locally generated: First Step: ========== Define TOS: This is a 4 bit value used as defined in RFC 1349. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | PRECEDENCE | TOS | MBZ | | | | | +-----+-----+-----+-----+-----+-----+-----+-----+ Then define the values possible as: 1000 -- minimize delay 0100 -- maximize throughput 0010 -- maximize reliability 0001 -- minimize monetary cost 0000 -- normal service Look at RFC 1349 for typical values used by different applications Then of course note that RFC 1349 is obsoleted by RFC 2474 (yes, you can weep); Having said all that: Linux remaps packets incoming with different values to some internal value; the colum "mapped to" shows the internal mapping 8value(hex) TOS(dec) mapped to(dec) ---------------------------------- 0x0 0 0 1 7 2 0 3 0 4 2 5 2 6 2 7 2 0x10 8 6 9 6 10 6 11 6 12 2 13 2 14 2 15 2 Fill in the "8value(hex)" column gaps using the bitmap from RFC1349 for the 8 bits; These are the values ou would see with tcpdump -vvv I filled the two easiest ones i could compute in my head. Second step: Take the default priority map: 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 This applies for both default prio and the 3-band FIFO queue. Note the queue map fitted on the last column 8 but value TOS mapped to queue map --------------------------------------------- 0x0 0 0 1 1 7 2 2 0 2 3 0 2 4 2 1 5 2 2 6 2 0 7 2 0 0x10 8 6 1 9 6 1 10 6 1 11 6 1 12 2 1 13 2 1 14 2 1 15 2 1 Queue 0 gets processed first then queue 1 then queue 2. In the strict priority processing such as in prio or default 3 band sched, queue 0 is processed until no more packets are left, then queue1 etc. This could result in starvation. You could avoid starvation by inserting a TBF in a prio; limit the size of the fifo in a class or use CBQ configured as WRR. I hope the above explains why you have to recreate the priomap everytime you change the number of bands. You used the word "probably" which is wrong. The proper word is "MUST". What i think would be useful for you to do is describe some of the vlaues used by some applications (RFC 1349 cut-n-paste job would help). cheers, jamal
On Sat, Dec 08, 2001 at 02:20:20PM -0500, jamal wrote:> > The parts of the table of contents that document stuff in the kernel not > > documented elsewhere: > > "not documented elsewhere" comes out rude. Werner and I (and even > Alexey when he was in the mood -- and i have seen some good documentation > by other people as well) have spent numerous hours documenting, presenting > and answering questions on mailing lists at timesTrue. I should have worded that better but I lost sight of politeness due to my great enthusiasm at finally understanding everything. Some parts required literally *hours* of digging through sources and disembodied slides - presentations lose something without a speaker.> Sample docs that i was personally involved in: > ftp://icaftp.epfl.ch/pub/linux/diffserv/misc/dsid-01.txt.gzThese days I understand this document, but I didn''t used to. That might be because I''m thick, though.> You need to introduce the big picture to the user. > and what is wrong with the definitions used in > http://www.davin.ottawa.on.ca/ols/img10.htm that forced you to introduce > your own?I''ve since moved to this terminology. Please also see the manpages I''m writing at http://ds9a.nl/lartc/manpages> Actually, the big picture is: > http://www.davin.ottawa.on.ca/ols/img9.htm > Also > http://www.linuxjournal.com/article.php?sid=3369 > (was written in 98 but got published in 99)Google is surely to be praised - I had found all these links already. But to summarize: stuff is out there.> [My complaints about your style is you often are trying to present facts > by using opinions. For example despite a lot of effort in the past to > explain ingress qdisc to you in the past and, pointing you to very good > documentation from CISCO you still ended using your opinions on what you > thought it should be;->I really didn''t understand how everything worked back then, sadly. I do now, hopefully.> My scanning of the document shows opinions still posing as miscontrued > facts. It is improving compared to what i saw last when we discussed ingress. > Let me clarify one thing in this email; i''ll read what you have later.Some stuff remains from that time, am working on removing it. My current efforts is writing the manpages and getting them 100% right and devoid of opinion. Once they are finished & reviewed, I''m ''backporting'' the insight to the HOWTO, which will then lose a lot of content and instead refer to the manpages.> Lets start by your description of TC_PRIO and TOS mappings etc: > Your descriptions of these values is insufficient. Consider this a > tutorial and reword it as you wish but please avoid opinions.Will do, it makes sense now.> Look at RFC 1349 for typical values used by different applications > Then of course note that RFC 1349 is obsoleted by RFC 2474 (yes, you can > weep);That confused me greatly, yes.> What i think would be useful for you to do is describe some of the vlaues > used by some applications (RFC 1349 cut-n-paste job would help).Thanks. I''m working on making the HOWTO more factual and the manpages 100% factual. I''m always happy with critiques. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
For starters, i think you need a defintions sections. Look at: http://www.ietf.org/internet-drafts/draft-ietf-diffserv-model-06.txt (eg what is a shaper etc and how trhings are placed together). At least that will ensure that you dont sday things like "Prio cant shape". It is a good model but may be insufficient given Linux TCs capabilities. Email me when unsure. Some other things: - In your comment "Do not confuse this classless simple qdisc with the classful PRIO one!". This is misleading: the default 3 band FIFO queue is conceptually the same as the default prio qdisc (the priomaps are identical). 3 bands; same prioritization schemes. - You really need to fix ingress section: it works for both forwarding and packets coming in to local sockets. More importantly, It takes advantages of _all_ filter schemes available for TC as well as the policing functionality (which sadly seemed to have been replicated by someone in netfilter, wrongly if i may add ;->). - You keep saying "reodering" -- dont know what that means. Reordering is generally considered a Bad Thing(tm). - your description of the "peakrate" (same in TBF as well as policing) Well captured. It took ages to get this into peoples heads. This also applies to CBQ. - your description of "MTU" Not very good description: This is just what it literally says; maximum transmit unit; A packet larger than this will be dropped. Default is 2K. For ethernet, MTUs of 1500 bytes, this is fine; however, you should put a cautionary statement here in regards to people having MTUs smaller than 2K (example the lo device); they might find that all their packets greater than 2K being dropped. More later if dont get distracted. cheers, jamal
On Sat, Dec 08, 2001 at 03:43:05PM -0500, jamal wrote:> For starters, i think you need a defintions sections. Look at: > http://www.ietf.org/internet-drafts/draft-ietf-diffserv-model-06.txt > > (eg what is a shaper etc and how trhings are placed together). At least > that will ensure that you dont sday things like "Prio cant shape".I see that now :-) The right wording appears to be that a Prio is a Work-conserving non-policing shaper.> It is a good model but may be insufficient given Linux TCs > capabilities. Email me when unsure.Will do.> Some other things: > - In your comment "Do not confuse this classless simple qdisc with the > classful PRIO one!". This is misleading: > the default 3 band FIFO queue is conceptually the same as the > default prio qdisc (the priomaps are identical). 3 bands; same > prioritization schemes.New wording: Do not confuse this classless simple qdisc with the classful PRIO one! Although they have a lot in common, the PRIO queue can contain different classes, whereas pfifo_fast has hardcoded FIFO bands.> - You really need to fix ingress section: > it works for both forwarding and packets coming in to local sockets. > More importantly, It takes advantages of _all_ filter schemes > available for TC as well as the policing functionality (which sadly seemed > to have been replicated by someone in netfilter, wrongly if i may add ;->).Yeah, it''s broken, it counts packets, not bytes. It does praise Alexey though :-) Ok, new ingress description: The ingress qdisc is a strange animal in that is not used to send packets out to the network adaptor. Instead, it allows you to apply tc filters to packets coming in over the interface, regardless of whether they have a local destination or are to be forwarded. As the tc filters contain a full Token Bucket Filter implementation, and are also able to match on the kernel flow estimator, there is a lot of functionality available. This effectively allows you to police incoming traffic, before it even enters the IP stack. Parameters & usage The ingress qdisc itself does not require any parameters. It differs from other qdiscs in that it does not occupy the root of a device. Attach it like this: # tc qdisc add dev eth0 ingress This allows you to have other, sending, qdiscs on your device besides the ingress qdisc. For a contrived example how the ingress qdisc could be used, see the Cookbook.> - You keep saying "reodering" -- dont know what that means. Reordering is > generally considered a Bad Thing(tm).Well. That is what it comes down to - it reorders packets. It does not reorder them within the same tcp/ip session, or at least, we hope so. In other words, it delays certain packets while it doesn''t delay others. How would you suggest wording this?> - your description of "MTU" > Not very good description: > This is just what it literally says; maximum transmit unit; > A packet larger than this will be dropped. Default is 2K. For ethernet, > MTUs of 1500 bytes, this is fine; however, you should put a cautionary > statement here in regards to people having MTUs smaller than 2K (example > the lo device); they might find that all their packets greater than 2K > being dropped.Sure? From linux/net/sched/sch_tbf.c: toks = PSCHED_TDIFF_SAFE(now, q->t_c, q->buffer, 0); if (q->P_tab) { ptoks = toks + q->ptokens; if (ptoks > (long)q->mtu) ptoks = q->mtu; ptoks -= L2T_P(q, skb->len); } Sure looks like the mtu, measured in tokens, is the size of the second bucket?> More later if dont get distracted.Thanks. -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
On Sat, 8 Dec 2001, bert hubert wrote:> On Sat, Dec 08, 2001 at 03:43:05PM -0500, jamal wrote: > > > For starters, i think you need a defintions sections. Look at: > > http://www.ietf.org/internet-drafts/draft-ietf-diffserv-model-06.txt > > > > (eg what is a shaper etc and how trhings are placed together). At least > > that will ensure that you dont sday things like "Prio cant shape". > > I see that now :-) The right wording appears to be that a Prio is a > Work-conserving non-policing shaper. >work conserving is right. non-policing is wrong. Policing is related to filters. Shaping is related to schedulers. Prio is a scheduler. Shaping results in non-work conserving schemes. You can attacha TBF which shaping inside a Prio qdisc. That would add non-workconserving-ness to it.> > It is a good model but may be insufficient given Linux TCs > > capabilities. Email me when unsure. > > Will do. >Basically dont take it for the gospel.> > Some other things: > > - In your comment "Do not confuse this classless simple qdisc with the > > classful PRIO one!". This is misleading: > > the default 3 band FIFO queue is conceptually the same as the > > default prio qdisc (the priomaps are identical). 3 bands; same > > prioritization schemes. > > New wording: > Do not confuse this classless simple qdisc with the classful PRIO one! > Although they have a lot in common, the PRIO queue can contain different > classes, whereas pfifo_fast has hardcoded FIFO bands.I am not sure if i like the wording: A class is the result of a class-ification. Both pfifo_fast and PRIO have builtin class-ifiers. Essentially if you treated default prio qdisc and pfifo_fast as black boxes, there is _no_ difference. Dont look at the code, think larger picture.> > > > - You really need to fix ingress section: > > it works for both forwarding and packets coming in to local sockets. > > More importantly, It takes advantages of _all_ filter schemes > > available for TC as well as the policing functionality (which sadly seemed > > to have been replicated by someone in netfilter, wrongly if i may add ;->). > > Yeah, it''s broken, it counts packets, not bytes. It does praise Alexey > though :-) >I think the main problem i found with it is that it used a single token bucket.> Ok, new ingress description: > > The ingress qdisc is a strange animal in that is not used to send packets > out to the network adaptor. Instead, it allows you to apply tc filters to > packets coming in over the interface, regardless of whether they have a > local destination or are to be forwarded. > > As the tc filters contain a full Token Bucket Filter implementation, and are > also able to match on the kernel flow estimator, there is a lot of > functionality available. This effectively allows you to police incoming > traffic, before it even enters the IP stack. >You can have two qdiscs per device, ingress and egress Please look at the router model i described earlier. The two hooks are very clearly described there. Ingress qdisc is work conserving only by design; whereas egress qdiscs could be non-work conserving.> Parameters & usage > > The ingress qdisc itself does not require any parameters. It differs from > other qdiscs in that it does not occupy the root of a device. Attach it like > this: > > # tc qdisc add dev eth0 ingress > > This allows you to have other, sending, qdiscs on your device besides the > ingress qdisc. >Again, look at the definitiions of eg/ingress. You need to have this diagram drawn: http://www.davin.ottawa.on.ca/ols/img9.htm in your document.> For a contrived example how the ingress qdisc could be used, see the > Cookbook. > > > - You keep saying "reodering" -- dont know what that means. Reordering is > > generally considered a Bad Thing(tm). > > Well. That is what it comes down to - it reorders packets. It does not > reorder them within the same tcp/ip session, or at least, we hope so. In > other words, it delays certain packets while it doesn''t delay others. How > would you suggest wording this?Look at the model draft then lets talk again.> > > - your description of "MTU" > > Not very good description: > > This is just what it literally says; maximum transmit unit; > > A packet larger than this will be dropped. Default is 2K. For ethernet, > > MTUs of 1500 bytes, this is fine; however, you should put a cautionary > > statement here in regards to people having MTUs smaller than 2K (example > > the lo device); they might find that all their packets greater than 2K > > being dropped. > > Sure?100% sure. Try a little experiment then look at the code again. cheers, jamal
On Sat, Dec 08, 2001 at 04:56:04PM -0500, jamal wrote:> > I see that now :-) The right wording appears to be that a Prio is a > > Work-conserving non-policing shaper. > > work conserving is right. non-policing is wrong. Policing is related to > filters. Shaping is related to schedulers. Prio is a scheduler.Ok, so ''work-conserving'' encodes the fact that it will never delay packets?> Shaping results in non-work conserving schemes. You can attacha TBF which > shaping inside a Prio qdisc. That would add non-workconserving-ness to it.Ok, but pfifo_fast for example will always be work conserving.> > New wording: > > Do not confuse this classless simple qdisc with the classful PRIO one! > > Although they have a lot in common, the PRIO queue can contain different > > classes, whereas pfifo_fast has hardcoded FIFO bands. > > I am not sure if i like the wording: A class is the result of a > class-ification. Both pfifo_fast and PRIO have builtin class-ifiers. > Essentially if you treated default prio qdisc and pfifo_fast as black > boxes, there is _no_ difference. Dont look at the code, think larger > picture.Well, I aim for the user. For the user, the big picture may be identical but the use is quite different. I don''t want to get email ''I tried to add a qdisc to pfifo_fast and it didn''t work!''. New wording: Do not confuse this classless simple qdisc with the classful PRIO one! Although they behave similarly, pfifo_fast is classless and you cannot add other qdiscs to it with the tc command. I think this covers what you mean and what I want.> You can have two qdiscs per device, ingress and egress > Please look at the router model i described earlier. The two hooks are > very clearly described there. > Ingress qdisc is work conserving only by design; whereas egress qdiscs > could be non-work conserving.True. The ingress qdisc isn''t really dequeued.> Again, look at the definitiions of eg/ingress. You need to have this > diagram drawn: > > http://www.davin.ottawa.on.ca/ols/img9.htm > in your document.It''s good, will need to asciify it however.> > other words, it delays certain packets while it doesn''t delay others. How > > would you suggest wording this? > > Look at the model draft then lets talk again.Ok. Rewording of the HOWTO will have to wait on a glossary/definition section anyhow.> > > the lo device); they might find that all their packets greater than 2K > > > being dropped. > > > > Sure? > > 100% sure. Try a little experiment then look at the code again.I will, but I bet you 5 euros that I''m right :-) We are talking about the TBF, aren''t we? Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
bert hubert
2001-Dec-08 23:23 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Sat, Dec 08, 2001 at 02:20:20PM -0500, jamal wrote:> Linux remaps packets incoming with different values to some internal > value; the colum "mapped to" shows the internal mapping > > 8value(hex) TOS(dec) mapped to(dec) > ---------------------------------- > 0x0 0 0 > 1 7 > 2 0 > 3 0 > 4 2 > 5 2 > 6 2 > 7 2 > 0x10 8 6 > 9 6 > 10 6 > 11 6 > 12 2 > 13 2 > 14 2 > 15 2I find this tos2prio table in the kernel (2.5.x), which is somewhat different than your table: 0 TC_PRIO_BESTEFFORT, 0 1 TC_PRIO_(FILLER), 1 2 TC_PRIO_BESTEFFORT, 0 3 TC_PRIO_(BESTEFFORT), 0 4 TC_PRIO_BULK, 2 5 TC_PRIO_(BULK), 2 6 TC_PRIO_BULK, 2 7 TC_PRIO_(BULK), 2 8 TC_PRIO_INTERACTIVE, 6 9 TC_PRIO_(INTERACTIVE), 6 10 TC_PRIO_INTERACTIVE, 6 11 TC_PRIO_(INTERACTIVE), 6 12 TC_PRIO_INTERACTIVE_BULK, 4 13 TC_PRIO_(INTERACTIVE_BULK), 4 14 TC_PRIO_INTERACTIVE_BULK, 4 15 TC_PRIO_(INTERACTIVE_BULK) 4> Fill in the "8value(hex)" column gaps using the bitmap from RFC1349 for > the 8 bits; These are the values ou would see with tcpdump -vvv > I filled the two easiest ones i could compute in my head. > > Second step: > > Take the default priority map: > 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 > This applies for both default prio and the 3-band FIFO queue. > Note the queue map fitted on the last column > > 8 but value TOS mapped to queue map > --------------------------------------------- > 0x0 0 0 1 > 1 7 2 > 2 0 2 > 3 0 2 > 4 2 1 > 5 2 2 > 6 2 0 > 7 2 0 > 0x10 8 6 1 > 9 6 1 > 10 6 1 > 11 6 1 > 12 2 1 > 13 2 1 > 14 2 1 > 15 2 1I''ve changed this table to: TOS Bits Means Linux Priority Band ------------------------------------------------------------ 0x0 0 Normal Service 0 Best Effort 1 0x2 1 Minimize Monetary Cost 1 Filler 2 0x4 2 Maximize Reliability 0 Best Effort 1 0x6 3 mmc+mr 0 Best Effort 1 0x8 4 Maximize Throughput 2 Bulk 2 0xa 5 mmc+mt 2 Bulk 2 0xc 6 mr+mt 2 Bulk 2 0xe 7 mmc+mr+mt 2 Bulk 2 0x10 8 Minimize Delay 6 Interactive 0 0x12 9 mmc+md 6 Interactive 0 0x14 10 mr+md 6 Interactive 0 0x16 11 mmc+mr+md 6 Interactive 0 0x18 12 mt+md 4 Int. Bulk 1 0x1a 13 mmc+mt+md 4 Int. Bulk 1 0x1c 14 mr+mt+md 4 Int. Bulk 1 0x1e 15 mmc+mr+mt+md 4 Int. Bulk 1 http://ds9a.nl/lartc/HOWTO/cvs/2.4routing/output/2.4routing-9.html#ss9.2 Your table appears to imply that a Maximum Reliability, Mininum Delay packet, TOS bits=9, gets mapped to band 1, not 0, which would not make sense. Laying it out like this, which does appear how it works, does mean that you can specify priorities in the priomap which do not correspond to possible TOS values. Is it possible at all to set skb->priority from userspace without going through the tos2prio mapping? CBQ can use the skb->priority to classify: /* * Step 1. If skb->priority points to one of our classes, use it. */ if (TC_H_MAJ(prio^sch->handle) == 0 && (cl = cbq_class_lookup(q, prio)) != NULL) return cl; But to do this, you would need to be able to set skb->priority to a 32bit number: include/linux/pkt_sched.h:#define TC_H_MAJ_MASK (0xFFFF0000U) include/linux/pkt_sched.h:#define TC_H_MAJ(h) ((h)&TC_H_MAJ_MASK) I can''t find where you would do this, any clues? Thanks again for taking the time to help me. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
On Sun, 9 Dec 2001, bert hubert wrote:> On Sat, Dec 08, 2001 at 04:56:04PM -0500, jamal wrote:> Ok. Rewording of the HOWTO will have to wait on a glossary/definition > section anyhow. >You need that termininology section.> > > > the lo device); they might find that all their packets greater than 2K > > > > being dropped. > > > > > > Sure? > > > > 100% sure. Try a little experiment then look at the code again. > > I will, but I bet you 5 euros that I''m right :-) We are talking about the > TBF, aren''t we? >We are talking about a dual token bucket in relation to filter policing. It would probably cost you more than 5 Euros charges to send 5 Euros to me in .ca ;-> So just keep it. How about i raise my guarantee to 200%? ;-> cheers, jamal
On Sun, 9 Dec 2001, bert hubert wrote:> > I will, but I bet you 5 euros that I''m right :-) We are talking about the > TBF, aren''t we? >The trick is that TBF delays packets and the policer drops packet when exceeding their profile. cheers, jamal
Henrik Nordstrom
2001-Dec-08 23:45 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sunday 09 December 2001 00.08, bert hubert wrote:> > True. The ingress qdisc isn''t really dequeued.One confusing thing is that the tc command supports one to add another qdisc to ingress. Have not yet figured out if this actually can be used for anything. tc qdisc add dev eth0 ingress tbf .... (or any other qdisc) is happily accepted but afaict does not seem to have any function.. Regards Henrik Nordström
On Sat, Dec 08, 2001 at 06:30:47PM -0500, jamal wrote:> > > On Sun, 9 Dec 2001, bert hubert wrote: > > > > > I will, but I bet you 5 euros that I''m right :-) We are talking about the > > TBF, aren''t we? > > > > The trick is that TBF delays packets and the policer drops packet when > exceeding their profile.Well, we''re both right to some extent. The policer drops packets which are larger than the configured MTU, but it also dimensions the second bucket to be of size mtu! From net/sched/police.c (trimmed a bit, added some comments): int tcf_police(struct sk_buff *skb, struct tcf_police *p) { psched_time_t now; long toks,ptoks = 0; spin_lock(&p->lock); p->stats.bytes += skb->len; p->stats.packets++; if (skb->len <= p->mtu) { if (p->R_tab == NULL) { spin_unlock(&p->lock); return p->result; } PSCHED_GET_TIME(now); toks = PSCHED_TDIFF_SAFE(now, p->t_c, p->burst, 0); // tokens that arrived since last invocation if (p->P_tab) { ptoks = toks + p->ptoks; if (ptoks > (long)L2T_P(p, p->mtu)) ptoks = (long)L2T_P(p, p->mtu); // cap total available ptokens to mtu ptoks -= L2T_P(p, skb->len); // deduct packet size } toks += p->toks; if (toks > (long)p->burst) toks = p->burst; // cap regular number of tokens toks -= L2T(p, skb->len); // deduct packet size if ((toks|ptoks) >= 0) { // send if both are positive p->t_c = now; p->toks = toks; // do accounting p->ptoks = ptoks; spin_unlock(&p->lock); return p->result; } } p->stats.overlimits++; spin_unlock(&p->lock); return p->action; } It''s a draw? :-) Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
jamal
2001-Dec-09 01:14 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Sun, 9 Dec 2001, bert hubert wrote:> On Sat, Dec 08, 2001 at 02:20:20PM -0500, jamal wrote: > > > Linux remaps packets incoming with different values to some internal > > value; the colum "mapped to" shows the internal mapping > > > > 8value(hex) TOS(dec) mapped to(dec) > > ---------------------------------- > > 0x0 0 0 > > 1 7 > > 2 0 > > 3 0 > > 4 2 > > 5 2 > > 6 2 > > 7 2 > > 0x10 8 6 > > 9 6 > > 10 6 > > 11 6 > > 12 2 > > 13 2 > > 14 2 > > 15 2 > > I find this tos2prio table in the kernel (2.5.x), which is somewhat > different than your table: > > 0 TC_PRIO_BESTEFFORT, 0 > 1 TC_PRIO_(FILLER), 1 > 2 TC_PRIO_BESTEFFORT, 0 > 3 TC_PRIO_(BESTEFFORT), 0 > 4 TC_PRIO_BULK, 2 > 5 TC_PRIO_(BULK), 2 > 6 TC_PRIO_BULK, 2 > 7 TC_PRIO_(BULK), 2 > 8 TC_PRIO_INTERACTIVE, 6 > 9 TC_PRIO_(INTERACTIVE), 6 > 10 TC_PRIO_INTERACTIVE, 6 > 11 TC_PRIO_(INTERACTIVE), 6 > 12 TC_PRIO_INTERACTIVE_BULK, 4 > 13 TC_PRIO_(INTERACTIVE_BULK), 4 > 14 TC_PRIO_INTERACTIVE_BULK, 4 > 15 TC_PRIO_(INTERACTIVE_BULK) 4 > > > > Fill in the "8value(hex)" column gaps using the bitmap from RFC1349 for > > the 8 bits; These are the values ou would see with tcpdump -vvv > > I filled the two easiest ones i could compute in my head. > > > > Second step: > > > > Take the default priority map: > > 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 > > This applies for both default prio and the 3-band FIFO queue. > > Note the queue map fitted on the last column > > > > 8 but value TOS mapped to queue map > > --------------------------------------------- > > 0x0 0 0 1 > > 1 7 2 > > 2 0 2 > > 3 0 2 > > 4 2 1 > > 5 2 2 > > 6 2 0 > > 7 2 0 > > 0x10 8 6 1 > > 9 6 1 > > 10 6 1 > > 11 6 1 > > 12 2 1 > > 13 2 1 > > 14 2 1 > > 15 2 1 > > I''ve changed this table to: > TOS Bits Means Linux Priority Band > ------------------------------------------------------------ > 0x0 0 Normal Service 0 Best Effort 1 > 0x2 1 Minimize Monetary Cost 1 Filler 2 > 0x4 2 Maximize Reliability 0 Best Effort 1 > 0x6 3 mmc+mr 0 Best Effort 1 > 0x8 4 Maximize Throughput 2 Bulk 2 > 0xa 5 mmc+mt 2 Bulk 2 > 0xc 6 mr+mt 2 Bulk 2 > 0xe 7 mmc+mr+mt 2 Bulk 2 > 0x10 8 Minimize Delay 6 Interactive 0 > 0x12 9 mmc+md 6 Interactive 0 > 0x14 10 mr+md 6 Interactive 0 > 0x16 11 mmc+mr+md 6 Interactive 0 > 0x18 12 mt+md 4 Int. Bulk 1 > 0x1a 13 mmc+mt+md 4 Int. Bulk 1 > 0x1c 14 mr+mt+md 4 Int. Bulk 1 > 0x1e 15 mmc+mr+mt+md 4 Int. Bulk 1 >Yes, sorry the last 4 are int_bulk (value 4) and not just bulk (2). good eye. You are still abusing the word TOS. Thats only 4 bits not 8; Use the terminology from RFC1349 at least.> http://ds9a.nl/lartc/HOWTO/cvs/2.4routing/output/2.4routing-9.html#ss9.2 > > Your table appears to imply that a Maximum Reliability, Mininum Delay > packet, TOS bits=9, gets mapped to band 1, not 0, which would not make > sense. >This is the priomap: 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 It says 1 is the right value> Laying it out like this, which does appear how it works, does mean that you > can specify priorities in the priomap which do not correspond to possible > TOS values. >You cant remap the 3 band scheduler trivially, but you should be able to replace it with a default prio qdisc get exactly the same behavior and use whatever map you want (eg your 0 to 1 substitution for TOS 1001)> Is it possible at all to set skb->priority from userspace without going > through the tos2prio mapping? >SO_PRIORITY socket option is doable; you have to be root.> CBQ can use the skb->priority to classify:so do prio and pfifo_fast (as i am sure you are aware)> /* > * Step 1. If skb->priority points to one of our classes, use it. > */ > if (TC_H_MAJ(prio^sch->handle) == 0 && > (cl = cbq_class_lookup(q, prio)) != NULL) > return cl; > > But to do this, you would need to be able to set skb->priority to a 32bit > number: >Cant think of a straight way to do this .... Alexey would know, cheers, jamal
On Sun, 9 Dec 2001, bert hubert wrote:> On Sat, Dec 08, 2001 at 06:30:47PM -0500, jamal wrote: > > > > > > On Sun, 9 Dec 2001, bert hubert wrote: > > > > > > > > I will, but I bet you 5 euros that I''m right :-) We are talking about the > > > TBF, aren''t we? > > > > > > > The trick is that TBF delays packets and the policer drops packet when > > exceeding their profile. > > Well, we''re both right to some extent. The policer drops packets which are > larger than the configured MTU, but it also dimensions the second bucket to > be of size mtu! >[..]> It''s a draw? :-)Actually you get the 5 euro. My comments were in regards to the _policer_ but they were made to comment on the _shaper_ in that specific section of the document. So i suppose when you explain the policer my comments would apply. [The rest of the parameters apply fine] cheers, jamal
bert hubert
2001-Dec-09 01:30 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Sat, Dec 08, 2001 at 08:14:10PM -0500, jamal wrote:> Yes, sorry the last 4 are int_bulk (value 4) and not just bulk (2). good > eye. You are still abusing the word TOS. Thats only 4 bits not 8; > Use the terminology from RFC1349 at least.Will do.> > http://ds9a.nl/lartc/HOWTO/cvs/2.4routing/output/2.4routing-9.html#ss9.2 > > > > Your table appears to imply that a Maximum Reliability, Mininum Delay > > packet, TOS bits=9, gets mapped to band 1, not 0, which would not make > > sense. > > This is the priomap: 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 > It says 1 is the right valueAFAICT, the priomap maps skb->priority to band. So the translation is as follows: Type of Service octet, which is fed to: skb->priority = rt_tos2priority(iph->tos); To extract the four TOS bits, and to translate to prio: static inline char rt_tos2priority(u8 tos) { return ip_tos2prio[IPTOS_TOS(tos)>>1]; } ---- __u8 ip_tos2prio[16] = { TC_PRIO_BESTEFFORT, ECN_OR_COST(FILLER), TC_PRIO_BESTEFFORT, ECN_OR_COST(BESTEFFORT), TC_PRIO_BULK, ECN_OR_COST(BULK), TC_PRIO_BULK, ECN_OR_COST(BULK), TC_PRIO_INTERACTIVE, ECN_OR_COST(INTERACTIVE), TC_PRIO_INTERACTIVE, ECN_OR_COST(INTERACTIVE), TC_PRIO_INTERACTIVE_BULK, ECN_OR_COST(INTERACTIVE_BULK), TC_PRIO_INTERACTIVE_BULK, ECN_OR_COST(INTERACTIVE_BULK) }; --- #define TC_PRIO_BESTEFFORT 0 #define TC_PRIO_FILLER 1 #define TC_PRIO_BULK 2 #define TC_PRIO_INTERACTIVE_BULK 4 #define TC_PRIO_INTERACTIVE 6 #define TC_PRIO_CONTROL 7 #define TC_PRIO_MAX 15 net/sched/sched_generic.c: static const u8 prio2band[TC_PRIO_MAX+1] { 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 }; list = ((struct sk_buff_head*)qdisc->data) + prio2band[skb->priority&TC_PRIO_MAX];> > CBQ can use the skb->priority to classify: > > so do prio and pfifo_fast (as i am sure you are aware)Of course, but only CBQ (& HTB, by the way) can extract a classid directly from it, without a priomap. Devik is planning to learn HTB to extract a classid directly from the fwmark, to skip a layer of indirection. Regards, bert hubert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
bert hubert
2001-Dec-09 01:35 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sat, Dec 08, 2001 at 08:19:07PM -0500, jamal wrote:> > It''s a draw? :-) > > Actually you get the 5 euro. My comments were in regards to the > _policer_ but they were made to comment on the _shaper_ in that specific > section of the document.I hope to see you at OLS :-) Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
jamal
2001-Dec-09 02:10 UTC
Re: CBQ and all other qdiscs now REALLY completely documented (almost!)
On Sun, 9 Dec 2001, bert hubert wrote:> On Sat, Dec 08, 2001 at 08:14:10PM -0500, jamal wrote: > > AFAICT, the priomap maps skb->priority to band. So the translation is as > follows: >yes ;->> > > > so do prio and pfifo_fast (as i am sure you are aware) > > Of course, but only CBQ (& HTB, by the way) can extract a classid directly > from it, without a priomap. Devik is planning to learn HTB to extract a > classid directly from the fwmark, to skip a layer of indirection. >I am not sure if this is such a nice hack. Whats wrong with with using the fwmark classifier to select classes? cheers, jamal
On Sun, 9 Dec 2001, bert hubert wrote:> On Sat, Dec 08, 2001 at 08:19:07PM -0500, jamal wrote: > > > > It''s a draw? :-) > > > > Actually you get the 5 euro. My comments were in regards to the > > _policer_ but they were made to comment on the _shaper_ in that specific > > section of the document. > > I hope to see you at OLS :-)Ok, then you shall get your drink of choice worth 5 euros ;-> cheers, jamal
On Sun, 9 Dec 2001, Henrik Nordstrom wrote:> to ingress. Have not yet figured out if this actually can be used for > anything. > > tc qdisc add dev eth0 ingress tbf .... > (or any other qdisc) > > is happily accepted but afaict does not seem to have any function..I cant seem to get it to work; ------- [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress tbf RTNETLINK answers: Invalid argument [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress [root@jzny tc]# tc -s qdisc qdisc ingress ffff: dev lo Sent 0 bytes 0 pkts (dropped 0, overlimits 0) -------- What is the full command? can you try tc -s qdisc after you add it ? cheers, jamal
Henrik Nordstrom
2001-Dec-09 11:38 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
You need to specify the correct arguments to the qdisc you are adding.. I.e. the same command as adding that qdisc as the root qdisc, but replace root by ingress. tc qdisc add dev eth0 ingress tbf rate 220kbit latency 50ms burst 1540 qdisc tbf ffff: dev eth0 rate 220Kbit burst 1407b lat 2147.5s Sent 0 bytes 0 pkts (dropped 0, overlimits 0) But as I said it does not seem to ever be used. Regards Henrik On Sunday 09 December 2001 03.30, jamal wrote:> On Sun, 9 Dec 2001, Henrik Nordstrom wrote: > > to ingress. Have not yet figured out if this actually can be used for > > anything. > > > > tc qdisc add dev eth0 ingress tbf .... > > (or any other qdisc) > > > > is happily accepted but afaict does not seem to have any function.. > > I cant seem to get it to work; > ------- > [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress tbf > RTNETLINK answers: Invalid argument > [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress > [root@jzny tc]# tc -s qdisc > qdisc ingress ffff: dev lo > Sent 0 bytes 0 pkts (dropped 0, overlimits 0) > -------- > > What is the full command? > can you try tc -s qdisc after you add it ? > > cheers, > jamal
bert hubert
2001-Dec-09 14:40 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sun, Dec 09, 2001 at 12:38:21PM +0100, Henrik Nordstrom wrote:> You need to specify the correct arguments to the qdisc you are adding.. I.e. > the same command as adding that qdisc as the root qdisc, but replace root by > ingress. > > tc qdisc add dev eth0 ingress tbf rate 220kbit latency 50ms burst 1540 > > qdisc tbf ffff: dev eth0 rate 220Kbit burst 1407b lat 2147.5s > Sent 0 bytes 0 pkts (dropped 0, overlimits 0) > > But as I said it does not seem to ever be used.This appears to be a bug - which version of tc are you using? I think in fact tc just added an egress tbf - you can''t specify *any* qdisc as an ingress qdisc except for the bare one, like Jamal does below: (jamal *wrote* the ingress qdisc, he should know)> > I cant seem to get it to work; > > ------- > > [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress tbf > > RTNETLINK answers: Invalid argument > > [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress > > [root@jzny tc]# tc -s qdisc > > qdisc ingress ffff: dev lo > > Sent 0 bytes 0 pkts (dropped 0, overlimits 0) > > -------- > > > > What is the full command? > > can you try tc -s qdisc after you add it ?Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
Hi, This is definetely a bug; Is this with diffserv turned on in iproute2/Config? What version of iproute2 (not that it makes much of a difference) cheers, jamal On Sun, 9 Dec 2001, Henrik Nordstrom wrote:> You need to specify the correct arguments to the qdisc you are adding.. I.e. > the same command as adding that qdisc as the root qdisc, but replace root by > ingress. > > tc qdisc add dev eth0 ingress tbf rate 220kbit latency 50ms burst 1540 > > qdisc tbf ffff: dev eth0 rate 220Kbit burst 1407b lat 2147.5s > Sent 0 bytes 0 pkts (dropped 0, overlimits 0) > > But as I said it does not seem to ever be used. > > Regards > Henrik > > On Sunday 09 December 2001 03.30, jamal wrote: > > On Sun, 9 Dec 2001, Henrik Nordstrom wrote: > > > to ingress. Have not yet figured out if this actually can be used for > > > anything. > > > > > > tc qdisc add dev eth0 ingress tbf .... > > > (or any other qdisc) > > > > > > is happily accepted but afaict does not seem to have any function.. > > > > I cant seem to get it to work; > > ------- > > [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress tbf > > RTNETLINK answers: Invalid argument > > [root@jzny tc]# ./tc qdisc add dev lo handle ffff: ingress > > [root@jzny tc]# tc -s qdisc > > qdisc ingress ffff: dev lo > > Sent 0 bytes 0 pkts (dropped 0, overlimits 0) > > -------- > > > > What is the full command? > > can you try tc -s qdisc after you add it ? > > > > cheers, > > jamal >
Henrik, Can you please try the attahed patch against iproute2-2.4.7-now-ss010824? cheers, jamal
Henrik Nordstrom
2001-Dec-09 15:49 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
Something like 2.2.4 with diffserv enabled (would not allow ingress at all otherwise). To be precise the current RedHat rawhide package, slightly changed to enable diffserv support. Regards Henrik On Sunday 09 December 2001 15.49, jamal wrote:> Hi, > This is definetely a bug; Is this with diffserv turned on in > iproute2/Config? > What version of iproute2 (not that it makes much of a difference) > > cheers, > jamal
Henrik Nordstrom
2001-Dec-09 16:45 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sunday 09 December 2001 16.01, jamal wrote:> Henrik, > Can you please try the attahed patch against iproute2-2.4.7-now-ss010824?Seems to make the tc userspace program to properly reject the arguments. It is a bit sad that one cannot queue packets in ingress. Would be quite useful to make ingress shaping behave more sane than what can be acheived with the queueless filter police mechanism. netfilter supports queueing/delaying of packets and then resume processing them at a later time using nf_reinject, so I think it should be possible to implement a ingress queue without too much effort.. but then the netfilter queueing seems to be very simplistic only supporting one queue per protocol family and this queueing interface is already used for queueing packets to userspace, so perhaps not as easy as I thought.. Queueing in netfilter works by 1. The queueing mechanism registers it''s handler by calling nf_register_queue_handler. Only one queue handler per protocol family is supported. 2. On packets needed to be queued, return NF_QUEUE 3. When the queue handler is done with the packet, it calls nf_reinject with a new verdict. 4. If the packet was not dropped/stolen, netfilter processing continues at the next hook (not priority). The queue handler gets the following information: skb, protocol family, nf hook number, and in/out devices. Regards Henrik
kuznet@ms2.inr.ac.ru
2001-Dec-09 18:14 UTC
Re: CBQ and all other qdiscs now REALLY completely documented
Hello!> > But to do this, you would need to be able to set skb->priority to a 32bit > > number: > > > > Cant think of a straight way to do this .... Alexey would know,SO_PRIORITY. Or I did not follow you? Alexey
bert hubert
2001-Dec-09 18:18 UTC
Re: CBQ and all other qdiscs now REALLY completely documented
On Sun, Dec 09, 2001 at 09:14:46PM +0300, kuznet@ms2.inr.ac.ru wrote:> > > But to do this, you would need to be able to set skb->priority to a 32bit > > > number: > > Cant think of a straight way to do this .... Alexey would know, > > SO_PRIORITY. Or I did not follow you?Ah yes, thanks, that sets sk->priority which later sets skb->priority. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
On Sun, 9 Dec 2001, Henrik Nordstrom wrote:> On Sunday 09 December 2001 16.01, jamal wrote: > > Henrik, > > Can you please try the attahed patch against iproute2-2.4.7-now-ss010824? > > Seems to make the tc userspace program to properly reject the arguments. >That was the intent; allowing it was harmless but gives a bad impression of functionality. So i guess we could say the patch works.> > It is a bit sad that one cannot queue packets in ingress. Would be quite > useful to make ingress shaping behave more sane than what can be acheived > with the queueless filter police mechanism. >Look at the definition of work vs non-work conserving; This is design intent. If you look at the datapath, it is totaly meaningless to put queues at ingress, for routing when they are being queued on ingress as well.> > netfilter supports queueing/delaying of packets and then resume processing > them at a later time using nf_reinject, so I think it should be possible to > implement a ingress queue without too much effort..The implementation/extension is trivial. There is no need for it; I went at great lengths with Martin/devik on this Maybe he can help me here ;->> but then the netfilter > queueing seems to be very simplistic only supporting one queue per protocol > family and this queueing interface is already used for queueing packets to > userspace, so perhaps not as easy as I thought.. >[..] No, implementation is a non-issue. There is no need for it. For 2.5 we might be able to have the ipqueue code use the power of TC. it already talks netlink; i''ll talk to some of the netfilter people. ipqueue has some speacial need to grab packets; we provide much more sophisticated mechanisms than Netfilter; so maybe there''s a marriage possibility. cheers, jamal
On Sun, 9 Dec 2001 kuznet@ms2.inr.ac.ru wrote:> Hello! > > > > But to do this, you would need to be able to set skb->priority to a 32bit > > > number: > > > > > > > Cant think of a straight way to do this .... Alexey would know, > > SO_PRIORITY. Or I did not follow you? >So priority limits the size of skb->priority to be from 0..6; this wont work with that check in cbq. cheers, jamal
Michael T. Babcock
2001-Dec-09 22:33 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sat, Dec 08, 2001 at 03:43:05PM -0500, jamal wrote:> - You keep saying "reodering" -- dont know what that means. Reordering is > generally considered a Bad Thing(tm).Reordering happens on a mass scale (packets often go out in a different order than they were received / generated) but not on a per-qdisc scale (packets go out ''in order'' within an SFQ queue or within a CBQ queue). Its quite obvious that fairness causes overall reordering of the available packets because you sometimes with to pass along (for example) an SSH packet before the 10 waiting FTP packets even though the latter got there first. -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
Michael T. Babcock
2001-Dec-09 22:36 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sat, Dec 08, 2001 at 10:30:55PM +0100, bert hubert wrote:> The ingress qdisc is a strange animal in that is not used to send packets > out to the network adaptor. Instead, it allows you to apply tc filters to > packets coming in over the interface, regardless of whether they have a > local destination or are to be forwarded.Opinions, opinions ... just the facts please. My suggested paragraph: The ingress qdisc allows the application of tc filters to the inbound packets on an interface instead of the outgoing ones. This filtering is done to all incoming packets, whether destined for the local host or to be forwarded. -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
bert hubert
2001-Dec-10 00:41 UTC
CBQ MANPAGE: I hear the theme of ''2001, A Space Odyssey''
... to the sound of ''Also sprach Zarathustra'': After weeks of social deprivation and much digging through heaps of code, I bring you tc-cbq.8 The CBQ manpage. Nearly 2500 words, 8 printed pages, of nearly unintelligible gobledygook, explaining mostly how CBQ works. It is part of the Linux Advanced Routing & Traffic Control documentation project which contains a HOWTO, a mailinglist, an IRC channel and now manpages: http://ds9a.nl/lartc I want to thank Jamal for stubbornly straightening me out when I use messy language and explaining how things work. The errors are mine though. I *implore* ANK and others to read through this. I''m about exhausted and running out of time (need to get on with work), and have a hard time figuring out the exact details of the CBQ link sharing algorithm. I need help, so to speak. The manpage indicates where. Thanks for your attention. Please find tc-cbq.8 attached. Regards, bert hubert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
jamal
2001-Dec-10 01:04 UTC
Re: CBQ MANPAGE: I hear the theme of ''2001, A Space Odyssey''
Sorry didnt read it; did the 30 sec scan .. If this is meant to be for users, why are you talking about skb->priority? Isnt it sufficient to just call it prioirity? Also, if you think that Alexeys imp. is based on Floyd only, you are highly mistaken; Going back to high latency response mode ... cheers, jamal On Mon, 10 Dec 2001, bert hubert wrote:> ... to the sound of ''Also sprach Zarathustra'': > > After weeks of social deprivation and much digging through heaps of code, I > bring you > > tc-cbq.8 > > The CBQ manpage. Nearly 2500 words, 8 printed pages, of nearly > unintelligible gobledygook, explaining mostly how CBQ works. > > It is part of the Linux Advanced Routing & Traffic Control documentation > project which contains a HOWTO, a mailinglist, an IRC channel and now > manpages: > > http://ds9a.nl/lartc > > I want to thank Jamal for stubbornly straightening me out when I use messy > language and explaining how things work. The errors are mine though. > > I *implore* ANK and others to read through this. I''m about exhausted and > running out of time (need to get on with work), and have a hard time > figuring out the exact details of the CBQ link sharing algorithm. I need > help, so to speak. The manpage indicates where. > > Thanks for your attention. Please find tc-cbq.8 attached. > > Regards, > > bert hubert > > > -- > http://www.PowerDNS.com Versatile DNS Software & Services > Trilab The Technology People > Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - > ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet >
bert hubert
2001-Dec-10 01:12 UTC
Re: CBQ MANPAGE: I hear the theme of ''2001, A Space Odyssey''
On Sun, Dec 09, 2001 at 08:04:42PM -0500, jamal wrote:> Sorry didnt read it; did the 30 sec scan .. > If this is meant to be for users, why are you talking about skb->priority? > Isnt it sufficient to just call it prioirity?It''s not done yet and may need some readability tuning. Note however that skb->priority is a bit overloaded. It can contain a priority, but also a 32bit encoded classid. These are different things, so they deserve different mention.> Also, if you think that Alexeys imp. is based on Floyd only, you are > highly mistaken;I just copied the attribution from the kernel, am glad to rectify things.> Going back to high latency response mode ...Thanks for reviewing. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
Cédric Rivard
2001-Dec-10 01:12 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sunday 9 December 2001 22:41, jamal wrote:> > It is a bit sad that one cannot queue packets in ingress. Would be quite > > useful to make ingress shaping behave more sane than what can be acheived > > with the queueless filter police mechanism. > > Look at the definition of work vs non-work conserving; This is design > intent. If you look at the datapath, it is totaly meaningless to put > queues at ingress, for routing when they are being queued on ingress as > well.Wouldn''t it make sense to set a non-work conserving interface on ingress and a work conserving interface on egress ? That would be handy to share bandwidth between outgoing packets through different interfaces. Cedric
Martin Devera
2001-Dec-10 08:38 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
> > It is a bit sad that one cannot queue packets in ingress. Would be quite > > useful to make ingress shaping behave more sane than what can be acheived > > with the queueless filter police mechanism. > > > > Look at the definition of work vs non-work conserving; This is design > intent. If you look at the datapath, it is totaly meaningless to put > queues at ingress, for routing when they are being queued on ingress as > well.hehe, jamal did you remember long discussion we have had about this (at diffserv list) ? :-)> > netfilter supports queueing/delaying of packets and then resume processing > > them at a later time using nf_reinject, so I think it should be possible to > > implement a ingress queue without too much effort.. > > The implementation/extension is trivial. There is no need for it; I went > at great lengths with Martin/devik on this Maybe he can help me here ;->yup I have not read whole message first :) So that you remember. The conclusion was that only reason of queue at ingres might be fact that existing queue stays here as indicator of flow''s activity. Definitely it would be helpful to create work conserving model of CBQ (HTB :-)) which would drop packets instead to dequeue them. IMHO ingres queuing could be used as poor man''s way how to reshape (or priorize) traffic which can''t be shaped at egress side (usualy because of adminstrative boundaries). This need would vanish in presence of such classfull work conserving CBQ. Note that you can do some similar things with policers but you can''t do the same thinks as with CBQ - you can''t set priorized borrowing hierarchy up.> For 2.5 we might be able to have the ipqueue code use the power of TC. it > already talks netlink; i''ll talk to some of the netfilter people. ipqueue > has some speacial need to grab packets; we provide much more sophisticated > mechanisms than Netfilter; so maybe there''s a marriage possibility.ipqueue !? what is it ? sounds good :) regards, devik
Henrik Nordstrom
2001-Dec-10 08:41 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Sunday 09 December 2001 22.41, jamal wrote:> Look at the definition of work vs non-work conserving; This is design > intent. If you look at the datapath, it is totaly meaningless to put > queues at ingress, for routing when they are being queued on ingress as > well.(on egress as well I assume...) True, but not all applications of shaping have the luxury of egress. For example, consider the not too uncommon example of a computer connected via 100Mbps networking to a DSL modem, and you want to tune the use of the link without needing to introduce a router inbetween.> The implementation/extension is trivial. There is no need for it; I went > at great lengths with Martin/devik on this Maybe he can help me here ;->So do you have any argument why one should not be able to shape incoming local traffic to a station in a good manner without having a router do the shaping? hh Regards Henrik
Henrik Nordstrom
2001-Dec-10 09:59 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Monday 10 December 2001 09.38, Martin Devera wrote:> Definitely it would be helpful to create work conserving model > of CBQ (HTB :-)) which would drop packets instead to dequeue > them.Won''t help me I think. I don''t have the need of hierarchies or fancy prioritizations between different traffics or whatever, only to be able to slow down (with a delay) some specific traffic destined for the local TCP. The filter police allows me to drop packets, but do not allow me to introduce delays. TCP is generally too smart to be delayed proper by "randomly" dropped packets without any signs in RTT. Especially when the RTT is small.> IMHO ingres queuing could be used as poor man''s way how to reshape > (or priorize) traffic which can''t be shaped at egress side (usualy > because of adminstrative boundaries). This need would vanish in > presence of such classfull work conserving CBQ.And such a administrative boundary is the one I am playing on. The boundary between a small customer and his ISP. The ISP obviously have the luxury of egress, but the customer does not on traffic received by him. Exacly how would this need vanish? Regards Henrik
Martin Devera
2001-Dec-10 11:35 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
> > Definitely it would be helpful to create work conserving model > > of CBQ (HTB :-)) which would drop packets instead to dequeue > > them. > > Won''t help me I think. I don''t have the need of hierarchies or fancy > prioritizations between different traffics or whatever, only to be able to > slow down (with a delay) some specific traffic destined for the local TCP. > > The filter police allows me to drop packets, but do not allow me to introduce > delays. > > TCP is generally too smart to be delayed proper by "randomly" dropped packets > without any signs in RTT. Especially when the RTT is small.Are you sure !? TCP slows down by half on every dropped packet per congestion window AFAIK. On other side it is often hard to slow TCP down by packet delay as TCP will try to accomodate it by making MSS larger. Am I right jamal ?> > > IMHO ingres queuing could be used as poor man''s way how to reshape > > (or priorize) traffic which can''t be shaped at egress side (usualy > > because of adminstrative boundaries). This need would vanish in > > presence of such classfull work conserving CBQ. > > And such a administrative boundary is the one I am playing on. The boundary > between a small customer and his ISP. The ISP obviously have the luxury of > egress, but the customer does not on traffic received by him. > > Exacly how would this need vanish?as I said above, packet dropping works well (at least for me in the same ISP scenario). When you are queuing then delay only helps you to postpone burst of reply data to some less used time. When bulk traffic persists, packet is droppes and TCP fallbacks down. devik
Martin Devera
2001-Dec-10 11:59 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
> True, but not all applications of shaping have the luxury of egress. For > example, consider the not too uncommon example of a computer connected via > 100Mbps networking to a DSL modem, and you want to tune the use of the link > without needing to introduce a router inbetween. > > > The implementation/extension is trivial. There is no need for it; I went > > at great lengths with Martin/devik on this Maybe he can help me here ;-> > > So do you have any argument why one should not be able to shape incoming > local traffic to a station in a good manner without having a router do the > shaping?jamal would probably say here that it is nonsence to delay/queue packet which already arived to your box :) I still think that is COULD be helpful and much more clear to be able to attach any qdisc at ingres. Current "ingres" qdisc could remain and work in the same way but additionaly when qdisc returns OK - I eat the packet - then net_bh would try to dequeue it. Then there would be no "special" qdisc like ingres - it would be subset of regular qdisc. The only real (implementation) problem I see is related to netif_restart - we''d need rather tc subsys suplied callback sch_restart(sch) which would do different thing for ingres and egres. All other parts should be easy to implement. On other note. I created so called IMQ device which solves different (but similar) problem. It should be easy to extend it to queue all incoming packets here too. Now only all outgoing ones go here. IMQ is pretty simple device and is already used in production environment by two people. devik
Henrik Nordstrom
2001-Dec-10 12:00 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Monday 10 December 2001 12.35, Martin Devera wrote:> Are you sure !? TCP slows down by half on every dropped > packet per congestion window AFAIK. > On other side it is often hard to slow TCP down by packet > delay as TCP will try to accomodate it by making MSS larger. > Am I right jamal ?You basically need both, or your packet drops will constantly be fighting retransmits as TCP is trying to recover. There will eventually be a balance, but not a too nice one.> as I said above, packet dropping works well (at least for me > in the same ISP scenario). > When you are queuing then delay only helps you to postpone > burst of reply data to some less used time. > When bulk traffic persists, packet is droppes and TCP fallbacks > down.Sure, just dropping packets will work to some extent, but not by far as efficient in transmitted data as limited queue with drop on overflow (preferably smarter than FIFO+overflow if you need support more than one concurrent TCP session). Regards Henrik
bert hubert
2001-Dec-10 12:14 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Mon, Dec 10, 2001 at 10:59:38AM +0100, Henrik Nordstrom wrote:> TCP is generally too smart to be delayed proper by "randomly" dropped packets > without any signs in RTT. Especially when the RTT is small.Richard Stevens disagrees with you.> And such a administrative boundary is the one I am playing on. The boundary > between a small customer and his ISP. The ISP obviously have the luxury of > egress, but the customer does not on traffic received by him. > > Exacly how would this need vanish?You can turn ingress into egress by inserting another machine of course. Ingress shaping, well, is weird if you have no concept of an ''ingress queue''. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services Trilab The Technology People Netherlabs BV / Rent-a-Nerd.nl - Nerd Available - ''SYN! .. SYN|ACK! .. ACK!'' - the mating call of the internet
Henrik Nordstrom
2001-Dec-10 12:25 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Monday 10 December 2001 12.59, Martin Devera wrote:> jamal would probably say here that it is nonsence to delay/queue packet > which already arived to your box :)In a station trying to shape the traffic sent to him it does by limiting the waste of retransmits. egress queues does not help then as there is no egress where to queue the packet. To argue that it is nonsense to have a ingress queue for your own received packets is the same as to argue that it is nonsense to have a egress queue for routed packets. The packet dynamics are the same, only the application is slightly different. To summary: ingress queues makes sense when you want to shape the traffic sent to you. This is most obviously true when looking at a station, but also true to some extent in routers. To get the minds working on the router case consider the following router: ppp0 256 Kpbs link to ISP eth0 100 Mbps lan 1 eth1 100 Mbps lan 2 eth3 100 Mbps lan 3 And you want without needing to involve your ISP reasonably shape incoming traffic received on ppp0 to not use more than 64kbps for rsync in total. Doing such a setup using ingress shaping is "trivial". Doing it using egress shaping on each of the ethernet interfaces is not. Sure, the queues is only of use if the queue can be large enough to fit significant portions of your active TCP windows, but when they are they can significantly increase link efficiency by trading per packet latency. Regards Henrik
Martin Devera
2001-Dec-10 12:52 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
> This is most obviously true when looking at a station, but also true to some > extent in routers. To get the minds working on the router case consider the > following router: > > ppp0 256 Kpbs link to ISP > eth0 100 Mbps lan 1 > eth1 100 Mbps lan 2 > eth3 100 Mbps lan 3 > > And you want without needing to involve your ISP reasonably shape incoming > traffic received on ppp0 to not use more than 64kbps for rsync in total. > > Doing such a setup using ingress shaping is "trivial". Doing it using egress > shaping on each of the ethernet interfaces is not.for this special case I designed IMQ :)
On Mon, 10 Dec 2001, Henrik Nordstrom wrote:> On Monday 10 December 2001 12.59, Martin Devera wrote: > > > jamal would probably say here that it is nonsence to delay/queue packet > > which already arived to your box :) > > In a station trying to shape the traffic sent to him it does by limiting the > waste of retransmits. egress queues does not help then as there is no egress > where to queue the packet. > > To argue that it is nonsense to have a ingress queue for your own received > packets is the same as to argue that it is nonsense to have a egress queue > for routed packets. The packet dynamics are the same, only the application is > slightly different.No, I would strongly suggest you run tests with dropped vs delayed TCP packets. What you''ll see is that even when you delay TCP packets retransmits will happen. So this is a weak reason. At least Martin and I agreed that the only reason youd need ingress is to maintain the same TC semantics across ingress and egress; As for the ipqueue folks there is a certain limitation with netlink at the moment (hence the per-protocol family issue); so we might have to help them queue packets in the kernel; pass only the headers to user space; let them make a decision on the fate of the packet and some qdisc will act on that decision. Now that is the hardway of doing things; the easy way is to fix netlink. Going back to hiding under paying work cheers, jamal
Martin Devera
2001-Dec-10 13:40 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
> > Are you sure !? TCP slows down by half on every dropped > > packet per congestion window AFAIK. > > On other side it is often hard to slow TCP down by packet > > delay as TCP will try to accomodate it by making MSS larger. > > Am I right jamal ? > > You basically need both, or your packet drops will constantly be fighting > retransmits as TCP is trying to recover. There will eventually be a balance, > but not a too nice one.I''d paste part of old jamal''s email here: ---- Recall Mathis equation: TCP b/width = C*MSS/(RTT*sqrt(p)) where p is the drop probability and C is a constant which changes under different conditions ---- Additionaly TCP tries to adapt window to RTT so that min(MSS,cwnd)/RTT changes very slowly. Thus remaining part is C/sqrt(P). As you see - it is hard to slow TCP down without dropping... If you will set up infinite queue and delay by 1s the TCP will keep window increasing until drop occurs or reciever''s advertised window is hit. I''d suggest you to read rfs2001. devik
Jim Fleming
2001-Dec-10 13:42 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
----- Original Message ----- From: "bert hubert" <ahu@ds9a.nl> To: "Henrik Nordstrom" <hno@marasystems.com> Cc: "Martin Devera" <devik@cdi.cz>; "jamal" <hadi@cyberus.ca>; <lartc@mailman.ds9a.nl> Sent: Monday, December 10, 2001 6:14 AM Subject: Re: [LARTC] Re: further CBQ/tc documentation ds9a.nl/lartc/manpages> On Mon, Dec 10, 2001 at 10:59:38AM +0100, Henrik Nordstrom wrote: > > > TCP is generally too smart to be delayed proper by "randomly" dropped packets > > without any signs in RTT. Especially when the RTT is small. > > Richard Stevens disagrees with you. >I assume you mean the late Richard Stevens, who passed away at an early age after helping to document the BSD Internet Protocol software in great detail. BTW, while on the subject of TCP, Qs, etc. It seems rather odd that people engrossed in protocols, would not consider the long history of making sure that layers are preserved or at least considered. You may have heard of Layer 2 (Network) and Layer 3 (Transport). It would seem that much of the work on Queues and the shaping of flows would most properly be located in Layer 3. The additions we are making to Layer 2, focus on extending the addressing and other functionality of Layer 2. It seems that Layer 2 will not have enough bits to accommodate all of the ideas that people are pouring into it. This may help... http://www.dot-biz.com/IPv4/Tutorial/ Jim Fleming http://www.IPv8.info IPv16....One Better !!
Michael T. Babcock
2001-Dec-10 13:52 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Mon, Dec 10, 2001 at 09:38:33AM +0100, Martin Devera wrote:> IMHO ingres queuing could be used as poor man''s way how to reshape > (or priorize) traffic which can''t be shaped at egress side (usualy > because of adminstrative boundaries). This need would vanish in > presence of such classfull work conserving CBQ.Please fill me in -- how could it ever not be possible to shape egress traffic? Or are you refering to the egress traffic of the upstream to the machine whose ingress side you wish to modify? -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
Michael T. Babcock
2001-Dec-10 13:54 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Mon, Dec 10, 2001 at 09:41:59AM +0100, Henrik Nordstrom wrote:> example, consider the not too uncommon example of a computer connected via > 100Mbps networking to a DSL modem, and you want to tune the use of the link > without needing to introduce a router inbetween.Assuming there is only one computer (and therefore no need for the router), why would you want something other than a work-conserving ingress policy? Drop certain packets, allow everything else ... I can almost see your point if we''re discussing very slow computers, but in that case the qdisc''s would slow it down more -- could you please fill me in on your assumptions here? -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
Gerry Creager N5JXS
2001-Dec-10 13:56 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
bert hubert wrote:> > On Mon, Dec 10, 2001 at 10:59:38AM +0100, Henrik Nordstrom wrote: > > > TCP is generally too smart to be delayed proper by "randomly" dropped packets > > without any signs in RTT. Especially when the RTT is small. > > Richard Stevens disagrees with you. > > > And such a administrative boundary is the one I am playing on. The boundary > > between a small customer and his ISP. The ISP obviously have the luxury of > > egress, but the customer does not on traffic received by him. > > > > Exacly how would this need vanish? > > You can turn ingress into egress by inserting another machine of course. > Ingress shaping, well, is weird if you have no concept of an ''ingress > queue''.Once you start working with tagging for DiffServ, you find that an ingress queue is a valuable idea. From our perspective here it is a differentiator in looking at some of the "big iron" from the likes of Juniper, Anritsu, Marconi, Cisco and Alcatel. Specifically, we''re looking at priority queueing for management of various services: VoIP, streaming video (unicast and multicast), H.323, etc. Ingress queueing provides us an opportunity to tag and shape coming into the router rather than simply shaping on egress. Our campus requires (geographic considerations) 7 internal routers before we come to the edge. We have to shape on ingress at the first one, then maintain the marking and policies throughout the network. -- Gerry Creager -- gerry@cs.tamu.edu Network Engineering Academy for Advanced Telecommunications and Learning Technologies Texas A&M University 979.458.4020 (Phone) -- 979.847.8578 (Fax)
Michael T. Babcock
2001-Dec-10 13:58 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
On Mon, Dec 10, 2001 at 01:00:41PM +0100, Henrik Nordstrom wrote:> Sure, just dropping packets will work to some extent, but not by far as > efficient in transmitted data as limited queue with drop on overflow > (preferably smarter than FIFO+overflow if you need support more than one > concurrent TCP session).This is why RED was created -- some of the research papers on RED show how it will balance out TCP traffic very well if tuned properly, but it isn''t always perfectly straightforward. -- Michael T. Babcock CTO, FibreSpeed Ltd. (Hosting, Security, Consultation, Database, etc) http://www.fibrespeed.net/~mbabcock/
Martin Devera
2001-Dec-10 14:02 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
> > IMHO ingres queuing could be used as poor man''s way how to reshape > > (or priorize) traffic which can''t be shaped at egress side (usualy > > because of adminstrative boundaries). This need would vanish in > > presence of such classfull work conserving CBQ. > > Please fill me in -- how could it ever not be possible to shape > egress traffic? Or are you refering to the egress traffic of the > upstream to the machine whose ingress side you wish to modify?exactly.
Jim Fleming
2001-Dec-10 15:04 UTC
Re: Re: further CBQ/tc documentation ds9a.nl/lartc/manpages
----- Original Message ----- From: "Gerry Creager N5JXS" <gerry@cs.tamu.edu> To: "bert hubert" <ahu@ds9a.nl> Cc: "Henrik Nordstrom" <hno@marasystems.com>; "Martin Devera" <devik@cdi.cz>; "jamal" <hadi@cyberus.ca>; <lartc@mailman.ds9a.nl> Sent: Monday, December 10, 2001 7:56 AM Subject: Re: [LARTC] Re: further CBQ/tc documentation ds9a.nl/lartc/manpages> bert hubert wrote: > > > > On Mon, Dec 10, 2001 at 10:59:38AM +0100, Henrik Nordstrom wrote: > > > > > TCP is generally too smart to be delayed proper by "randomly" dropped packets > > > without any signs in RTT. Especially when the RTT is small. > > > > Richard Stevens disagrees with you. > > > > > And such a administrative boundary is the one I am playing on. The boundary > > > between a small customer and his ISP. The ISP obviously have the luxury of > > > egress, but the customer does not on traffic received by him. > > > > > > Exacly how would this need vanish? > > > > You can turn ingress into egress by inserting another machine of course. > > Ingress shaping, well, is weird if you have no concept of an ''ingress > > queue''. > > Once you start working with tagging for DiffServ, you find that an > ingress queue is a valuable idea. From our perspective here it is a > differentiator in looking at some of the "big iron" from the likes of > Juniper, Anritsu, Marconi, Cisco and Alcatel. > > Specifically, we''re looking at priority queueing for management of > various services: VoIP, streaming video (unicast and multicast), H.323, > etc. Ingress queueing provides us an opportunity to tag and shape > coming into the router rather than simply shaping on egress. Our campus > requires (geographic considerations) 7 internal routers before we come > to the edge. We have to shape on ingress at the first one, then > maintain the marking and policies throughout the network. > -- > Gerry Creager -- gerry@cs.tamu.edu > Network Engineering > Academy for Advanced Telecommunications and Learning Technologies > Texas A&M University 979.458.4020 (Phone) -- 979.847.8578 (Fax) >"We have to shape on ingress at the first one, then maintain the marking and policies throughout the network." It sounds like you need RIFRAF Routing. RIFRAF - Remote Identification Field Random Action Filter Jim Fleming http://www.IPv8.info IPv16....One Better !!
kuznet@ms2.inr.ac.ru
2001-Dec-10 17:04 UTC
Re: CBQ and all other qdiscs now REALLY completely documented
Hello!> So priority limits the size of skb->priority to be from 0..6; this wont > work with that check in cbq.No, it does not. Values different of "low prio" defaults (0..6) are not allowed to user without privileges by evident reasons. User with correspoding capability may direct traffic to any class. Alexey