I am trying to do ingress flow control with htb + imq, and as could be expected it isn''t working well. It works a lot better when I keep the htb ceiling well below what the link can actually carry - I guess because htb gets to throttle the TCP fast start before it causes packets to be dropped. The only problem is wasting all that bandwidth hurts. It occurred to me that the bandwidth needn''t be wasted, if only I can convince HTB to reserve some bandwidth for a class. For example, lets say we have a 1000kbit link, and two classes sharing that link: - Voip - ie high prio real time, and - Web - background traffic. Right now, with htb or cbq or whatever, I can do this: Guaranteed Rate Ceiling Prio Link 700kbit 700kbit |--Voip 200kbit 700kbit 1 \--Web 300kbit 700kbit 2 This works, in that Voip won''t be hit by new connections overloading the link before htb can bring then under control. But it wastes 300kbit of bandwidth in doing so. An observation. Lets say the link is carrying its rated capacity. Ie, there is 400kbit of Voip traffic, and 300kbit of web traffic. In this scenario, there is really no harm letting the Voip use the remaining 300kbit of spare capacity. The Voip traffic is already being shaped so delays are being introduced by the HTB filter anyway. If we let it use the remaining 300kb that shaping may disappear. Yes, it may now be hit by new incoming TCP traffic - but we may get lucky and it may not, whereas before it was always being shaped by HTB. To be more precise, I want to create some "headroom" that VOIP can use, but Web traffic can''t. Here are some examples. In each case the link is maxed out. Packets arriving at filter Packets sent by Filter Voip Web Voip Web 0 BIG 0 700kbit 200kbit BIG 200kbit 500kbit 400kbit BIG 400kbit 300kbit 600kbit BIG 600kbit 300kbit BIG BIG 700kbit 300kbit As far as I can tell, HTB, nor any other qdisc for that matter, can be configured to do this. Am I correct?
On Thu, Feb 23, 2006 at 06:38:09PM +1000, Russell Stuart wrote:> For example, lets say we have a 1000kbit link, and two > classes sharing that link: > > - Voip - ie high prio real time, and > - Web - background traffic.Have you measured this link, i.e. when there is no activity and you start some Voip sessions, do they get a constant downstream of 1000kbit? It may very well be that you have to measure the real throughput and then go a little lower (since you have to be the bottleneck), however having to throw 30% of bandwidth away sounds a bit too harsh to me.> Guaranteed Rate Ceiling Prio > Link 700kbit 700kbit > |--Voip 200kbit 700kbit 1 > \--Web 300kbit 700kbit 2Are there other classes as well, because the sum of Voip + Web rate is just 500kbit, where the parent class offers 700kbit? You should make sure that the sum of child class rates equals the parent class rate. HTB results get more predictable that way.> To be more precise, I want to create some "headroom" that > VOIP can use, but Web traffic can''t.Usually, this "headroom" is the rate. In your example, Voip has 200kbit of bandwidth guaranteed. Web traffic can''t use it unless of course there is no Voip traffic at all. Another way of indirect headroom would be to hard limit the Web class, i.e. give the Web class a lower ceil than the other classes. This way, there is bandwidth that the Web class can''t use no matter what, even if the link is completely empty. Regards Andreas Klauer
On Thu, 2006-02-23 at 10:23 +0100, Andreas Klauer wrote:> On Thu, Feb 23, 2006 at 06:38:09PM +1000, Russell Stuart wrote: > > For example, lets say we have a 1000kbit link, and two > > classes sharing that link: > > > > - Voip - ie high prio real time, and > > - Web - background traffic. > > Have you measured this link, i.e. when there is no activity > and you start some Voip sessions, do they get a constant > downstream of 1000kbit? > > It may very well be that you have to measure the real throughput > and then go a little lower (since you have to be the bottleneck), > however having to throw 30% of bandwidth away sounds a bit too > harsh to me.The setup I gave was purely hypothetical. 300kbit headroom sounds way to high to me as well - any advice others may have on this would be appreciated.> Another way of indirect headroom would be to hard limit the Web class, > i.e. give the Web class a lower ceil than the other classes. This way, > there is bandwidth that the Web class can''t use no matter what, even > if the link is completely empty.That is the right answer - it would achieve what I want. In hindsight it seems so obvious I don''t know why I didn''t think of it myself. Thanks for taking the time to answer my query.
Russell Stuart wrote:> > On Thu, 2006-02-23 at 10:23 +0100, Andreas Klauer wrote: > > On Thu, Feb 23, 2006 at 06:38:09PM +1000, Russell Stuart wrote: > > > For example, lets say we have a 1000kbit link, and two > > > classes sharing that link: > > > > > > - Voip - ie high prio real time, and > > > - Web - background traffic. > > > > Have you measured this link, i.e. when there is no activity > > and you start some Voip sessions, do they get a constant > > downstream of 1000kbit? > > > > It may very well be that you have to measure the real throughput > > and then go a little lower (since you have to be the bottleneck), > > however having to throw 30% of bandwidth away sounds a bit too > > harsh to me. > > The setup I gave was purely hypothetical. 300kbit > headroom sounds way to high to me as well - any > advice others may have on this would be appreciated. > > > Another way of indirect headroom would be to hard limit the Web class, > > i.e. give the Web class a lower ceil than the other classes. This way, > > there is bandwidth that the Web class can''t use no matter what, even > > if the link is completely empty. > > That is the right answer - it would achieve what I want. > In hindsight it seems so obvious I don''t know why I > didn''t think of it myself. > > Thanks for taking the time to answer my query.Two more things. HTTP is a bursty protocol, so you need to think about the burst and cburst parameters you give it. If you want to squash TCP fast start, use a low burst which will backlog and eventually drop the excessive packets. On the other hand, my experience is that a slow started connection never increases its flow rate much even though the spec says it should. And you can get better precision from HTB by setting HYSTERYSIS (did I just misspell that?), thus dequeueing a single packet rather than a pair. I don''t recommend that, but you should know about it. On many ATM links it is a godsend. In terms of headroom, I find that 85 % of real capacity always works, so I start with that and push up until something breaks. YMMV. -- gypsy
On Thu, 2006-02-23 at 19:49 -0800, gypsy wrote:> Two more things. HTTP is a bursty protocol, so you need to think about > the burst and cburst parameters you give it.I had already figured out that I had to send burst as small as possible. I recall reading both value is the> If you want to squash TCP > fast start, use a low burst which will backlog and eventually drop the > excessive packets. On the other hand, my experience is that a slow > started connection never increases its flow rate much even though the > spec says it should. And you can get better precision from HTB by > setting HYSTERYSIS (did I just misspell that?), thus dequeueing a single > packet rather than a pair. I don''t recommend that, but you should know > about it. On many ATM links it is a godsend.I had already figured out that I had to sent burst as small as possible, but the HTB User Guide says "Latest tc tool will compute and set the smallest possible burst when it is not specified", so I had left it alone. In fact it defaults to 1919 bytes in my case. Looking at the TC source, this is calculated as: (rate_in_bytes_per_second / HZ) + mtu and then rounded up to the next entry in the rate table. Perhaps: max(rate_in_bytes_per_second / HZ, mtu) would of been a better choice. In my case that will evaluate to the mtu, so I will try that.> In terms of headroom, I find that 85 % of real capacity always works, so > I start with that and push up until something breaks. YMMV.Excellent! Thank you.
Sorry for the mess posted before. I hit send by mistake. On Thu, 2006-02-23 at 19:49 -0800, gypsy wrote:> Two more things. HTTP is a bursty protocol, so you need to think about > the burst and cburst parameters you give it.I had already figured out that I had to sent burst as small as possible, but the HTB User Guide says "Latest tc tool will compute and set the smallest possible burst when it is not specified", so I had left it alone. In fact it defaults to 1919 bytes in my case. Looking at the TC source, this is calculated as: (rate_in_bytes_per_second / HZ) + mtu and then rounded up to the next entry in the rate table. Perhaps: max(rate_in_bytes_per_second / HZ, mtu) would of been a better choice. In my case that will evaluate to the mtu, so I will try that.> If you want to squash TCP > fast start, use a low burst which will backlog and eventually drop the > excessive packets. On the other hand, my experience is that a slow > started connection never increases its flow rate much even though the > spec says it should. And you can get better precision from HTB by > setting HYSTERYSIS (did I just misspell that?), thus dequeueing a single > packet rather than a pair. I don''t recommend that, but you should know > about it. On many ATM links it is a godsend.Looking at the kernel code, HTB_HYSTERESIS is set in kernel.org kernels as shipped. You have to unset it if you have large (>100K byte) bursts, apparently.> In terms of headroom, I find that 85 % of real capacity always works, so > I start with that and push up until something breaks. YMMV.Excellent! Thank you.
On Fri, 2006-02-24 at 07:27 +1000, Russell Stuart wrote:> On Thu, 2006-02-23 at 10:23 +0100, Andreas Klauer wrote: > > Another way of indirect headroom would be to hard limit the Web class, > > i.e. give the Web class a lower ceil than the other classes. This way, > > there is bandwidth that the Web class can''t use no matter what, even > > if the link is completely empty. > > That is the right answer - it would achieve what I want. > In hindsight it seems so obvious I don''t know why I > didn''t think of it myself.Turns out it didn''t do what I wanted. But there is a way to force HTB to do what I wanted, and I think it is unusual enough to document by posting it here. To recap: what I wanted to do to was reserve approx 20% of the bandwidth as headroom for VOIP. The headroom was to prevent the TCP faststart of new incoming connections from hitting the VOIP traffic. The headroom bandwidth was must remain unused, unless VOIP itself was forced to eat into it. Another way of saying the same thing is I wanted to allocate some bandwidth to VOIP that it would not lend to other classes. Example: VOID - Assured Rate 30%, 20% of which can only be used by VOIP. Lowest latency. INTERACTIVE - Assured Rate 20%, all of which may be borrowed by other classes. Middle Latency. BULK - Assured Rate 50%, all of which may be borrowed by other classes. Highest Latency. HTB Class structure the implements this: htb class parent 1: classid 1:10 rate 80% ceil 100% htb class parent 1:10 classid 1:11 rate 30% ceil 100% htb class parent 1:11 classid 1:19 rate 30% ceil 100% prio 0 [VOIP leaf] htb class parent 1:10 classid 1:20 rate 70% ceil 80% htb class parent 1:20 classid 1:21 rate 20% ceil 80% prio 1 [interactive leaf] htb class parent 1:20 classid 1:22 rate 50% ceil 80% prio 2 [other leaf] This is the small class tree I can think of that does it.
On Sun, Mar 05, 2006 at 01:43:11PM +1000, Russell Stuart wrote:> HTB Class structure the implements this: > > htb class parent 1: classid 1:10 rate 80% ceil 100%To my understanding, a root class that has a higher ceil than rate can always use bandwidth up to it''s ceil. Thus it would be more correct to set the rate to 100% here (whatever that is) as well.> htb class parent 1:10 classid 1:11 rate 30% ceil 100% > htb class parent 1:11 classid 1:19 rate 30% ceil 100% prio 0 [VOIP leaf]If class 1:19 is the only child of class 1:11, it can always use whatever bandwidth the parent class can get. So the parent class does not have any limiting factor here and you can just skip it and attach 1:19 to 1:10 directly.> htb class parent 1:10 classid 1:20 rate 70% ceil 80% > htb class parent 1:20 classid 1:21 rate 20% ceil 80% prio 1 [interactive leaf] > htb class parent 1:20 classid 1:22 rate 50% ceil 80% prio 2 [other leaf] > > This is the small class tree I can think of that does it.The only difference to the tree before is that you grouped all non-voip traffic under a separate parent class which is capped at 80%, so voip always has 20% of available bandwidth which can not be taken away from it, right? If that was your intention, then it''s fine. Regards Andreas Klauer
On Sun, 2006-03-05 at 10:16 +0100, Andreas Klauer wrote:> > > > htb class parent 1: classid 1:10 rate 80% ceil 100% > > To my understanding, a root class that has a higher ceil than rate > can always use bandwidth up to it''s ceil. Thus it would be more > correct to set the rate to 100% here (whatever that is) as well.I can verify it doesn''t. I have implemented this in real life, and the class is limited to the "rate".> > htb class parent 1:10 classid 1:11 rate 30% ceil 100% > > htb class parent 1:11 classid 1:19 rate 30% ceil 100% prio 0 [VOIP leaf] > > If class 1:19 is the only child of class 1:11, it can always use > whatever bandwidth the parent class can get. So the parent class > does not have any limiting factor here and you can just skip it > and attach 1:19 to 1:10 directly.It looks like you are right, although perhaps not for the reasons you think. The structure was intended to prevent a priority inversion that occurs in HTB when you use deep trees. Lets say we have this example: root 1:10 --------- 1:30 prio 0 / \ 1:31 prio 1 1:30 1:21 1:32 prio 2 / \ 1:31 1:32 Lets assume 1:30 and 1:32 are over limit. 1:31 is not sending anything, and thus 1:21 is under limit. What I wanted to happen is 1:30 and 1:32 are serviced based on their priority - ie 1:30 sees a lower latency as it has the lowest priority. But that won''t happen in this case. Looking at: http://luxik.cdi.cz/~devik/qos/htb/manual/theory.htm You see that since 1:30 is over limit, its request for bandwidth will get forwarded to 1:10. 1:32 request for bandwidth will get forwarded to 1:21. Regardless of priority, HTB always services the lowest node in the tree (as measured as the distance to the furtherest leaf) first. Priority is only considered when that distance is the same for the competing nodes. Since 1:21 is lower in the tree than 1:10, it will get serviced first, regardless of the fact that 1:30''s packets have a high priority. Thus the "priority inversion". By restructuring the tree like this: root 1:10 --------- 1:30 prio 0 / \ 1:31 prio 1 1:20 1:21 1:32 prio 2 / / \ 1:30 1:31 1:32 I had hoped to fix the problem by ensuring that 1:30 and 1:32 ancestors are at the same height in the tree. It didn''t work because I made the rate for 1:20 wrong. It should of been 100%. With that fix the priority inversion should go away. The revised class structure is now: htb class parent 1: classid 1:10 rate 80% ceil 100% htb class parent 1:10 classid 1:11 rate 100% ceil 100% htb class parent 1:11 classid 1:19 rate 30% ceil 100% prio 0 [VOIP leaf] htb class parent 1:10 classid 1:20 rate 70% ceil 100% htb class parent 1:20 classid 1:21 rate 20% ceil 100% prio 1 [interactive leaf] htb class parent 1:20 classid 1:22 rate 50% ceil 100% prio 2 [other leaf]> > htb class parent 1:10 classid 1:20 rate 70% ceil 80% > > htb class parent 1:20 classid 1:21 rate 20% ceil 80% prio 1 [interactive leaf] > > htb class parent 1:20 classid 1:22 rate 50% ceil 80% prio 2 [other leaf] > > > > This is the small class tree I can think of that does it. > > The only difference to the tree before is that you grouped all non-voip > traffic under a separate parent class which is capped at 80%, so voip > always has 20% of available bandwidth which can not be taken away from > it, right? If that was your intention, then it''s fine.Yes, that was my intention. As an example, if 1:22 (other) in the above hierarchy was wanting to use say 120%, and 1:19 (VOIP) was wanting to use 10%, then HTB would allocate the available bandwidth as follows: 1:19 (VOIP) - 10% 1:22 (other) - 70% It may be the "only difference", but it is subtle, and it surprised me to see that HTB could reserve bandwidth like this. It is a very flexible qdisc.
On Mon, Mar 06, 2006 at 10:53:13AM +1000, Russell Stuart wrote:> I can verify it doesn''t. I have implemented this in real > life, and the class is limited to the "rate".Thanks for pointing it out.> The revised class structure is now: > > htb class parent 1: classid 1:10 rate 80% ceil 100% > htb class parent 1:10 classid 1:11 rate 100% ceil 100% > htb class parent 1:11 classid 1:19 rate 30% ceil 100% prio 0 [VOIP leaf] > htb class parent 1:10 classid 1:20 rate 70% ceil 100% > htb class parent 1:20 classid 1:21 rate 20% ceil 100% prio 1 [interactive leaf] > htb class parent 1:20 classid 1:22 rate 50% ceil 100% prio 2 [other leaf]Interesting analysis, although it kind of defies my HTB logic (which is just an inaccurate model). If the 1:10 class is limited to the rate as you said above (which would be 80%), how can a child class have a rate of 100%? I still don''t understand what to make of a root class with different rate / ceil settings. It''s either limited to rate, or to ceil all the time; if it isn''t, it decides to jump over it''s rate under which circumstances? Regards Andreas Klauer
On Mon, 2006-03-06 at 02:19 +0100, Andreas Klauer wrote:> > The revised class structure is now: > > > > htb class parent 1: classid 1:10 rate 80% ceil 100% > > htb class parent 1:10 classid 1:11 rate 100% ceil 100% > > htb class parent 1:11 classid 1:19 rate 30% ceil 100% prio 0 [VOIP leaf] > > htb class parent 1:10 classid 1:20 rate 70% ceil 100% > > htb class parent 1:20 classid 1:21 rate 20% ceil 100% prio 1 [interactive leaf] > > htb class parent 1:20 classid 1:22 rate 50% ceil 100% prio 2 [other leaf] > > Interesting analysis, although it kind of defies my HTB logic > (which is just an inaccurate model). If the 1:10 class is > limited to the rate as you said above (which would be 80%), > how can a child class have a rate of 100%?I got the idea from the FAQ, section titled "What if sum of child rates is greater than parent rate?": http://luxik.cdi.cz/~devik/qos/htb/htbfaq.htm What he says happens in there does agree with the "Theory of Operation" document - but you would have to be smarter than I apparently am to infer it from just reading the "Theory of Operation". Anyway, after reading "Theory of Operation", you can restate how HTB works in fairly simple terms. Lets say you have a node with a assured rate (ie "rate") of X%, and its owns a packet that is queued to go. Then one of three things happen: a. If the current traffic flow through this node and its children is over the nodes ceiling, the packet is not sent. b. If the current traffic flow through this node is less than the nodes N% assured rate the packet is transmitted. This happens unconditionally - regardless of what the parents rates might be. Thus lets say you set up a hierarchy with rates like this: 100% / \ 60% 60% If both nodes are sending as fast as they can go, then they will use 120% of the link capacity, regardless of that fact that the parent is at 100%. This presumably would be an error. c. If the current traffic flow this node is over the N% assured rate, then this node will flatly refuse to send packet. Instead it passes it onto its parent to deal with. The parent now "owns" the packet, and thus goes through these exact same three steps. So if the parent is also over its assured rate, it will give the packet to its parent. This is how the packet works its way up the hierarchy. So the packet passes up the tree, from node to node, until it can find a node that will send it. As time goes on a packets ownership may go back down the tree, towards the leaf that generated it. This happens because as time passes, nodes will go back under their assured rate. If there are lots of packets in the queue, this means there could be several nodes in the tree all holding up their hands saying "I have a packet ready to send". HTB has to choose between them. There are three cases: a. Packets that are owned by nodes at the bottom of the tree get first priority. Thus the leaves get to send their packets first. Only if no leaves have packets to send does the next layer of nodes even get considered. Thus packets owned by the root have the lowest priority of all. b. Even after (a), we could have several nodes at the same level all with packets ready to send. HTB chooses between them based on the prio field. The nodes with the lowest prio field wins, and gets to send before everybody else. c. But .. there could be several nodes with the same prio, at the same level in the tree, all with packets to send. The nodes are then processed in a round robin fashion - ie HTB cycles around them, servicing each in turn. Somewhere in all this the quantum raises its ugly head. I avoided having to think about that by making all my quantum''s the same. The quantum effects how a parent node allocates bandwidth to the over-rate children that want to borrow from someone else. The parent allocates the excess bandwidth to the children according to their quantum. The higher a child quantum, the more of the excess bandwidth the parent allocates it. Thus if all children have the same quantum, they each get the same share of the excess bandwidth. If there where three over-rate children whose quantums were - A:3000 B:1500 C:1500, then A would get 50% of the excess bandwidth: 50% = 3000 / (3000+1500+1500) and B & C would get 25% each: 25% = 1500 / (3000+1500+1500) Notice the quantum only effects nodes that are trying to send over their assured rate, and thus have to borrow from their parent. If they are not borrowing they are not effected.> I still don''t understand what to make of a root class with > different rate / ceil settings. It''s either limited to rate, > or to ceil all the time; if it isn''t, it decides to jump > over it''s rate under which circumstances?A node will only send a packet if the collective flow of the node and its children is less than the nodes assured rate (ie the "rate" parameter). If the flow is above that figure the node will never send the packet, but its ancestors may. The root doesn''t have any ancestors, so the packet won''t be sent if the its assured rate is exceeded. Making the roots ceil equal to assured rate would make what is happening clearer, but doesn''t alter the behaviour in any way. As it happens the classes I actually use are a bit more complicated that I have shown here. The scripts that create them are automatically generated. The code that did the generating was easier to write when all ceil''s were 100% - which is why I presented it that way.