Jonathan Lynch
2005-Jul-27 10:28 UTC
HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Im using a Linux machine with standard pc hardware with 3 seperate PCI network interfaces to operate as a DiffServ core router using Linux traffic control. The machine is a P4 2.8ghz, 512mb RAM running fedora core 3 with the 2.6.12.3 kernel. All links and network interfaces are full duplex fast ethernet. IP forwarding is enabled in the kernel. All hosts on the network have their time sychronised using a stratum 1 server on the same VLAN. Below is a ascii diagram of the network. (network A) edge router ------>core router---->edge router (network C) ^ | | edge router (network B) Core Router Configuration: --------------------------- The core router implements the expedited forwarding PHB. I have tried 2 Different Configurations. 1. HTB qdisc with two htb classes. One which services VoIP traffic (marked with EF codepoint) VoIP traffic is guaranteed to serviced at a minimum rate of 1500 kbit. This htb class is serviced by a fifo queue with a limit of 5 packets. The 2nd htb class guarantees all other traffic to serviced at a minimum rate of 5mbit. The RED qdisc services this htb class. 2. PRIO qdisc with token a bucket filter to service VoIP traffic (marked with EF codepoint) VoIP traffic with a guaranteed minimum rate of 1500 kbit. A RED qdisc to service all other traffic. Test 1. --------------------------- VoIP traffic originates from network A and is destined to network C. The throughput of VoIP traffic is 350 kbit. No other traffic passes through the core router during this time. These Voip packets are marked with the EF codepoint. Using either of the above mentioned configurations for the core router, the delay of the VoIP traffic in travelling from network A to network C passing through the core router is 0.25 milliseconds. Test 2. --------------------------- Again VoIP traffic originates from network A and is destined to netwotk C with a throughput of 350 kbit. TCP traffic also originates from another host in network A and is destined for another host in network C. More TCP traffic originates from network B and is destined to network C. This TCP traffic is from transfering large files through http. As a result a bottleneck is created at the outgoing interface of the core router to network C. The combined TCP traffic from these sources is nearly 100 mbit. Using either of the above mentioned configurations for the core router, the delay of the VoIP traffic in travelling from network A to network C passing through the core router is 30ms milliseconds with 0% loss. There is a considerable amount of TCP packets dropped. Could anyone tell me why the delay is so high (30ms) for VoIP packets which are treated with the EF phb when the outgoing interface of core router to network c is saturated ? Is it due to operating system factors ? Has anyone else had similar experiences ? Also I would appreciate if anyone could give me performace metrics as to approximately how many packets per second a router running Linux with standard pc hardware can forward. Or even mention any factors that would affect this performance. Im assume the system interrupt frequncy HZ will affect performance in some way. Jonathan Lynch Note: I already posted the same question to the list a few weeks back but got no reply. I have reworded my question so it is clearer. ----------------------------------------------------------------------------------------------- The config I used for each setup is included below. These are slight modifications that are supplied with iproute2 source code. Config 1 using htb ------------------- tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 Main htb qdisc & class tc qdisc add dev $1 parent 1:0 handle 2:0 htb tc class add dev $1 parent 2:0 classid 2:1 htb rate 100Mbit ceil 100Mbit EF Class (2:10) tc class add dev $1 parent 2:1 classid 2:10 htb rate 1500Kbit ceil 100Mbit tc qdisc add dev $1 parent 2:10 pfifo limit 5 tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:10 pass_on BE Class (2:20) tc class add dev $1 parent 2:1 classid 2:20 htb rate 5Mbit ceil 100Mbit tc qdisc add dev $1 parent 2:20 red limit 60KB min 15KB max 45KB burst 20 avpkt 1000 bandwidth 100Mbit probability 0.4 tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask 0 classid 2:20 pass_on Config 2 using PRIO ------------------- Main dsmark & classifier tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 Main prio queue tc qdisc add dev $1 parent 1:0 handle 2:0 prio tc qdisc add dev $1 parent 2:1 tbf rate 1.5Mbit burst 1.5kB limit 1.6kB tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on BE class(2:2) tc qdisc add dev $1 parent 2:2 red limit 60KB min 15KB max 45KB burst 20 avpkt 1000 bandwidth 100Mbit probability 0.4 tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask 0 classid 2:2 pass_on
Andy Furniss
2005-Jul-27 13:25 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Jonathan Lynch wrote:> Could anyone tell me why the delay is so high (30ms) for VoIP packets > which are treated with the EF phb when the outgoing interface of core > router to network c is saturated ? >I have never used dsmark so am not sure about the classification parts of your rules. You need to check where the packets are going with with tc -s qdisc ls dev ... The other parts have some issues see below.> ----------------------------------------------------------------------------------------------- > The config I used for each setup is included below. These are slight > modifications that are supplied with iproute2 source code. > > Config 1 using htb > ------------------- > tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index > tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc > shift 2flowid/classid here maybe, to get packets to 2:0, though it may work - check.> > Main htb qdisc & class > tc qdisc add dev $1 parent 1:0 handle 2:0 htb > tc class add dev $1 parent 2:0 classid 2:1 htb rate 100Mbit ceil 100Mbit100mbit will be too high if it''s 100mbit nic.> > EF Class (2:10) > tc class add dev $1 parent 2:1 classid 2:10 htb rate 1500Kbit ceil > 100Mbit > tc qdisc add dev $1 parent 2:10 pfifo limit 5 > tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex > classid 2:10 pass_onDon''t know what pass_on will mean here.> > BE Class (2:20) > tc class add dev $1 parent 2:1 classid 2:20 htb rate 5Mbit ceil 100Mbit > tc qdisc add dev $1 parent 2:20 red limit 60KB min 15KB max 45KB burst > 20 avpkt 1000 bandwidth 100Mbit probability 0.4 > tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask > 0 classid 2:20 pass_on > > Config 2 using PRIO > ------------------- > Main dsmark & classifier > tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index > tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc > shift 2 > > Main prio queue > tc qdisc add dev $1 parent 1:0 handle 2:0 prio > tc qdisc add dev $1 parent 2:1 tbf rate 1.5Mbit burst 1.5kB limit 1.6kBWon''t hurt if the packets are small voip but TBF has a nasty habit of taking 1 from the burst/mtu you specify so your burst setting may result in packets >1499B getting dropped - tc -s -d qdisc ls dev ... should show what it''s using.> tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex > classid 2:1 pass_on > > BE class(2:2) > tc qdisc add dev $1 parent 2:2 red limit 60KB min 15KB max 45KB burst 20 > avpkt 1000 bandwidth 100Mbit probability 0.4 > tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask > 0 classid 2:2 pass_onWithout wrapping it with something like htb red won''t shape traffic. Andy.
Jonathan Lynch
2005-Jul-27 15:37 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Andy, Many thanks for your reply. Below is some output from the queueing disciplines to show that the filters are working correctly and they are going to the right classes. NOTE: The root qdisc of each interface is deleted before I run the tests. This resets the statistics for the qdisc. The following is the output after the tests. Output of tc -s qdisc show on the core route for the 3 network interfaces. qdisc dsmark 1: dev eth0 indices 0x0040 set_tc_index Sent 2183574289 bytes 1496372 pkts (dropped 60982, overlimits 0 requeues 0) qdisc htb 2: dev eth0 parent 1: r2q 10 default 0 direct_packets_stat 22 Sent 2183574289 bytes 1496372 pkts (dropped 60982, overlimits 140759 requeues 0) qdisc pfifo 8007: dev eth0 parent 2:10 limit 5p Sent 7265998 bytes 51169 pkts (dropped 0, overlimits 0 requeues 0) qdisc red 8008: dev eth0 parent 2:20 limit 60Kb min 15Kb max 45Kb Sent 2176307367 bytes 1445181 pkts (dropped 60982, overlimits 60982 requeues 0) marked 0 early 60982 pdrop 0 other 0 qdisc pfifo 8009: dev eth1 limit 1000p Sent 33334496 bytes 477176 pkts (dropped 0, overlimits 0 requeues 0) qdisc pfifo 800a: dev eth2 limit 1000p Sent 40637134 bytes 585931 pkts (dropped 0, overlimits 0 requeues 0) Again here is the ASCII diagram (network A) --> (eth1) core router (eth0) --> (network C) (eth2) ^ | | (network B)>From network A to C (from 2 pcs used for the purpose of trafficgeneration) TCP traffic - pc 1 Sent 994762580 bytes 658704 pkts (dropped 0, overlimits 0 requeues 0) VoIP traffic - pc 2 Sent 7286487 bytes 51298 pkts (dropped 0, overlimits 0 requeues 0)>From network B to CTCP traffic Sent 1271745729 bytes 841217 pkts (dropped 27, overlimits 0 requeues 0) So total amount of packets transmitted to incoming interface on the core router is 658704 + 51298 + (841217 - 27) = 1,551,192 packets. The total sent by the dsmark and htb qdisc on the core router is 1,496,372 packets and 60,982 are dropped. The total received is 1,557,354. There is also some more traffic received from other nodes in network A, but this is minimal and also traffic from the core router itself. This should account for the difference. VoIP traffic sent from a machine in network A = 51298 packets. It is practically the same as the number of packets that pass through the pfifo 51169 which is attached to class 2:1 TCP traffic that should be passing through class 2:10 which is the BE class is 658,704 packets (TCP) from network A and 841,217 packets from B which equals which totals 1,499,921 traffic sent from the BE class is 1,445,181 + 60,982 packets which were dropped. So 1,506,163 packets were received by the BE class 2:10 The traffic sent from the output interface of eth1 and eth2 is mainly acks back to network A and network B respectively.>100mbit will be too high if it''s 100mbit nic.What value would you recommend to set as the ceil for a 100 mbit NIC ??.>Don''t know what pass_on will mean here.pass_on means if no class id equal to the result of the filter is found then try next filter, which is the BE class in this case. So back to the main question, could anyone tell me why the delay is so high (30ms) for VoIP packets which are treated with the EF phb when the outgoing interface of core router to network c is saturated ? Jonathan> Won''t hurt if the packets are small voip but TBF has a nasty habit of > taking 1 from the burst/mtu you specify so your burst setting may result > in packets >1499B getting dropped - tc -s -d qdisc ls dev ... should > show what it''s using.> Without wrapping it with something like htb red won''t shape traffic.I am not to concerned about the PRIO + TBF setup. My priority is with the htb setup but I will look into this and see if I notice that. On Wed, 2005-07-27 at 14:25 +0100, Andy Furniss wrote:> Jonathan Lynch wrote: > > > Could anyone tell me why the delay is so high (30ms) for VoIP packets > > which are treated with the EF phb when the outgoing interface of core > > router to network c is saturated ? > > > > I have never used dsmark so am not sure about the classification parts > of your rules. You need to check where the packets are going with with > tc -s qdisc ls dev ... > > The other parts have some issues see below. > > > ----------------------------------------------------------------------------------------------- > > The config I used for each setup is included below. These are slight > > modifications that are supplied with iproute2 source code. > > > > Config 1 using htb > > ------------------- > > tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index > > tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc > > shift 2 > > flowid/classid here maybe, to get packets to 2:0, though it may work - > check. > > > > > Main htb qdisc & class > > tc qdisc add dev $1 parent 1:0 handle 2:0 htb > > tc class add dev $1 parent 2:0 classid 2:1 htb rate 100Mbit ceil 100Mbit > > 100mbit will be too high if it''s 100mbit nic. > > > > > EF Class (2:10) > > tc class add dev $1 parent 2:1 classid 2:10 htb rate 1500Kbit ceil > > 100Mbit > > tc qdisc add dev $1 parent 2:10 pfifo limit 5 > > tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex > > classid 2:10 pass_on > > Don''t know what pass_on will mean here. > > > > > BE Class (2:20) > > tc class add dev $1 parent 2:1 classid 2:20 htb rate 5Mbit ceil 100Mbit > > tc qdisc add dev $1 parent 2:20 red limit 60KB min 15KB max 45KB burst > > 20 avpkt 1000 bandwidth 100Mbit probability 0.4 > > tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask > > 0 classid 2:20 pass_on > > > > Config 2 using PRIO > > ------------------- > > Main dsmark & classifier > > tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index > > tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc > > shift 2 > > > > Main prio queue > > tc qdisc add dev $1 parent 1:0 handle 2:0 prio > > tc qdisc add dev $1 parent 2:1 tbf rate 1.5Mbit burst 1.5kB limit 1.6kB > > Won''t hurt if the packets are small voip but TBF has a nasty habit of > taking 1 from the burst/mtu you specify so your burst setting may result > in packets >1499B getting dropped - tc -s -d qdisc ls dev ... should > show what it''s using. > > > tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex > > classid 2:1 pass_on > > > > BE class(2:2) > > tc qdisc add dev $1 parent 2:2 red limit 60KB min 15KB max 45KB burst 20 > > avpkt 1000 bandwidth 100Mbit probability 0.4 > > tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask > > 0 classid 2:2 pass_on > > Without wrapping it with something like htb red won''t shape traffic. > > Andy.
Andy Furniss
2005-Jul-27 21:53 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Jonathan Lynch wrote:> Andy, Many thanks for your reply. Below is some output from the queueing > disciplines to show that the filters are working correctly and they are > going to the right classes.OK classification looks good then.> > pass_on means if no class id equal to the result of the filter is found > then try next filter, which is the BE class in this case.Ahh I''ll have to play with this dsmark stuff one day :-)> > So back to the main question, could anyone tell me why the delay is so > high (30ms) for VoIP packets which are treated with the EF phb when the > outgoing interface of core router to network c is saturated ?I would test next with htb setup like (assuming you are HZ=1000 - you will be under rate if not) - ... tc class add dev $1 parent 2:0 classid 2:1 htb rate 90Mbit ceil 90Mbit quantum 1500 burst 12k cburst 12k tc class add dev $1 parent 2:1 classid 2:10 htb rate 1500kbit ceil 90Mbit quantum 1500 burst 12k cburst 12k ... tc class add dev $1 parent 2:1 classid 2:20 htb rate 5Mbit ceil 90Mbit quantum 1500 burst 12k cburst 12k ... If that doesn''t make things any better then you could try giving the 2:10 class a rate alot higher than it needs and see if that helps. Andy.
Jonathan Lynch
2005-Jul-28 16:37 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Andy, thanks again for your help. Yes, HZ is still 1000 in 2.6.12. I tried your suggestions are here are the results. ASCII diagram (network A) --> (eth1) core router (eth0) --> (network C) (eth2) ^ | | (network B) Looking at the following graphics http://140.203.56.30/~jlynch/htb/core_router.png http://140.203.56.30/~jlynch/htb/voip_stream_23691.png voip_stream_23691.png is a graph of the delay of the voice stream travelling from network A to network C in test 2. Notice from the core router graph that there is only voip traffic passing through the core router until time 07:55 and the delay in voip stream is 0.25 ms until then. After this time tcp traffic is introduced saturating the outgoing interface of the core router (eth0). The delay increases to a maximum of 2.75 ms , which is a considerable improvement on 30ms when I was using the ceil value of 100mbit. But there is a lot of jitter. With the ceil at 90Mbit, the outgoing bit rate of eth0 has gone from 98mbit to approx 90Mbit as can be seen from the core router graph for eth0 bytes out. Note that with the tcp traffic is all http downloads, so most Ethernet frames will be of maximum size, 1518 bytes, so 98mbits is the maximum throughput possible on a 100mbit card, taking into account the overheads of ethernet such as the interframe gap, preamble and start frame delimiter. Im not sure how to configure some of the htb parameters. The following is my understanding of them and a few questions I have as well. How exactly does the HZ value have a bearing on the ceil value ? How can I calculate a maximum for the ceil value ? 12kb is the minimum burst size for a 100 mbit NIC with a timer resolution of 1ms (1000hz) and tc calculates the smallest possible burst when it is not specified, right ?. cburst is the number of bytes that can be burst as fast as the interface can transmit them. It is smaller than burst can is ideally one packet size, right ? quantum determines the ratio at which the classes share their parents bandwidth. Each class is given quantum number of bytes before serving the next class, right ? Is there any way I can limit the jitter of the VoIP traffic passing through the htb class ? Jonathan On Wed, 2005-07-27 at 22:53 +0100, Andy Furniss wrote:> Jonathan Lynch wrote: > > Andy, Many thanks for your reply. Below is some output from the queueing > > disciplines to show that the filters are working correctly and they are > > going to the right classes. > > OK classification looks good then. > > > > > > pass_on means if no class id equal to the result of the filter is found > > then try next filter, which is the BE class in this case. > > Ahh I''ll have to play with this dsmark stuff one day :-) > > > > > So back to the main question, could anyone tell me why the delay is so > > high (30ms) for VoIP packets which are treated with the EF phb when the > > outgoing interface of core router to network c is saturated ? > > I would test next with htb setup like (assuming you are HZ=1000 - you > will be under rate if not) - > > ... > > tc class add dev $1 parent 2:0 classid 2:1 htb rate 90Mbit ceil 90Mbit > quantum 1500 burst 12k cburst 12k > > tc class add dev $1 parent 2:1 classid 2:10 htb rate 1500kbit ceil > 90Mbit quantum 1500 burst 12k cburst 12k > ... > > tc class add dev $1 parent 2:1 classid 2:20 htb rate 5Mbit ceil 90Mbit > quantum 1500 burst 12k cburst 12k > > ... > > > If that doesn''t make things any better then you could try giving the > 2:10 class a rate alot higher than it needs and see if that helps. > > > Andy.
Andy Furniss
2005-Jul-28 21:49 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Jonathan Lynch wrote:> Andy, thanks again for your help. Yes, HZ is still 1000 in 2.6.12. I > tried your suggestions are here are the results. > > ASCII diagram > > (network A) --> (eth1) core router (eth0) --> (network C) > (eth2) > ^ > | > | > (network B) > > Looking at the following graphics > > http://140.203.56.30/~jlynch/htb/core_router.png > http://140.203.56.30/~jlynch/htb/voip_stream_23691.png > > voip_stream_23691.png is a graph of the delay of the voice stream > travelling from network A to network C in test 2. Notice from the core > router graph that there is only voip traffic passing through the core > router until time 07:55 and the delay in voip stream is 0.25 ms until > then. After this time tcp traffic is introduced saturating the outgoing > interface of the core router (eth0). The delay increases to a maximum of > 2.75 ms , which is a considerable improvement on 30ms when I was using > the ceil value of 100mbit. But there is a lot of jitter.I suppose you could hope for a bit less jitter 12k burst is about 1ms at 100mbit. There is a tweak you can do for htb which may help - in net/sched/sch_htb.c there is a #define HYSTERESIS 1 - changing it to 0 and recompiling kernel/the module makes things more accurate.> > > With the ceil at 90Mbit, the outgoing bit rate of eth0 has gone from > 98mbit to approx 90Mbit as can be seen from the core router graph for > eth0 bytes out. Note that with the tcp traffic is all http downloads, so > most Ethernet frames will be of maximum size, 1518 bytes, so 98mbits is > the maximum throughput possible on a 100mbit card, taking into account > the overheads of ethernet such as the interframe gap, preamble and start > frame delimiter. > > Im not sure how to configure some of the htb parameters. The following > is my understanding of them and a few questions I have as well. > > How exactly does the HZ value have a bearing on the ceil value ? How can > I calculate a maximum for the ceil value ?It''s more to do with burst/cburst than ceil.> > 12kb is the minimum burst size for a 100 mbit NIC with a timer > resolution of 1ms (1000hz) and tc calculates the smallest possible burst > when it is not specified, right ?.It seems not, I think hysteresis may be involved again here (but then one of my tcs is hacked about a bit). You can see what htb is using as defaults by doing tc -s -d class ls .. If I do that on similar kernels one with hysteresis 0 and one with 1 I see quite different values. I chose 12k as big enough for the 90mbit test 12000*8*1000=96mbit at ip level and it seemed like a nice multiple of 1500mtu :-)> > cburst is the number of bytes that can be burst as fast as the interface > can transmit them. It is smaller than burst can is ideally one packet > size, right ?Ideally 1 packet but not achievable with htb at lan speed and hz 1000, also AIUI the way htb does drr means with mixed packet sizes things aren''t packet perfect even at low rates. Saying that I use htb at low rates and can apparently get packet perfect with my traffic mix. I think hfsc can do it perfectly on both counts.> > quantum determines the ratio at which the classes share their parents > bandwidth. Each class is given quantum number of bytes before serving > the next class, right ?Yea setting 1500 probably makes no difference for this test.> > Is there any way I can limit the jitter of the VoIP traffic passing > through the htb class ?Try the hysteresis and/or setting the rate for interactive way higher than it''s traffic rate. I did a quick test to see how things were for me at 100mbit. Because my other pcs are slow I needed to use two as receivers for netperf. I noticed something I didn''t expect with red or the settings you use - one of the pcs is slower and has less memory thus smaller tcp socket size. Using 4 streams two to each unshaped they get about the same, though with txqueuelen = 1000 there are no drops (with window scalng off there is a difference). With red and wscale on, the red really favoured the fast pc - I tried a 40k bfifo so that I got drops, expecting to see the same, but it was still far more even than the red. I couldn''t really simulate the voip traffic in theory I should be able to use ping with -i < 1 sec, but using the latest inetutils you get a flood ping if you do that. I reported this about 18 months ago and it''s supposedly fixed in the cvs (though I don''t know if fixed means it just says invalid argument rather than actually does what''s asked, because I have failed to build it so far). So if anyone reading this has a i386 ping that -i 0.5 works on, please mail me the binary :-) Andy.
Jonathan Lynch
2005-Aug-02 20:59 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
I did the same tests that I outlined earlier, but this time by setting hysteresis to 0. The config for the core router is included at the bottom. The graphs for the delay of the voip stream and the traffic going through the core router can be found at the following addresses. http://140.203.56.30/~jlynch/htb/core_router_hysteresis.png http://140.203.56.30/~jlynch/htb/voip_stream_24761_hysteresis.png The max delay of the stream has dropped to 1.8ms. Again the jitter seems to be around 1ms. There seems to be a pattern going whereby the delay reaches about 1.6ms then drops back to 0.4 ms, jumps back to 1.6ms and then back to 0.4ms repeatedly and then it rises from 0.5ms gradually and repeats this behaviour. Is there any explanation to this pattern ? Would it have anything go to do with burst being 1ms ? When the ceil is specified as being 90mbit, is this at IP level ? What does this correspond to when a Mbit = 1,000,000 bits. Im a bit confused with the way tc interprets this rate. If the ceil is based at IP level then the max ceil is going to be a value between 54 Mbit and 97 Mbit (not the tc values) for a 100 Mbit interface depending on the size of the packets passing through, right ? Minimum Ethernet frame 148,809 * (46 * 8) = 148,809 * 368 = 54,761,712 Mbps Maximum Ethernet frame 8,127 * (1500 * 8) = 8,127 * 12,000 = 97,524,000 Mbps About the red settings, I dont understand properly how to configure the settings. I was using the configuration that came with the examples. Jonathan On Thu, 2005-07-28 at 22:49 +0100, Andy Furniss wrote:> Jonathan Lynch wrote: > > Andy, thanks again for your help. Yes, HZ is still 1000 in 2.6.12. I > > tried your suggestions are here are the results. > > > > ASCII diagram > > > > (network A) --> (eth1) core router (eth0) --> (network C) > > (eth2) > > ^ > > | > > | > > (network B) > > > > Looking at the following graphics > > > > http://140.203.56.30/~jlynch/htb/core_router.png > > http://140.203.56.30/~jlynch/htb/voip_stream_23691.png > > > > voip_stream_23691.png is a graph of the delay of the voice stream > > travelling from network A to network C in test 2. Notice from the core > > router graph that there is only voip traffic passing through the core > > router until time 07:55 and the delay in voip stream is 0.25 ms until > > then. After this time tcp traffic is introduced saturating the outgoing > > interface of the core router (eth0). The delay increases to a maximum of > > 2.75 ms , which is a considerable improvement on 30ms when I was using > > the ceil value of 100mbit. But there is a lot of jitter. > > I suppose you could hope for a bit less jitter 12k burst is about 1ms at > 100mbit. > > There is a tweak you can do for htb which may help - in > net/sched/sch_htb.c there is a #define HYSTERESIS 1 - changing it to 0 > and recompiling kernel/the module makes things more accurate. > > > > > > > With the ceil at 90Mbit, the outgoing bit rate of eth0 has gone from > > 98mbit to approx 90Mbit as can be seen from the core router graph for > > eth0 bytes out. Note that with the tcp traffic is all http downloads, so > > most Ethernet frames will be of maximum size, 1518 bytes, so 98mbits is > > the maximum throughput possible on a 100mbit card, taking into account > > the overheads of ethernet such as the interframe gap, preamble and start > > frame delimiter. > > > > Im not sure how to configure some of the htb parameters. The following > > is my understanding of them and a few questions I have as well. > > > > How exactly does the HZ value have a bearing on the ceil value ? How can > > I calculate a maximum for the ceil value ? > > It''s more to do with burst/cburst than ceil. > > > > > 12kb is the minimum burst size for a 100 mbit NIC with a timer > > resolution of 1ms (1000hz) and tc calculates the smallest possible burst > > when it is not specified, right ?. > > It seems not, I think hysteresis may be involved again here (but then > one of my tcs is hacked about a bit). > > You can see what htb is using as defaults by doing tc -s -d class ls .. > > If I do that on similar kernels one with hysteresis 0 and one with 1 I > see quite different values. > > I chose 12k as big enough for the 90mbit test 12000*8*1000=96mbit at ip > level and it seemed like a nice multiple of 1500mtu :-) > > > > > > cburst is the number of bytes that can be burst as fast as the interface > > can transmit them. It is smaller than burst can is ideally one packet > > size, right ? > > Ideally 1 packet but not achievable with htb at lan speed and hz 1000, > also AIUI the way htb does drr means with mixed packet sizes things > aren''t packet perfect even at low rates. > > Saying that I use htb at low rates and can apparently get packet perfect > with my traffic mix. > > I think hfsc can do it perfectly on both counts. > > > > > quantum determines the ratio at which the classes share their parents > > bandwidth. Each class is given quantum number of bytes before serving > > the next class, right ? > > Yea setting 1500 probably makes no difference for this test. > > > > > Is there any way I can limit the jitter of the VoIP traffic passing > > through the htb class ? > > Try the hysteresis and/or setting the rate for interactive way higher > than it''s traffic rate. > > I did a quick test to see how things were for me at 100mbit. Because my > other pcs are slow I needed to use two as receivers for netperf. > > I noticed something I didn''t expect with red or the settings you use - > one of the pcs is slower and has less memory thus smaller tcp socket > size. Using 4 streams two to each unshaped they get about the same, > though with txqueuelen = 1000 there are no drops (with window scalng off > there is a difference). With red and wscale on, the red really favoured > the fast pc - I tried a 40k bfifo so that I got drops, expecting to see > the same, but it was still far more even than the red. > > I couldn''t really simulate the voip traffic in theory I should be able > to use ping with -i < 1 sec, but using the latest inetutils you get a > flood ping if you do that. I reported this about 18 months ago and it''s > supposedly fixed in the cvs (though I don''t know if fixed means it just > says invalid argument rather than actually does what''s asked, because I > have failed to build it so far). > > So if anyone reading this has a i386 ping that -i 0.5 works on, please > mail me the binary :-) > > Andy.Main dsmark & classifier tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 Main htb qdisc & class tc qdisc add dev $1 parent 1:0 handle 2:0 htb tc class add dev $1 parent 2:0 classid 2:1 htb rate 90Mbit ceil 90Mbit burst 12k cburst 12k EF Class (2:10) tc class add dev $1 parent 2:1 classid 2:10 htb rate 5Mbit ceil 90Mbit burst 12k cburst 12k prio 1 tc qdisc add dev $1 parent 2:10 pfifo limit 5 tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:10 pass_on BE Class (2:20) tc class add dev $1 parent 2:1 classid 2:20 htb rate 10Mbit ceil 90Mbit burst 12k cburst 12k prio 2 tc qdisc add dev $1 parent 2:20 red limit 60KB min 15KB max 45KB burst 20 avpkt 1000 bandwidth 100Mbit probability 0.4 tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask 0 classid 2:20 pass_on
Andy Furniss
2005-Aug-03 14:04 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Jonathan Lynch wrote:> I did the same tests that I outlined earlier, but this time by setting > hysteresis to 0. The config for the core router is included at the > bottom. The graphs for the delay of the voip stream and the traffic > going through the core router can be found at the following addresses. > > http://140.203.56.30/~jlynch/htb/core_router_hysteresis.png > http://140.203.56.30/~jlynch/htb/voip_stream_24761_hysteresis.png > > > The max delay of the stream has dropped to 1.8ms. Again the jitter seems > to be around 1ms. There seems to be a pattern going whereby the delay > reaches about 1.6ms then drops back to 0.4 ms, jumps back to 1.6ms and > then back to 0.4ms repeatedly and then it rises from 0.5ms gradually and > repeats this behaviour. Is there any explanation to this pattern ? > > Would it have anything go to do with burst being 1ms ?Yes I suppose if you could sample truly randomly you would get a proper distribution - I guess the pattern arises because your timers are synchronised for the test.> > When the ceil is specified as being 90mbit, is this at IP level ? > What does this correspond to when a Mbit = 1,000,000 bits. Im a bit > confused with the way tc interprets this rate.Yes htb uses ip level length (but you can specify overhead & min size) , the rate calculations use a lookup table which is likely to have a granularity of 8 bytes (you can see this with tc -s -d class ls .. look for /8 after the burst/cburst). There is a choice in 2.6 configs about using CPU/jiffies/gettimeofday - I use CPU and now I''ve got a ping that does < 1 sec I get the same results as you.> > If the ceil is based at IP level then the max ceil is going to be a > value between 54 Mbit and 97 Mbit (not the tc values) for a 100 Mbit > interface depending on the size of the packets passing through, right ? > > Minimum Ethernet frame > 148,809 * (46 * 8) = 148,809 * 368 = 54,761,712 Mbps > > Maximum Ethernet frame > 8,127 * (1500 * 8) = 8,127 * 12,000 = 97,524,000 MbpsIf you use the overhead option I think you will be to overcome this limitation and push the rates closer to 100mbit.> About the red settings, I dont understand properly how to configure the > settings. I was using the configuration that came with the examples.I don''t use red it was just something I noticed - maybe making it longer would help, maybe my test wasn''t rerpresentative. FWIW I had a play around with HFSC (not that I know what I am doing really) and at 92mbit managed to get - rtt min/avg/max/mdev = 0.330/0.414/0.493/0.051 ms loaded from rtt min/avg/max/mdev = 0.114/0.133/0.187/0.028 ms idle and that was through a really cheap switch. Andy.
Andy Furniss
2005-Aug-03 19:32 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Andy Furniss wrote:> Jonathan Lynch wrote:>> If the ceil is based at IP level then the max ceil is going to be a >> value between 54 Mbit and 97 Mbit (not the tc values) for a 100 Mbit >> interface depending on the size of the packets passing through, right ? >> >> Minimum Ethernet frame >> 148,809 * (46 * 8) = 148,809 * 368 = 54,761,712 Mbps >> >> Maximum Ethernet frame >> 8,127 * (1500 * 8) = 8,127 * 12,000 = 97,524,000 Mbps > > > If you use the overhead option I think you will be to overcome this > limitation and push the rates closer to 100mbit.I looked up ethernet overheads and found the figure of 38 bytes per frame, the 46 is min eth payload size? and looking at the way mpu is handled by the tc rate table generator I think you would need to use 46 + 38 as mpu. So on every htb line that has a rate put ..... overhead 38 mpu 84 I haven''t checked those figures or tested close to limits though, the 12k burst would need increasing a bit aswell or that will slightly over limit rate at HZ=1000. Andy.
Andy Furniss
2005-Aug-04 18:06 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Andy Furniss wrote:> I haven''t checked those figures or tested close to limits though, the > 12k burst would need increasing a bit aswell or that will slightly over > limit rate at HZ=1000.It seems that htb still uses ip level for burst so 12k is enough. With the overhead at 38 I can ceil at 99mbit OK. Andy.
Jonathan Lynch
2005-Aug-10 18:37 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Andy, thanks for all the feedback. I was away on holidays for the last week and am only back today. I have a few more questions which are listed below. On Wed, 2005-08-03 at 15:04 +0100, Andy Furniss wrote:> Jonathan Lynch wrote: > > I did the same tests that I outlined earlier, but this time by setting > > hysteresis to 0. The config for the core router is included at the > > bottom. The graphs for the delay of the voip stream and the traffic > > going through the core router can be found at the following addresses. > > > > http://140.203.56.30/~jlynch/htb/core_router_hysteresis.png > > http://140.203.56.30/~jlynch/htb/voip_stream_24761_hysteresis.png > > > > > > The max delay of the stream has dropped to 1.8ms. Again the jitter seems > > to be around 1ms. There seems to be a pattern going whereby the delay > > reaches about 1.6ms then drops back to 0.4 ms, jumps back to 1.6ms and > > then back to 0.4ms repeatedly and then it rises from 0.5ms gradually and > > repeats this behaviour. Is there any explanation to this pattern ? > > > > Would it have anything go to do with burst being 1ms ? > > Yes I suppose if you could sample truly randomly you would get a proper > distribution - I guess the pattern arises because your timers are > synchronised for the test.I dont understand what you mean when you say "if you could sample truly randomly you would get a proper distribution". Also having the timers synchronized will allow for more accurate measurements of the delay. I cant see how this would have an impact on the pattern.> > > > When the ceil is specified as being 90mbit, is this at IP level ? > > What does this correspond to when a Mbit = 1,000,000 bits. Im a bit > > confused with the way tc interprets this rate. > > Yes htb uses ip level length (but you can specify overhead & min size) , > the rate calculations use a lookup table which is likely to have a > granularity of 8 bytes (you can see this with tc -s -d class ls .. look > for /8 after the burst/cburst). > > There is a choice in 2.6 configs about using CPU/jiffies/gettimeofday - > I use CPU and now I''ve got a ping that does < 1 sec I get the same > results as you. >I have the default setting which is to set it to jiffies. There is a comment in the kernal config for Packet scheduler clock source that mentions that Jiffies "its resolution is too low for accurate shaping except at very low speed". I will recompile the kernel and try the CPU option tomorrow to see if there is any change.> > > > If the ceil is based at IP level then the max ceil is going to be a > > value between 54 Mbit and 97 Mbit (not the tc values) for a 100 Mbit > > interface depending on the size of the packets passing through, right ? > > > > Minimum Ethernet frame > > 148,809 * (46 * 8) = 148,809 * 368 = 54,761,712 Mbps > > > > Maximum Ethernet frame > > 8,127 * (1500 * 8) = 8,127 * 12,000 = 97,524,000 Mbps > > If you use the overhead option I think you will be to overcome this > limitation and push the rates closer to 100mbit. > > > > About the red settings, I dont understand properly how to configure the > > settings. I was using the configuration that came with the examples. > > I don''t use red it was just something I noticed - maybe making it longer > would help, maybe my test wasn''t rerpresentative. > > FWIW I had a play around with HFSC (not that I know what I am doing > really) and at 92mbit managed to get - > > rtt min/avg/max/mdev = 0.330/0.414/0.493/0.051 ms loaded > from > rtt min/avg/max/mdev = 0.114/0.133/0.187/0.028 ms idle > > and that was through a really cheap switch. > > Andy.> looked up ethernet overheads and found the figure of 38 bytes per > frame, the 46 is min eth payload size? and looking at the way mpu is > handled by the tc rate table generator I think you would need to use > 46 > + 38 as mpu. > > So on every htb line that has a rate put ..... overhead 38 mpu 84 > > I haven''t checked those figures or tested close to limits though, the > 12k burst would need increasing a bit aswell or that will slightly > over > limit rate at HZ=1000. > > > > > I haven''t checked those figures or tested close to limits though, the > 12k burst would need increasing a bit aswell or that will slightly over > limit rate at HZ=1000. > > It seems that htb still uses ip level for burst so 12k is enough. > > With the overhead at 38 I can ceil at 99mbit OK.I didnt realise such options existed for htb (mpu + overhead). These parameters are not mentioned in the man pages or in the htb manual. I presume I have to patch tc to get these features ?. Yep 46 is the minimum eth payload size and 38 is the min overhead for ethernet frames. interframe gap 96bits 12 bytes +preamble 56bits 7 bytes +sfd 8bits 1 byte +eth header 14 bytes +crc 4 bytes --------- 38 bytes overhead per ethernet frame. Jonathan
Andy Furniss
2005-Aug-11 16:36 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Jonathan Lynch wrote:> > I dont understand what you mean when you say "if you could sample truly > randomly you would get a proper distribution". > > Also having the timers synchronized will allow for more accurate > measurements of the delay. I cant see how this would have an impact on > the pattern.I mean it''s possibly just to do with the test if a 0ms - 1ms delay is expected then you could see patterns arising depending on how you measure delay/clock drift or something. Now I have two pings that do intervels < 1 sec - the inetutils GNU ping guys implemented it for me :-), and I also have the iputils one I can simulate a stream better. While doing this I noticed that iputils ping actually gives lower latency readings when sending many pps. Using tcpdump deltas I can see the network latency is the same however many pps I do - it''s just that when measuring <1ms delays and doing many pps it seems that some code gets cached (guess) and the reported delay changes as a result. I mention that just to illustrate that measuring small delays can be misleading and influenced by the exact nature of your setup.> > I have the default setting which is to set it to jiffies. There is a > comment in the kernal config for Packet scheduler clock source that > mentions that Jiffies "its resolution is too low for accurate shaping > except at very low speed". I will recompile the kernel and try the CPU > option tomorrow to see if there is any change.Maybe not in the case of htb - I use CPU and see similar results, the comment about accurate shaping was probably written when HZ=100, but I suppose it will be better for something :-)> > I didnt realise such options existed for htb (mpu + overhead). These parameters are not mentioned in the man pages or in the htb manual. > I presume I have to patch tc to get these features ?.There is mention on the htb page - it was added as a patch so was not designed in, which explains why burst doesn''t use it. You don''t need to patch recent iproute2 it''s already in there. Andy.
Jonathan Lynch
2005-Aug-20 20:51 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
I did a number of tests and there doesn''t appear to be any noticeable differences between using CPU and JIFFIES (HZ=1000) as packet scheduler source. I didn''t mention that the outgoing interface on the core router has 2 ip addresses. One is vlan tagged for the test network im running and the other is not tagged, so the vast majority of outgoing traffic will be tagged on the outgoing interface. Question 1. When sending a packet on a tagged interface, the 4 byte vlan header is added between the source address and the ethertype field. As far as I know the tag is added and removed in the device driver and so it does not show up when using ethereal or tcpdump. The driver in use is the e1000 which supports VLANS. As a result the minimum pkt size for tagged packets is 68bytes and maximum is 1522 bytes. When using MPU and overhead in htb, does that mean that the overhead should be increased to 42 (38 + 4 for vlan header) and MPU = 88 (42 + 46). There are both tagged and untagged packets frames on this interface. I have run tests using MPU of 84 and overhead of 38 and mpu 88 and overhead of 42 and increasing the ceil to 90, 95, 97 and 99Mbit. There are spikes in the delay and the higher I increase the ceil, the higher the spikes are. I have graphs of the tests I done. The filename of the graph should explain the settings that were used in the test. Hysteresis is set to 0, CPU as the packet scheduler clock source, mpu and overhead ( 38 and 84 respectively) or vlan mpu and overhead (42 and 88 respectively) and then the ceiling at either 90,95 or 99 mbit. 21957_hysteresis_CPU_mpu_overhead_95_ceil.png 22762_hysteresis_CPU_vlan_mpu_overhead_90_ceil.png 22875_hysteresis_CPU_vlan_mpu_overhead_99_ceil.png 24135_hysteresis_CPU_mpu_overhead_90_ceil.png 24143_hysteresis_CPU_vlan_mpu_overhead_95_ceil.png 24262_hysteresis_CPU_mpu_overhead_99_ceil.png They can be found at http://www.compsoc.nuigalway.ie/~jlynch/htb/ MPU and overhead seems to be used mainly in places where the size of frames are fixed. Does it make a difference using it with Ethernet where the frame size is variable ? When you said you could ceil at 99mbit ok did you look at the max delay ? Did u notice spikes like what are in my graphs ? Do you have any idea what could be causing these spikes ?? Other observations: I was using values from /proc/net/dev to measure the throughput going through the core router and I noticed that different network drivers increment the packet counters differently. (ie for e1000 for a packet with max Ethernet payload of 1500 bytes byte counter incremented by 1518 bytes which includes eth header and trailer. 3c59x on the other hand increments by 1514 bytes which does not include the eth trailer. For vlan tagged packets e1000 increments byte counter by 1522 bytes, 3c59x increments by 1518 bytes. Have you come across this before ? Also according to the source of tc, specifically in tc_util.c it refers to this page http://physics.nist.gov/cuu/Units/binary.html as to how rates are specified in tc. So i presume tc uses this to specify rate, ceil etc. This doesnt seem to be mentioned anywhere. static const struct rate_suffix { const char *name; double scale; } suffixes[] = { { "bit", 1. }, { "Kibit", 1024. }, { "kbit", 1000. }, { "mibit", 1024.*1024. }, { "mbit", 1000000. }, { "gibit", 1024.*1024.*1024. }, { "gbit", 1000000000. }, { "tibit", 1024.*1024.*1024.*1024. }, { "tbit", 1000000000000. }, { "Bps", 8. }, { "KiBps", 8.*1024. }, { "KBps", 8000. }, { "MiBps", 8.*1024*1024. }, { "MBps", 8000000. }, { "GiBps", 8.*1024.*1024.*1024. }, { "GBps", 8000000000. }, { "TiBps", 8.*1024.*1024.*1024.*1024. }, { "TBps", 8000000000000. }, { NULL } }; Jonathan On Thu, 2005-08-11 at 17:36 +0100, Andy Furniss wrote:> Jonathan Lynch wrote: > > > > > I dont understand what you mean when you say "if you could sample truly > > randomly you would get a proper distribution". > > > > Also having the timers synchronized will allow for more accurate > > measurements of the delay. I cant see how this would have an impact on > > the pattern. > > I mean it''s possibly just to do with the test if a 0ms - 1ms delay is > expected then you could see patterns arising depending on how you > measure delay/clock drift or something. > > Now I have two pings that do intervels < 1 sec - the inetutils GNU ping > guys implemented it for me :-), and I also have the iputils one I can > simulate a stream better. > > While doing this I noticed that iputils ping actually gives lower > latency readings when sending many pps. Using tcpdump deltas I can see > the network latency is the same however many pps I do - it''s just that > when measuring <1ms delays and doing many pps it seems that some code > gets cached (guess) and the reported delay changes as a result. > > I mention that just to illustrate that measuring small delays can be > misleading and influenced by the exact nature of your setup. > > > > > > > I have the default setting which is to set it to jiffies. There is a > > comment in the kernal config for Packet scheduler clock source that > > mentions that Jiffies "its resolution is too low for accurate shaping > > except at very low speed". I will recompile the kernel and try the CPU > > option tomorrow to see if there is any change. > > Maybe not in the case of htb - I use CPU and see similar results, the > comment about accurate shaping was probably written when HZ=100, but I > suppose it will be better for something :-) > > > > > > > I didnt realise such options existed for htb (mpu + overhead). These parameters are not mentioned in the man pages or in the htb manual. > > I presume I have to patch tc to get these features ?. > > > There is mention on the htb page - it was added as a patch so was not > designed in, which explains why burst doesn''t use it. > > You don''t need to patch recent iproute2 it''s already in there. > > Andy.
Andy Furniss
2005-Oct-19 10:59 UTC
Re: HTB and PRIO qdiscs introducing extra latency when output interface is saturated
Andy Furniss wrote:> Andy Furniss wrote: > >> I haven''t checked those figures or tested close to limits though, the >> 12k burst would need increasing a bit aswell or that will slightly >> over limit rate at HZ=1000. > > > It seems that htb still uses ip level for burst so 12k is enough. > > With the overhead at 38 I can ceil at 99mbit OK. >Jonathan spotted that on eth skb->len is ip len + 14 so overhead should be 24 not 38. Andy.