Hi again, I keep posting about my problem with HTB -> http://mailman.ds9a.nl/pipermail/lartc/2005q3/016611.html With a bit of search I recently found the exact same problem I have in the 2004 archives with some graphs that explain it far better than I did -> http://mailman.ds9a.nl/pipermail/lartc/2004q4/014519.html and http://mailman.ds9a.nl/pipermail/lartc/2004q4/014568.html Unluckily there were no solution, well or I didn''t find it in the archives, so if anyone have a clue. I upgraded my box to the 2.6.12.2 kernel with the last iproute 2 but nothing change, I still have my shaping problem. Thanks. Gael. _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Gael Mauleon wrote:> Hi again, > > > > I keep posting about my problem with HTB -> > http://mailman.ds9a.nl/pipermail/lartc/2005q3/016611.htmlI had a go with what you posted there over lan and with 2 tcp streams it behaves as expected (see below for exact test). Can you reproduce the failiure shaping over a lan rather than your internet connection? If your upstream bandwidth is sold as 2meg then ceil 2000kbit is likely to be too high. You could also try specifying quantum = 1500 on all the leafs as you get it auto calculated from rates otherwise (you can see them with tc -s -d class ls ...). It didn''t affect my test though. If you are looking at htbs rate counters remember that they use a really long average (about 100 sec) so they can mislead. I tested below with two netperf tcp send tests to ips I added to another PC on my lan. # /usr/local/netperf/netperf -H 192.168.0.60 -f k -l 60 & /usr/local/netperf/netperf -f k -H 192.168.0.102 -l 60 & which gave - Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 43689 16384 16384 60.09 1884.66 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 43689 16384 16384 60.22 51.58 The script - QOSIN=eth0 tc qdisc del dev $QOSIN root &>/dev/null tc qdisc add dev $QOSIN root handle 1:0 htb tc class add dev $QOSIN parent 1:0 classid 1:1 htb rate 2000kbit ### SUBCLASS1 tc class add dev $QOSIN parent 1:1 classid 1:10 htb rate 750kbit ceil 2000kbit prio 1 tc class add dev $QOSIN parent 1:10 classid 1:101 htb rate 250kbit ceil 2000kbit prio 1 tc qdisc add dev $QOSIN parent 1:101 handle 101: pfifo limit 10 tc class add dev $QOSIN parent 1:10 classid 1:102 htb rate 250kbit ceil 2000kbit prio 1 tc qdisc add dev $QOSIN parent 1:102 handle 102: pfifo limit 10 tc class add dev $QOSIN parent 1:10 classid 1:103 htb rate 250kbit ceil 2000kbit prio 1 tc qdisc add dev $QOSIN parent 1:103 handle 103: pfifo limit 10 tc filter add dev $QOSIN parent 1:0 protocol ip u32 match ip dst 192.168.0.102 flowid 1:102 ###HIGH PRIO ### tc class add dev $QOSIN parent 1:1 classid 1:50 htb rate 50kbit ceil 2000kbit prio 0 quantum 1500 tc qdisc add dev $QOSIN parent 1:50 handle 50: pfifo limit 10 ### LOW PRIO ### tc class add dev $QOSIN parent 1:1 classid 1:60 htb rate 50kbit ceil 2000kbit prio 5 quantum 1500 tc qdisc add dev $QOSIN parent 1:60 handle 60: pfifo limit 10 tc filter add dev $QOSIN parent 1:0 protocol ip u32 match ip dst 192.168.0.60 flowid 1:60 Andy.
> I had a go with what you posted there over lan and with 2 tcp streams it > behaves as expected (see below for exact test). > > Can you reproduce the failiure shaping over a lan rather than your > internet connection? > > If your upstream bandwidth is sold as 2meg then ceil 2000kbit is likely > to be too high. > > You could also try specifying quantum = 1500 on all the leafs as you get > it auto calculated from rates otherwise (you can see them with tc -s -d > class ls ...). It didn''t affect my test though. > > If you are looking at htbs rate counters remember that they use a really > long average (about 100 sec) so they can mislead. > > I tested below with two netperf tcp send tests to ips I added to another > PC on my lan. > > # /usr/local/netperf/netperf -H 192.168.0.60 -f k -l 60 & > /usr/local/netperf/netperf -f k -H 192.168.0.102 -l 60 & > > which gave - > > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^3bits/sec > > 43689 16384 16384 60.09 1884.66 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^3bits/sec > > 43689 16384 16384 60.22 51.58Did the exact same test and it''s working (10kbits for the low prio was the only diff) !! That''s with pfifo -> TCP STREAM TEST to 10.0.1.228 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 87380 8192 8192 63.00 35.37 TCP STREAM TEST to 10.0.1.227 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 87380 8192 8192 60.00 1897.27 That''s with sfq -> TCP STREAM TEST to 10.0.1.227 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 87380 8192 8192 60.00 1918.02 TCP STREAM TEST to 10.0.1.228 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 87380 8192 8192 60.00 28.40 10.0.1.228 was the lowprio IP. So everything worked fine for the test...!!! After that I tested with my original script, I just changed the netfilter rules to classify packets with the same rules I used for the netperfs test and I had the same good results... So it must be something with my 2m line or with the traffic type I shape. I quite don''t understand the concept of putting the rate of the line lower than it''s true value, can you explain me this and do the excess bandwith is lost ? What is a good value for a 2m line (SDSL) ? I''ll try tomorrow to have an host outside with netperf so I can test the line itself. Again thanks for your time and help Andy.
Ok I tested the shaping on the SDSL line with netperf and an host outside. Same script than before, I classify the packets into qdisc based on the source address in netfilter and here are the result, that''s with sfq. I''m positive on the right traffic going to the right class. TCP STREAM TEST to 81.57.243.113 (NORMAL) Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 87380 8192 8192 60.00 282.90 TCP STREAM TEST to 81.57.243.113 (LOWPRIO) Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^3bits/sec 87380 8192 8192 61.00 390.00 In fact the rate are just share sometimes it''s the LOWPRIO that has more Sometimes it''s the NORMAL traffic... As a note, there were other traffic on the line at the same moment I didn''t classify but I was expecting the ratio to be kept (LOWPRIO is prio 5, 50kbits, NORMAL is prio 1 200kbits) Well I need to investigate more, I don''t understand why it don''t work on the SDSL line... My installation is quite simple : Cisco Router <-> Linux Box (QOS) <-> LAN And I was testing egress traffic from LAN to internet, but it''s the same for ingress. Do this can appear if the ratio I put are bigger than the actual line ?? And again what is a good ratio/ceil for a so sold 2mbits SDSL line ?? Gaël.
Just tested with 1800kbits as the rate/ceil of the line, with adjustment to all the rate to match the total rate, but I have the same result, bandwith seems to be shared just like if there were no qos in place... I''ll do a full round trip of tests today, it must be hidden somewhere :)> -----Message d''origine----- > De : lartc-bounces@mailman.ds9a.nl [mailto:lartc-bounces@mailman.ds9a.nl] > De la part de Gael Mauleon > Envoyé : mercredi 13 juillet 2005 12:26 > À : lartc@mailman.ds9a.nl > Objet : RE: [LARTC] HTB Rate and Prio (continued) > > Ok I tested the shaping on the SDSL line with netperf and > an host outside. > > Same script than before, I classify the packets into qdisc based on the > source address in netfilter and here are the result, that''s with sfq. > I''m positive on the right traffic going to the right class. > > > TCP STREAM TEST to 81.57.243.113 (NORMAL) > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^3bits/sec > > 87380 8192 8192 60.00 282.90 > > > > TCP STREAM TEST to 81.57.243.113 (LOWPRIO) > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^3bits/sec > > 87380 8192 8192 61.00 390.00 > > > In fact the rate are just share sometimes it''s the LOWPRIO that has more > Sometimes it''s the NORMAL traffic... > > As a note, there were other traffic on the line at the same moment I > didn''t > classify but I was expecting the ratio to be kept (LOWPRIO is prio 5, > 50kbits, NORMAL is prio 1 200kbits) > > Well I need to investigate more, I don''t understand why it don''t work on > the > SDSL line... > > My installation is quite simple : > > Cisco Router <-> Linux Box (QOS) <-> LAN > > And I was testing egress traffic from LAN to internet, but it''s the same > for > ingress. > > > Do this can appear if the ratio I put are bigger than the actual line ?? > And again what is a good ratio/ceil for a so sold 2mbits SDSL line ?? > > > Gaël. > > > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Gael Mauleon wrote:> Ok I tested the shaping on the SDSL line with netperf and > an host outside. > > Same script than before, I classify the packets into qdisc based on the > source address in netfilter and here are the result, that''s with sfq. > I''m positive on the right traffic going to the right class.> > > TCP STREAM TEST to 81.57.243.113 (NORMAL) > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^3bits/sec > > 87380 8192 8192 60.00 282.90 > > > > TCP STREAM TEST to 81.57.243.113 (LOWPRIO) > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^3bits/sec > > 87380 8192 8192 61.00 390.00 Hmm I can''t really think why this is happening, is it the same box that you did the lan test from? I think what I would do as the next test is turn off window scaling - echo 0 > /proc/sys/net/ipv4/tcp_window_scaling add 70k bfifos to the classes - you shouldn''t drop any packets then. Repeat the test and tcpdump it. If you see packet loss then this could be the explanation. Andy.
Andy Furniss wrote:> add 70k bfifos to the classes - you shouldn''t drop any packets then.Maybe 100k just to be safe 70k may be a bit close once you take into account the headers. Andy.
Gael Mauleon wrote:> I quite don''t understand the concept of putting the rate of the line lower > than it''s true value, can you explain me this and do the excess bandwith is > lost ? What is a good value for a 2m line (SDSL) ?It depends on what rate you are really synced at and what extra overheads/encapsulation your sdsl line has. It may be a bit different for sdsl - I only know adsl, but as an example, for me, an empty ack which htb will see as 40 bytes (ignoring timestamps/sacks) will actually use 2 atm cells = 106 bytes on my line a 1500 byte ip packet will use 32 cells = 1696 bytes. Which means that if I tweaked by experementing with my rate using bulk traffic and arrived at some figure which seemed OK things could still go overlimits when the traffic consists of alot of small packets. There are patches to work things out per packet and a thesis giving various overheads at http://www.adsl-optimizer.dk/ but for now just back off - 1800 is probably OK for netperf tests. I am talking about egress here - if you are going to shape ingress aswell you need to back off even more as you are trying to shape from the wrong end of the bottleneck (which can''t be done perfectly anyway) to build up queues to shape with. Andy.
> It depends on what rate you are really synced at and what extra > overheads/encapsulation your sdsl line has. > > It may be a bit different for sdsl - I only know adsl, but as an > example, for me, an empty ack which htb will see as 40 bytes (ignoring > timestamps/sacks) will actually use 2 atm cells = 106 bytes on my line a > 1500 byte ip packet will use 32 cells = 1696 bytes. > > Which means that if I tweaked by experementing with my rate using bulk > traffic and arrived at some figure which seemed OK things could still go > overlimits when the traffic consists of alot of small packets. > > There are patches to work things out per packet and a thesis giving > various overheads at > > http://www.adsl-optimizer.dk/ > > but for now just back off - 1800 is probably OK for netperf tests. > > I am talking about egress here - if you are going to shape ingress > aswell you need to back off even more as you are trying to shape from > the wrong end of the bottleneck (which can''t be done perfectly anyway) > to build up queues to shape with. > > Andy.Thanks for all this tips, before I try all you said, I was going to test my line at 500kbits rate/ceil just to put one variable out of the equation, and to my surprise the shaping is working good with 500kbits, I did more tests an dit is working good up to something between 1000 and 1200 kbits, at 1000 it is working, at 1200 the shaping is gone and the problem is back... What I don''t understand here, is the fact I can go up to 1800 or higher with iptraf and some netperfs tests... So I''m really lost now, what can I do ?? I can''t only shape 1000 and loose 800 kbits, I really need some advice here on what can be done, I''m going mad :) Thanks Gael.
Actually doing some more tests with all traffic classified can reach 1700 kbits as rate/ceil, at this rate I must put the prio to have some good results. Doing more tests, I didn''t know HTB was so sensitive to the max rate/ceil... I''ll post later on.