Thomas Heinz
2002-Feb-09 19:07 UTC
htb causes roaring penguin''s pppoe daemon to terminate connection
Hi I know, the subject sounds quite strange. Well, here is the situation. I''m using 2.4.17 with the htb patch and the latest h323 patch from the netfilter cvs. I''ve been using htb and h323 on 2.4.16 and never experienced any problems but with 2.4.17 the pppoe daemon from roaring penguin that I''m using sometimes terminates with the error message: pppoe[pid]: send (sendPacket): No buffer space available This situation arises when my upstream is full. When I flush the htb rules everything''s fine. Maybe someone experienced the same problem or can give me a hint what I should try to eliminate the problem. Thank you for your interest. Thomas Here is my htb script (based on the howto): #!/bin/bash DOWNLINK=765 UPLINK=125 DEV=ppp0 PHYS=eth1 # underlying physical device ifconfig $DEV txqueuelen 200 ifconfig $PHYS txqueuelen 2 tc qdisc del dev $DEV root > /dev/null 2>&1 tc qdisc del dev $DEV ingress > /dev/null 2>&1 # some shortcuts ADDQDISC="tc qdisc add dev $DEV" ADDCLASS="tc class add dev $DEV" ADDFILTER="tc filter add dev $DEV" ########## uplink ############# $ADDQDISC root handle 1: htb default 900 $ADDCLASS parent 1: classid 1:1 htb rate ${UPLINK}kbit burst 1540 $ADDCLASS parent 1:1 classid 1:10 htb rate 40kbit ceil 100kbit burst 1540 \ prio 1 $ADDCLASS parent 1:10 classid 1:100 htb rate 20kbit ceil 100kbit \ burst 1540 prio 1 $ADDCLASS parent 1:10 classid 1:200 htb rate 20kbit ceil 100kbit \ burst 1540 prio 2 $ADDCLASS parent 1:1 classid 1:30 htb rate 15kbit ceil ${UPLINK}kbit \ burst 1540 prio 2 $ADDCLASS parent 1:1 classid 1:40 htb rate 70kbit ceil ${UPLINK}kbit \ burst 1540 prio 3 $ADDCLASS parent 1:40 classid 1:800 htb rate 40kbit ceil ${UPLINK}kbit \ burst 1540 prio 1 $ADDCLASS parent 1:40 classid 1:900 htb rate 30kbit ceil ${UPLINK}kbit \ burst 1540 prio 2 # all get Stochastic Fairness except icmp/tos traffic $ADDQDISC parent 1:100 handle 100: sfq perturb 10 $ADDQDISC parent 1:200 handle 200: sfq perturb 10 $ADDQDISC parent 1:30 handle 30: pfifo limit 5 $ADDQDISC parent 1:800 handle 800: sfq perturb 10 $ADDQDISC parent 1:900 handle 900: sfq perturb 10 $ADDFILTER parent 1:0 protocol ip prio 1 handle 1 fw flowid 1:100 $ADDFILTER parent 1:0 protocol ip prio 2 handle 2 fw flowid 1:200 $ADDFILTER parent 1:0 protocol ip prio 3 u32 match ip tos 0x10 0xff flowid 1:30 $ADDFILTER parent 1:0 protocol ip prio 3 u32 match ip protocol 1 0xff \ flowid 1:30 $ADDFILTER parent 1:0 protocol ip prio 3 u32 \ match ip protocol 6 0xff \ match u8 0x05 0x0f at 0 \ match u16 0x0000 0xffc0 at 2 \ match u8 0x10 0xff at 33 \ flowid 1:30 $ADDFILTER parent 1:0 protocol ip prio 4 handle 3 fw flowid 1:800 ########## downlink ############# $ADDQDISC handle ffff: ingress $ADDFILTER parent ffff: protocol ip prio 50 u32 match ip src \ 0.0.0.0/0 police rate ${DOWNLINK}kbit burst 5k drop flowid :1
Martin Devera
2002-Feb-09 21:05 UTC
Re: htb causes roaring penguin''s pppoe daemon to terminate connection
> I''m using 2.4.17 with the htb patch and the latest h323 patch from the > netfilter cvs. I''ve been using htb and h323 on 2.4.16 and never experienced > any problems but with 2.4.17 the pppoe daemon from roaring penguin thatAre you using the same htb patch ? 2.4.17 has done rather big changes .. This is not the first incompatibility I''ve seen.> I''m using sometimes terminates with the error message: > pppoe[pid]: send (sendPacket): No buffer space available > > This situation arises when my upstream is full. When I flush the htb rules > everything''s fine.you could try cbq instead. But I think that the problem is that htb will accumulate too many packets and pppoe checks for it. I''ve had similar problems during htb testing from userspace.> Maybe someone experienced the same problem or can give me a hint what > I should try to eliminate the problem.use tc -s class show ethXX and tc -s qdisc and send me the results. I''m interested in backlog sizes. devik
Thomas Heinz
2002-Feb-09 22:24 UTC
Re: htb causes roaring penguin''s pppoe daemon to terminate connection
Hi Martin You wrote:> Are you using the same htb patch ? 2.4.17 has done rather big > changes .. This is not the first incompatibility I''ve seen.No, for 2.4.17 I''m using http://luxik.cdi.cz/~devik/qos/htb/v2/htb2_2.4.17.diff> you could try cbq instead.I like htb ;-)> But I think that the problem is that > htb will accumulate too many packets and pppoe checks for it. > I''ve had similar problems during htb testing from userspace.Do think it would make sense to reduce the transmit queue on ppp0 from 200 to ... let''s say 50 or 20?> use tc -s class show ethXX and tc -s qdisc and send me the results. I''m > interested in backlog sizes.Ok, here is what I got after a breakdown -- | | Thomas | V # tc -s class show dev ppp0 class htb 1:1 root prio 0 rate 125Kbit ceil 125Kbit burst 1539b cburst 1759b Sent 364823 bytes 2290 pkts (dropped 0, overlimits 0) rate 2099bps 8pps lended: 49 borrowed: 0 giants: 0 injects: 0 tokens: 70247 ctokens: 81511 class htb 1:10 parent 1:1 prio 1 rate 40Kbit ceil 100Kbit burst 1539b cburst 1727b Sent 0 bytes 0 pkts (dropped 0, overlimits 0) lended: 0 borrowed: 0 giants: 0 injects: 0 tokens: 246399 ctokens: 110592 class htb 1:100 parent 1:10 leaf 100: prio 1 rate 20Kbit ceil 100Kbit burst 1539b cburst 1727b Sent 0 bytes 0 pkts (dropped 0, overlimits 0) lended: 0 borrowed: 0 giants: 0 injects: 0 tokens: 492799 ctokens: 110592 class htb 1:200 parent 1:10 leaf 200: prio 2 rate 20Kbit ceil 100Kbit burst 1539b cburst 1727b Sent 0 bytes 0 pkts (dropped 0, overlimits 0) lended: 0 borrowed: 0 giants: 0 injects: 0 tokens: 492799 ctokens: 110592 class htb 1:30 parent 1:1 leaf 30: prio 2 rate 15Kbit ceil 125Kbit burst 1539b cburst 1759b Sent 302593 bytes 1927 pkts (dropped 0, overlimits 0) rate 1056bps 6pps lended: 1887 borrowed: 40 giants: 0 injects: 0 tokens: 499370 ctokens: 81511 class htb 1:40 parent 1:1 prio 3 rate 70Kbit ceil 125Kbit burst 1539b cburst 1759b Sent 62230 bytes 363 pkts (dropped 0, overlimits 0) rate 1052bps 1pps lended: 7 borrowed: 9 giants: 0 injects: 0 tokens: 115200 ctokens: 75776 class htb 1:800 parent 1:40 leaf 800: prio 1 rate 40Kbit ceil 125Kbit burst 1539b cburst 1759b Sent 0 bytes 0 pkts (dropped 0, overlimits 0) lended: 0 borrowed: 0 giants: 0 injects: 0 tokens: 246399 ctokens: 90112 class htb 1:900 parent 1:40 leaf 900: prio 2 rate 30Kbit ceil 125Kbit burst 1539b cburst 1759b Sent 62230 bytes 363 pkts (dropped 0, overlimits 0) rate 31bps lended: 347 borrowed: 16 giants: 0 injects: 0 tokens: 268800 ctokens: 75776 # tc -s qdisc qdisc ingress ffff: dev ppp0 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc sfq 900: dev ppp0 quantum 1492b perturb 10sec Sent 62230 bytes 363 pkts (dropped 0, overlimits 0) qdisc sfq 800: dev ppp0 quantum 1492b perturb 10sec Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc pfifo 30: dev ppp0 limit 5p Sent 303453 bytes 1932 pkts (dropped 0, overlimits 0) qdisc sfq 200: dev ppp0 quantum 1492b perturb 10sec Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc sfq 100: dev ppp0 quantum 1492b perturb 10sec Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc htb 1: dev ppp0 r2q 10 default 900 dcache 0 deq_util 1/1000000 deq_rate 10 trials_per_deq 0 dcache_hits 0 direct_packets 0 Sent 365683 bytes 2295 pkts (dropped 0, overlimits 50)
Martin Devera
2002-Feb-09 22:41 UTC
Re: htb causes roaring penguin''s pppoe daemon to terminate connection
> > Are you using the same htb patch ? 2.4.17 has done rather big > > changes .. This is not the first incompatibility I''ve seen. > > No, for 2.4.17 I''m using > http://luxik.cdi.cz/~devik/qos/htb/v2/htb2_2.4.17.diffAnd what one did you use for 2.4.16 ?> > you could try cbq instead. > > I like htb ;-):) I only need to determine whether the bug is related to change in htb, pppoe or kernel. If 2.4.16 worked then we need to minimize code size with the bug. Not it seems that you used at least different kernel and htb. Not sure about pppoe patch (or is it in kernel?). Try to use htb2_2.4.17.diff with 2.4.16 - it could say whether the bug is in htb (the diff should work with 2.4.16).> Do think it would make sense to reduce the transmit queue on ppp0 > from 200 to ... let''s say 50 or 20?probably not. The value is not used if you attached you own qdisc. It is only used as default size for default qdisc. What you might try to do is to use pfifo instead of SFQ and set pfifo''s limit to say 20 packets. It is not long term solution but could help up to find the problem. The results you sent shows no backlog - probably it is too late or the problem is elsewhere. I don''t know pppoe - it could be helpful to grep on source and look when the error message is printed. It could help us to see what pppoe sees as error. devik
Thomas Heinz
2002-Feb-10 14:47 UTC
Re: htb causes roaring penguin''s pppoe daemon to terminate connection
You wrote:> And what one did you use for 2.4.16 ?There was a patch for 2.4.16 on http://luxik.cdi.cz/~devik/qos/htb/ It is no longer there but I can send you a diff against the 2.4.17 patch.>>>you could try cbq instead. >> ..> :) I only need to determine whether the bug is related to > change in htb, pppoe or kernel. If 2.4.16 worked then we need > to minimize code size with the bug.OK, I''ll make some tests.> Try to use htb2_2.4.17.diff with 2.4.16 - it could say whether > the bug is in htb (the diff should work with 2.4.16).Meanwhile I''m not so sure whether the problem is caused by htb. I used the same htb script and made some tests using iperf. I sent data over 4 parallel connections (2 tcp, 2 udp), so that my upstream is full. This time pppoe did not die. The times before when pppoe died there was always a h323 connection involved - at least I think so. So maybe the problem is caused by the h323 patch I''m using.>>Do think it would make sense to reduce the transmit queue on ppp0 >>from 200 to ... let''s say 50 or 20? > > probably not. The value is not used if you attached you own > qdisc. It is only used as default size for default qdisc.Ah, interesting. So my ifconfig $SOME_IF txqueuelen $SOME_QLEN lines in the script can be thrown away. Thank you for your tips. I''ll analyze the situation in more detail and report the results. I''m currently a bit short of time but I should be able to find it out over the next few days. Thomas