Hi Couple months ago I started to have a strange problem with HTB. My setup is Fedora Core 2 + Pentium 2 233 + 128 MB of ram and its serving as a router. For some time since going to kernel 2.6 my HTB QoS Stalls for couple seconds, every couple minutes. If the connection load is bigger the stalling is more frequent and takes longer. I isolated the problem to be with HTB (CBQ works fine). The script I use now worked for a year without any problems, but since fedora changed to 2.6 the problems started. The problem is with the stock kernel and with a self-compiled one. -- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
> Me too. I've think the problem started after I use kernel 2.6.8.1<http://2.6.8.1>. > If > I good remember I doesn't have this problem with 2.6.7. The new thing in > 2.6.8.1 <http://2.6.8.1> was the QoS clock source (or similar). I use here > CPU cycle > counter because I have a fast uplink (1 Gbps). I think maybe this can be > the source of the problem.I have used gettimeofday, but it didn't help. Or the HTB has a bug from 2.6.8.1 <http://2.6.8.1>. If anybody has idea please write to here. Arpad Kunszt>-- Miłego Dnia / Have a nice day Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
> HiHi!> > Couple months ago I started to have a strange problem with HTB. > My setup is Fedora Core 2 + Pentium 2 233 + 128 MB of ram and its > serving as a router. > For some time since going to kernel 2.6 my HTB QoS Stalls for couple > seconds, every couple minutes. If the connection load is bigger the > stalling is more frequent and takes longer.I experienced this too. I use 2.6.11 with qnet patch. I''ve a bigger machine with lots of users.> I isolated the problem to be with HTB (CBQ works fine).I use HTB too. I can''t use CBQ with some reasons so I cant test it. My HTB setup stands about 2500 classes and a total of approx 15000 tc objects.> The script I use now worked for a year without any problems, but since > fedora changed to 2.6 the problems started.Me too. I''ve think the problem started after I use kernel 2.6.8.1. If I good remember I doesn''t have this problem with 2.6.7. The new thing in 2.6.8.1 was the QoS clock source (or similar). I use here CPU cycle counter because I have a fast uplink (1 Gbps). I think maybe this can be the source of the problem. Or the HTB has a bug from 2.6.8.1. If anybody has idea please write to here. Arpad Kunszt PS: Sorry for my terribly bad english :-(
Krystian Antoni wrote:> Hi > > Couple months ago I started to have a strange problem with HTB. > My setup is Fedora Core 2 + Pentium 2 233 + 128 MB of ram and its serving as > a router. > For some time since going to kernel 2.6 my HTB QoS Stalls for couple > seconds, every couple minutes. If the connection load is bigger the stalling > is more frequent and takes longer. > I isolated the problem to be with HTB (CBQ works fine). > The script I use now worked for a year without any problems, but since > fedora changed to 2.6 the problems started. > The problem is with the stock kernel and with a self-compiled one.What sort of load, number of rules/classes and bandwidth do you have? Andy.
Connection is 256kbit/2mbit, but the down interface is used for SMB also so its shaped from 100mbits. The interfaces are both ethernet. Number of rules in Iptables is around 300, and I have a new kernel (from kernel org) and new iptables. Number of qdisc/class per interface is 7-8. Leaf classes are SFQ. I tryed: - running only one part of QOS (down or up) - puting only minimal set of rules in iptables - getting kernels from fedora/kernel.org - reinstalling fedora. - changing clock from standart to gettimeofday - changed leaves from SFQ to PRIO - changed speeds of QOS. - played with my outside interface from HALF to FULL duplex and reverse way. and still I had this stalls on the interface which has HTB QoS setup. Load of my internet connection is around 10%. Propably you're thinking that I set up my connections parameters too high in QOS, but that is not the case (I think ;-) since I did a little bit of testing and found out the setting which alowed me to keep LAN pings through my internet connection even under heavy load. CBQ works without any problems. Computer is 233 MMX with 128 of RAM (36 MB used for system), NIC's are 3com's 905B and 905C. I dont remember if I tryed changing them :-) And I dont really now what is that I screw up :-) Wonna see my qos scripts? :D On 4/19/05, Andy Furniss <andy.furniss@dsl.pipex.com> wrote:> > Krystian Antoni wrote: > > Hi > > > > Couple months ago I started to have a strange problem with HTB. > > My setup is Fedora Core 2 + Pentium 2 233 + 128 MB of ram and its > serving as > > a router. > > For some time since going to kernel 2.6 my HTB QoS Stalls for couple > > seconds, every couple minutes. If the connection load is bigger the > stalling > > is more frequent and takes longer. > > I isolated the problem to be with HTB (CBQ works fine). > > The script I use now worked for a year without any problems, but since > > fedora changed to 2.6 the problems started. > > The problem is with the stock kernel and with a self-compiled one. > > What sort of load, number of rules/classes and bandwidth do you have? > > Andy. >-- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Krystian Antoni wrote:> Connection is 256kbit/2mbit, but the down interface is used for SMB also so > its shaped from 100mbits. The interfaces are both ethernet. > Number of rules in Iptables is around 300, and I have a new kernel (from > kernel org) and new iptables. > Number of qdisc/class per interface is 7-8. Leaf classes are SFQ. > I tryed: > - running only one part of QOS (down or up) > - puting only minimal set of rules in iptables > - getting kernels from fedora/kernel.org > - reinstalling fedora. > - changing clock from standart to gettimeofday > - changed leaves from SFQ to PRIO > - changed speeds of QOS. > - played with my outside interface from HALF to FULL duplex and reverse way. > and still I had this stalls on the interface which has HTB QoS setup. > Load of my internet connection is around 10%. Propably you''re thinking that > I set up my connections parameters too high in QOS, but that is not the case > (I think ;-) since I did a little bit of testing and found out the setting > which alowed me to keep LAN pings through my internet connection even under > heavy load. > CBQ works without any problems. > Computer is 233 MMX with 128 of RAM (36 MB used for system), NIC''s are > 3com''s 905B and 905C. I dont remember if I tryed changing them :-) > And I dont really now what is that I screw up :-) Wonna see my qos scripts? > :DI use a P200 for a gateway but it''s only got one eth card, so I can''t really recreate your setup properly - I assume the SMB traffic is forwarded and not local traffic. I can''t even do 100Mbit when actually running anything on mine with a rtl8139 - I run out of CPU - doesn''t seem to affect internet shaping/throughput. I''ll have a go at shaping on eth and see if I can recreate it when I have time. When you say stalls, what happens eg just no throughput or dropped packets or just high latency - is swap being used etc. Andy.
Hi all iam also facing the same problem what Mr Antoni have even i have done many kind of experment, but i could not resolve is this bug in FC3, but when i does the FC1 its working fine I found difference from FC1 to FC3 is FC1 iptables 1.2.8 HTB 3.12 FC3 iptable 1.2.11 htb 3.16 can some experts coment on this hare ----- Original Message ----- From: Krystian Antoni To: Andy Furniss Cc: lartc@mailman.ds9a.nl Sent: Wednesday, April 20, 2005 11:43 AM Subject: Re: [LARTC] HTB stalling Connection is 256kbit/2mbit, but the down interface is used for SMB also so its shaped from 100mbits. The interfaces are both ethernet. Number of rules in Iptables is around 300, and I have a new kernel (from kernel org) and new iptables. Number of qdisc/class per interface is 7-8. Leaf classes are SFQ. I tryed: - running only one part of QOS (down or up) - puting only minimal set of rules in iptables - getting kernels from fedora/kernel.org - reinstalling fedora. - changing clock from standart to gettimeofday - changed leaves from SFQ to PRIO - changed speeds of QOS. - played with my outside interface from HALF to FULL duplex and reverse way. and still I had this stalls on the interface which has HTB QoS setup. Load of my internet connection is around 10%. Propably you''re thinking that I set up my connections parameters too high in QOS, but that is not the case (I think ;-) since I did a little bit of testing and found out the setting which alowed me to keep LAN pings through my internet connection even under heavy load. CBQ works without any problems. Computer is 233 MMX with 128 of RAM (36 MB used for system), NIC''s are 3com''s 905B and 905C. I dont remember if I tryed changing them :-) And I dont really now what is that I screw up :-) Wonna see my qos scripts? :D On 4/19/05, Andy Furniss <andy.furniss@dsl.pipex.com> wrote: Krystian Antoni wrote: > Hi > > Couple months ago I started to have a strange problem with HTB. > My setup is Fedora Core 2 + Pentium 2 233 + 128 MB of ram and its serving as > a router. > For some time since going to kernel 2.6 my HTB QoS Stalls for couple > seconds, every couple minutes. If the connection load is bigger the stalling > is more frequent and takes longer. > I isolated the problem to be with HTB (CBQ works fine). > The script I use now worked for a year without any problems, but since > fedora changed to 2.6 the problems started. > The problem is with the stock kernel and with a self-compiled one. What sort of load, number of rules/classes and bandwidth do you have? Andy. -- Miłego Dnia Krystian Antoni ------------------------------------------------------------------------------ _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
> Hi allHi!> > iam also facing the same problem what Mr Antoni have > even i have done many kind of experment, but i could not resolve is > this bug in FC3, > but when i does the FC1 its working fine > > I found difference from FC1 to FC3 is > > FC1 iptables 1.2.8 HTB 3.12 > FC3 iptable 1.2.11 htb 3.16I experience the problem with iptables 1.3.1 patched with qnet patch. Are you using a 2.6 series kernel with FC1? IMHO in FC1 the default kernel was 2.4. I think the problem is in the kernel somewhere. Bye, Arpad Kunszt _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
> hareram wrote: > > Hi all > > iam also facing the same problem what Mr Antoni have > even i have done many kind of experment, but i could not resolve is > this bug in FC3, > but when i does the FC1 its working fine > > I found difference from FC1 to FC3 is > > FC1 iptables 1.2.8 HTB 3.12 > FC3 iptable 1.2.11 htb 3.16 > > can some experts coment on this > hareAlthough this stalling issue may still be a timer problem rather than HTB, there are bugs in versions of HTB < 3.17 so you MUST patch the kernel so HTB is at least 3.17. I don''t know about iptables. Do remember that FC3 contains much BETA; you really should be using something less "bleeding edge" for a server. -- gypsy
The problem started with arrival of 2.6 kernels. On 4/22/05, Kunszt Arpad <arpad.kunszt@syrius-software.hu> wrote:> > > > Hi all > Hi! > > > > iam also facing the same problem what Mr Antoni have > > even i have done many kind of experment, but i could not resolve is > > this bug in FC3, > > but when i does the FC1 its working fine > > > > I found difference from FC1 to FC3 is > > > > FC1 iptables 1.2.8 HTB 3.12 > > FC3 iptable 1.2.11 htb 3.16 > I experience the problem with iptables 1.3.1 patched with qnet patch. > Are you using a 2.6 series kernel with FC1? IMHO in FC1 the default > kernel was 2.4. > I think the problem is in the kernel somewhere. > Bye, > > Arpad Kunszt > > > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc > > > >-- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
How to check version of HTB?? I have standart one which came with kernel 2.6.11.7 <http://2.6.11.7>, my tc says I have "TC HTB version 3.3". Is this my HTB version? :-) Im not using FC3 kernel but a vanilla one so there is no much beta in it :) besides I cut most of the stuff out just for experimentation and still no luck :) On 4/22/05, gypsy <gypsy@iswest.com> wrote:> > > hareram wrote: > > > > Hi all > > > > iam also facing the same problem what Mr Antoni have > > even i have done many kind of experment, but i could not resolve is > > this bug in FC3, > > but when i does the FC1 its working fine > > > > I found difference from FC1 to FC3 is > > > > FC1 iptables 1.2.8 HTB 3.12 > > FC3 iptable 1.2.11 htb 3.16 > > > > can some experts coment on this > > hare > > Although this stalling issue may still be a timer problem rather than > HTB, there are bugs in versions of HTB < 3.17 so you MUST patch the > kernel so HTB is at least 3.17. I don't know about iptables. > > Do remember that FC3 contains much BETA; you really should be using > something less "bleeding edge" for a server. > -- > gypsy > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc >-- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Krystian Antoni wrote:> > How to check version of HTB?? > I have standart one which came with kernel 2.6.11.7, my tc says I have > "TC HTB version 3.3". Is this my HTB version? :-)No, it isn''t. 3.3 is the TC version. The only ways I know to find out the HTB version are: 1) if it loads as a module, it will record in your logs its version. 2) read the source. ''grep HTB_VER /usr/src/linux/net/sched/sch_htb.c'' should return something like 0x30011 That "11" at the end is hex, so in decimal it = 17, making the example above 3.17> Im not using FC3 kernel but a vanilla one so there is no much beta in > it :) besides I cut most of the stuff out just for experimentation and > still no luck :)But it is still a 2.6 kernel, not a 2.4 kernel? In 2.4.x, linux/include/net/pkt_sched.h specifies the timer. (I can''t help you with 2.6. Until it appears as standard in Slackware current, it is not stable enough for my production servers.). -- gypsy> On 4/22/05, gypsy <gypsy@iswest.com> wrote: > > > hareram wrote: > > > > Hi all > > > > iam also facing the same problem what Mr Antoni have > > even i have done many kind of experment, but i could not > resolve is > > this bug in FC3, > > but when i does the FC1 its working fine > > > > I found difference from FC1 to FC3 is > > > > FC1 iptables 1.2.8 HTB 3.12 > > FC3 iptable 1.2.11 htb 3.16 > > > > can some experts coment on this > > hare > > Although this stalling issue may still be a timer problem > rather than > HTB, there are bugs in versions of HTB < 3.17 so you MUST > patch the > kernel so HTB is at least 3.17. I don''t know about > iptables. > > Do remember that FC3 contains much BETA; you really should > be using > something less "bleeding edge" for a server. > -- > gypsy > _______________________________________________ > LARTC mailing list > LARTC@mailman.ds9a.nl > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc > > -- > Mi³ego Dnia > Krystian Antoni
so i checked using your way: "HTB init, kernel part version 3.17". On 4/23/05, gypsy <gypsy@iswest.com> wrote:> > Krystian Antoni wrote: > > > > How to check version of HTB?? > > I have standart one which came with kernel 2.6.11.7 <http://2.6.11.7>, > my tc says I have > > "TC HTB version 3.3". Is this my HTB version? :-) > > No, it isn't. 3.3 is the TC version. > > The only ways I know to find out the HTB version are: > 1) if it loads as a module, it will record in your logs its version. > 2) read the source. 'grep HTB_VER /usr/src/linux/net/sched/sch_htb.c' > should return something like > 0x30011 > That "11" at the end is hex, so in decimal it = 17, making the example > above 3.17 > > > Im not using FC3 kernel but a vanilla one so there is no much beta in > > it :) besides I cut most of the stuff out just for experimentation and > > still no luck :) > > But it is still a 2.6 kernel, not a 2.4 kernel? In 2.4.x, > linux/include/net/pkt_sched.h specifies the timer. (I can't help you > with 2.6. Until it appears as standard in Slackware current, it is not > stable enough for my production servers.). > -- > gypsy > > > On 4/22/05, gypsy <gypsy@iswest.com> wrote: > > > > > hareram wrote: > > > > > > Hi all > > > > > > iam also facing the same problem what Mr Antoni have > > > even i have done many kind of experment, but i could not > > resolve is > > > this bug in FC3, > > > but when i does the FC1 its working fine > > > > > > I found difference from FC1 to FC3 is > > > > > > FC1 iptables 1.2.8 HTB 3.12 > > > FC3 iptable 1.2.11 htb 3.16 > > > > > > can some experts coment on this > > > hare > > > > Although this stalling issue may still be a timer problem > > rather than > > HTB, there are bugs in versions of HTB < 3.17 so you MUST > > patch the > > kernel so HTB is at least 3.17. I don't know about > > iptables. > > > > Do remember that FC3 contains much BETA; you really should > > be using > > something less "bleeding edge" for a server. > > -- > > gypsy > > _______________________________________________ > > LARTC mailing list > > LARTC@mailman.ds9a.nl > > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc > > > > -- > > Mi³ego Dnia > > Krystian Antoni >-- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Andy Furniss wrote:> I''ll have a go at shaping on eth and see if I can recreate it when I > have time.I tried and can''t recreate it on my p200 with CPU as timer. The test probably could have been better though - I just ran 7 netperfs each shaped to its own class and tweaked the rates until I was using 98% CPU. Played DoD with the netgraph all the way across the screen and could notice nothing strange happening. Is anyone who gets the stalls using CPU as timer? The class my internet traffic went to had way more rate than it needed and I had pfifos on the leaves. Do the stalls literally last 2 seconds, should I be able to easily see it just pinging or polling packet counts every second? Andy
Andy Furniss wrote:> Is anyone who gets the stalls using CPU as timer?I see Arpad is - another difference possibly - do you have smp as kernel option but have one processor. I don''t know if it''s anything to do with this, but last time I left it on as an option I got a panic - so it''s always off for me now. Andy.
Andy Furniss wrote:> Andy Furniss wrote: > >> Is anyone who gets the stalls using CPU as timer? > > > I see Arpad is - another difference possibly - do you have smp as > kernel option but have one processor. I don''t know if it''s anything to > do with this, but last time I left it on as an option I got a panic - > so it''s always off for me now. >I use the CPU as timer and I have a dual Xeon box. So I use the SMP but I have phisically 2 CPUs. I don''t use HyperThreading. When I ran into this problem the CPU usage was not too high. The load was low and I had 25-50% idle CPU time (top). If I make some load on the macine the ping times grow and none else change. I checked that the ksoftirqd is not eating my CPU. In the next 2-3 weeks I''ll try to make a setup with HFSC not HTB maybe it will help (I cant use CBQ on this machine). But there are about 20-25000 tc objects on this machine and it''s not a 5 minutes job :-( I haven''t any else idea yet :-( Arpad Kunszt
Kunszt Arpad wrote:> I use the CPU as timer and I have a dual Xeon box. So I use the SMP but > I have phisically 2 CPUs. I don''t use HyperThreading.The TSCs on an SMP box may drift apart. Can you reproduce the problem with gettimeofday as time source? Regards Patrick
I have a normal kernel + vanilla one with SMP turned off. On 4/24/05, Andy Furniss <andy.furniss@dsl.pipex.com> wrote:> > Andy Furniss wrote: > > > Is anyone who gets the stalls using CPU as timer? > > I see Arpad is - another difference possibly - do you have smp as kernel > option but have one processor. I don't know if it's anything to do with > this, but last time I left it on as an option I got a panic - so it's > always off for me now. > > Andy. >-- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
the stalls might last as long as 5 seconds and when they happen, everything including a web browser stops working. ill be back tuesday evening so then i'll try to look at my system. On 4/24/05, Andy Furniss <andy.furniss@dsl.pipex.com> wrote:> > Andy Furniss wrote: > > > I'll have a go at shaping on eth and see if I can recreate it when I > > have time. > > I tried and can't recreate it on my p200 with CPU as timer. > > The test probably could have been better though - I just ran 7 netperfs > each shaped to its own class and tweaked the rates until I was using 98% > CPU. Played DoD with the netgraph all the way across the screen and > could notice nothing strange happening. > > Is anyone who gets the stalls using CPU as timer? > > The class my internet traffic went to had way more rate than it needed > and I had pfifos on the leaves. > > Do the stalls literally last 2 seconds, should I be able to easily see > it just pinging or polling packet counts every second? > > Andy > >-- Miłego Dnia Krystian Antoni _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
> Kunszt Arpad wrote: > > I use the CPU as timer and I have a dual Xeon box. So I use the SMP but > > I have phisically 2 CPUs. I don''t use HyperThreading. > > The TSCs on an SMP box may drift apart. Can you reproduce the problem > with gettimeofday as time source?I''ll make a try in the next 2-3 days. Arpad Kunszt _______________________________________________ LARTC mailing list LARTC@mailman.ds9a.nl http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
Krystian Antoni wrote:> the stalls might last as long as 5 seconds and when they happen, everything > including a web browser stops working. > > ill be back tuesday evening so then i''ll try to look at my system.Maybe I should try with a script more like you are using if you don''t mind posting. One thought I have is that htb seems to have far longer default queues than it used to. If you have set a default class other than 0 with a low rate then your arp packets may be getting stuck in a very long queue. Andy.