thr3ads.net - LARTC - Re: 2.4.20 htb3 oops [Mar 2003]

If this information is useful, please help other people find it:
Share via:

Göran Runfeldt

2003-Mar-03 13:22 UTC

Re: 2.4.20 htb3 oops

Hi everyone,

I am having problems with "oopses" since I introduced HTB on my
company''s PC-based routers. It seems that only routers with high 
network load are affected. The average network load on the two most
problematic routers are 10Mbps in/out and 2.5Mbps in/out.
The other machines with less than 1Mbps average traffic seems unaffected.
 
We have been getting oopses on these machines 1-3 times per week.

We have tried to replace the hardware on both machines without any
improvement. We are using the same combination of hardware and kernel in
the same physical location without any problems, so we assume that hardware,
kernel or heat is not the problem here.
Machines with high network load that does not have any HTB rules loaded
do not suffer from this problem.

Hardware info:
  Router 1 (10Mbps avg in/out):
    1 x Intel(R) Celeron(R) CPU 1.80GHz
    256MB RAM
    eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
    eth1: RealTek RTL8139

  Router 2: (2.5Mbit avg in/out):
    1 x Intel(R) Celeron(R) CPU 1.70GHz
    128MB RAM
     eth0: RealTek RTL8139
     eth1: RealTek RTL8139

Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
tracking of GRE/PPTP connections. They are both single processor machines.
They both shape traffic from and to a VLAN interface. The kernel is compiled
for CPU type "Pentium-III/Celeron" but the machines are running on
Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3 CPU
before we replaced the hardware, and we had the same problem back then.

Unfortunately I have not been able to gather any output from the consoles of
the crasched machines.

Here is the script the ruleset script:
#!/bin/sh
for DEV in eth0.123 eth1
do
        tc qdisc del dev $DEV root
        tc qdisc add dev $DEV root handle 1: htb
        # Total
        tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
        # Default class
        tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
        # Filesharing traffic
        tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
        # ICMP (Highest priority - on customer''s request, not ours)
        tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
prio 0
        tc qdisc add dev $DEV parent 1:2 handle 2: sfq
        tc qdisc add dev $DEV parent 1:3 handle 3: sfq
        tc qdisc add dev $DEV parent 1:4 handle 4: sfq
        for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
        do
                # Send to "crap-class"
                tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
match ip sport $PORT 0xffff flowid 1:3
                tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
match ip dport $PORT 0xffff flowid 1:3
        done
        tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
protocol 1 0xff flowid 1:4 # ICMP
        tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
protocol 0 0x00 flowid 1:2 # Everything else
done

I have not tried to apply the HTB patches from the latest prepatch
version of the Linux kernel or the "htb_3.7_delay_bug" patch
(I think they do the same thing?). Maybe I should try that?

I can get more information (like kernel config etc.) if anyone needs it,
but this thing is really hard to debug since it only happens sporadically.

Thanks,
Göran
>
> In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> on single processor systems everything is OK.
>_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Abraham van der Merwe

2003-Mar-03 13:39 UTC

head link

Re: 2.4.20 htb3 oops

Hi G?ran!

Oopses or kernel panics? Could you please post the oops dumps (with decoded
symbols of course).
> I am having problems with "oopses" since I introduced HTB on my
> company''s PC-based routers. It seems that only routers with high 
> network load are affected. The average network load on the two most
> problematic routers are 10Mbps in/out and 2.5Mbps in/out.
> The other machines with less than 1Mbps average traffic seems unaffected.
>  
> We have been getting oopses on these machines 1-3 times per week.
> 
> We have tried to replace the hardware on both machines without any
> improvement. We are using the same combination of hardware and kernel in
> the same physical location without any problems, so we assume that
hardware,
> kernel or heat is not the problem here.
> Machines with high network load that does not have any HTB rules loaded
> do not suffer from this problem.
> 
> Hardware info:
>   Router 1 (10Mbps avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.80GHz
>     256MB RAM
>     eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
>     eth1: RealTek RTL8139
> 
>   Router 2: (2.5Mbit avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.70GHz
>     128MB RAM
>      eth0: RealTek RTL8139
>      eth1: RealTek RTL8139
> 
> Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
> tracking of GRE/PPTP connections. They are both single processor machines.
> They both shape traffic from and to a VLAN interface. The kernel is
compiled
> for CPU type "Pentium-III/Celeron" but the machines are running
on
> Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3
CPU
> before we replaced the hardware, and we had the same problem back then.
> 
> Unfortunately I have not been able to gather any output from the consoles
of
> the crasched machines.
> 
> Here is the script the ruleset script:
> #!/bin/sh
> for DEV in eth0.123 eth1
> do
>         tc qdisc del dev $DEV root
>         tc qdisc add dev $DEV root handle 1: htb
>         # Total
>         tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
>         # Default class
>         tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
>         # Filesharing traffic
>         tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
>         # ICMP (Highest priority - on customer''s request, not
ours)
>         tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
> prio 0
>         tc qdisc add dev $DEV parent 1:2 handle 2: sfq
>         tc qdisc add dev $DEV parent 1:3 handle 3: sfq
>         tc qdisc add dev $DEV parent 1:4 handle 4: sfq
>         for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
> 6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
>         do
>                 # Send to "crap-class"
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip sport $PORT 0xffff flowid 1:3
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip dport $PORT 0xffff flowid 1:3
>         done
>         tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
> protocol 1 0xff flowid 1:4 # ICMP
>         tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
> protocol 0 0x00 flowid 1:2 # Everything else
> done
> 
> I have not tried to apply the HTB patches from the latest prepatch
> version of the Linux kernel or the "htb_3.7_delay_bug" patch
> (I think they do the same thing?). Maybe I should try that?
> 
> I can get more information (like kernel config etc.) if anyone needs it,
> but this thing is really hard to debug since it only happens sporadically.
> 
> Thanks,
> Göran
> 
> >
> > In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> > on single processor systems everything is OK.
> >
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
-- 

Regards
 Abraham

Genius may have its limitations, but stupidity is not thus handicapped.
		-- Elbert Hubbard

___________________________________________________
 Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
 P.O. Box 3472, Matieland, Stellenbosch, 7602
 Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
 Email: abz@frogfoot.net

Göran Runfeldt

2003-Mar-03 14:53 UTC

head link

Re: 2.4.20 htb3 oops

Hi Abraham,

I''m sorry for mixing up the terms. I thought "oops" and
"kernel panic"
were the same thing. This is the text that our technician wrote down
from the screen after the first crasch: 

"...unable to handling kernel null pointer dereference at virtual address
00000
Kernel panic: aiee killing interrupt handling - in interrupt handler not
syncing."

He also says that the keyboard LEDs were "blinking". We have not
been able to receive any data from the other crasches, since when
the technician arrived the machines were "stone dead" with the
keyboard LEDs blinking. This is all the data that I have at the moment,
because the machines are physically located far from here.

I realize that this might not be enough information to make something useful of.
It is merely an attempt to somewhat document the problem.

Göran
----- Original Message ----- 
From: "Abraham van der Merwe" <abz@frogfoot.net>
To: "Göran Runfeldt" <goran@wasadata.se>
Cc: "Linux Advanced Routing & Traffic Control list"
<lartc@mailman.ds9a.nl>
Sent: Monday, March 03, 2003 2:39 PM
Subject: Re: [LARTC] 2.4.20 htb3 oops

Hi G?ran!

Oopses or kernel panics? Could you please post the oops dumps (with decoded
symbols of course).
> I am having problems with "oopses" since I introduced HTB on my
> company''s PC-based routers. It seems that only routers with high 
> network load are affected. The average network load on the two most
> problematic routers are 10Mbps in/out and 2.5Mbps in/out.
> The other machines with less than 1Mbps average traffic seems unaffected.
>  
> We have been getting oopses on these machines 1-3 times per week.
> 
> We have tried to replace the hardware on both machines without any
> improvement. We are using the same combination of hardware and kernel in
> the same physical location without any problems, so we assume that
hardware,
> kernel or heat is not the problem here.
> Machines with high network load that does not have any HTB rules loaded
> do not suffer from this problem.
> 
> Hardware info:
>   Router 1 (10Mbps avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.80GHz
>     256MB RAM
>     eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
>     eth1: RealTek RTL8139
> 
>   Router 2: (2.5Mbit avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.70GHz
>     128MB RAM
>      eth0: RealTek RTL8139
>      eth1: RealTek RTL8139
> 
> Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
> tracking of GRE/PPTP connections. They are both single processor machines.
> They both shape traffic from and to a VLAN interface. The kernel is
compiled
> for CPU type "Pentium-III/Celeron" but the machines are running
on
> Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3
CPU
> before we replaced the hardware, and we had the same problem back then.
> 
> Unfortunately I have not been able to gather any output from the consoles
of
> the crasched machines.
> 
> Here is the script the ruleset script:
> #!/bin/sh
> for DEV in eth0.123 eth1
> do
>         tc qdisc del dev $DEV root
>         tc qdisc add dev $DEV root handle 1: htb
>         # Total
>         tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
>         # Default class
>         tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
>         # Filesharing traffic
>         tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
>         # ICMP (Highest priority - on customer''s request, not
ours)
>         tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
> prio 0
>         tc qdisc add dev $DEV parent 1:2 handle 2: sfq
>         tc qdisc add dev $DEV parent 1:3 handle 3: sfq
>         tc qdisc add dev $DEV parent 1:4 handle 4: sfq
>         for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
> 6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
>         do
>                 # Send to "crap-class"
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip sport $PORT 0xffff flowid 1:3
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip dport $PORT 0xffff flowid 1:3
>         done
>         tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
> protocol 1 0xff flowid 1:4 # ICMP
>         tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
> protocol 0 0x00 flowid 1:2 # Everything else
> done
> 
> I have not tried to apply the HTB patches from the latest prepatch
> version of the Linux kernel or the "htb_3.7_delay_bug" patch
> (I think they do the same thing?). Maybe I should try that?
> 
> I can get more information (like kernel config etc.) if anyone needs it,
> but this thing is really hard to debug since it only happens sporadically.
> 
> Thanks,
> Göran
> 
> >
> > In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> > on single processor systems everything is OK.
> >
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
-- 

Regards
 Abraham

Genius may have its limitations, but stupidity is not thus handicapped.
  -- Elbert Hubbard

___________________________________________________
 Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
 P.O. Box 3472, Matieland, Stellenbosch, 7602
 Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
 Email: abz@frogfoot.net
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Abraham van der Merwe

2003-Mar-03 16:29 UTC

head link

Re: 2.4.20 htb3 oops

Hi G?ran!

Unfortunately that is not going to help much since you can''t figure out
from
the information below where in the code it crashes. Next time, please copy
the entire kernel panic (including stack trace) and run it through ksymoops
or look up the symbols in vmlinux.
> I''m sorry for mixing up the terms. I thought "oops" and
"kernel panic"
> were the same thing. This is the text that our technician wrote down
> from the screen after the first crasch: 
> 
> "...unable to handling kernel null pointer dereference at virtual
address 00000
> Kernel panic: aiee killing interrupt handling - in interrupt handler not
syncing."
> 
> He also says that the keyboard LEDs were "blinking". We have not
> been able to receive any data from the other crasches, since when
> the technician arrived the machines were "stone dead" with the
> keyboard LEDs blinking. This is all the data that I have at the moment,
> because the machines are physically located far from here.
> 
> I realize that this might not be enough information to make something
useful of.
> It is merely an attempt to somewhat document the problem.
> 
> Göran
> ----- Original Message ----- 
> From: "Abraham van der Merwe" <abz@frogfoot.net>
> To: "Göran Runfeldt" <goran@wasadata.se>
> Cc: "Linux Advanced Routing & Traffic Control list"
<lartc@mailman.ds9a.nl>
> Sent: Monday, March 03, 2003 2:39 PM
> Subject: Re: [LARTC] 2.4.20 htb3 oops
> 
> Hi G?ran!
> 
> Oopses or kernel panics? Could you please post the oops dumps (with decoded
> symbols of course).
> 
> > I am having problems with "oopses" since I introduced HTB on
my
> > company''s PC-based routers. It seems that only routers with
high
> > network load are affected. The average network load on the two most
> > problematic routers are 10Mbps in/out and 2.5Mbps in/out.
> > The other machines with less than 1Mbps average traffic seems
unaffected.
> >  
> > We have been getting oopses on these machines 1-3 times per week.
> > 
> > We have tried to replace the hardware on both machines without any
> > improvement. We are using the same combination of hardware and kernel
in
> > the same physical location without any problems, so we assume that
hardware,
> > kernel or heat is not the problem here.
> > Machines with high network load that does not have any HTB rules
loaded
> > do not suffer from this problem.
> > 
> > Hardware info:
> >   Router 1 (10Mbps avg in/out):
> >     1 x Intel(R) Celeron(R) CPU 1.80GHz
> >     256MB RAM
> >     eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
> >     eth1: RealTek RTL8139
> > 
> >   Router 2: (2.5Mbit avg in/out):
> >     1 x Intel(R) Celeron(R) CPU 1.70GHz
> >     128MB RAM
> >      eth0: RealTek RTL8139
> >      eth1: RealTek RTL8139
> > 
> > Both use Linux kernel 2.4.20 with patches for FreeS/WAN and
connection-
> > tracking of GRE/PPTP connections. They are both single processor
machines.
> > They both shape traffic from and to a VLAN interface. The kernel is
compiled
> > for CPU type "Pentium-III/Celeron" but the machines are
running on
> > Pentium-IV/Celeron processors, if that matters. Router 1 were using a
P3 CPU
> > before we replaced the hardware, and we had the same problem back
then.
> > 
> > Unfortunately I have not been able to gather any output from the
consoles of
> > the crasched machines.
> > 
> > Here is the script the ruleset script:
> > #!/bin/sh
> > for DEV in eth0.123 eth1
> > do
> >         tc qdisc del dev $DEV root
> >         tc qdisc add dev $DEV root handle 1: htb
> >         # Total
> >         tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
> >         # Default class
> >         tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
> >         # Filesharing traffic
> >         tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
> >         # ICMP (Highest priority - on customer''s request, not
ours)
> >         tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit
\
> > prio 0
> >         tc qdisc add dev $DEV parent 1:2 handle 2: sfq
> >         tc qdisc add dev $DEV parent 1:3 handle 3: sfq
> >         tc qdisc add dev $DEV parent 1:4 handle 4: sfq
> >         for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
> > 6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
> >         do
> >                 # Send to "crap-class"
> >                 tc filter add dev $DEV protocol ip parent 1: prio 1
u32 \
> > match ip sport $PORT 0xffff flowid 1:3
> >                 tc filter add dev $DEV protocol ip parent 1: prio 1
u32 \
> > match ip dport $PORT 0xffff flowid 1:3
> >         done
> >         tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match
ip \
> > protocol 1 0xff flowid 1:4 # ICMP
> >         tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match
ip \
> > protocol 0 0x00 flowid 1:2 # Everything else
> > done
> > 
> > I have not tried to apply the HTB patches from the latest prepatch
> > version of the Linux kernel or the "htb_3.7_delay_bug" patch
> > (I think they do the same thing?). Maybe I should try that?
> > 
> > I can get more information (like kernel config etc.) if anyone needs
it,
> > but this thing is really hard to debug since it only happens
sporadically.
> > 
> > Thanks,
> > Göran
> > 
> > >
> > > In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> > > on single processor systems everything is OK.
> > >
> > _______________________________________________
> > LARTC mailing list / LARTC@mailman.ds9a.nl
> > http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
-- 

Regards
 Abraham

Kirk to Enterprise -- beam down yeoman Rand and a six-pack.

___________________________________________________
 Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
 P.O. Box 3472, Matieland, Stellenbosch, 7602
 Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
 Email: abz@frogfoot.net

Göran Runfeldt

2003-Mar-04 11:49 UTC

head link

SV: 2.4.20 htb3 oops

I have arranged a serial console with logging to a terminal client, so next
time it happens I will have the output from the panic. 

For the record: 
One of the machines crashed again this night at 11.30 p.m. (local time). 
I noticed that the MRTG graph looks a bit odd: 
http://hem.wasadata.net/goran/mrtg.png 
 
The total limit is set to12Mbit, as you probably can see if you check the
ruleset in my first post. Even about 2-4 hours before the crash the graph
shows two network traffic "spikes". The first one tops at about 22Mbps
of outgoing traffic on the interface and the second one tops 28Mbps of
incoming traffic. 
Could this have anything to do with the crash?

Many Thanks,
Göran

----- Original Message ----- 
From: Abraham van der Merwe <abz@frogfoot.net>
To: Göran Runfeldt <goran@wasadata.se>
Cc: <lartc@mailman.ds9a.nl>
Sent: Monday, March 03, 2003 5:29 PM
Subject: Re: [LARTC] 2.4.20 htb3 oops

Hi G?ran!

Unfortunately that is not going to help much since you can''t figure out
from
the information below where in the code it crashes. Next time, please copy
the entire kernel panic (including stack trace) and run it through ksymoops
or look up the symbols in vmlinux.

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Thomas Kirk

2003-Mar-10 20:20 UTC

head link

Re: SV: 2.4.20 htb3 oops

Hep

On Tue, Mar 04, 2003 at 12:49:18PM +0100, Göran Runfeldt wrote:
> I have arranged a serial console with logging to a terminal client, so next
> time it happens I will have the output from the panic. 
> 
> For the record: 
> One of the machines crashed again this night at 11.30 p.m. (local time). 
> I noticed that the MRTG graph looks a bit odd: 
> http://hem.wasadata.net/goran/mrtg.png 
>  
> The total limit is set to12Mbit, as you probably can see if you check the
> ruleset in my first post. Even about 2-4 hours before the crash the graph
> shows two network traffic "spikes". The first one tops at about
22Mbps
> of outgoing traffic on the interface and the second one tops 28Mbps of
> incoming traffic. 
> Could this have anything to do with the crash?
Any solution to the problems describe above? Im currently looking into
building a new kernel 2.4.20 with HTB compiled as module in a
production enviroment (2,5mbits average, 6-10mbits/peak). I wont use
it if its broken though?

-- 
Venlig hilsen/Kind regards
Thomas Kirk
ARKENA
tlf/phone +04570233456
thomas(at)arkena(dot)com
Http://www.arkena.com


Oh, give me a home,
Where the buffalo roam,
And I''ll show you a house with a really messy kitchen.
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Maybe Matching Threads

Search for more maybe matching threads

LARTC - Mar 2003 - Re: 2.4.20 htb3 oops

Re: 2.4.20 htb3 oops

Re: 2.4.20 htb3 oops

Re: 2.4.20 htb3 oops

Re: 2.4.20 htb3 oops

SV: 2.4.20 htb3 oops

Re: SV: 2.4.20 htb3 oops

Maybe Matching Threads