thr3ads.net - LARTC - tc/htb still hangs system. [Sep 2002]

If this information is useful, please help other people find it:
Share via:

Tomasz Wrona

2002-Sep-18 22:52 UTC

tc/htb still hangs system.

Hello,

I would like ask for help in despair...

I am running complex htb setup to manage two leased lines [1+2MBit] for
500 users. Setup works from few months but in this time I have still
awfully problems with stable work. Nowadays there is no week when system
hangs two
or three times. I tried dozens of setups, patches, recompilations of
kernel, iproute and iptables, tricks and still almost without improvement.

I tried to find reason but without success. Bellow I mention some
observed facts. Maybe someboty could advise or solve problem...


When I started with HTB [2.0] [about 200 users on 1Mbit link; Previous it
worked on CBQ setup] there wasn''t propably [AFAIK] problems with stable
work [many days between manual reboot]. From time to time I had some
sudden hangs, which become more often later. I changed kernel to 2.4.18
but the reason
what I found [or one of the reasons...] whas hardware related [damaged cpu
and/or mboard].

I changed completly all hardware and changed kernel to 2.4.18 patched
with htb v2. but problem didn''t dissapear - frozen system during
several
days. Most ofen hangs was after 1 or 2 days of stable work but some cases
it was 4 or even 7 days.
.
Then I tried 2.4.18 patched with htb v.3. but situation become hopeless. I
found that each changing of htb class parameters [tc class change...] lead
to freezing system. Sometimes one sometimes more changing params caused
crash. Htb 3 was useless for me and htb 2 was/is also unstable.
I also tried htb2 on completly other machine [but with the same Linux
distro - PLD], but system crashed almost immediately.

Now I am testing 2.4.20-pre7 [with included htb3], system didn''t hang
immediatelly after changing htb class params but now it works from several
hours to one day.


Unfortunatelly I didn''t have any error messages in logs and on console.

ERRATA: Today I have first time to see some
logs on console [kernel 2.4.20-pre7], there is something about Oops
"Process swapper (pid: 0, stackpage=c0211000)" and a lot of digits.
I put screen on location: http://eter.tym.pl/bug2.gif


Diging in LARTC archive I found maybe something simmilar problem in post
of "Dimitris Zilaskos: [LARTC] tc reliably hangs my system "
[http://mailman.ds9a.nl/pipermail/lartc/2002q3/004316.html]

but there is not solution for me and despaired I dont know the reason
and what to do with it.

BTW. If this bug report [http://eter.tym.pl/bug2.gif file]  is related
more to kernel than to tc/htb please tell me where to send it.


Regards
tw
-- 

----------------
 ck.eter.tym.pl

"Never let shooling disturb Your education"

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Werner Almesberger

2002-Sep-19 00:35 UTC

head link

Re: tc/htb still hangs system.

Tomasz Wrona wrote:> I tried to find reason but without success. Bellow I mention some
> observed facts. Maybe someboty could advise or solve problem...
In case you want to try systematic debugging of HTB, you may find
tcsim useful. tcsim can be run under Electric Fence and under
Valgrind (http://developer.kde.org/~sewardj/).

It won''t help you find race conditions and such, but spotting
odd side-effects of parameter changes may be well within its
capabilities.

Of course, a decent set of regression tests should also be
useful for future HTB development ...

Concerning the Oops you got: you should run it through ksymoops
(see Documentation/oops-tracing.txt in your kernel source tree).

If you don''t want to type in the whole Oops text, you can also
get the location of individual symbols with

gdb your/kernel/dir/vmlinux
(gdb) info line *0xd093caa4
etc.

The most useful data is in the EIP and the call trace.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Tomasz Wrona

2002-Sep-19 09:52 UTC

head link

Re: tc/htb still hangs system - ksymoops traced.

Werner thanks for usefull info !

On Wed, 18 Sep 2002, Werner Almesberger wrote:
> Concerning the Oops you got: you should run it through ksymoops
> (see Documentation/oops-tracing.txt in your kernel source tree).
OK, I retyped screenshot and put it to ksymoops  and it said:
[Will it be enough info to debug, what can I do also ?]


### BEGIN ###
ksymoops 2.4.6 on i686 2.4.20-pre7.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre7/ (default)
     -m /boot/System.map-2.4.20-pre7 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I''ll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oops: 0000
CPU:    0
EIP:    0010:[<d093c56f>]  Not tained
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00005198   ebx: 00000030     ecx: cd227400       edx: cf6ecc84
esi: cf6ecc84   edi: 00000030     ebp: 00000000       esp: c0211e3c
ds: 0018        es: 0018       ss: 0018
Process swapper (pid: 0, stackpage=c0211000)
Stack:  00000000 cf6ecc00 00000000 cd227400 d093caa4 cd363c5c cf6ecc84
cd227400
        00000003 cd227400 00000001 cd363c5c cd363c5c d093ce55 cd363c5c
cd227400
        03938700 00000000 cd227400 d093d71f cd363c5c cd227400 c0211eb4
cd363924
Call Trace:     [<d093caa4>] [<d093ce55>] [<d093d71f>]
[<d093dc1b>]
[<d093d913>]
    [<d093dc8c>] [<c0199843>] [<c019384d>] [<c01168aa>]
[<c0109962>]
[<c0106ba0>]
    [<c0106ba0>] [<c010bb18>] [<c0106ba0>] [<c0106ba0>]
[<c0106bc3>]
[<c0106c29>]
    [<c0105000>] [<c0105027>]
Code: 81 38 f1 fe fa fe 74 12 68 84 01 00 00 68 00 f3 93 d0 c8 82

>>EIP; d093c56f <[sch_htb]htb_add_to_id_tree+93/130>   <====
>>ecx; cd227400 <_end+cfb8e4c/10596a4c>
>>edx; cf6ecc84 <_end+f47e6d0/10596a4c>
>>esi; cf6ecc84 <_end+f47e6d0/10596a4c>
>>esp; c0211e3c <init_task_union+1e3c/2000>
Trace; d093caa4 <[sch_htb]htb_activate_prios+a4/13c>
Trace; d093ce55 <[sch_htb]htb_change_class_mode+89/a0>
Trace; d093d71f <[sch_htb]htb_do_events+1bb/210>
Trace; d093dc1b <[sch_htb]htb_dequeue+10b/21c>
Trace; d093d913 <[sch_htb]htb_dequeue_tree+a7/218>
Trace; d093dc8c <[sch_htb]htb_dequeue+17c/21c>
Trace; c0199843 <qdisc_restart+13/d8>
Trace; c019384d <net_tx_action+99/a8>
Trace; c01168aa <do_softirq+5a/a4>
Trace; c0109962 <do_IRQ+96/a8>
Trace; c0106ba0 <default_idle+0/28>
Trace; c0106ba0 <default_idle+0/28>
Trace; c010bb18 <call_do_IRQ+5/d>
Trace; c0106ba0 <default_idle+0/28>
Trace; c0106ba0 <default_idle+0/28>
Trace; c0106bc3 <default_idle+23/28>
Trace; c0106c29 <cpu_idle+41/54>
Trace; c0105000 <_stext+0/0>
Trace; c0105027 <rest_init+27/28>

Code;  d093c56f <[sch_htb]htb_add_to_id_tree+93/130>
00000000 <_EIP>:
Code;  d093c56f <[sch_htb]htb_add_to_id_tree+93/130>   <====   0:   81
38 f1 fe fa fe         cmpl   $0xfefafef1,(%eax)   <====Code;  d093c575
<[sch_htb]htb_add_to_id_tree+99/130>
   6:   74 12                     je     1a <_EIP+0x1a> d093c589
<[sch_htb]htb_add_to_id_tree+ad/130>
Code;  d093c577 <[sch_htb]htb_add_to_id_tree+9b/130>
   8:   68 84 01 00 00            push   $0x184
Code;  d093c57c <[sch_htb]htb_add_to_id_tree+a0/130>
   d:   68 00 f3 93 d0            push   $0xd093f300
Code;  d093c581 <[sch_htb]htb_add_to_id_tree+a5/130>
  12:   c8 82 00 00               enter  $0x82,$0x0

<0>Kernel panic: Aiee, killing interrupt handler!

1 warning issued.  Results may not be reliable.

### END ###


Regards
tw
-- 

----------------
 ck.eter.tym.pl


_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Werner Almesberger

2002-Sep-21 03:35 UTC

head link

Re: tc/htb still hangs system - ksymoops traced.

Tomasz Wrona wrote:> OK, I retyped screenshot and put it to ksymoops  and it said:
> [Will it be enough info to debug, what can I do also ?]
I guess Martin could figure it out from this. I''m too lazy ;-)

But it would be interesting to see whether this problem also
shows up in tcsim (then, it should be easy to diagnose it
completely). You seem to have a sequence of configuration
commands that reliably cause this crash, right ? If yes, it
would be good if you could send them.

Also, do you know what happens at the time of the crash ?
Is this simply the first packet that hits HTB after some
change, or is this a packet with special characteristics ?
(Specific flow, etc.)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

LARTC - Sep 2002 - tc/htb still hangs system.

tc/htb still hangs system.

Re: tc/htb still hangs system.

Re: tc/htb still hangs system - ksymoops traced.

Re: tc/htb still hangs system - ksymoops traced.