thr3ads.net - Lustre discuss - [Lustre-discuss] Kernel Panic [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Aaron Knister

2006-Nov-21 15:07 UTC

[Lustre-discuss] Kernel Panic

Hi,
I have a machine with 20tb of local storage (internal sata drives). It 
has three 6.7tb arrays, each of which is configured as an OST and part 
of a larger LOV. I''ll call this "machine B". Machine B has 16
gigs of
memory. Not only does it act as an OSS but it also has this single large 
lov mounted and I/O intensive jobs run on this system reading/writing 
heavliy from this single LOV. The LOV is also mounted on another system 
that we''ll call "Machine A". Machine A does not act as an OSS
and is not
serving out any disk via lustre. Last night I noticed that any of my 
jobs running on "Machine B" that were reading/writing from this LOV
were
running at about 60% of cpu capacity, while the remaining 40% was being 
used by the "system". I got those numbers from iostat. Note that the 
EXACT same jobs run on "Machine A" were running at 100% cpu.

I couldn''t figure out what system calls were hogging up 40 percent of 
the total cpu. I stumbled across an article describing how in large 
memory systems if the page size isn''t set right the system can end up 
spending more time swapping out pages then actually doing user work, 
which would account for this 40% system usage. (I don''t understand 
this). After reading another article about virtual memory I found on 
RedHat''s site I issued the following command-

" echo "10" > cat /proc/sys/vm/max_queue_depth"

4 hours later the system crashed, and I''m not sure why. I''m
not asking
you guys to debug my crash except as it relates to lustre. I have 
netdump running and was able to get a bit of information about the 
crash. The only reason I''m posting here about this is because in the 
information I recived about the dump there were quite a few calls to 
various lustre functions right before the crash/panic.

I''ve attached the rest of it as a text file.

I have read about there being some type of memory (leak?/bug?) in lustre 
which can show up when you run an machine as a lustre OSS and a lustre 
filesystem client. Could it be this bug that I''m seeing?

Many thanks in advance,

-Aaron

ps please correct me if my acronyms are off :)

-------------- next part --------------
Kernel BUG at panic:75
invalid operand: 0000 [1] SMP
CPU 6
Modules linked in: ipt_REJECT(U) ipt_state(U) ip_conntrack(U) ipt_multiport(U)
iptable_filter(U) ip_tables(U) llite(U) mdc(U) lov(U) osc(U) obdfilter(U)
fsfilt_ldiskfs(U) ldiskfs(U) ost(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U)
lnet(U) libcfs(U) sg(U) st(U) loop(U) md5(U) ipv6(U) parport_pc(U) lp(U)
parport(U) netconsole(U) netdump(U) autofs4(U) i2c_dev(U) i2c_core(U) nfs(U)
lockd(U) nfs_acl(U) sunrpc(U) ds(U) yenta_socket(U) pcmcia_core(U) dm_mirror(U)
dm_mod(U) button(U) battery(U) ac(U) ohci_hcd(U) ehci_hcd(U) e1000(U) floppy(U)
ext3(U) jbd(U) sata_nv(U) libata(U) 3w_9xxx(U) aic79xx(U) sd_mod(U) scsi_mod(U)
Pid: 120, comm: kswapd3 Tainted: G   M  2.6.9-42.0.2.EL_lustre.1.4.7.1smp
RIP: 0010:[<ffffffff80136d8e>] <ffffffff80136d8e>{panic+211}
RSP: 0018:00000101fc72bd18  EFLAGS: 00010086
RAX: 000000000000002d RBX: ffffffff8031df9b RCX: 0000000000000046
RDX: 0000000000012e6d RSI: 0000000000000046 RDI: ffffffff80373dc0
RBP: 0000000000000900 R08: 00000000ffffffff R09: ffffffff8031df9b
R10: 0000000000000061 R11: 000000000000002a R12: 00000000ffffffff
R13: ffffffff8036ac80 R14: 00137b5ff61df5cb R15: ffffffff8031df9b
FS:  0000002a95573380(0000) GS:ffffffff80477e80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000002aa6dee000 CR3: 00000003030d8000 CR4: 00000000000006e0
Process kswapd3 (pid: 120, threadinfo 0000010100146000, task 00000100080c3800)
Stack: 0000003000000008 00000101fc72bdf8 00000101fc72bd38 000000031770fa68
       0000000000000046 0000000000000046 0000000000012e40 0000000000000046
       00000000ffffffff 00000101fc72be78
Call Trace:<ffffffff80117930>{print_mce+159}
<ffffffff801179f1>{mce_available+0}
<ffffffff80117ce6>{do_machine_check+731}
<ffffffff8011135b>{machine_check+127}
<ffffffffa0521447>{:llite:ll_releasepage+0}
<ffffffffa05159fc>{:llite:llap_from_page+546}
 <EOE> <ffffffffa046d91a>{:lov:lov_teardown_async_page+711}
       <ffffffffa0516411>{:llite:ll_removepage+455}
<ffffffffa0521455>{:llite:ll_releasepage+14}
       <ffffffff80163353>{shrink_zone+3363}
<ffffffff80131555>{recalc_task_prio+337}
       <ffffffff80163b5f>{balance_pgdat+506}
<ffffffff80163da9>{kswapd+252}
       <ffffffff80134b66>{autoremove_wake_function+0}
<ffffffff80134b66>{autoremove_wake_function+0}
       <ffffffff80110e23>{child_rip+8} <ffffffff80163cad>{kswapd+0}
       <ffffffff80110e1b>{child_rip+0}

Code: 0f 0b 40 d4 31 80 ff ff ff ff 4b 00 31 ff e8 47 c1 fe ff e8
RIP <ffffffff80136d8e>{panic+211} RSP <00000101fc72bd18>

CPU 6: Machine Check Exception:                4 Bank 4: f603200100000813
TSC 137b5ff61e0204 ADDR 31770fa68
Kernel panic - not syncing: Machine check
----------- [cut here ] --------- [please bite here ] ---------

Jean-Marc Saffroy

2006-Nov-21 16:09 UTC

head link

[Lustre-discuss] Kernel Panic

On Tue, 21 Nov 2006, Aaron Knister wrote:
> I have a machine with 20tb of local storage (internal sata drives). It 
> has three 6.7tb arrays, each of which is configured as an OST and part 
> of a larger LOV. I''ll call this "machine B". Machine B
has 16 gigs of
> memory. Not only does it act as an OSS but it also has this single large 
> lov mounted and I/O intensive jobs run on this system reading/writing 
> heavliy from this single LOV.
If machine B acts both as client and server for the same Lustre 
filesystem, you may run into recovery problems in case of a crash. CFS 
recommends against using such configurations. I''m not sure how
dangerous
this is, however.
> The LOV is also mounted on another system that we''ll call
"Machine A".
> Machine A does not act as an OSS and is not serving out any disk via 
> lustre. Last night I noticed that any of my jobs running on "Machine
B"
> that were reading/writing from this LOV were running at about 60% of cpu 
> capacity, while the remaining 40% was being used by the "system".
I got
> those numbers from iostat. Note that the EXACT same jobs run on
"Machine
> A" were running at 100% cpu.
Can you describe the load generated by your job on Lustre? Metadata 
intensive programs can generate a lot of activity on Lustre servers.

Also check that Lustre debugging is set to zero (I suspect many innocent 
users get caught by this setting).
> I couldn''t figure out what system calls were hogging up 40 percent
of
> the total cpu. I stumbled across an article describing how in large 
> memory systems if the page size isn''t set right the system can end
up
> spending more time swapping out pages then actually doing user work, 
> which would account for this 40% system usage. (I don''t understand
> this).
I guess this was not really about swapping, rather page fault handling. :) 
You could probably have a rough idea of what''s going on with a kernel 
profiling program such as oprofile.

[snip]> I''ve attached the rest of it as a text file.
The panic message states it''s an MCE (Machine Check Exception), usually
this is a hardware problem (memory, CPU, etc.).


-- 
Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net

Nicholas Henke

2006-Nov-21 16:24 UTC

head link

[Lustre-discuss] Kernel Panic

Aaron Knister wrote:


...snipped...> Code: 0f 0b 40 d4 31 80 ff ff ff ff 4b 00 31 ff e8 47 c1 fe ff e8
> RIP <ffffffff80136d8e>{panic+211} RSP <00000101fc72bd18>
>
> CPU 6: Machine Check Exception:                4 Bank 4: f603200100000813
> TSC 137b5ff61e0204 ADDR 31770fa68
>   
>
I do believe that MCE''s are typically some bit of hardware gone bad.

http://en.wikipedia.org/wiki/Machine_Check_Exception

Nic

Aaron Knister

2006-Nov-24 10:52 UTC

head link

[Lustre-discuss] Kernel Panic

Thank you all for your help. The machine had panicked several times 
since, but the issue went away when I took the client away from the 
server. A round of "I told you so" was in order for my higher ups ;-) 
because I recommended against this setup in the first place...but I did 
what I was told.

-Aaron

Jean-Marc Saffroy wrote:> On Tue, 21 Nov 2006, Aaron Knister wrote:
>
>> I have a machine with 20tb of local storage (internal sata drives). 
>> It has three 6.7tb arrays, each of which is configured as an OST and 
>> part of a larger LOV. I''ll call this "machine B".
Machine B has 16
>> gigs of memory. Not only does it act as an OSS but it also has this 
>> single large lov mounted and I/O intensive jobs run on this system 
>> reading/writing heavliy from this single LOV.
>
> If machine B acts both as client and server for the same Lustre 
> filesystem, you may run into recovery problems in case of a crash. CFS 
> recommends against using such configurations. I''m not sure how 
> dangerous this is, however.
>
>> The LOV is also mounted on another system that we''ll call
"Machine
>> A". Machine A does not act as an OSS and is not serving out any
disk
>> via lustre. Last night I noticed that any of my jobs running on 
>> "Machine B" that were reading/writing from this LOV were
running at
>> about 60% of cpu capacity, while the remaining 40% was being used by 
>> the "system". I got those numbers from iostat. Note that the
EXACT
>> same jobs run on "Machine A" were running at 100% cpu.
>
> Can you describe the load generated by your job on Lustre? Metadata 
> intensive programs can generate a lot of activity on Lustre servers.
>
> Also check that Lustre debugging is set to zero (I suspect many 
> innocent users get caught by this setting).
>
>> I couldn''t figure out what system calls were hogging up 40
percent of
>> the total cpu. I stumbled across an article describing how in large 
>> memory systems if the page size isn''t set right the system can
end up
>> spending more time swapping out pages then actually doing user work, 
>> which would account for this 40% system usage. (I don''t
understand
>> this).
>
> I guess this was not really about swapping, rather page fault 
> handling. :) You could probably have a rough idea of what''s going
on
> with a kernel profiling program such as oprofile.
>
> [snip]
>> I''ve attached the rest of it as a text file.
>
> The panic message states it''s an MCE (Machine Check Exception), 
> usually this is a hardware problem (memory, CPU, etc.).
>
>

Lustre discuss - Nov 2006 - Kernel Panic

[Lustre-discuss] Kernel Panic

[Lustre-discuss] Kernel Panic

[Lustre-discuss] Kernel Panic

[Lustre-discuss] Kernel Panic