thr3ads.net - Lustre discuss - [Lustre-discuss] Kernel Panic on MDS [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Wojciech Turek

2010-Jan-18 11:59 UTC

[Lustre-discuss] Kernel Panic on MDS

RHEL4 Lustre-1.6.6

Does the kernel panic below rings a bell to anyone?

general protection fault: 0000 [1] SMP
CPU 1
Modules linked in: mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) lustre(U)
lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U)
lnet(U) lvfs(U) libcfs(U) ldiskfs(U) drbd(U) ipmi_si(U)
ipmi_devintf(U) ipmi_msghandler(U) autofs4(U) i2
c_nforce2(U) i2c_amd756(U) i2c_isa(U) i2c_amd8111(U) i2c_i801(U)
i2c_core(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U)
cpufreq_powersave(U) ib_ipoib(U) md5(U) ipv6(U) ib_usa(U) mlx4_ib(U)
mlx4_core(U) ib_mthca(U) mptctl(U) dm_
mirror(U) dm_round_robin(U) dm_multipath(U) dm_mod(U) sr_mod(U)
usb_storage(U) joydev(U) button(U) battery(U) ac(U) uhci_hcd(U)
ehci_hcd(U) hw_random(U) ib_ipath(U) ib_umad(U) ib_ucm(U) ib_uverbs(U)
ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U)
 ata_piix(U) libata(U) sg(U) ext3(U) jbd(U) xfs(U) tg3(U) s2io(U)
qla2xxx_conf(U) qla2xxx(U) nfs(U) nfs_acl(U) lockd(U) sunrpc(U)
megaraid_sas(U) mptsas(U) mptscsi(U) mptbase(U) e1000(U) bnx2(U)
sd_mod(U) scsi_mod(U)
Pid: 13546, comm: collectl Tainted: GF     2.6.9-67.0.22.EL_lustre.1.6.6smp
RIP: 0010:[<ffffffff801af8f0>]
<ffffffff801af8f0>{proc_pid_status+534}
RSP: 0018:0000010416fc9e48  EFLAGS: 00010203
RAX: 00047d5350f64e05 RBX: 0000010063767080 RCX: 0000000000000002
RDX: 0000000000000001 RSI: ffffffff80321d58 RDI: 000001006074b092
RBP: 000001030ae0b030 R08: 00000000fffffff9 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000000 R12: 000001006074b092
R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000006e1f
FS:  0000002a96310e80(0000) GS:ffffffff804fcb00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000552abd58a8 CR3: 0000000008258000 CR4: 00000000000006e0
Process collectl (pid: 13546, threadinfo 0000010416fc8000, task
0000010411385030)
Stack: 000001000001da80 ffffffff8032c17f 000001006074b000 0000000000000202
       0000000000000001 0000000011385030 0000001000000001 0000000000000202
       395f74646d5f6c6c 0072650065720037
Call Trace:<ffffffff801acfd3>{proc_info_read+85}
<ffffffff80178dac>{vfs_read+207}
       <ffffffff80179008>{sys_read+69}
<ffffffff80110236>{system_call+126}

Code: 8b 14 90 31 c0 e8 9c d8 03 00 48 98 49 01 c4 8b 13 b8 20 00
RIP <ffffffff801af8f0>{proc_pid_status+534} RSP <0000010416fc9e48>
 <0>Kernel panic - not syncing: Oops


-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517

Andreas Dilger

2010-Jan-18 13:00 UTC

head link

[Lustre-discuss] Kernel Panic on MDS

On 2010-01-18, at 19:59, Wojciech Turek wrote:> RHEL4 Lustre-1.6.6
>
> Does the kernel panic below rings a bell to anyone?
>
> RIP: 0010:[<ffffffff801af8f0>]
<ffffffff801af8f0>{proc_pid_status+534}
> Process collectl (pid: 13546, threadinfo 0000010416fc8000, task
> Call Trace:<ffffffff801acfd3>{proc_info_read+85}
>            <ffffffff80178dac>{vfs_read+207}
>            <ffffffff80179008>{sys_read+69}
>            <ffffffff80110236>{system_call+126}

This looks like collectl reading from a /proc entry after it was cleaned
up.  I think several such bugs were already fixed.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Wojciech Turek

2010-Jan-18 15:09 UTC

head link

[Lustre-discuss] Kernel Panic on MDS

Thanks Andreas for quick answer. So upgrading to a newer version of
colletcl should fix it?

Cheers

Wojciech

2010/1/18 Andreas Dilger <adilger at sun.com>:> On 2010-01-18, at 19:59, Wojciech Turek wrote:
>>
>> RHEL4 Lustre-1.6.6
>>
>> Does the kernel panic below rings a bell to anyone?
>>
>> RIP: 0010:[<ffffffff801af8f0>]
<ffffffff801af8f0>{proc_pid_status+534}
>> Process collectl (pid: 13546, threadinfo 0000010416fc8000, task
>> Call Trace:<ffffffff801acfd3>{proc_info_read+85}
>> ? ? ? ? ? <ffffffff80178dac>{vfs_read+207}
>> ? ? ? ? ? <ffffffff80179008>{sys_read+69}
>> ? ? ? ? ? <ffffffff80110236>{system_call+126}
>
>
> This looks like collectl reading from a /proc entry after it was cleaned
> up. ?I think several such bugs were already fixed.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>


-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517

Andreas Dilger

2010-Jan-18 15:28 UTC

head link

[Lustre-discuss] Kernel Panic on MDS

On 2010-01-18, at 23:09, Wojciech Turek wrote:> Thanks Andreas for quick answer. So upgrading to a newer version of
> colletcl should fix it?
No, it is a Lustre bug, not collectl.  I think a newer version of  
Lustre has fixes in lprocfs to avoid such races.
> 2010/1/18 Andreas Dilger <adilger at sun.com>:
>> On 2010-01-18, at 19:59, Wojciech Turek wrote:
>>>
>>> RHEL4 Lustre-1.6.6
>>>
>>> Does the kernel panic below rings a bell to anyone?
>>>
>>> RIP: 0010:[<ffffffff801af8f0>]
<ffffffff801af8f0>{proc_pid_status
>>> +534}
>>> Process collectl (pid: 13546, threadinfo 0000010416fc8000, task
>>> Call Trace:<ffffffff801acfd3>{proc_info_read+85}
>>>           <ffffffff80178dac>{vfs_read+207}
>>>           <ffffffff80179008>{sys_read+69}
>>>           <ffffffff80110236>{system_call+126}
>>
>>
>> This looks like collectl reading from a /proc entry after it was  
>> cleaned
>> up.  I think several such bugs were already fixed.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>>
>
>
>
> -- 
> --
> Wojciech Turek
>
> Assistant System Manager
>
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk
> Tel: (+)44 1223 763517

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jan 2010 - Kernel Panic on MDS

[Lustre-discuss] Kernel Panic on MDS

[Lustre-discuss] Kernel Panic on MDS

[Lustre-discuss] Kernel Panic on MDS

[Lustre-discuss] Kernel Panic on MDS