Christopher Huhn
2010-Apr-23 12:18 UTC
[Lustre-discuss] Kernel oops after cat on /proc/fs/lustre/mgs/MGS/exports/*/stats
Dear lustre wizards, we are experiencing problems on our MDS and our Lustre expert is abroad (he just attended LUG meeting). One of the symptoms we observe are reproducible kernel oopses when viewing some stats files beneath /proc/fs/lustre/mgs/MGS/exports : mds:~# cat /proc/fs/lustre/mgs/MGS/exports/10.12... at tcp/stats Killed mds:~# mds kernel: Oops: 0000 [38] SMP Apr 23 13:23:19 mds kernel: Unable to handle kernel paging request at ffffffff00040024 RIP: Apr 23 13:23:19 mds kernel: [<ffffffff883d6680>] :obdclass:lprocfs_stats_seq_show+0x80/0x1e0 Apr 23 13:23:19 mds kernel: PGD 203067 PUD 0 Apr 23 13:23:19 mds kernel: Oops: 0000 [38] SMP Apr 23 13:23:20 mds kernel: CPU 7 Apr 23 13:23:20 mds kernel: Modules linked in: mds fsfilt_ldiskfs(F) mgs mgc ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs xt_tcpudp iptable_filter ip_tables x_tables drbd cn button ac battery bonding xfs ipmi_si ipmi_devintf ipmi_msghandler serio_raw psmouse joydev pcspkr i2c_i801 i2c_core shpchp pci_hotplug evdev parport_pc parport ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid10 raid456 xor raid1 raid0 multipath linear md_mod sd_mod ide_cd cdrom ata_generic libata generic usbhid hid piix 3w_9xxx floppy ide_core ehci_hcd uhci_hcd e1000 scsi_mod thermal processor fan Apr 23 13:23:20 mds kernel: Pid: 7293, comm: cat Tainted: GF 2.6.22+lustre1.6.7.2+0.credativ.etch.1 #2 Apr 23 13:23:20 mds kernel: RIP: 0010:[<ffffffff883d6680>] [<ffffffff883d6680>] :obdclass:lprocfs_stats_seq_show+0x80/0x1e0 Apr 23 13:23:20 mds kernel: RSP: 0018:ffff8103ba5f9e48 EFLAGS: 00010282 Apr 23 13:23:20 mds kernel: RAX: ffffffff00040004 RBX: 7fffffffffffffff RCX: 0000000000000006 Apr 23 13:23:20 mds kernel: RDX: 0101010101010101 RSI: 0000000000000000 RDI: 0000000000000000 Apr 23 13:23:20 mds kernel: RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000 Apr 23 13:23:20 mds kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Apr 23 13:23:20 mds kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff8108000a1760 Apr 23 13:23:20 mds kernel: FS: 00002b4a366786d0(0000) GS:ffff81081004b840(0000) knlGS:0000000000000000 Apr 23 13:23:20 mds kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 23 13:23:20 mds kernel: CR2: ffffffff00040024 CR3: 000000078f018000 CR4: 00000000000006e0 Apr 23 13:23:20 mds kernel: Process cat (pid: 7293, threadinfo ffff8103ba5f8000, task ffff8107dc299530) Apr 23 13:23:20 mds kernel: Stack: 0000000000000202 ffffffff00000000 ffffffff00040004 ffff81067dae2640 Apr 23 13:23:20 mds kernel: 000000004bd18327 00000000000ca54d 0000000000000000 ffff81067dae2640 Apr 23 13:23:20 mds kernel: ffffffff00040004 0000000000040004 0000000000000400 0000000000000000 Apr 23 13:23:20 mds kernel: Call Trace: Apr 23 13:23:20 mds kernel: [<ffffffff8029c0ac>] seq_read+0x105/0x28d Apr 23 13:23:20 mds kernel: [<ffffffff80283f23>] vfs_read+0xcb/0x153 Apr 23 13:23:20 mds kernel: [<ffffffff802842bf>] sys_read+0x45/0x6e Apr 23 13:23:20 mds kernel: [<ffffffff80209d8e>] system_call+0x7e/0x83 Apr 23 13:23:20 mds kernel: Apr 23 13:23:20 mds kernel: Apr 23 13:23:20 mds kernel: Code: 48 8b 50 20 48 8b 48 28 4c 03 60 10 4c 03 68 18 48 39 d3 48 Apr 23 13:23:20 mds kernel: RIP [<ffffffff883d6680>] :obdclass:lprocfs_stats_seq_show+0x80/0x1e0 mds kernel: CR2: ffffffff00040024 Apr 23 13:23:20 mds kernel: RSP <ffff8103ba5f9e48> Apr 23 13:23:20 mds kernel: CR2: ffffffff00040024 Server and affected client both run Lustre 1.6.7.2 on Debian Etch/x86_64 in this case. The behavior does not change after a client reboot. All hints on how to solve this are really appreciated. Kind regards, Christopher -- Christopher Huhn Linux therapist GSI Helmholtzzentrum fuer Schwerionenforschung GmbH Planckstr. 1 64291 Darmstadt http://www.gsi.de/ Gesellschaft mit beschraenkter Haftung Sitz der Gesellschaft / Registered Office: Darmstadt Handelsregister / Commercial Register: Amtsgericht Darmstadt, HRB 1528 Geschaeftsfuehrung / Managing Directors: Professor Dr. Dr. h.c. Horst Stoecker, Christiane Neumann, Dr. Hartmut Eickhoff Vorsitzende des Aufsichtsrates / Supervisory Board Chair: Dr. Beatrix Vierkorn-Rudolph Stellvertreter / Deputy Chair: Dr. Rolf Bernhard
Wojciech Turek
2010-Apr-23 14:07 UTC
[Lustre-discuss] Kernel oops after cat on /proc/fs/lustre/mgs/MGS/exports/*/stats
Hi, This is a known bug that is fixed in 1.8.2 https://bugzilla.lustre.org/show_bug.cgi?id=21420 Best regards Wojciech On 23 April 2010 13:18, Christopher Huhn <C.Huhn at gsi.de> wrote:> Dear lustre wizards, > > we are experiencing problems on our MDS and our Lustre expert is abroad > (he just attended LUG meeting). > > One of the symptoms we observe are reproducible kernel oopses when > viewing some stats files beneath /proc/fs/lustre/mgs/MGS/exports : > > mds:~# cat /proc/fs/lustre/mgs/MGS/exports/10.12... at tcp/stats > Killed > mds:~# mds kernel: Oops: 0000 [38] SMP > Apr 23 13:23:19 mds kernel: Unable to handle kernel paging request > at ffffffff00040024 RIP: > Apr 23 13:23:19 mds kernel: [<ffffffff883d6680>] > :obdclass:lprocfs_stats_seq_show+0x80/0x1e0 > Apr 23 13:23:19 mds kernel: PGD 203067 PUD 0 > Apr 23 13:23:19 mds kernel: Oops: 0000 [38] SMP > Apr 23 13:23:20 mds kernel: CPU 7 > Apr 23 13:23:20 mds kernel: Modules linked in: mds fsfilt_ldiskfs(F) > mgs mgc ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc > obdclass lnet lvfs libcfs xt_tcpudp iptable_filter ip_tables > x_tables drbd cn button ac battery bonding xfs ipmi_si ipmi_devintf > ipmi_msghandler serio_raw psmouse joydev pcspkr i2c_i801 i2c_core > shpchp pci_hotplug evdev parport_pc parport ext3 jbd mbcache > dm_mirror dm_snapshot dm_mod raid10 raid456 xor raid1 raid0 > multipath linear md_mod sd_mod ide_cd cdrom ata_generic libata > generic usbhid hid piix 3w_9xxx floppy ide_core ehci_hcd uhci_hcd > e1000 scsi_mod thermal processor fan > Apr 23 13:23:20 mds kernel: Pid: 7293, comm: cat Tainted: GF > 2.6.22+lustre1.6.7.2+0.credativ.etch.1 #2 > Apr 23 13:23:20 mds kernel: RIP: 0010:[<ffffffff883d6680>] > [<ffffffff883d6680>] :obdclass:lprocfs_stats_seq_show+0x80/0x1e0 > Apr 23 13:23:20 mds kernel: RSP: 0018:ffff8103ba5f9e48 EFLAGS: 00010282 > Apr 23 13:23:20 mds kernel: RAX: ffffffff00040004 RBX: > 7fffffffffffffff RCX: 0000000000000006 > Apr 23 13:23:20 mds kernel: RDX: 0101010101010101 RSI: > 0000000000000000 RDI: 0000000000000000 > Apr 23 13:23:20 mds kernel: RBP: 0000000000000000 R08: > 0000000000000008 R09: 0000000000000000 > Apr 23 13:23:20 mds kernel: R10: 0000000000000000 R11: > 0000000000000000 R12: 0000000000000000 > Apr 23 13:23:20 mds kernel: R13: 0000000000000000 R14: > 0000000000000000 R15: ffff8108000a1760 > Apr 23 13:23:20 mds kernel: FS: 00002b4a366786d0(0000) > GS:ffff81081004b840(0000) knlGS:0000000000000000 > Apr 23 13:23:20 mds kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Apr 23 13:23:20 mds kernel: CR2: ffffffff00040024 CR3: > 000000078f018000 CR4: 00000000000006e0 > Apr 23 13:23:20 mds kernel: Process cat (pid: 7293, threadinfo > ffff8103ba5f8000, task ffff8107dc299530) > Apr 23 13:23:20 mds kernel: Stack: 0000000000000202 > ffffffff00000000 ffffffff00040004 ffff81067dae2640 > Apr 23 13:23:20 mds kernel: 000000004bd18327 00000000000ca54d > 0000000000000000 ffff81067dae2640 > Apr 23 13:23:20 mds kernel: ffffffff00040004 0000000000040004 > 0000000000000400 0000000000000000 > Apr 23 13:23:20 mds kernel: Call Trace: > Apr 23 13:23:20 mds kernel: [<ffffffff8029c0ac>] seq_read+0x105/0x28d > Apr 23 13:23:20 mds kernel: [<ffffffff80283f23>] vfs_read+0xcb/0x153 > Apr 23 13:23:20 mds kernel: [<ffffffff802842bf>] sys_read+0x45/0x6e > Apr 23 13:23:20 mds kernel: [<ffffffff80209d8e>] system_call+0x7e/0x83 > Apr 23 13:23:20 mds kernel: > Apr 23 13:23:20 mds kernel: > Apr 23 13:23:20 mds kernel: Code: 48 8b 50 20 48 8b 48 28 4c 03 60 > 10 4c 03 68 18 48 39 d3 48 > Apr 23 13:23:20 mds kernel: RIP [<ffffffff883d6680>] > :obdclass:lprocfs_stats_seq_show+0x80/0x1e0 > mds kernel: CR2: ffffffff00040024 > Apr 23 13:23:20 mds kernel: RSP <ffff8103ba5f9e48> > Apr 23 13:23:20 mds kernel: CR2: ffffffff00040024 > > > Server and affected client both run Lustre 1.6.7.2 on Debian Etch/x86_64 > in this case. The behavior does not change after a client reboot. > > All hints on how to solve this are really appreciated. > > Kind regards, > Christopher > > -- > Christopher Huhn > Linux therapist > > GSI Helmholtzzentrum fuer Schwerionenforschung GmbH > Planckstr. 1 > 64291 Darmstadt > http://www.gsi.de/ > > Gesellschaft mit beschraenkter Haftung > > Sitz der Gesellschaft / Registered Office: Darmstadt > Handelsregister / Commercial Register: > Amtsgericht Darmstadt, HRB 1528 > > Geschaeftsfuehrung / Managing Directors: > Professor Dr. Dr. h.c. Horst Stoecker, > Christiane Neumann, > Dr. Hartmut Eickhoff > Vorsitzende des Aufsichtsrates / Supervisory Board Chair: > Dr. Beatrix Vierkorn-Rudolph > Stellvertreter / Deputy Chair: Dr. Rolf Bernhard > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- -- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100423/277987e4/attachment.html