Ms. Megan Larko
2009-Mar-04 17:11 UTC
[Lustre-discuss] How to read/understand a Lustre error
Greetings, I have a Lustre OSS with eleven (0-11) OSTs. Every once in a while the OSS hosting the OSTs fails with a kernel panic. The system runs CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp. There are no error messages in the /var/log/messages file for March 4 prior to the message printed below. The last line in the /var/log/messages file was a routine stamp from March 2. How do I understand the "lock callback timer expired message below? After the dump the system shows "kernel panic" on console and requires a manual reboot. Any tips and insight greatly appreciated. megan ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Mar 4 09:42:57 oss4 kernel: LustreError: 0:0:(ldlm_lockd.c:210:waiting_locks_callback()) ### lock callback timer expired: evicting client 9e1d3bc1-201b-4d0b-cc1a-2d52d619c937 at NET_0x50000c0a840d6_UUID nid 192.168.64.214 at o2ib ns: filter-crew8-OST0004_UUID lock: ffff81039e9bdd80/0x99e7393d0850f39f lrc: 2/0,0 mode: PR/PR res: 1267155/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 20 remote: 0x8a3b31c963e264e8 expref: 830 pid: 4250 Mar 4 09:42:59 oss4 kernel: LustreError: 4989:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-107) req at ffff8102fdae7400 x4386790/t0 o101-><?>@<?>:-1 lens 232/0 ref 0 fl Interpret:/0/0 rc -107/0 Mar 4 09:42:59 oss4 kernel: LustreError: 4989:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 3 previous similar messages Mar 4 09:43:47 oss4 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 4206: it was inactive for 100s Mar 4 09:43:47 oss4 kernel: Lustre: 0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 4206 Mar 4 09:43:47 oss4 kernel: ll_ost_66 D 0000000000000580 0 4206 1 4207 4205 (L-TLB) Mar 4 09:43:47 oss4 kernel: ffff810424c99618 0000000000000046 0000000000000001 0000000000000080 Mar 4 09:43:47 oss4 kernel: 000000000000000a ffff810425134860 ffffffff802dcae0 0003a25c7b1d51cd Mar 4 09:43:47 oss4 kernel: 000000000000052d ffff810425134a48 ffffffff00000000 ffff81042ed6f7e0 Mar 4 09:43:47 oss4 kernel: Call Trace: Mar 4 09:43:47 oss4 kernel: [<ffffffff80061bb1>] __mutex_lock_slowpath+0x55/0x90 Mar 4 09:43:47 oss4 kernel: [<ffffffff80061bf1>] .text.lock.mutex+0x5/0x14 Mar 4 09:43:47 oss4 kernel: [<ffffffff8002d201>] shrink_icache_memory+0x40/0x1e6 Mar 4 09:43:47 oss4 kernel: [<ffffffff8003e778>] shrink_slab+0xdc/0x153 Mar 4 09:43:47 oss4 kernel: [<ffffffff800c2cd7>] try_to_free_pages+0x189/0x275 Mar 4 09:43:47 oss4 kernel: [<ffffffff8000efd1>] __alloc_pages+0x1a8/0x2ab Mar 4 09:43:47 oss4 kernel: [<ffffffff80017026>] cache_grow+0x137/0x395 (...etc to end of kernel panic dump) Mar 4 09:43:48 oss4 kernel: LustreError: dumping log to /tmp/lustre-log.1236177827.4206 A "lctl dl" after rebooting the computer: [root at oss4 log]# lctl lctl > dl 0 UP mgc MGC192.168.64.210 at o2ib 5df96fa8-528f-de53-c2e1-d4db598b057d 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter crew8-OST0000 crew8-OST0000_UUID 11 3 UP obdfilter crew8-OST0001 crew8-OST0001_UUID 11 4 UP obdfilter crew8-OST0002 crew8-OST0002_UUID 11 5 UP obdfilter crew8-OST0003 crew8-OST0003_UUID 11 6 UP obdfilter crew8-OST0004 crew8-OST0004_UUID 11 7 UP obdfilter crew8-OST0005 crew8-OST0005_UUID 11 8 UP obdfilter crew8-OST0006 crew8-OST0006_UUID 11 9 UP obdfilter crew8-OST0007 crew8-OST0007_UUID 11 10 UP obdfilter crew8-OST0008 crew8-OST0008_UUID 11 11 UP obdfilter crew8-OST0009 crew8-OST0009_UUID 11 12 UP obdfilter crew8-OST000a crew8-OST000a_UUID 11 13 UP obdfilter crew8-OST000b crew8-OST000b_UUID 11 The computer comes up normally without errors.
Brian J. Murrell
2009-Mar-04 17:29 UTC
[Lustre-discuss] How to read/understand a Lustre error
On Wed, 2009-03-04 at 12:11 -0500, Ms. Megan Larko wrote:> Greetings, > > I have a Lustre OSS with eleven (0-11) OSTs. Every once in a while > the OSS hosting the OSTs fails with a kernel panic.To be clear, what you are reporting is not a kernel panic. It is a watchdog timeout. Kernel panics halt the machine. Watchdog timeouts do not, although both will print stack traces so they are easily confused.> The system runs > CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp.I would suggest upgrading to 1.6.7. We fix quite a number of bug with each point release we do and given that you 3 behind, that is a lot of bugs.> How do I understand the "lock callback timer expired message below?A client was requested to give back a lock it held and timed out doing so. Usually indicates a bug or a network failure.> After the dump the system shows "kernel panic" on console and requires > a manual reboot.Maybe you are getting a panic, but there is no evidence of that in what you pasted below, just the watchdog timeout and its stack trace.> Any tips and insight greatly appreciated.Really. If at all possible, upgrade. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090304/4b9c6bea/attachment.bin
Andreas Dilger
2009-Mar-05 19:26 UTC
[Lustre-discuss] How to read/understand a Lustre error
On Mar 04, 2009 12:11 -0500, Ms. Megan Larko wrote:> I have a Lustre OSS with eleven (0-11) OSTs. Every once in a while > the OSS hosting the OSTs fails with a kernel panic. The system runs > CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp. > There are no error messages in the /var/log/messages file for March 4 > prior to the message printed below. The last line in the > /var/log/messages file was a routine stamp from March 2. > > How do I understand the "lock callback timer expired message below? > After the dump the system shows "kernel panic" on console and requires > a manual reboot.Note that a real kernel panic will NOT be written into /var/log/messages, EVEN if you have "remote syslog" because at the time of panic or OOM the kernel cannot write to the filesystem or generate a normal TCP request from userspace. If you have a serial console attached to the OSS, or if you have "netconsole" or "netdump" configured (special network consoles that can only send messages from the kernel using a low-level network interface) you can capture the just-before-panic messages on another node.> Mar 4 09:43:47 oss4 kernel: [<ffffffff80061bb1>] > __mutex_lock_slowpath+0x55/0x90 > Mar 4 09:43:47 oss4 kernel: [<ffffffff80061bf1>] .text.lock.mutex+0x5/0x14 > Mar 4 09:43:47 oss4 kernel: [<ffffffff8002d201>] > shrink_icache_memory+0x40/0x1e6 > Mar 4 09:43:47 oss4 kernel: [<ffffffff8003e778>] shrink_slab+0xdc/0x153 > Mar 4 09:43:47 oss4 kernel: [<ffffffff800c2cd7>] try_to_free_pages+0x189/0x275 > Mar 4 09:43:47 oss4 kernel: [<ffffffff8000efd1>] __alloc_pages+0x1a8/0x2ab > Mar 4 09:43:47 oss4 kernel: [<ffffffff80017026>] cache_grow+0x137/0x395This looks like a memory allocation problem/deadlock. With 11 OSTs on a single OSS that means (by default, if you didn''t specify otherwise) 11 * 400 MB = 4400MB = 4.4GB of memory JUST for the journals of those filesystems. If your OSS doesn''t have at least that much RAM (preferrably about 16GB of RAM) it will not be usable under heavy load. I would ask why you have so many OSTs on a single OSS? Is this a system where you need a lot of capacity, but not much performance? Are the OSTs smaller than the maximum possible size (8TB) and could be merged together? Having fewer OSTs is usually best because it reduces per-OST overhead like the journal, and also avoids free space fragmentation. If you really have 11x 8TB OSTs (or near that limit) and you want to keep them all on the OSS with not enough RAM, but don''t expect peak performance out of all OSTs at the same time, or have relatively few clients, then I would suggest to reduce the journal size to 128MB (=1408MB total) or 64MB (=704MB), which can be done on a cleanly unmounted filesystem. root> umount /dev/{ostdev} root> e2fsck /dev/{ostdev} root> tune2fs -O ^has_journal /dev/{ostdev} root> tune2fs -j -J size=128 /dev/{ostdev} # or "-J size=64" root> mount /dev/{ostdev} Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.