thr3ads.net - Lustre discuss - [Lustre-discuss] How to read/understand a Lustre error [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Ms. Megan Larko

2009-Mar-04 17:11 UTC

[Lustre-discuss] How to read/understand a Lustre error

Greetings,

I have a Lustre OSS with eleven (0-11) OSTs.  Every once in a while
the OSS hosting the OSTs fails with a kernel panic.  The system runs
CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
There are no error messages in the /var/log/messages file for March 4
prior to the message printed below.   The last line in the
/var/log/messages file was a routine stamp from March 2.

How do I understand the "lock callback timer expired message below?
After the dump the system shows "kernel panic" on console and requires
a manual reboot.

Any tips and insight greatly appreciated.

megan

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Mar  4 09:42:57 oss4 kernel: LustreError:
0:0:(ldlm_lockd.c:210:waiting_locks_callback()) ### lock callback
timer expired: evicting client
9e1d3bc1-201b-4d0b-cc1a-2d52d619c937 at NET_0x50000c0a840d6_UUID nid
192.168.64.214 at o2ib  ns: filter-crew8-OST0004_UUID lock:
ffff81039e9bdd80/0x99e7393d0850f39f lrc: 2/0,0 mode: PR/PR res:
1267155/0 rrc: 3 type: EXT [0->18446744073709551615] (req
0->18446744073709551615) flags: 20 remote: 0x8a3b31c963e264e8 expref:
830 pid: 4250
Mar  4 09:42:59 oss4 kernel: LustreError:
4989:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error
(-107)  req at ffff8102fdae7400 x4386790/t0 o101-><?>@<?>:-1 lens
232/0
ref 0 fl Interpret:/0/0 rc -107/0
Mar  4 09:42:59 oss4 kernel: LustreError:
4989:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 3 previous
similar messages
Mar  4 09:43:47 oss4 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 4206: it was inactive for 100s
Mar  4 09:43:47 oss4 kernel: Lustre:
0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
process 4206
Mar  4 09:43:47 oss4 kernel: ll_ost_66     D 0000000000000580     0
4206      1          4207  4205 (L-TLB)
Mar  4 09:43:47 oss4 kernel:  ffff810424c99618 0000000000000046
0000000000000001 0000000000000080
Mar  4 09:43:47 oss4 kernel:  000000000000000a ffff810425134860
ffffffff802dcae0 0003a25c7b1d51cd
Mar  4 09:43:47 oss4 kernel:  000000000000052d ffff810425134a48
ffffffff00000000 ffff81042ed6f7e0
Mar  4 09:43:47 oss4 kernel: Call Trace:
Mar  4 09:43:47 oss4 kernel:  [<ffffffff80061bb1>]
__mutex_lock_slowpath+0x55/0x90
Mar  4 09:43:47 oss4 kernel:  [<ffffffff80061bf1>]
.text.lock.mutex+0x5/0x14
Mar  4 09:43:47 oss4 kernel:  [<ffffffff8002d201>]
shrink_icache_memory+0x40/0x1e6
Mar  4 09:43:47 oss4 kernel:  [<ffffffff8003e778>] shrink_slab+0xdc/0x153
Mar  4 09:43:47 oss4 kernel:  [<ffffffff800c2cd7>]
try_to_free_pages+0x189/0x275
Mar  4 09:43:47 oss4 kernel:  [<ffffffff8000efd1>]
__alloc_pages+0x1a8/0x2ab
Mar  4 09:43:47 oss4 kernel:  [<ffffffff80017026>] cache_grow+0x137/0x395
(...etc to end of kernel panic dump)

Mar  4 09:43:48 oss4 kernel: LustreError: dumping log to
/tmp/lustre-log.1236177827.4206

A "lctl dl" after rebooting the computer:
[root at oss4 log]# lctl
lctl > dl
  0 UP mgc MGC192.168.64.210 at o2ib 5df96fa8-528f-de53-c2e1-d4db598b057d 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter crew8-OST0000 crew8-OST0000_UUID 11
  3 UP obdfilter crew8-OST0001 crew8-OST0001_UUID 11
  4 UP obdfilter crew8-OST0002 crew8-OST0002_UUID 11
  5 UP obdfilter crew8-OST0003 crew8-OST0003_UUID 11
  6 UP obdfilter crew8-OST0004 crew8-OST0004_UUID 11
  7 UP obdfilter crew8-OST0005 crew8-OST0005_UUID 11
  8 UP obdfilter crew8-OST0006 crew8-OST0006_UUID 11
  9 UP obdfilter crew8-OST0007 crew8-OST0007_UUID 11
 10 UP obdfilter crew8-OST0008 crew8-OST0008_UUID 11
 11 UP obdfilter crew8-OST0009 crew8-OST0009_UUID 11
 12 UP obdfilter crew8-OST000a crew8-OST000a_UUID 11
 13 UP obdfilter crew8-OST000b crew8-OST000b_UUID 11

The computer comes up normally without errors.

Brian J. Murrell

2009-Mar-04 17:29 UTC

head link

[Lustre-discuss] How to read/understand a Lustre error

On Wed, 2009-03-04 at 12:11 -0500, Ms. Megan Larko
wrote:> Greetings,
> 
> I have a Lustre OSS with eleven (0-11) OSTs.  Every once in a while
> the OSS hosting the OSTs fails with a kernel panic.
To be clear, what you are reporting is not a kernel panic.  It is a
watchdog timeout.  Kernel panics halt the machine.  Watchdog timeouts do
not, although both will print stack traces so they are easily confused.
>   The system runs
> CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
I would suggest upgrading to 1.6.7.  We fix quite a number of bug with
each point release we do and given that you 3 behind, that is a lot of
bugs.
> How do I understand the "lock callback timer expired message below?
A client was requested to give back a lock it held and timed out doing
so.  Usually indicates a bug or a network failure.
> After the dump the system shows "kernel panic" on console and
requires
> a manual reboot.
Maybe you are getting a panic, but there is no evidence of that in what
you pasted below, just the watchdog timeout and its stack trace.
> Any tips and insight greatly appreciated.
Really.  If at all possible, upgrade.

b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090304/4b9c6bea/attachment.bin

Andreas Dilger

2009-Mar-05 19:26 UTC

head link

[Lustre-discuss] How to read/understand a Lustre error

On Mar 04, 2009  12:11 -0500, Ms. Megan Larko wrote:> I have a Lustre OSS with eleven (0-11) OSTs.  Every once in a while
> the OSS hosting the OSTs fails with a kernel panic.  The system runs
> CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
> There are no error messages in the /var/log/messages file for March 4
> prior to the message printed below.   The last line in the
> /var/log/messages file was a routine stamp from March 2.
> 
> How do I understand the "lock callback timer expired message below?
> After the dump the system shows "kernel panic" on console and
requires
> a manual reboot.
Note that a real kernel panic will NOT be written into /var/log/messages,
EVEN if you have "remote syslog" because at the time of panic or OOM
the kernel cannot write to the filesystem or generate a normal TCP request
from userspace.

If you have a serial console attached to the OSS, or if you have
"netconsole"
or "netdump" configured (special network consoles that can only send
messages from the kernel using a low-level network interface) you can
capture the just-before-panic messages on another node.
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff80061bb1>]
> __mutex_lock_slowpath+0x55/0x90
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff80061bf1>]
.text.lock.mutex+0x5/0x14
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff8002d201>]
> shrink_icache_memory+0x40/0x1e6
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff8003e778>]
shrink_slab+0xdc/0x153
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff800c2cd7>]
try_to_free_pages+0x189/0x275
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff8000efd1>]
__alloc_pages+0x1a8/0x2ab
> Mar  4 09:43:47 oss4 kernel:  [<ffffffff80017026>]
cache_grow+0x137/0x395
This looks like a memory allocation problem/deadlock.  With 11 OSTs on
a single OSS that means (by default, if you didn''t specify otherwise)
11 * 400 MB = 4400MB = 4.4GB of memory JUST for the journals of those 
filesystems.  If your OSS doesn''t have at least that much RAM
(preferrably
about 16GB of RAM) it will not be usable under heavy load.

I would ask why you have so many OSTs on a single OSS?  Is this a system
where you need a lot of capacity, but not much performance?  Are the
OSTs smaller than the maximum possible size (8TB) and could be merged
together?  Having fewer OSTs is usually best because it reduces per-OST
overhead like the journal, and also avoids free space fragmentation.

If you really have 11x 8TB OSTs (or near that limit) and you want to
keep them all on the OSS with not enough RAM, but don''t expect peak
performance out of all OSTs at the same time, or have relatively few
clients, then I would suggest to reduce the journal size to 128MB
(=1408MB total) or 64MB (=704MB), which can be done on a cleanly
unmounted filesystem.

root> umount /dev/{ostdev}
root> e2fsck /dev/{ostdev}
root> tune2fs -O ^has_journal /dev/{ostdev}
root> tune2fs -j -J size=128 /dev/{ostdev}    # or "-J size=64"
root> mount /dev/{ostdev}

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Mar 2009 - How to read/understand a Lustre error

[Lustre-discuss] How to read/understand a Lustre error

[Lustre-discuss] How to read/understand a Lustre error

[Lustre-discuss] How to read/understand a Lustre error