Roger Spellman
2010-Aug-08  19:19 UTC
[Lustre-discuss] LustreError: 5920:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-19)
Hi,
We have a customer that is down right now, and is not able to run jobs.
The file systems that was up and running fine for weeks.  Then, on 8/1, one of
the OSTs was having IB problems.  I unmounted the OST, fixed the IB problem
(reseated the cable), then tried mounting the OST, but the mount never
completed.
I tried a soft reboot, but that failed.  So, I did another hard reboot.
When it came back up, I ran tunefs.lustre, and it gave this strange output:
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
   Read previous values:
Target:
Index:      unassigned
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x70
              (needs_index first_time update )
Persistent mount opts:
Parameters:
tunefs.lustre: exiting with 22 (Invalid argument)
I realized that something was corrupted.  So, I mounted it as ldiskfs, removed
the last_rcvd file,and ran:
tunefs.lustre --verbose --erase-param --mgsnode=192.168.2.11 at o2ib
--mgsnode=192.168.2.12 at o2ib
--writeconf --fsname=tslstr --ost --index=1 /dev/mapper/map0
I then ran tunefs.lustre on this node, and it looked as expected, namely:
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
   Read previous values:
Target:     tslstr-OST0001
Index:      1
Lustre FS:  tslstr
Mount type: ldiskfs
Flags:      0x142
              (OST update writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib
   Permanent disk data:
Target:     tslstr-OST0001
Index:      1
Lustre FS:  tslstr
Mount type: ldiskfs
Flags:      0x142
              (OST update writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib
exiting before disk write.
I remounted the OST, and I am now getting:
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
Lustre: MGC192.168.2.11 at o2ib: Reactivating import
LustreError: 137-5: UUID ''tslstr-OST0001_UUID'' is not
available  for connect (no target)
LustreError: 5920:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing
error (-19)
req at ffff81011207c800 x913142/t0 o8-><?>@<?>:0/0 lens 304/0 e 0
to 0 dl 1281120645 ref 1 fl
Interpret:/0/0 rc -19/0
LustreError: 137-5: UUID ''tslstr-OST0001_UUID'' is not
available  for connect (no target)
LustreError: 5921:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing
error (-19)
req at ffff81011207c400 x913148/t0 o8-><?>@<?>:0/0 lens 304/0 e 0
to 0 dl 1281120670 ref 1 fl
Interpret:/0/0 rc -19/0
The customer is unable to make any progress right now.  Any help would be
greatly appreciated.
Thanks.
Roger
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100808/4d5206a5/attachment.html
Alexey Lyashkov
2010-Aug-09  14:10 UTC
[Lustre-discuss] LustreError: 5920:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-19)
Hi Roger, command line and output looks correct, but On Aug 8, 2010, at 22:19, Roger Spellman wrote:> tunefs.lustre --verbose --erase-param --mgsnode=192.168.2.11 at o2ib --mgsnode=192.168.2.12 at o2ib > --writeconf --fsname=tslstr --ost --index=1 /dev/mapper/map0 > > I then ran tunefs.lustre on this node, and it looked as expected, namely: > > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > Read previous values: > Target: tslstr-OST0001 > Index: 1 > Lustre FS: tslstr > Mount type: ldiskfs > Flags: 0x142 > (OST update writeconf ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib > > > Permanent disk data: > Target: tslstr-OST0001 > Index: 1 > Lustre FS: tslstr > Mount type: ldiskfs > Flags: 0x142 > (OST update writeconf ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib > > >>> > exiting before disk write. > >>That message say - disk no have an updates. but that message has printed only with "--print" or "--noformat" or "-n" command line option. Other messages is result of lost config info update. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100809/7aaa51d3/attachment.html