Roger Spellman
2010-Aug-08 19:19 UTC
[Lustre-discuss] LustreError: 5920:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-19)
Hi, We have a customer that is down right now, and is not able to run jobs. The file systems that was up and running fine for weeks. Then, on 8/1, one of the OSTs was having IB problems. I unmounted the OST, fixed the IB problem (reseated the cable), then tried mounting the OST, but the mount never completed. I tried a soft reboot, but that failed. So, I did another hard reboot. When it came back up, I ran tunefs.lustre, and it gave this strange output: checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: Index: unassigned Lustre FS: lustre Mount type: ldiskfs Flags: 0x70 (needs_index first_time update ) Persistent mount opts: Parameters: tunefs.lustre: exiting with 22 (Invalid argument) I realized that something was corrupted. So, I mounted it as ldiskfs, removed the last_rcvd file,and ran: tunefs.lustre --verbose --erase-param --mgsnode=192.168.2.11 at o2ib --mgsnode=192.168.2.12 at o2ib --writeconf --fsname=tslstr --ost --index=1 /dev/mapper/map0 I then ran tunefs.lustre on this node, and it looked as expected, namely: checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: tslstr-OST0001 Index: 1 Lustre FS: tslstr Mount type: ldiskfs Flags: 0x142 (OST update writeconf ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib Permanent disk data: Target: tslstr-OST0001 Index: 1 Lustre FS: tslstr Mount type: ldiskfs Flags: 0x142 (OST update writeconf ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib exiting before disk write. I remounted the OST, and I am now getting: LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled Lustre: MGC192.168.2.11 at o2ib: Reactivating import LustreError: 137-5: UUID ''tslstr-OST0001_UUID'' is not available for connect (no target) LustreError: 5920:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-19) req at ffff81011207c800 x913142/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1281120645 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 137-5: UUID ''tslstr-OST0001_UUID'' is not available for connect (no target) LustreError: 5921:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-19) req at ffff81011207c400 x913148/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1281120670 ref 1 fl Interpret:/0/0 rc -19/0 The customer is unable to make any progress right now. Any help would be greatly appreciated. Thanks. Roger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100808/4d5206a5/attachment.html
Alexey Lyashkov
2010-Aug-09 14:10 UTC
[Lustre-discuss] LustreError: 5920:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-19)
Hi Roger, command line and output looks correct, but On Aug 8, 2010, at 22:19, Roger Spellman wrote:> tunefs.lustre --verbose --erase-param --mgsnode=192.168.2.11 at o2ib --mgsnode=192.168.2.12 at o2ib > --writeconf --fsname=tslstr --ost --index=1 /dev/mapper/map0 > > I then ran tunefs.lustre on this node, and it looked as expected, namely: > > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > Read previous values: > Target: tslstr-OST0001 > Index: 1 > Lustre FS: tslstr > Mount type: ldiskfs > Flags: 0x142 > (OST update writeconf ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib > > > Permanent disk data: > Target: tslstr-OST0001 > Index: 1 > Lustre FS: tslstr > Mount type: ldiskfs > Flags: 0x142 > (OST update writeconf ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=192.168.2.11 at o2ib mgsnode=192.168.2.12 at o2ib > > >>> > exiting before disk write. > >>That message say - disk no have an updates. but that message has printed only with "--print" or "--noformat" or "-n" command line option. Other messages is result of lost config info update. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100809/7aaa51d3/attachment.html