For reasons unknown to me, I''m unable to umount an OST on one of my two OSSes. This just started. oss02:~ # ps -ef|grep umount root 8335 6373 0 11:45 pts/0 00:00:00 umount -t lustre -a root 10016 8463 0 12:05 pts/1 00:00:00 grep mount oss02:~ # lctl dl 0 UP mgc MGC10.200.20.59 at tcp 52c2288f-b787-aa06-8956-b56a0b1f38cb 5 1 UP ost OSS OSS_uuid 3 2 ST obdfilter i3_lfs3-OST0001 i3_lfs3-OST0001_UUID 1 oss02:~ # tail -5 /var/log/messages May 2 11:54:21 oss02 kernel: LustreError: 6919:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 20 previous similar messages May 2 12:02:55 oss02 kernel: LustreError: 137-5: UUID ''i3_lfs3-OST0001_UUID'' is not available for connect (stopping) May 2 12:02:55 oss02 kernel: LustreError: Skipped 40 previous similar messages May 2 12:02:55 oss02 kernel: LustreError: 6960:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-19) req at ffff810228532400 x45949/t0 o8-><?>@<?>:-1 lens 304/0 ref 0 fl Interpret:/0/0 rc -19/0 May 2 12:02:55 oss02 kernel: LustreError: 6960:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 40 previous similar messages The above was available and in use before the attempt to umount. The OSS knows that something is amiss: oss02:~ # cat /proc/fs/lustre/health_check LBUG NOT HEALTHY The client, of course, is hanging now but I get this error. samba02:~ # lfs df UUID 1K-blocks Used Available Use% Mounted on i3_lfs3-MDT0000_UUID 5127239040 293477480 4833761560 5% /mnt/lustre/i3_lfs3[MDT:0] error: llapi_obd_statfs failed: Bad address (-14) samba02:~ # After I power cycle oss02 (the OSS which had the mount that wouldn''t release), I still get the same: samba02:~ # lfs df UUID 1K-blocks Used Available Use% Mounted on i3_lfs3-MDT0000_UUID 5127239040 293477480 4833761560 5% /mnt/lustre/i3_lfs3[MDT:0] error: llapi_obd_statfs failed: Bad address (-14) and I can''t run ''lfs check osts''; in fact it produces: May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c:504:ptlrpc_import_delay_req()) @@@ Uninitialized import. req at ffff8101e6c7c200 x46820/t0 o400->i3_lfs3-OST0000_UUID@<NULL>:6 lens 64/64 ref 1 fl Rpc:N/0/0 rc 0/0 May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c:506:ptlrpc_import_delay_req()) LBUG May 2 12:53:59 samba02 kernel: Lustre: 18105:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 18105 May 2 12:53:59 samba02 kernel: lfs R running task 0 18105 12088 (NOTLB) May 2 12:53:59 samba02 kernel: 0000000000000000 ffffffff8840741c 0000000000000004 ffff8101d7642ac0 May 2 12:53:59 samba02 kernel: ffff8101efe779a0 ffffffff8016079c ffff8101e6c7c200 ffff8101e6c7c200 May 2 12:53:59 samba02 kernel: ffff8101d3a609b8 ffffffff883fb72b May 2 12:53:59 samba02 kernel: Call Trace: <ffffffff8840741c>{:ptlrpc:lustre_pack_request+1052} May 2 12:53:59 samba02 kernel: <ffffffff8016079c>{filemap_nopage+387} <ffffffff883fb72b>{:ptlrpc:ptlrpc_queue_wait+571} May 2 12:53:59 samba02 kernel: <ffffffff883f92c4>{:ptlrpc:ptlrpc_prep_req_pool+1796} May 2 12:53:59 samba02 kernel: <ffffffff88426d4c>{:ptlrpc:lprocfs_wr_ping+428} <ffffffff802d968d>{__down_read+18} May 2 12:53:59 samba02 kernel: <ffffffff8836f29b>{:obdclass:lprocfs_fops_write+91} May 2 12:53:59 samba02 kernel: <ffffffff80181803>{vfs_write+215} <ffffffff80181dca>{sys_write+69} May 2 12:53:59 samba02 kernel: <ffffffff8010ad3e>{system_call+126} May 2 12:53:59 samba02 kernel: LustreError: dumping log to /tmp/lustre-log.1209754439.18105 thanks JR
On Fri, 2008-05-02 at 14:56 -0400, jrs wrote:> > The OSS knows that something is amiss: > > oss02:~ # cat /proc/fs/lustre/health_check > LBUG > NOT HEALTHYThis means that you have hit an LBUG. Somewhere in the messages log on the OSS there should be an LBUG() message most likely preceded by an "ASSERTION" message. Try searching bugzilla for matching situations. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080502/1a0dd8a5/attachment-0001.bin
> May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c:504:ptlrpc_import_delay_req()) @@@ Uninitialized import. req at ffff8101e6c7c200 > May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c:506:ptlrpc_import_delay_req()) LBUG > May 2 12:53:59 samba02 kernel: Lustre: 18105:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 18105 > May 2 12:53:59 samba02 kernel: lfs R running task 0 18105 12088 (NOTLB) > May 2 12:53:59 samba02 kernel: 0000000000000000 ffffffff8840741c 0000000000000004 ffff8101d7642ac0 > May 2 12:53:59 samba02 kernel: ffff8101efe779a0 ffffffff8016079c ffff8101e6c7c200 ffff8101e6c7c200 > May 2 12:53:59 samba02 kernel: ffff8101d3a609b8 ffffffff883fb72b > May 2 12:53:59 samba02 kernel: Call Trace: <ffffffff8840741c>{:ptlrpc:lustre_pack_request+1052} > May 2 12:53:59 samba02 kernel: <ffffffff8016079c>{filemap_nopage+387} <ffffffff883fb72b>{:ptlrpc:ptlrpc_queue_wait+571} > May 2 12:53:59 samba02 kernel: <ffffffff883f92c4>{:ptlrpc:ptlrpc_prep_req_pool+1796} > May 2 12:53:59 samba02 kernel: <ffffffff88426d4c>{:ptlrpc:lprocfs_wr_ping+428} <ffffffff802d968d>{__down_read+18} > May 2 12:53:59 samba02 kernel: <ffffffff8836f29b>{:obdclass:lprocfs_fops_write+91} > May 2 12:53:59 samba02 kernel: <ffffffff80181803>{vfs_write+215} <ffffffff80181dca>{sys_write+69} > May 2 12:53:59 samba02 kernel: <ffffffff8010ad3e>{system_call+126} > May 2 12:53:59 samba02 kernel: LustreError: dumping log to /tmp/lustre-log.1209754439.18105 >can you please open new bug report and submit /tmp/lustre-log.1209754439.18105 to bug? -- Alex Lyashkov <Alexey.lyashkov at sun.com> Lustre Group, Sun Microsystems
On May 4, 2008, at 2:37 AM, Alex Lyashkov wrote:>> May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c: >> 504:ptlrpc_import_delay_req()) @@@ Uninitialized import. >> req at ffff8101e6c7c200 >> May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c: >> 506:ptlrpc_import_delay_req()) LBUG >> May 2 12:53:59 samba02 kernel: Lustre: 18105:0:(linux-debug.c: >> 168:libcfs_debug_dumpstack()) showing stack for process 18105 >> May 2 12:53:59 samba02 kernel: lfs R running >> task 0 18105 12088 (NOTLB) >> May 2 12:53:59 samba02 kernel: 0000000000000000 ffffffff8840741c >> 0000000000000004 ffff8101d7642ac0 >> May 2 12:53:59 samba02 kernel: ffff8101efe779a0 >> ffffffff8016079c ffff8101e6c7c200 ffff8101e6c7c200 >> May 2 12:53:59 samba02 kernel: ffff8101d3a609b8 >> ffffffff883fb72b >> May 2 12:53:59 samba02 kernel: Call Trace: <ffffffff8840741c> >> {:ptlrpc:lustre_pack_request+1052} >> May 2 12:53:59 samba02 kernel: <ffffffff8016079c> >> {filemap_nopage+387} <ffffffff883fb72b>{:ptlrpc:ptlrpc_queue_wait >> +571} >> May 2 12:53:59 samba02 kernel: <ffffffff883f92c4> >> {:ptlrpc:ptlrpc_prep_req_pool+1796} >> May 2 12:53:59 samba02 kernel: <ffffffff88426d4c> >> {:ptlrpc:lprocfs_wr_ping+428} <ffffffff802d968d>{__down_read+18} >> May 2 12:53:59 samba02 kernel: <ffffffff8836f29b> >> {:obdclass:lprocfs_fops_write+91} >> May 2 12:53:59 samba02 kernel: <ffffffff80181803>{vfs_write >> +215} <ffffffff80181dca>{sys_write+69} >> May 2 12:53:59 samba02 kernel: <ffffffff8010ad3e> >> {system_call+126} >> May 2 12:53:59 samba02 kernel: LustreError: dumping log to /tmp/ >> lustre-log.1209754439.18105 >> > can you please open new bug report and > submit /tmp/lustre-log.1209754439.18105 to bug?This looks like https://bugzilla.lustre.org/show_bug.cgi?id=15505 comment 5 number 2. As Alex says, the lustre logs will be essential in fixing this. -Marc ---- D. Marc Stearman LC Lustre Administration Lead marc at llnl.gov 925.423.9670 Pager: 1.888.203.0641
Here''s the attached log. Thanks John D. Marc Stearman wrote:> On May 4, 2008, at 2:37 AM, Alex Lyashkov wrote: > >>> May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c: >>> 504:ptlrpc_import_delay_req()) @@@ Uninitialized import. >>> req at ffff8101e6c7c200 >>> May 2 12:53:59 samba02 kernel: LustreError: 18105:0:(client.c: >>> 506:ptlrpc_import_delay_req()) LBUG >>> May 2 12:53:59 samba02 kernel: Lustre: 18105:0:(linux-debug.c: >>> 168:libcfs_debug_dumpstack()) showing stack for process 18105 >>> May 2 12:53:59 samba02 kernel: lfs R running >>> task 0 18105 12088 (NOTLB) >>> May 2 12:53:59 samba02 kernel: 0000000000000000 ffffffff8840741c >>> 0000000000000004 ffff8101d7642ac0 >>> May 2 12:53:59 samba02 kernel: ffff8101efe779a0 >>> ffffffff8016079c ffff8101e6c7c200 ffff8101e6c7c200 >>> May 2 12:53:59 samba02 kernel: ffff8101d3a609b8 >>> ffffffff883fb72b >>> May 2 12:53:59 samba02 kernel: Call Trace: <ffffffff8840741c> >>> {:ptlrpc:lustre_pack_request+1052} >>> May 2 12:53:59 samba02 kernel: <ffffffff8016079c> >>> {filemap_nopage+387} <ffffffff883fb72b>{:ptlrpc:ptlrpc_queue_wait >>> +571} >>> May 2 12:53:59 samba02 kernel: <ffffffff883f92c4> >>> {:ptlrpc:ptlrpc_prep_req_pool+1796} >>> May 2 12:53:59 samba02 kernel: <ffffffff88426d4c> >>> {:ptlrpc:lprocfs_wr_ping+428} <ffffffff802d968d>{__down_read+18} >>> May 2 12:53:59 samba02 kernel: <ffffffff8836f29b> >>> {:obdclass:lprocfs_fops_write+91} >>> May 2 12:53:59 samba02 kernel: <ffffffff80181803>{vfs_write >>> +215} <ffffffff80181dca>{sys_write+69} >>> May 2 12:53:59 samba02 kernel: <ffffffff8010ad3e> >>> {system_call+126} >>> May 2 12:53:59 samba02 kernel: LustreError: dumping log to /tmp/ >>> lustre-log.1209754439.18105 >>> >> can you please open new bug report and >> submit /tmp/lustre-log.1209754439.18105 to bug? > > This looks like https://bugzilla.lustre.org/show_bug.cgi?id=15505 > comment 5 number 2. As Alex says, the lustre logs will be essential > in fixing this. > > -Marc > > ---- > D. Marc Stearman > LC Lustre Administration Lead > marc at llnl.gov > 925.423.9670 > Pager: 1.888.203.0641 > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: lustre-log.1209754439.18105.bz2 Type: application/octet-stream Size: 110792 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080505/bfeacf64/attachment-0001.obj