Another artifact of this bug is
samba01:~ # lfs df
UUID 1K-blocks Used Available Use% Mounted on
i3_lfs4-MDT0000_UUID 5127312276 293484496 4833827780 5%
/mnt/lustre/i3_lfs4[MDT:0]
i3_lfs4-OST0000_UUID 5768202536 295289400 5472913136 5%
/mnt/lustre/i3_lfs4[OST:0]
i3_lfs4-OST0001_UUID 5768202536 296678080 5471524456 5%
/mnt/lustre/i3_lfs4[OST:1]
i3_lfs4-OST0002_UUID 5768201600 293605428 5474596172 5%
/mnt/lustre/i3_lfs4[OST:2]
i3_lfs4-OST0003_UUID 5768201600 293605432 5474596168 5%
/mnt/lustre/i3_lfs4[OST:3]
i3_lfs4-OST0004_UUID 5768201600 293477420 5474724180 5%
/mnt/lustre/i3_lfs4[OST:4]
error: llapi_obd_statfs failed: Bad address (-14)
I have additional OSTs which appear numerically after the one bad one.
Would rebooting the MDTs help?
The logs on the client say:
May 14 15:27:22 samba01 kernel: Lustre: setting import i3_lfs4-OST0005_UUID
INACTIVE by administrator request
May 14 15:27:22 samba01 kernel: Lustre:
i3_lfs4-OST0005-osc-ffff8101e2d6dc00.osc: set parameter active=0
May 14 15:27:22 samba01 kernel: LustreError:
4143:0:(lov_obd.c:140:lov_connect_obd()) not connecting OSC
i3_lfs4-OST0005_UUID; administratively disabled
which seems normal.
Curiosly, the MDS says:
May 14 15:29:02 mds01 kernel: Lustre: i3_lfs4-MDT0000: haven''t heard
from client 4dc7492d-7669-ecae-a4b5-bca2891c2dc0 (at 10.200.20.63 at tcp) in
7087
seconds. I think it''s dead, and I am evicting it.
This is the client above which seems to be functioning properly ...
Both OSSs also have that message. I''ve rebooted the client, no effect.
Thanks
John
jrs wrote:> After disabling an OST with:
>
> lctl conf_param i3_lfs4-OST0005.osc.active=0
>
> one of my clients now hangs when running:
>
> samba01:~ # lfs check osts
> i3_lfs4-OST0000-osc-ffff8101da370800 active.
> i3_lfs4-OST0001-osc-ffff8101da370800 active.
> i3_lfs4-OST0002-osc-ffff8101da370800 active.
> i3_lfs4-OST0003-osc-ffff8101da370800 active.
> i3_lfs4-OST0004-osc-ffff8101da370800 active.
>
> The above has been running for 10 minutes.
> The load on the machine has been driven up to 1.0
> (it''s a dual core box).
>
> In /var/log/messages I see:
>
> May 14 10:13:09 samba01 kernel: LustreError:
> 4006:0:(client.c:504:ptlrpc_import_delay_req()) @@@ Uninitialized
> import. req at ffff8101e8648400 x
> 76/t0 o400->i3_lfs4-OST0005_UUID@<NULL>:6 lens 64/64 ref 1 fl
Rpc:N/0/0
> rc 0/0
> May 14 10:13:09 samba01 kernel: LustreError:
> 4006:0:(client.c:506:ptlrpc_import_delay_req()) LBUG
> May 14 10:13:09 samba01 kernel: Lustre:
> 4006:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
> process 4006
> May 14 10:13:09 samba01 kernel: lfs R running task 0
> 4006 3872 (NOTLB)
> May 14 10:13:09 samba01 kernel: ffff8101e7e046c0 0000000000000086
> ffff8101da36a780 ffff8101dd8cdb80
> May 14 10:13:09 samba01 kernel: 0000000000000001 00007fffefa8395f
> ffffffff8837a29b 0000004b9300f2ed
> May 14 10:13:09 samba01 kernel: ffff8101dd8cdb80 0000000000000001
> May 14 10:13:09 samba01 kernel: Call Trace:
> <ffffffff8837a29b>{:obdclass:lprocfs_fops_write+91}
> May 14 10:13:09 samba01 kernel:
<ffffffff80181803>{vfs_write+215}
> <ffffffff80181dca>{sys_write+69}
> May 14 10:13:09 samba01 kernel:
<ffffffff8010ad3e>{system_call+126}
> May 14 10:13:09 samba01 kernel: LustreError: dumping log to
> /tmp/lustre-log.1210781589.4006
>
>
> I''ve attached the referred to log file.
>
> This might be the same bug as:
> https://bugzilla.lustre.org/show_bug.cgi?id=12565
>
> Is there any work around?
>
> Thanks,
> Johb
>