Erik Froese
2010-Aug-14 00:11 UTC
[Lustre-discuss] Lost OSTs, remounted, now /proc/fs/lustre/obdfilter/$UUID/ is empty
Hello, We had a problem with our disk controller that required a reboot. 2 of our OSTs remounted and went through the recovery window but clients hang trying to access them. Also /proc/fs/lustre/obdfilter/$UUID/ is empty for that OST UUID. LDISKFS FS on dm-5, internal journal on dm-5:8 LDISKFS-fs: delayed allocation enabled LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LDISKFS-fs: mounted filesystem dm-5 with ordered data mode Lustre: 16377:0:(filter.c:990:filter_init_server_data()) RECOVERY: service scratch-OST0007, 281 recoverable clients, 0 delayed clients, last_rcvd 55834575088 Lustre: scratch-OST0007: Now serving scratch-OST0007 on /dev/mapper/ost_scratch_7 with recovery enabled Lustre: scratch-OST0007: Will be in recovery for at least 5:00, or until 281 clients reconnect Lustre: 6799:0:(ldlm_lib.c:1788:target_queue_last_replay_reply()) scratch-OST0007: 280 recoverable clients remain Lustre: 6799:0:(ldlm_lib.c:1788:target_queue_last_replay_reply()) Skipped 279 previous similar messages Lustre: scratch-OST0007.ost: set parameter quota_type=ug Lustre: 7305:0:(ldlm_lib.c:1788:target_queue_last_replay_reply()) scratch-OST0007: 276 recoverable clients remain Lustre: 7305:0:(ldlm_lib.c:1788:target_queue_last_replay_reply()) Skipped 3 previous similar messages Lustre: 7304:0:(ldlm_lib.c:1788:target_queue_last_replay_reply()) scratch-OST0007: 203 recoverable clients remain Lustre: 7304:0:(ldlm_lib.c:1788:target_queue_last_replay_reply()) Skipped 72 previous similar messages Lustre: scratch-OST0007: Recovery period over after 0:57, of 281 clients 281 recovered and 0 were evicted. [root at oss2 ~]# mount | grep lustre /dev/mapper/ost_scratch_8 on /lustre/scratch/ost_8 type lustre (rw) /dev/mapper/ost_scratch_9 on /lustre/scratch/ost_9 type lustre (rw) /dev/mapper/ost_scratch_7 on /lustre/scratch/ost_7 type lustre (rw) [root at oss2 ~]# ls -l /proc/fs/lustre/obdfilter/scratch-OST0007/ total 0 e2fsck reported an incorrect free inode count and corrected it. It didn''t help the /proc situation. Any ideas? This is Lustre 1.8.3 on RHEL. Erik