I''m evaluating lustre. I''m trying what I think is a basic/simple ethernet config. with MDT and OST on the same node. Can someone tell me if the following (~150 second recovery occurring when small 190 GB OST is re-mounted) is expected behavior or if I''m missing something? I thought I would send this and continue with the eval while awaiting a response. I''m using lustre release 1.6.4.2 with the vanilla 2.6.18.8 kernel with a Scientific Linux 5 (derived from RHEL5) distro with e2fsprogs 1.40.4.cfs1. I''m doing the following: aaa() { set -x dmesg -c >/dev/null mkfs.lustre --fsname datafs --mdt --mgs --reformat /dev/sda1 mkfs.lustre --fsname datafs --ost --mgsnode=pool4 at tcp --reformat / dev/sda2 e2label /dev/sda1 e2label /dev/sda2 mount.lustre /dev/sda1 /mnt/data/mdt mount.lustre /dev/sda2 /mnt/data/ost0 dmesg -c >dmesg.0 mount.lustre pool4 at tcp:/datafs /mnt/datafs dmesg -c >dmesg.1 umount /mnt/datafs umount /mnt/data/ost0 umount /mnt/data/mdt e2label /dev/sda1 e2label /dev/sda2 dmesg -c >dmesg.2 mount.lustre /dev/sda1 /mnt/data/mdt mount.lustre /dev/sda2 /mnt/data/ost0 dmesg -c >dmesg.3 while cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status \ | egrep ''RECOVERING|time remaining'';do sleep 30;done mount.lustre pool4 at tcp:/datafs /mnt/datafs } aaa 2>&1 | tee aaa.0; dmesg -c >dmesg.4 The files dmesg.{0,1,2,3,4} and aaa.0 are available at: http://fnapcf.fnal.gov/~ron/lustre-1.6.4.2-dmesg3-e2fsprog/ Here is aaa.0 which shows the recovery: + dmesg -c + mkfs.lustre --fsname datafs --mdt --mgs --reformat /dev/sda1 WARNING: MDS group upcall is not set, use ''NONE'' Permanent disk data: Target: datafs-MDTffff Index: unassigned Lustre FS: datafs Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: device size = 95367MB formatting backing filesystem ldiskfs on /dev/sda1 target name datafs-MDTffff 4k blocks 0 options -J size=400 -i 4096 -I 512 -q -O dir_index -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L datafs-MDTffff -J size=400 -i 4096 -I 512 -q -O dir_index -F /dev/sda1 Writing CONFIGS/mountdata + mkfs.lustre --fsname datafs --ost --mgsnode=pool4 at tcp --reformat / dev/sda2 Permanent disk data: Target: datafs-OSTffff Index: unassigned Lustre FS: datafs Mount type: ldiskfs Flags: 0x72 (OST needs_index first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=192.168.241.247 at tcp device size = 190734MB formatting backing filesystem ldiskfs on /dev/sda2 target name datafs-OSTffff 4k blocks 0 options -J size=400 -i 16384 -I 256 -q -O dir_index -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L datafs-OSTffff -J size=400 -i 16384 -I 256 -q -O dir_index -F /dev/sda2 Writing CONFIGS/mountdata + e2label /dev/sda1 datafs-MDTffff + e2label /dev/sda2 datafs-OSTffff + mount.lustre /dev/sda1 /mnt/data/mdt + mount.lustre /dev/sda2 /mnt/data/ost0 + dmesg -c + mount.lustre pool4 at tcp:/datafs /mnt/datafs + dmesg -c + umount /mnt/datafs + umount /mnt/data/ost0 + umount /mnt/data/mdt + e2label /dev/sda1 datafs-MDT0000 + e2label /dev/sda2 datafs-OST0000 + dmesg -c + mount.lustre /dev/sda1 /mnt/data/mdt + mount.lustre /dev/sda2 /mnt/data/ost0 + dmesg -c + cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status + egrep ''RECOVERING|time remaining'' status: RECOVERING time remaining: 250 + sleep 30 + cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status + egrep ''RECOVERING|time remaining'' status: RECOVERING time remaining: 245 + sleep 30 + cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status + egrep ''RECOVERING|time remaining'' status: RECOVERING time remaining: 215 + sleep 30 + cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status + egrep ''RECOVERING|time remaining'' status: RECOVERING time remaining: 185 + sleep 30 + cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status + egrep ''RECOVERING|time rg'' status: RECOVERING time remaining: 5 + sleep 30 + cat /proc/fs/lustre/obdfilter/datafs-OST0000/recovery_status + egrep ''RECOVERING|time remaining'' + mount.lustre pool4 at tcp:/datafs /mnt/datafs Thanks, Ron
Brian J. Murrell
2008-Feb-05 20:19 UTC
[Lustre-discuss] obdfilter/datafs-OST0000/recovery_status
On Tue, 2008-02-05 at 11:41 -0800, Ron wrote:> I''m evaluating lustre. I''m trying what I think is a basic/simple > ethernet config. with MDT and OST on the same node. Can someone tell > me if the following (~150 second recovery occurring when small 190 GB > OST is re-mounted) is expected behavior or if I''m missing something?it''s expected given that you mounted the MDT first. There is nothing technically wrong with this (mountconf allows you to mount in any order) except that when the OSTs mount, recovery will be needed. To avoid this, mount the OSTs first. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080205/036d8fc8/attachment-0002.bin
Andreas Dilger
2008-Feb-05 20:59 UTC
[Lustre-discuss] obdfilter/datafs-OST0000/recovery_status
On Feb 05, 2008 15:19 -0500, Brian J. Murrell wrote:> On Tue, 2008-02-05 at 11:41 -0800, Ron wrote: > > I''m evaluating lustre. I''m trying what I think is a basic/simple > > ethernet config. with MDT and OST on the same node. Can someone tell > > me if the following (~150 second recovery occurring when small 190 GB > > OST is re-mounted) is expected behavior or if I''m missing something? > > it''s expected given that you mounted the MDT first. There is nothing > technically wrong with this (mountconf allows you to mount in any order) > except that when the OSTs mount, recovery will be needed. To avoid > this, mount the OSTs first.More importantly, _unmount_ the MDT first, since it is a "client" of the OSTs, and the OSTs are waiting for this client to reconnect when they restart. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Apparently Analagous Threads
- Lost OSTs, remounted, now /proc/fs/lustre/obdfilter/$UUID/ is empty
- Announcing an updated document version (v1.7) of the Lustre 1.6 Operations Manual
- Announcing an updated document version (v1.7) of the Lustre 1.6 Operations Manual
- Announcing an updated document version (v1.7) of the Lustre 1.6 Operations Manual
- 1.8.4 and write-through cache