The problem was resolved by restarting the MDT service machine. I
don''t
know if it made a difference, but the OST had been deactivated there via
lctl command. It would not activate, perhaps because the volume was not
mounted on the OSS, but by restarting the service that volatile state
was cleared. Subsequent operations on the volume beginning with the
mkfs.lustre then succeeded without a hitch.
bob
On 11/9/2011 4:08 PM, Bob Ball wrote:> I''m hoping someone can help me out here. We are running Lustre
1.8.4
> under SL5.7 (now, it has slowly upgraded over time from SL5.3 or SL5.4
> and it started out at Lustre 1.8.3). A newly installed OSS running
> SL5.7 does not seem to show this issue, when making new OST (not
> reusing the index as in this case). However, we were having
> underlying file system issues on one OST of this older server, so we
> drained that OST of all files using lfs_migrate, saved all the
> information such as LAST_ID, recreated the Virtual disk on the
> underlying Dell MD1000 shelf (PERC-6 controller, RAID-5 on 9 750GB
> disks, 128kB stripe), and then, following a full init of the vdisk,
> tried to make the lustre file system:
>
> [root at umfs06 reformat]# mkfs.lustre --ost --mgsnode=10.10.1.140 at tcp0
> --fsname=umt3 --reformat --index=25 --mkfsoptions="-i 2000000"
> --reformat
> --mountfsoptions="errors=remount-ro,extents,mballoc,stripe=256"
/dev/sdk
>
> Permanent disk data:
> Target: umt3-OST0019
> Index: 25
> Lustre FS: umt3
> Mount type: ldiskfs
> Flags: 0x62
> (OST first_time update )
> Persistent mount opts: errors=remount-ro,extents,mballoc,stripe=256
> Parameters: mgsnode=10.10.1.140 at tcp
>
> device size = 5719040MB
> 2 6 18
> formatting backing filesystem ldiskfs on /dev/sdk
> target name umt3-OST0019
> 4k blocks 1464074240
> options -i 2000000 -J size=400 -I 256 -q -O
> dir_index,extents,uninit_groups -F
> mkfs_cmd = mke2fs -j -b 4096 -L umt3-OST0019 -i 2000000 -J size=400 -I
> 256 -q -O dir_index,extents,uninit_groups -F /dev/sdk 1464074240
> mkfs.lustre: Unable to mount /dev/sdk: Invalid argument
>
> mkfs.lustre FATAL: failed to write local files
> mkfs.lustre: exiting with 22 (Invalid argument)
>
> =================>
> /var/log/messages contains
>
> 2011-11-09T15:46:48-05:00 umfs06.aglt2.org kernel: [23601.867384]
> LDISKFS-fs (sdk): ldiskfs_check_descriptors: Inode bitmap for group
> 984 not in group (block 28049409)!
> 2011-11-09T15:46:48-05:00 umfs06.aglt2.org kernel: [23601.867392]
> LDISKFS-fs (sdk): group descriptors corrupted!
>
> ==================>
> This has happened multiple times now. At various time, on various
> tries, the details of the group and block have changed. But not this
> error. Following a system reboot this morning, I was able to get this
> to complete, restored the LAST_ID, etc, but at mount time it failed,
> and corrupted the underlying volume so that e2fsck had to be run.
> Wash, rinse, repeat. So, as a last try, I did it all over from
> scratch, with the result above.
>
> I''m at a loss to know what to do. Before the volume was wiped and
> recreated I was able to mount it as "-t ldiskfs" without a
problem,
> then remount it afterwards as "-t lustre". rpm set is listed
below.
> The other 11 volumes on this OSS are served just fine.
>
> Does anyone have any advice about what to try here?
>
> Thanks,
> bob
>
> [root at umfs06 reformat]# rpm -qa|grep lustre
> (none):lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3.x86_64
> (none):kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4.x86_64
> (none):kernel-devel-2.6.18-164.11.1.el5_lustre.1.8.3.x86_64
> (none):lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64
>
0:kernel-module-openafs-2.6.18-194.3.1.el5_lustre.1.8.4-1.4.14-80.sl5.x86_64
>
> (none):kernel-2.6.18-164.11.1.el5_lustre.1.8.3.x86_64
> (none):lustre-tests-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64
> (none):kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4.x86_64
> (none):lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3.x86_64
> (none):lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64
> (none):kernel-2.6.18-194.3.1.el5_lustre.1.8.4.x86_64
> (none):lustre-ldiskfs-3.1.3-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64
>
> [root at umfs06 reformat]# rpm -qa|grep e2fs
> (none):e2fsprogs-devel-1.39-33.el5.x86_64
> (none):e2fsprogs-1.41.10.sun2-0redhat.x86_64
> (none):e2fsprogs-libs-1.39-33.el5.i386
>
> [root at umfs06 reformat]# uname -r
> 2.6.18-194.3.1.el5_lustre.1.8.4
>
> [root at umfs06 reformat]# cat /etc/redhat-release
> Scientific Linux SL release 5.7 (Boron)
>
>
>