Roger Sersted
2012-Sep-27 22:17 UTC
[Lustre-discuss] After Upgrade 1.6.7.2->1.8.8, OSTs won''t mount
I upgraded from Lustre 1.6.7.2 to 1.8.8 by swapping-out all 6 OSSes and replacing with new hardware. The MDT was moved to another system using the backup/restore procedure in the Lustre 1.8 manual (tar with setfattr and getfattr). Old: CentOS 5.5 x86_64 Lustre 1.6.7.2 (from Sun) New: CentOS 5.8 x86_64 Lustre 1.8.8 (from Whamcloud) The MDS mounts the MDT (combined MGT) just fine. However, the OSSes are having problems. I ran e2fsck on two different OSSes (each with a different OST) and one e2fsck corrected a few errors, the other was clean. But, neither one can mount their respective OST. The errors indicated a problem with the FS journal (internal journal). I tried to drop it with tune2fs -O ^has_journal /dev/sdp, but it would run forever. On a normal FS it should take just a few seconds. An strace showed it was continually seeking through the FS, eg lseek(3, 61471719424, SEEK_SET) = 61471719424 read(3,"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 4096) = 4096 I''ve rebooted the machine and made sure all the lustre drivers were loaded and they were. I also downgraded e2fsprogs from e2fsprogs-1.42.3.wc3 to 1.41.90.wc3, thinking I had hit an obscure code bug, no change. After re-reading some of the messages, do I need to convert these ext3/ldiskfs filesystems to ext4, eg tune2fs -O extents,uninit_bg,dir_index /dev/sdp? Thanks, Roger S. Here are some command results: ===============================================[root at apslstr07 ~]# tunefs.lustre --verbose --writeconf /dev/sdp checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre1-OST0003 Index: 3 Lustre FS: lustre1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=172.16.1.110 at tcp mgsnode=172.16.1.111 at tcp failover.node=172.16.1.108 at tcp Permanent disk data: Target: lustre1-OST0003 Index: 3 Lustre FS: lustre1 Mount type: ldiskfs Flags: 0x102 (OST writeconf ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=172.16.1.110 at tcp mgsnode=172.16.1.111 at tcp failover.node=172.16.1.108 at tcp tunefs.lustre: Unable to mount /dev/sdp: Invalid argument tunefs.lustre FATAL: failed to write local files tunefs.lustre: exiting with 22 (Invalid argument) ============================================================[root at apslstr07 ~]# mount -v -t ldiskfs /dev/sdp /lustre mount: wrong fs type, bad option, bad superblock on /dev/sdp, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so from /var/log/messages: Sep 27 17:00:10 apslstr07 kernel: LDISKFS-fs (sdp): no journal found ============================================================[root at apslstr07 log]# mount -v -t lustre /dev/sdp /lustre arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/sdp arg[5] = /lustre source = /dev/sdp (/dev/sdp), target = /lustre options = rw mounting device /dev/sdp at /lustre, flags=0 options=device=/dev/sdp mount.lustre: mount /dev/sdp at /lustre failed: Invalid argument retries left: 0 mount.lustre: mount /dev/sdp at /lustre failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. from /var/log/messages: Sep 27 17:01:13 apslstr07 kernel: Lustre: Build Version: jenkins-wc1--PRISTINE-2.6.18-308.4.1.el5_lustre Sep 27 17:01:13 apslstr07 kernel: Lustre: Listener bound to ib0:172.17.1.107:987:mlx4_0 Sep 27 17:01:13 apslstr07 kernel: Lustre: Added LNI 172.17.1.107 at o2ib [8/64/0/180] Sep 27 17:01:13 apslstr07 kernel: Lustre: Added LNI 172.16.1.107 at tcp [8/256/0/180] Sep 27 17:01:13 apslstr07 kernel: Lustre: Accept secure, port 988 Sep 27 17:01:14 apslstr07 kernel: Lustre: Lustre Client File System; http://www.lustre.org/ Sep 27 17:01:14 apslstr07 kernel: LDISKFS-fs (sdp): no journal found Sep 27 17:01:14 apslstr07 kernel: LustreError: 14854:0:(obd_mount.c:1307:server_kernel_mount()) premount /dev/sdp:0x0 ldiskfs failed: -22, ldiskfs2 failed: -19. Is the ldiskfs module available? Sep 27 17:01:14 apslstr07 kernel: LustreError: 14854:0:(obd_mount.c:1633:server_fill_super()) Unable to mount device /dev/sdp: -22 Sep 27 17:01:14 apslstr07 kernel: LustreError: 14854:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-22)