Norimichi SUZUKI
2011-May-02 10:44 UTC
[Lustre-discuss] MDT crash : group descriptors corrupted!
Hi, Because of network trouble, our mds was crashed. After that I can''t mount mdt(/dev/mapper/mpath1p1). [root at mds1 ~]# mount -t lustre /dev/mapper/mpath1p1 /mds mount.lustre: mount /dev/mapper/mpath1p1 at /mds failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. syslog is following. May 2 19:16:39 mds1 kernel: LDISKFS-fs (dm-1): ldiskfs_check_descriptors: Checksum for group 0 failed (20132!=16032) May 2 19:16:39 mds1 kernel: LDISKFS-fs (dm-1): group descriptors corrupted! May 2 19:16:39 mds1 multipathd: dm-1: umount map (uevent) May 2 19:16:39 mds1 kernel: LustreError: 12513:0:(obd_mount.c:1292:server_kernel_mount()) premount /dev/mapper/mpath1p1:0x0 ldiskfs failed: -22, ldiskfs2 failed: -19. Is the ldiskfs module available? May 2 19:16:39 mds1 kernel: LustreError: 12513:0:(obd_mount.c:1618:server_fill_super()) Unable to mount device /dev/mapper/mpath1p1: -22 May 2 19:16:39 mds1 kernel: LustreError: 12513:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-22) I''v seen other similar cases in this list and so I tried e2fsck. [root at mds1 log]# e2fsck -fp /dev/mapper/mpath1p1 e2fsck: MMP: fsck being run while trying to open /dev/mapper/mpath1p1 lustre-MDT0000: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 32768 <device> [root at mds1 log]# e2fsck -b 32768 /dev/mapper/mpath1p1 e2fsck 1.41.12.2.ora3 (23-Feb-2011) e2fsck: Bad magic number in super-block while trying to open /dev/mapper/mpath1p1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 4294967294 <device> But when I execute e2fsck with -n option, I got a lot of messages like this. Group descriptor 44662 checksum is invalid. IGNORED. Group descriptor 44663 checksum is invalid. IGNORED. Group descriptor 44664 checksum is invalid. IGNORED. ? ? Our environment? OS : CentOS v5.5 x64 [root at mds1 log]# rpm -qa | grep lustre kernel-debuginfo-common-2.6.18-194.3.1.el5_lustre.1.8.4 lustre-ldiskfs-3.1.3-2.6.18_194.3.1.el5_lustre.1.8.4 lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4 kernel-2.6.18-194.3.1.el5_lustre.1.8.4 lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4 kernel-debuginfo-2.6.18-194.3.1.el5_lustre.1.8.4 And I updated e2fsprog. e2fsprogs-1.41.12.2.ora3-0redhat.x86_64.rpm e2fsprogs-devel-1.41.12.2.ora3-0redhat.x86_64.rpm [root at mds1 ~]# debugfs /dev/mapper/mpath1p1 debugfs 1.41.12.2.ora3 (23-Feb-2011) debugfs: ls 2 (12) . 2 (12) .. 11 (20) lost+found 238583809 (16) CONFIGS 415629313 (12) ROOT 1006206977 (16) PENDING 1148682241 (12) LOGS 763035649 (16) OBJECTS 12 (20) last_rcvd 13 (20) lov_objid 14 (20) health_check 15 (3920) CATALOGS debugfs: stats Filesystem volume name: lustre-MDT0000 Last mounted on: / Filesystem UUID: 6d67c826-440d-41c2-8548-ed510c008db4 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype mmp sparse_super large_file uninit_bg Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: not clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1463844864 Block count: 1463838711 Reserved block count: 73191935 Free blocks: 1280643105 Free inodes: 1463843707 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 674 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 32768 Inode blocks per group: 4096 Filesystem created: Wed Oct 27 20:35:54 2010 Last mount time: Fri Oct 29 18:36:44 2010 Last write time: Mon May 2 19:23:20 2011 Mount count: 6 Maximum mount count: 26 Last checked: Wed Oct 27 20:35:54 2010 Check interval: 15552000 (6 months) Next check after: Mon Apr 25 20:35:54 2011 Lifetime writes: 698 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 512 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: a6d9f773-352f-415c-8f30-3b63f3d4d2f7 Journal backup: inode blocks MMP block number: 5129 MMP update interval: 1 Directories: 8 Group 0: block bitmap at 1025, inode bitmap at 1026, inode table at 1027 27639 free blocks, 32753 free inodes, 2 used directories, 0 unused inodes [Checksum 0x3ea0] Group 1: block bitmap at 33793, inode bitmap at 33794, inode table at 33795 27645 free blocks, 32768 free inodes, 0 used directories, 0 unused inodes [Checksum 0x8e73] Group 2: block bitmap at 65536, inode bitmap at 65537, inode table at 65538 28670 free blocks, 32768 free inodes, 0 used directories, 0 unused inodes [Checksum 0xc7a1] ? ? Can anyone please give me advice? Thanks in advance. Norimichi Suzuki
Brett Worth
2011-May-02 12:26 UTC
[Lustre-discuss] MDT crash : group descriptors corrupted!
On 05/02/2011 08:44 PM, Norimichi SUZUKI wrote:> Because of network trouble, our mds was crashed. > After that I can''t mount mdt(/dev/mapper/mpath1p1). > > [root at mds1 ~]# mount -t lustre /dev/mapper/mpath1p1 /mdsIs it possible that the /dev/mapper/mpath1 lun is not the one you expect it to be? Could the discovery process have presented a different partition at that location? fdisk -l might tell you what is what. Brett -- /) _ _ _/_/ / / / _ _// /_)/</= / / (_(_/()/< ///
Norimichi SUZUKI
2011-May-02 13:33 UTC
[Lustre-discuss] MDT crash : group descriptors corrupted!
Hi Brett, Thank you for your e-mail. [root at mds1 ~]# fdisk -l /dev/mapper/mpath1 WARNING: GPT (GUID Partition Table) detected on ''/dev/mapper/mpath1''! The util fdisk doesn''t support GPT. Use GNU Parted. WARNING: The size of this disk is 6.0 TB (5995883397120 bytes). DOS partition table format can not be used on drives for volumes larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID partition table format (GPT). Disk /dev/mapper/mpath1: 5995.8 GB, 5995883397120 bytes 255 heads, 63 sectors/track, 728957 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/mapper/mpath1p1 1 267350 2147483647+ ee EFI GPT If you have any ideas please give me suggestions. Norimichi Suzuki (2011/05/02 21:26), Brett Worth wrote:> On 05/02/2011 08:44 PM, Norimichi SUZUKI wrote: >> Because of network trouble, our mds was crashed. >> After that I can''t mount mdt(/dev/mapper/mpath1p1). >> >> [root at mds1 ~]# mount -t lustre /dev/mapper/mpath1p1 /mds > Is it possible that the /dev/mapper/mpath1 lun is not the one you expect it to be? Could > the discovery process have presented a different partition at that location? fdisk -l > might tell you what is what. > > Brett
Johann Lombardi
2011-May-02 13:56 UTC
[Lustre-discuss] MDT crash : group descriptors corrupted!
On Mon, May 02, 2011 at 07:44:57PM +0900, Norimichi SUZUKI wrote:> [root at mds1 log]# e2fsck -fp /dev/mapper/mpath1p1 > e2fsck: MMP: fsck being run while trying to open /dev/mapper/mpath1p1 > lustre-MDT0000:Is e2fsck already running? If not, you can run ''tune2fs -f -E clear-mmp /dev/mapper/mpath1p1'' and run e2fsck again. Johann
Norimichi SUZUKI
2011-May-02 14:47 UTC
[Lustre-discuss] MDT crash : group descriptors corrupted!
Hi Johann, Thank you for your advice. I executed those commands as you said. Now e2fsck appears to be running. Thanks in advance. Norimichi Suzuki (2011/05/02 22:56), Johann Lombardi wrote:> On Mon, May 02, 2011 at 07:44:57PM +0900, Norimichi SUZUKI wrote: >> [root at mds1 log]# e2fsck -fp /dev/mapper/mpath1p1 >> e2fsck: MMP: fsck being run while trying to open /dev/mapper/mpath1p1 >> lustre-MDT0000: > Is e2fsck already running? If not, you can run ''tune2fs -f -E clear-mmp /dev/mapper/mpath1p1'' and run e2fsck again. > > Johann > >
Norimichi SUZUKI
2011-May-03 05:26 UTC
[Lustre-discuss] MDT crash : group descriptors corrupted!
Hi Johann, Thank you for your advice. e2fsck is finished. Now I can mount mdt without any errors. Our lustre volume is now available as before. Thanks a lot. Norimichi Suzuki (2011/05/02 23:47), Norimichi SUZUKI wrote:> Hi Johann, > > Thank you for your advice. > I executed those commands as you said. > Now e2fsck appears to be running. > > Thanks in advance. > > Norimichi Suzuki > > (2011/05/02 22:56), Johann Lombardi wrote: >> On Mon, May 02, 2011 at 07:44:57PM +0900, Norimichi SUZUKI wrote: >>> [root at mds1 log]# e2fsck -fp /dev/mapper/mpath1p1 >>> e2fsck: MMP: fsck being run while trying to open /dev/mapper/mpath1p1 >>> lustre-MDT0000: >> Is e2fsck already running? If not, you can run ''tune2fs -f -E clear-mmp /dev/mapper/mpath1p1'' and run e2fsck again. >> >> Johann >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >