Gavin Flower
2011-Feb-15 03:14 UTC
physical size of the device inconsistent with superblock, after RAID problems
Hi, I would appreciate advice recovering from the following situation, after an aborted mdadm resizing operation and subsequent recovery actions: /dev/md1: The filing system size (according to the superblock) is 76799952 blocks The physical size of the device is 76799616 Either the superblock or the partition table is likely to be corrupt! /dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually (i.e. without -a or -p options) fsck.ext4 -f -n /dev/md1 output: e2fsck 1.41.12 (17-May-2010) The filesystem size (according to the superblock) is 76799952 blocks The physical size of the device is 76799616 blocks Either the superblock or the partition table is likely to be corrupt! Abort? no Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -9626 -(9728--9752) +(405344--405369) Fix? no /dev/md1: ********** WARNING: Filesystem still has errors ********** /dev/md1: 1693644/19202048 files (0.3% non-contiguous), 54273929/76799952 blocks Note that original size, according mdadm, was not a multiple of 512KB, so I reshaped it to be the largest multiple or 512KB less than the original size using the -size option of mdadm. So my second attempt to reshape, using the 512 chunk size, started okay. The previous chunk size was 64KB. Note I am using Fedora 14, up-to-date as of Friday February 11th, and that there are 5X500KB drives, with 3 RAID-6 arrays: /dev/md0 swap /dev/md1 mostly user data (the problematic one) /dev/md2 distribution & O/S files plus /boot on a non-RAID ext4 partition Sequence of events: Reshaped /dev/md1 using mdadm, without first reducing size of the ext4 filesystem. The process of reshaping /dev/md1 was about 20% through when I killed it. System appeared okay. I rebooted a few minute later, but shortly after I selected the kernel, it stopped, and I dropped into a shell. With the help of Neil Brown, I made some progress and /dev/md1 reshaping appeared to have completed without error. However, on the next reboot I got the INCONSISTENCY message. Will it be safe to simply accept fsck's offer to fix, or are there other things I should do? Thanks, Gavin -- All Adults share the Responsibility to help Raise Today's Children, for they are Tomorrow's Society!
Gavin Flower
2011-Feb-17 23:53 UTC
physical size of the device inconsistent with superblock, after RAID problems
Hi Neil, My attempted post to ext3-users at redhat.com, had not been published there (even though I had emailed it 4 days ago!), as at a minute ago. I finally bit the bullet and went ahead. I accepted the fixes put forward by fsck associated with bitmap differences, and rebooted. Still problems. Still had the discrepancy in the file size.? So I ran the command: resize2fs -p /dev/md1 76799616 I used the smaller of the 2 block counts, as: (a) I needed to reduce the file system size, because I had already reduced the RAID size (I _SHOULD_ have done this first, before resizing the RAID), and (b) it is reported as the 'physical' size of the device, so it is likely to be the correct value IMHO The system the came up successfully after a reboot, and I was able to log in as normal. There appeared to be no apparent loss of data, not that I did an exhaustive systematic check. However, several users have logged on successfully, and it is playing its part as gateway to the Internet, and squid appears to be providing its normal functionality. Neil, your help and encouragement was/is greatly appreciated! Thanks, Gavin -- All Adults share the Responsibility to help Raise Today's Children, for they are Tomorrow's Society! --- On Tue, 15/2/11, Gavin Flower <gavinflower at yahoo.com> wrote: From: Gavin Flower <gavinflower at yahoo.com> Subject: physical size of the device inconsistent with superblock, after RAID problems To: ext3-users at redhat.com Cc: neilb at suse.de, linux-raid at vger.kernel.org Date: Tuesday, 15 February, 2011, 16:14 Hi, I would appreciate advice recovering from the following situation, after an aborted mdadm resizing operation and subsequent recovery actions: /dev/md1: The filing system size (according to the superblock) is 76799952 blocks The physical size of the device is 76799616 Either the superblock or the partition table is likely to be corrupt! /dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually (i.e. without -a or -p options) fsck.ext4 -f -n /dev/md1 output: e2fsck 1.41.12 (17-May-2010) The filesystem size (according to the superblock) is 76799952 blocks The physical size of the device is 76799616 blocks Either the superblock or the partition table is likely to be corrupt! Abort? no Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences:? -9626 -(9728--9752) +(405344--405369) Fix? no /dev/md1: ********** WARNING: Filesystem still has errors ********** /dev/md1: 1693644/19202048 files (0.3% non-contiguous), 54273929/76799952 blocks Note that original size, according mdadm, was not a multiple of 512KB, so I reshaped it to be the largest multiple or 512KB less than the original size using the -size option of mdadm.? So my second attempt to reshape, using the 512 chunk size, started okay.? The previous chunk size was 64KB. Note I am using Fedora 14, up-to-date as of Friday February 11th, and that there are 5X500KB drives, with 3 RAID-6 arrays: /dev/md0 swap /dev/md1 mostly user data (the problematic one) /dev/md2 distribution & O/S files plus /boot on a non-RAID ext4 partition Sequence of events: Reshaped /dev/md1 using mdadm, without first reducing size of the ext4 filesystem. The process of reshaping /dev/md1 was about 20% through when I killed it. System appeared okay. I rebooted a few minute later, but shortly after I selected the kernel, it stopped, and I dropped into a shell. With the help of Neil Brown, I made some progress and /dev/md1 reshaping appeared to have completed without error. However, on the next reboot I got the INCONSISTENCY message. Will it be safe to simply accept fsck's offer to fix, or are there other things I should do? Thanks, Gavin -- All Adults share the Responsibility to help Raise Today's Children, for they are Tomorrow's Society!
NeilBrown
2011-Feb-18 01:51 UTC
physical size of the device inconsistent with superblock, after RAID problems
On Thu, 17 Feb 2011 15:53:11 -0800 (PST) Gavin Flower <gavinflower at yahoo.com> wrote:> Hi Neil, > > My attempted post to ext3-users at redhat.com, had not been published there (even though I had emailed it 4 days ago!), as at a minute ago. > > I finally bit the bullet and went ahead. > > I accepted the fixes put forward by fsck associated with bitmap differences, and rebooted. > > Still problems. > > Still had the discrepancy in the file size.? So I ran the command: > > resize2fs -p /dev/md1 76799616 > > I used the smaller of the 2 block counts, as: > (a) I needed to reduce the file system size, because I had already reduced the RAID size (I _SHOULD_ have done this first, before resizing the RAID), and > (b) it is reported as the 'physical' size of the device, so it is likely to be the correct value IMHO > > The system the came up successfully after a reboot, and I was able to log in as normal. > > There appeared to be no apparent loss of data, not that I did an exhaustive systematic check. However, several users have logged on successfully, and it is playing its part as gateway to the Internet, and squid appears to be providing its normal functionality. > > Neil, your help and encouragement was/is greatly appreciated! >Excellent! I'm glad you found a way through. As you didn't really trim very much from your device it is certainly possible that no critical data was there. Quite possibly resize2fs would have told you if there was (I certainly hope it would have done). NeilBrown
Gavin Flower
2011-Feb-18 03:50 UTC
physical size of the device inconsistent with superblock, after RAID problems
--- On Fri, 18/2/11, NeilBrown <neilb at suse.de> wrote:> From: NeilBrown <neilb at suse.de> > Subject: Re: physical size of the device inconsistent with superblock, after RAID problems > To: "Gavin Flower" <gavinflower at yahoo.com> > Cc: ext3-users at redhat.com, linux-raid at vger.kernel.org > Date: Friday, 18 February, 2011, 14:51 > On Thu, 17 Feb 2011 15:53:11 -0800 > (PST) Gavin Flower <gavinflower at yahoo.com> > wrote: > > > Hi Neil, > > > > My attempted post to ext3-users at redhat.com, > had not been published there (even though I had emailed it 4 > days ago!), as at a minute ago. > > > > I finally bit the bullet and went ahead. > > > > I accepted the fixes put forward by fsck associated > with bitmap differences, and rebooted. > > > > Still problems. > > > > Still had the discrepancy in the file size. So I ran > the command: > > > > resize2fs -p /dev/md1 76799616 > > > > I used the smaller of the 2 block counts, as: > > (a) I needed to reduce the file system size, because I > had already reduced the RAID size (I _SHOULD_ have done this > first, before resizing the RAID), and > > (b) it is reported as the 'physical' size of the > device, so it is likely to be the correct value IMHO > > > > The system the came up successfully after a reboot, > and I was able to log in as normal. > > > > There appeared to be no apparent loss of data, not > that I did an exhaustive systematic check. However, several > users have logged on successfully, and it is playing its > part as gateway to the Internet, and squid appears to be > providing its normal functionality. > > > > Neil, your help and encouragement was/is greatly > appreciated! > > > > Excellent! I'm glad you found a way through. > As you didn't really trim very much from your device it is > certainly possible > that no critical data was there. Quite possibly > resize2fs would have told > you if there was (I certainly hope it would have done). > > NeilBrown >Hi Neil, Having about 26% spare capacity (see output of the df) md1 (the problematic RAID 6), probably (?) meant that nothing was likely to be lost by trimming a tiny fraction of a percent from the end. However, since the md1 device actually resides on 5 real physical drives, reality is almost certainly more complicated! - possibly, hence the bit map discrepancies (now I'm firmly outside my area of expertise!). # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/md2 1097254408 27547660 1013969456 3% / tmpfs 4097108 772 4096336 1% /dev/shm /dev/sda1 1032088 129800 849860 14% /boot /dev/md1 302377920 212244524 74773476 74% /data # mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Thu Dec 3 13:05:02 2009 Raid Level : raid6 Array Size : 307198464 (292.97 GiB 314.57 GB) Used Dev Size : 102399488 (97.66 GiB 104.86 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Feb 18 15:09:50 2011 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K UUID : 6f1176ae:a0ad6cac:bfe78010:bc810f04 Events : 0.3389728 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 2 8 66 2 active sync /dev/sde2 3 8 50 3 active sync /dev/sdd2 4 8 34 4 active sync /dev/sdc2 # Cheers, Gavin