in 5 out of 6 e2fsck''s I do after an OSS crash, I get one free blocks count wrong and often a bitmap in a group that wants to be corrected. is this normal? or is it an ldiskfs or an e2fsck bug? rhel5 x86_64 e2fsprogs-1.40.11.sun1-0redhat kernel-lustre-smp-2.6.18-92.1.10.el5_lustre.1.6.6 cheers, robin [root at sox2 ~]# e2fsck -f /dev/md5 e2fsck 1.40.11.sun1 (17-June-2008) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -107639 Fix<y>? yes Free blocks count wrong for group #3 (19179, counted=19180). Fix<y>? yes Free blocks count wrong (8199819, counted=8199820). Fix<y>? yes system-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** system-OST0001: 133986/3055616 files (1.2% non-contiguous), 4007188/12207008 blocks [root at sox2 ~]# e2fsck -f /dev/md6 e2fsck 1.40.11.sun1 (17-June-2008) home-OST0001: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #3 (23432, counted=23433). Fix<y>? yes Free blocks count wrong (131098913, counted=131098914). Fix<y>? yes home-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** home-OST0001: 26848/33513472 files (2.4% non-contiguous), 2934270/134033184 blocks [root at sox2 ~]# e2fsck -f /dev/md7 e2fsck 1.40.11.sun1 (17-June-2008) apps-OST0001: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #3 (23432, counted=23433). Fix<y>? yes Free blocks count wrong (34865220, counted=34865221). Fix<y>? yes apps-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** apps-OST0001: 45904/9166848 files (3.9% non-contiguous), 1794027/36659248 blocks [root at sox2 ~]# [root at sox2 ~]# e2fsck -f /dev/md15 e2fsck 1.40.11.sun1 (17-June-2008) system-OST0000: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #3 (20647, counted=20648). Fix<y>? yes Free blocks count wrong (8115827, counted=8115828). Fix<y>? yes system-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** system-OST0000: 134002/3055616 files (1.2% non-contiguous), 4091180/12207008 blocks [root at sox2 ~]# e2fsck -f /dev/md16 e2fsck 1.40.11.sun1 (17-June-2008) home-OST0000: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information home-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** home-OST0000: 26831/33513472 files (2.1% non-contiguous), 2951394/134033184 blocks [root at sox2 ~]# e2fsck -f /dev/md17 e2fsck 1.40.11.sun1 (17-June-2008) apps-OST0000: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #3 (3046, counted=3047). Fix<y>? yes Free blocks count wrong (34976431, counted=34976432). Fix<y>? yes apps-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** apps-OST0000: 45798/9166848 files (3.7% non-contiguous), 1682816/36659248 blocks
On Feb 19, 2009 20:42 -0500, Robin Humble wrote:> in 5 out of 6 e2fsck''s I do after an OSS crash, I get one free blocks > count wrong and often a bitmap in a group that wants to be corrected. > > is this normal? > or is it an ldiskfs or an e2fsck bug?Do you have the "MMP" feature enabled?> rhel5 x86_64 > e2fsprogs-1.40.11.sun1-0redhat > kernel-lustre-smp-2.6.18-92.1.10.el5_lustre.1.6.6 > > cheers, > robin > > [root at sox2 ~]# e2fsck -f /dev/md5 > e2fsck 1.40.11.sun1 (17-June-2008) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Block bitmap differences: -107639 > Fix<y>? yes > > Free blocks count wrong for group #3 (19179, counted=19180). > Fix<y>? yes > > Free blocks count wrong (8199819, counted=8199820). > Fix<y>? yes > > > system-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** > system-OST0001: 133986/3055616 files (1.2% non-contiguous), > 4007188/12207008 blocks > [root at sox2 ~]# e2fsck -f /dev/md6 > e2fsck 1.40.11.sun1 (17-June-2008) > home-OST0001: recovering journal > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Free blocks count wrong for group #3 (23432, counted=23433). > Fix<y>? yes > > Free blocks count wrong (131098913, counted=131098914). > Fix<y>? yes > > > home-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** > home-OST0001: 26848/33513472 files (2.4% non-contiguous), > 2934270/134033184 blocks > [root at sox2 ~]# e2fsck -f /dev/md7 > e2fsck 1.40.11.sun1 (17-June-2008) > apps-OST0001: recovering journal > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Free blocks count wrong for group #3 (23432, counted=23433). > Fix<y>? yes > > Free blocks count wrong (34865220, counted=34865221). > Fix<y>? yes > > > apps-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** > apps-OST0001: 45904/9166848 files (3.9% non-contiguous), > 1794027/36659248 blocks > [root at sox2 ~]# > [root at sox2 ~]# e2fsck -f /dev/md15 > e2fsck 1.40.11.sun1 (17-June-2008) > system-OST0000: recovering journal > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Free blocks count wrong for group #3 (20647, counted=20648). > Fix<y>? yes > > Free blocks count wrong (8115827, counted=8115828). > Fix<y>? yes > > > system-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** > system-OST0000: 134002/3055616 files (1.2% non-contiguous), > 4091180/12207008 blocks > [root at sox2 ~]# e2fsck -f /dev/md16 > e2fsck 1.40.11.sun1 (17-June-2008) > home-OST0000: recovering journal > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > > home-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** > home-OST0000: 26831/33513472 files (2.1% non-contiguous), > 2951394/134033184 blocks > [root at sox2 ~]# e2fsck -f /dev/md17 > e2fsck 1.40.11.sun1 (17-June-2008) > apps-OST0000: recovering journal > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Free blocks count wrong for group #3 (3046, counted=3047). > Fix<y>? yes > > Free blocks count wrong (34976431, counted=34976432). > Fix<y>? yes > > > apps-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** > apps-OST0000: 45798/9166848 files (3.7% non-contiguous), > 1682816/36659248 blocks > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Fri, Feb 20, 2009 at 02:10:50PM -0700, Andreas Dilger wrote:>On Feb 19, 2009 20:42 -0500, Robin Humble wrote: >> in 5 out of 6 e2fsck''s I do after an OSS crash, I get one free blocks >> count wrong and often a bitmap in a group that wants to be corrected. >> >> is this normal? >> or is it an ldiskfs or an e2fsck bug? > >Do you have the "MMP" feature enabled?no, MMP is off. there is a small chance that this is the first time the partitions have been fsck''d since MMP was turned off though - I can''t be sure about that. we have MMP off because when e2fsck or tune2fs crashes (eg. out of memory, or when tune2fs goes recursively looking for journal devices that don''t exist) then it makes the MMP''d partition unusable. cheers, robin>> rhel5 x86_64 >> e2fsprogs-1.40.11.sun1-0redhat >> kernel-lustre-smp-2.6.18-92.1.10.el5_lustre.1.6.6 >> >> cheers, >> robin >> >> [root at sox2 ~]# e2fsck -f /dev/md5 >> e2fsck 1.40.11.sun1 (17-June-2008) >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Block bitmap differences: -107639 >> Fix<y>? yes >> >> Free blocks count wrong for group #3 (19179, counted=19180). >> Fix<y>? yes >> >> Free blocks count wrong (8199819, counted=8199820). >> Fix<y>? yes >> >> >> system-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** >> system-OST0001: 133986/3055616 files (1.2% non-contiguous), >> 4007188/12207008 blocks >> [root at sox2 ~]# e2fsck -f /dev/md6 >> e2fsck 1.40.11.sun1 (17-June-2008) >> home-OST0001: recovering journal >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Free blocks count wrong for group #3 (23432, counted=23433). >> Fix<y>? yes >> >> Free blocks count wrong (131098913, counted=131098914). >> Fix<y>? yes >> >> >> home-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** >> home-OST0001: 26848/33513472 files (2.4% non-contiguous), >> 2934270/134033184 blocks >> [root at sox2 ~]# e2fsck -f /dev/md7 >> e2fsck 1.40.11.sun1 (17-June-2008) >> apps-OST0001: recovering journal >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Free blocks count wrong for group #3 (23432, counted=23433). >> Fix<y>? yes >> >> Free blocks count wrong (34865220, counted=34865221). >> Fix<y>? yes >> >> >> apps-OST0001: ***** FILE SYSTEM WAS MODIFIED ***** >> apps-OST0001: 45904/9166848 files (3.9% non-contiguous), >> 1794027/36659248 blocks >> [root at sox2 ~]# >> [root at sox2 ~]# e2fsck -f /dev/md15 >> e2fsck 1.40.11.sun1 (17-June-2008) >> system-OST0000: recovering journal >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Free blocks count wrong for group #3 (20647, counted=20648). >> Fix<y>? yes >> >> Free blocks count wrong (8115827, counted=8115828). >> Fix<y>? yes >> >> >> system-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** >> system-OST0000: 134002/3055616 files (1.2% non-contiguous), >> 4091180/12207008 blocks >> [root at sox2 ~]# e2fsck -f /dev/md16 >> e2fsck 1.40.11.sun1 (17-June-2008) >> home-OST0000: recovering journal >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> >> home-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** >> home-OST0000: 26831/33513472 files (2.1% non-contiguous), >> 2951394/134033184 blocks >> [root at sox2 ~]# e2fsck -f /dev/md17 >> e2fsck 1.40.11.sun1 (17-June-2008) >> apps-OST0000: recovering journal >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Free blocks count wrong for group #3 (3046, counted=3047). >> Fix<y>? yes >> >> Free blocks count wrong (34976431, counted=34976432). >> Fix<y>? yes >> >> >> apps-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** >> apps-OST0000: 45798/9166848 files (3.7% non-contiguous), >> 1682816/36659248 blocks >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >Cheers, Andreas >-- >Andreas Dilger >Sr. Staff Engineer, Lustre Group >Sun Microsystems of Canada, Inc.
On Feb 21, 2009 01:09 -0500, Robin Humble wrote:> On Fri, Feb 20, 2009 at 02:10:50PM -0700, Andreas Dilger wrote: > >On Feb 19, 2009 20:42 -0500, Robin Humble wrote: > >> in 5 out of 6 e2fsck''s I do after an OSS crash, I get one free blocks > >> count wrong and often a bitmap in a group that wants to be corrected. > >> > >> is this normal? > >> or is it an ldiskfs or an e2fsck bug? > > > >Do you have the "MMP" feature enabled? > > no, MMP is off. > > there is a small chance that this is the first time the partitions have > been fsck''d since MMP was turned off though - I can''t be sure about that.That would probably be the cause - the MMP function uses a single block, and it needs to be freed by e2fsck when the feature is disabled. We should probably fix tune2fs to do this at the time MMP is turned off. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Sat, Feb 21, 2009 at 04:13:49PM -0700, Andreas Dilger wrote:>On Feb 21, 2009 01:09 -0500, Robin Humble wrote: >> On Fri, Feb 20, 2009 at 02:10:50PM -0700, Andreas Dilger wrote: >> >On Feb 19, 2009 20:42 -0500, Robin Humble wrote: >> >> in 5 out of 6 e2fsck''s I do after an OSS crash, I get one free blocks >> >> count wrong and often a bitmap in a group that wants to be corrected. >> >> >> >> is this normal? >> >> or is it an ldiskfs or an e2fsck bug? >> > >> >Do you have the "MMP" feature enabled? >> >> no, MMP is off. >> >> there is a small chance that this is the first time the partitions have >> been fsck''d since MMP was turned off though - I can''t be sure about that. > >That would probably be the cause - the MMP function uses a single block, >and it needs to be freed by e2fsck when the feature is disabled. We >should probably fix tune2fs to do this at the time MMP is turned off.awesome diagnosis! # e2fsck -f /dev/md0 e2fsck 1.40.11.sun1 (17-June-2008) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information short-OST0000: 13/366190592 files (7.7% non-contiguous), 22998875/1464758400 blocks # tune2fs -O ^mmp /dev/md0 tune2fs 1.40.11.sun1 (17-June-2008) # e2fsck -f /dev/md0 e2fsck 1.40.11.sun1 (17-June-2008) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (31222, counted=31223). Fix<y>? yes Free blocks count wrong (1441759525, counted=1441759526). Fix<y>? yes short-OST0000: ***** FILE SYSTEM WAS MODIFIED ***** short-OST0000: 13/366190592 files (7.7% non-contiguous), 22998874/1464758400 blocks cheers, robin