Martin Mokrejs
2007-May-18 09:06 UTC
2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
Hi, I just tried the 2.6.22-r1 candidate to test whether some bug I have hit in the past still exists. I did use 2.6.20.6 so far. So, I have cleanly rebooted to use the new kernel, after the machine came up I tried to mess with the bug, and had to reboot again to play with kernel commandline parameters. Unfortunately, on the next reboot fsck was schedules on my filesystem after 38 clean mounts. :( And the problem started. The fsck found some unused inodes, but probably did not know where do they belong to, but it deleted them automagically. Finally, the fsck died because it cannot fine some '..' entry. Here is retyped what happened as recorded by my camera. ;) /dev/hda3 has been mounted 38 times without being checked, check forced HTREE directory inode 1163319 has an invalid root node. HTREE INDEX CLEARED Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570614) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570603) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5586948) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5586957) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode 5570561. CLEARED. Unconnected directory inode 5570567 (...) /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) Turning off the power and booting back with 2.6.20.6 and obviously running same fsck gives me: /dev/hda3 contains a file system with errors, check forced. Missing '..' in direcotry inode 5570587. /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) What do you recommend me now? I cannot say what is the fsck version, but I can tell you this is a Gentoo linux box in the ~x86 tree, so whatever is in the "unstable" branch. :( I do use ext2/ext3 windows driver from http://www.fs-driver.org/ to access the filesystem. Even now, when the filesystem should be marked as dirty I can access it from windows and see the files. Does the extfs.sys ignore the mark? ;) Anyway, since that time there is a directory 'Recycled' at the top level of the filesystem. ;-) I do remember recently that possibly one of the system packages in Gentoo installed some kind of a hash into the filesystem, or hashing support, something like that. Sorry, I do not remember the details. Am just think what could have made the fsck think there is something wrong. I think IO would like to restore the filesystem to the previous stage before running the fsck. How can I do it? No, I do not have a backup of the filesystem. :( I subscribed to the email lists but please send me Cc: anyway. Many thanks. Martin
Martin Mokrejs
2007-May-18 13:51 UTC
2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote:> On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote: > > Hi, > > I just tried the 2.6.22-r1 candidate to test whether some bug I have > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have > > cleanly rebooted to use the new kernel, after the machine came up I > > tried to mess with the bug, and had to reboot again to play with kernel > > commandline parameters. Unfortunately, on the next reboot fsck was > > schedules on my filesystem after 38 clean mounts. :( And the problem > > started. The fsck found some unused inodes, but probably did not know > > where do they belong to, but it deleted them automagically. Finally, the > > fsck died because it cannot fine some '..' entry. > > > > /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode > > 5570561. CLEARED. > > Unconnected directory inode 5570567 (...) > > > > /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. > > (i.e., without -a or -p options) > > > > This means that e2fsck has reached a point where it needs user > intervention. So you should not run e2fsck with -p, -a or -y options. > Look up the e2fsck man page for more on this.Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but how can I revert the changes? Can you say which directories were affected? Thanks, Martin
Martin Mokrejs
2007-May-18 14:32 UTC
2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 07:38:18PM +0530, Kalpak Shah wrote:> On Fri, 2007-05-18 at 15:51 +0200, Martin Mokrejs wrote: > > On Fri, May 18, 2007 at 05:17:06PM +0530, Kalpak Shah wrote: > > > On Fri, 2007-05-18 at 11:06 +0200, Martin Mokrejs wrote: > > > > Hi, > > > > I just tried the 2.6.22-r1 candidate to test whether some bug I have > > > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have > > > > cleanly rebooted to use the new kernel, after the machine came up I > > > > tried to mess with the bug, and had to reboot again to play with kernel > > > > commandline parameters. Unfortunately, on the next reboot fsck was > > > > schedules on my filesystem after 38 clean mounts. :( And the problem > > > > started. The fsck found some unused inodes, but probably did not know > > > > where do they belong to, but it deleted them automagically. Finally, the > > > > fsck died because it cannot fine some '..' entry. > > > > > > > > /dev/hda3: Entry '..' in .../??? (5701636) has deleted/unused inode > > > > 5570561. CLEARED. > > > > Unconnected directory inode 5570567 (...) > > > > > > > > /dev/hda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. > > > > (i.e., without -a or -p options) > > > > > > > > > > This means that e2fsck has reached a point where it needs user > > > intervention. So you should not run e2fsck with -p, -a or -y options. > > > Look up the e2fsck man page for more on this. > > > > Yeah, stupid init.d script in Gentoo. I will report at Gentoo as well but > > how can I revert the changes? Can you say which directories were affected? > > No there is nothing wrong with your script, most problems get solved by > -a or -p and hence your init.d script is correct in using these options. > > I don't understand what you mean by reverting your changes.I would like to boot with another/previous/tested kernel and run another, stable fsck version. Yes, I cannot say how it happened that ext3 had broken directory, but for sure before making changes to the filesystem I would boot with a tested kernel and tools.> > An unconnected directory inode means that this directory (inode 5570567) > does not have a valid ".." entry (which is the backpointer to its > parent). So this directory will be moved to lost+found.And those original "errors"? Did not those modifications cause this in turn? /dev/hda3 has been mounted 38 times without being checked, check forced HTREE directory inode 1163319 has an invalid root node. HTREE INDEX CLEARED Entry '..' in .../??? (5570587) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570620) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570625) has deleted/unused inode 5570561. CLEARED. /dev/hda3: Entry '..' in .../??? (5570567) has deleted/unused inode 5570561. CLEARED. [cut] Martin
Martin Mokrejs
2007-May-18 14:35 UTC
2.6.22-rc1 killed my ext3 filesystem cleanly unmounted
On Fri, May 18, 2007 at 04:20:39PM +0200, Jesper Juhl wrote:> On 18/05/07, Martin Mokrejs <mmokrejs at ribosome.natur.cuni.cz> wrote: > > Hi, > > I just tried the 2.6.22-r1 candidate to test whether some bug I have > > hit in the past still exists. I did use 2.6.20.6 so far. So, I have > > cleanly rebooted to use the new kernel, after the machine came up I > > tried to mess with the bug, and had to reboot again to play with kernel > > commandline parameters. Unfortunately, on the next reboot fsck was > > schedules on my filesystem after 38 clean mounts. :( And the problem > > started. The fsck found some unused inodes, but probably did not know > > where do they belong to, but it deleted them automagically. Finally, the > > fsck died because it cannot fine some '..' entry. > > > > How do you know that the corruption was caused by 2.6.21-rc1 ? > Isn't it possible that the corruption was created by an earlier > kernel, but only detected when a forced fsck was run - which just > happened to be while you were running 2.6.21-rc1 ... > > My point is that, as far as I can see, there's nothing tying > 2.6.21-rc1 specifically to this corruption... or?You might be right, but I thought maybe more probably is the cause in kernel as that is what I have changed recently. ;) Or maybe someone can at leats say "No, no changes to be considered between 2.6.20.6 and 2.6.22-rc1.". ;) Martin
Theodore Tso
2007-May-22 18:01 UTC
fs periodic check (was Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted)
On Sun, May 20, 2007 at 07:55:26PM +0000, Pavel Machek wrote:> > #1, This is why periodic checks are a good thing; it catches problems > > that could stay hidden and result in data loss sooner rather later. > > Actually, I see something funny with periodic checks here. It claims > 'filesystem check on next boot' for >10 boots now. > > It is sharp zaurus machine, and the filesystem tends to _never_ be > unmounted correctly (broken scripts), so I get journal replay each > time.The Sharp Zaurus is a PDA which is almost always running on battery, right? You need to add to /etc/e2fsck.conf: [options] defer_check_on_battery = false See the e2fsck.conf man page for more details, but basically, e2fsck was optimized for x86 laptops that have such lousy batttery life that people generally try to run AC adapters to avoid killing the laptop battery --- and for which running a spinning hard drive platters for an extended time to fsck a 100GB drive might not be such a hot idea. So we try to defer the periodic fsck until the laptop is back on AC power. But for a PDA running a flash drive which is almost always running on battery you'll want to change the default using e2fsck.conf. - Ted
Jan Kara
2007-May-28 12:38 UTC
fs periodic check (was Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted)
Hello,> But here's what I've got: > > oot at spitz:/home/pavel# fsck.ext2 -f /dev/hda3 > e2fsck 1.38 (30-Jun-2005) > Pass 1: Checking inodes, blocks, and sizes > Inode 371989 has illegal block(s). Clear<y>? yes > > Illegal block #2 (134217728) in inode 371989. CLEARED. > Pass 2: Checking directory structure > > i_file_acl for inode 371988 (/home/root/misc/zaurus/smail) is 131072, > should be zero. > Clear<y>? yes > > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Block bitmap differences: -339972 +471044 > Fix<y>? yes > > Free blocks count wrong for group #10 (13882, counted=13883). > Fix<y>? yes > > ...kernel 2.6.16-preempt (on zaurus). Filesystem should have been clean -- I was > using it till crash for half a year, but that's what journal is for, > right? ...But I guess this is almost impossible to debug?Actually, your case doesn't seem to be hard. The first block number is 0x8000000 and the second one 0x20000. So something is flipping your bits... Honza -- Jan Kara <jack at suse.cz> SuSE CR Labs
Theodore Tso
2007-May-29 02:55 UTC
fs periodic check (was Re: 2.6.22-rc1 killed my ext3 filesystem cleanly unmounted)
On Thu, May 24, 2007 at 05:39:11PM +0000, Pavel Machek wrote:> Right. Could we get more helpful message here? 'Filesystem check on > next boot on AC power'?So "(check deferred; on battery)" wasn't explicit enough? I guess I assumed that users would understand that the opposite of "on battery" was "on AC power". I guess I could say "(check defferred 'til on AC power)" if people think it would be clearer.> Or maybe keep counting, and when we reach 2x mount-count-limit, > force a fsck, battery power or not?We do that already. It's just tough to make that all fit on an 80 status message. :-) - Ted