thr3ads.net - Ext3 users - A solution to Kernel Panic ... on ext3 only! [May 2002]

If this information is useful, please help other people find it:
Share via:

Uwe Dippel

2002-May-20 08:57 UTC

A solution to Kernel Panic ... on ext3 only!

Here I don't want to start a discussion, but rather share a *solution*
that took me some days to come up with.
The whole stuff started with a power-outage. After reboot, my server
(ext3) came back with the dreaded file system error (Ctrl-D to reboot or
password for maintenance) - if you've never seen this, consider yourself
lucky!
In any case, I did the fsck as prescribed, but abandoned the effort
after having to type the 'Y' for more than 100 times; restarting fsck
with '-y'. *Never* ever do this on ext3, as you will see later!
It started but then informed me about " ... too many errors" or so.
Next, after reboot, it came with a kernel-panic: No init found.
This is almost the time for re-install, isn't it!? No, I tried the
repair before. Bad luck, while reading my nice root-partition (hda6), it
complained "Error mounting filesystem on hda6: Invalid argument", and
"You don't have any Linux partitions. Press return ..."; though I
could
*see* all files on hda6 nicely at the shell. I checked fstab: okay. I
could even mount /dev/hda6 to /mnt/help; looking pretty sane. Though I
was loosing out on my sanity ... !
Finally, enlightenment crossed my mind and here is the problem and the
solution:
The "fsck -y" (see above) had not been able to handle all the errors
and
made the journal unusable. This is why all rescue and booting ended in
disarray: the corrupted journal made the partition look invalid as ext3,
though it was not so bad. I only had to convert it to ext2, have it
repair all the errors and finally recreate the journal; effectively
reconvert it to ext3. It is been running ever since without problem.
I am even pondering to consider that behaviour a bug, since a somewhat
minor problem made things worse (kernel panic!) unnecessarily. It seems
the journal got corrupted not by the outage but by simply too many
automated 'Y' *during repair* !? At least, it had been okay for the
first boot after the outage. Remarkable!

Uwe

Uwe Dippel

2002-May-20 09:03 UTC

head link

A solution to Kernel Panic ... on ext3 only!

Here I don't want to start a discussion, but rather share a *solution*
that took me some days to come up with. 
The whole stuff started with a power-outage. After reboot, my server
(ext3) came back with the dreaded file system error (Ctrl-D to reboot or
password for maintenance) - if you've never seen this, consider yourself
lucky! 
In any case, I did the fsck as prescribed, but abandoned the effort
after having to type the 'Y' for more than 100 times; restarting fsck
with '-y'. *Never* ever do this on ext3, as you will see later! 
It started but then informed me about " ... too many errors" or so.
Next, after reboot, it came with a kernel-panic: No init found. 
This is almost the time for re-install, isn't it!? No, I tried the
repair before. Bad luck, while reading my nice root-partition (hda6), it
complained "Error mounting filesystem on hda6: Invalid argument", and
"You don't have any Linux partitions. Press return ..."; though I
could
*see* all files on hda6 nicely at the shell. I checked fstab: okay. I
could even mount /dev/hda6 to /mnt/help; looking pretty sane. Though I
was loosing out on my sanity ... ! 
Finally, enlightenment crossed my mind and here is the problem and the
solution: 
The "fsck -y" (see above) had not been able to handle all the errors
and
made the journal unusable. This is why all rescue and booting ended in
disarray: the corrupted journal made the partition look invalid as ext3,
though it was not so bad. I only had to convert it to ext2, have it
repair all the errors and finally recreate the journal; effectively
reconvert it to ext3. It is been running ever since without problem. 
I am even pondering to consider that behaviour a bug, since a somewhat
minor problem made things worse (kernel panic!) unnecessarily. It seems
the journal got corrupted not by the outage but by simply too many
automated 'Y' *during repair* !? At least, it had been okay for the
first boot after the outage. Remarkable! 

Uwe

Stephen C. Tweedie

2002-May-20 11:12 UTC

head link

Re: A solution to Kernel Panic ... on ext3 only!

On Mon, May 20, 2002 at 04:57:19PM +0800, Uwe Dippel
wrote:> Here I don't want to start a discussion, but rather share a *solution*
> that took me some days to come up with.
> The whole stuff started with a power-outage. After reboot, my server
> (ext3) came back with the dreaded file system error (Ctrl-D to reboot or
> password for maintenance) - if you've never seen this, consider
yourself
> lucky!
> In any case, I did the fsck as prescribed, but abandoned the effort
> after having to type the 'Y' for more than 100 times; restarting
fsck
> with '-y'. *Never* ever do this on ext3, as you will see later!
What sort of errors was fsck finding?
> It started but then informed me about " ... too many errors" or
so.
> Next, after reboot, it came with a kernel-panic: No init found.
So the root fs had failed to load...> This is almost the time for re-install, isn't it!? No, I tried the
> repair before. Bad luck, while reading my nice root-partition (hda6), it
> complained "Error mounting filesystem on hda6: Invalid argument"
yep.  Did the kernel give any error log as to why it failed to mount?
> I only had to convert it to ext2, have it
> repair all the errors and finally recreate the journal; effectively
> reconvert it to ext3. It is been running ever since without problem.
> I am even pondering to consider that behaviour a bug
If fsck left the fs unmountable, then yes, it is a bug.  The question
is, what damage did it fail to repair?  We'd need a lot more detail in
your bug report to be sure of that.

Cheers,
 Stephen

Theodore Ts'o

2002-May-22 16:32 UTC

head link

Re: A solution to Kernel Panic ... on ext3 only!

On Mon, May 20, 2002 at 04:57:19PM +0800, Uwe Dippel
wrote:> In any case, I did the fsck as prescribed, but abandoned the effort
> after having to type the 'Y' for more than 100 times; restarting
fsck
> with '-y'. *Never* ever do this on ext3, as you will see later!
> It started but then informed me about " ... too many errors" or
so.
There is no "too many errors" printed by e2fsck.  The closest is
"too
many illegal blocks" in an inode, which is followed by an offer to
clear the inode entirely.  If an inode has a corrupted indirect block,
this can cause e2fsck to offer to give up on the inode altogether.
This normal behaviour, and shouldn't have caused any problem (unless
the system really needed the inode which had been corrupted in order
for its boot scripts, of course).
> Next, after reboot, it came with a kernel-panic: No init found.
> This is almost the time for re-install, isn't it!? No, I tried the
> repair before. Bad luck, while reading my nice root-partition (hda6), it
> complained "Error mounting filesystem on hda6: Invalid argument",
and
> "You don't have any Linux partitions. Press return ...";
though I could
> *see* all files on hda6 nicely at the shell. I checked fstab: okay. I
> could even mount /dev/hda6 to /mnt/help; looking pretty sane. Though I
> was loosing out on my sanity ... !
What distribution are you using (i.e., RedHat, Mandrake, Debian,
etc)., and do you know how the root partition was trying to be
mounted?  Based on what you reported, it sounds like the boot scripts
were only trying to mount it as ext3, and not as anything else.
Normally the initrd scripts are set up to load the ext3 module, if
necessary, and then attempts a mount of the filesystem without
specifying the filesystem type.  That way, the kernel will
automatically attempt to mount the filesystem first as ext3, and then
if that fails, it will fall back and attempt a mount as ext2.
> The "fsck -y" (see above) had not been able to handle all the
errors and
> made the journal unusable. This is why all rescue and booting ended in
> disarray: the corrupted journal made the partition look invalid as ext3,
> though it was not so bad. I only had to convert it to ext2, have it
> repair all the errors and finally recreate the journal; effectively
> reconvert it to ext3. It is been running ever since without problem.
> I am even pondering to consider that behaviour a bug, since a somewhat
> minor problem made things worse (kernel panic!) unnecessarily. It seems
> the journal got corrupted not by the outage but by simply too many
> automated 'Y' *during repair* !? At least, it had been okay for the
> first boot after the outage. Remarkable!
There are some filesystem corruptions that will cause e2fsck to offer
to delete the journal, after which it will explicitly state that the
journal inode has been removed, and that the filesystem has been
recoverted to ext2.  It doesn't take a lot of filesystem errors, and
this will happen with or without "fsck -y" (that's just a
red-herring).  Certain specific filesystem corruptions simply corrupt
the journal file, and cause e2fsck to need to remove it.  

It's possible that there might have been some subtle filesystem
corruption which where e2fsck doesn't clear out the journal inode and
recoverts the filesystem to ext2, but you haven't given us enough
information to know exactly what the filesystem corruption was.  (See
the man page for e2fsck, in the REPORTING BUGS section for a
discussion of the sort of information that is really needed to for me
to be able to reproduce, find, and fix e2fsck bugs.)

As I said, "fsck -y" is a red-herring.  All -y does is the equivalent
of your typing 'y' to every question asked by e2fsck; it just saves a
little keyboard wear and tear.  The real question is how your
filesystem was corrupted, and how e2fsck reacted to that particular
form of filesystem corruption.  E2fsck *should* be able to handle any
just about any form of filesystem corruption, for ext2 or ext3, and it
surely shouldn't make things worse.  But of course, no software is
bug-free, and you may have found some new and innovative way for your
filesystems to be corrupted, which e2fsck doesn't deal with correctly.
But I would need a lot more information to be able to figure out
exactly what happened.

						- Ted

Theodore Ts'o

2002-May-24 18:41 UTC

head link

Re: A solution to Kernel Panic ... on ext3 only!

On Thu, May 23, 2002 at 10:08:03AM +0800, Uwe Dippel
wrote:> Please accept, that I wrote the stuff as closely as possible. I had even
> taken a few notes. If you are not a filesystem hacker, you'll probably
> just try to repair the whole stuff and this is what I did. You are
> probably right on the "too many illegal blocks". In any case, I
answered
> all with '-y' and found the situation as described after reboot: 
> "Mounting /proc filesystem 
> Creating root device 
> Mounting root filesystem 
> ext3: No journal on filesystem on ide0(3,6) 
> mount: error 22 mounting ext3 
> pivotroot: pivot_root(/sysroot, /sysroot/initrd) failed: 
> Freeing unused kernel memeory: 220k free 
> Kernel panic: No init found" 
> (Please don't bash me for mistakes copying this manually twice!) 
OK, given what you've told me, I suspect you can reproduce the failure
simply by running coverting your root partition back to ext2 (boot a
rescue system, and then run the command "tune2fs -O ^has_journal" to
remove the journal), and but leave the fstab entry as ext3.

I don't keep a have a recent Red Hat system hanging around, so I can't
test it myself, but it seems pretty clear from what you've
describe. (I probably should have a RH machine around, but it's s a
pain to keep up with security package updates for two different
distributions, and all of my machines are running Debian these days.
I should have one, though, since it's handy for debugging these sorts
of boot script-specific problems, which tend to be distribution
specific.)

Anyway, what happens is that mount is failing because the fstab entry
is saying that the filesystem is ext3, but since the ext3 mount is
failing, the root filesystem is never mounted, which makes the
pivotroot operation fail, and that's why you're getting the "no
init
found" error message.

One of way of making things more robust is to use an /etc/fstab entry
which looks like this:

/dev/hda1      /        ext3,ext2    defaults        1 1

This will cause mount and fsck to first try ext3, and then ext2, so
that in the case where the filesystem is converted back to ext2 due to
a filesystem error, the system will actually come back up cleanly.
The only gotcha here is that the code to support having a list of
filesystems in /etc/fstab is relatively new, and I don't know what
version of Red Hat youre running; if it's too old, it might not have a
sufficiently new enough version of mount and fsck (in the util-linux
and e2fsprogs packages, respectively).

Another way of solving the problem would be to add something like the
following to the Red Hat's initrd linuxrc, before it tries to mount
the real root partition:

ext3root=`awk '{ if (($2 == "/") && ($3 ==
"ext3")) {print $1;}}' /etc/fstab`
if test -n "$ext3root" ; then
	tune2fs --j $ext3root > /dev/null
fi

The tune2fs command is a no-op if the filesystem already has a
journal, but if it doesn't, it will automatically add a journal to the
filesystem.  So this is nice, both because it allows Red Hat to
recover from a missing journal automatically, but also because it
means it becomes very simple to convert your root filesystem to ext3.
All you need to do is to edit /etc/fstab so that root's fs type is
ext3, and then reboot the system.
> And here my enhancement proposal drops in: The data were quite well,
> only mounting as ext3 failed. And this was non-trivial to find out,
> since - as described - the 'Repair' option of the Install-CDROMs
also
> complained about this "...Invalid Argument ..... You don't have
any
> Linux partitions". Re-install is option of choice, you might think;
but:
> no, deleting the journal and a normal fsck on the resulting ext2 brought
> everything back to normal
Yes, you're right.  The system ought to be more robust about the
discrepancy between what's in /etc/fstab and whether the filesystem is
ext3 or ext2.  However, this is a distribution specific problem, not
one which I can fix in e2fsprogs.  So what I'd suggest you do is to
file an enhancement request with Red Hat.  (Or Stephen, who works for
Red Hat, may be able to file this enhancement request for you.)

I will file a similar enhancement request for Debian.

						- Ted

UWE HEINZ RUDI DIPPEL

2002-May-25 11:55 UTC

head link

RE: A solution to Kernel Panic ... on ext3 only!

Dear all,

thanks for your comments. Now I really feel I ought to have done a better
documentation of what I did and once again ask for your forgiveness on this!

*If* and only *if* I remember well, Theodore, contrary to what you wrote in
your first mail, fsck had not degraded the ext3 to ext2 gracefully and there
was a journal left (for deletion), which I did.
Secondly, and here I ask Stephen before messing up the machine again, if it
had been a clean, journal-less ext2, the repair-option should not have said
that "You don't have any Linux partitions", should it? There must
have been
something very wrong with hda6, to come up with this message. Eventually,
here it might be helpful to know, which conditions make the repair option
issue this statement. This had even made me run an fdisk to see, if the
partitions had been destroyed, which returned a clear '83'; finally
motivating me to continue. fdisk was of the opinion '83' and
'repair' of the
opinion '!=83' ?? strange!

Even if we don't manage to reconstruct the situation: Thanks again for all
your efforts!

Uwe

Reasonably Related Threads

Search for more possibly parallel threads

Ext3 users - May 2002 - A solution to Kernel Panic ... on ext3 only!

A solution to Kernel Panic ... on ext3 only!

A solution to Kernel Panic ... on ext3 only!

Re: A solution to Kernel Panic ... on ext3 only!

Re: A solution to Kernel Panic ... on ext3 only!

Re: A solution to Kernel Panic ... on ext3 only!

RE: A solution to Kernel Panic ... on ext3 only!

Reasonably Related Threads