thr3ads.net - Ext3 users - Re: [long] major problems on fs; e2fsck running out of memory [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Keith Keller

2014-Jun-02 02:43 UTC

Re: [long] major problems on fs; e2fsck running out of memory

Hi Bodo and Ted,

Thank you both for your responses; they confirm what I thought might be
the case.  Knowing that I can try to proceed with your suggestions.  I
do have some followup questions for you:


On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o
wrote:> Unfortunately, there has been a huge number of bug fixes for ext4's
> online resize since 2.6.32 and 1.42.11.  It's quite possible that you
> hit one of them.
Would this scenario be explained by these bugs?  I'd expect that if a
resize2fs failed, it would report a problem pretty quickly.  (But
perhaps that's the nature of some of these bugs.)
> Well, actually it's not quite that simple.  There are multiple passes
> to e2fsck, and the first pass is estimated to be 70% of the total
> e2fsck run.  So 51.8% reported by the progress means e2fsck had gotten
> 74% of the way through pass 1.  So that would mean that it had got
> through about inodes associated to about 3.9TB into the file system.
Aha!  Thanks for the clarification.  That's certainly well more than the
original fs size.
> That being said, it's pretty clear that portions of the inode table
> and block group descriptor was badly corrupted.  So I suspect there
> isn't going to be much that can be done to try to repair the file
> system completely.  If there are specific files you need to recover,
> I'd suggest trying to recover them first before trying to do anything
> else.  The good news is that probably around 75% of your files can
> probably be recovered.
So, now when I try to mount, I get an error:

# mount -o ro -t ext4 /dev/mapper/vg1--sdb-lv_vz /vz/
mount: Stale NFS file handle

That's clearly a spurious error, so I checked dmesg:

# dmesg|tail
[159891.219387] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42252
failed (36703!=0)
[159891.219586] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42253
failed (51517!=0)
[159891.219786] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42254
failed (51954!=0)
[159891.220025] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42496
failed (37296!=0)
[159891.220225] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42497
failed (31921!=0)
[159891.220451] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42498
failed (2993!=0)
[159891.220650] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42499
failed (59056!=0)
[159891.220850] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42500
failed (28571!=22299)
[159891.225762] EXT4-fs (dm-0): get root inode failed
[159891.227436] EXT4-fs (dm-0): mount failed

and before that there are many other checksum failed errors.  When I
try a rw mount I get these messages instead:

[160052.031554] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 0
failed (43864!=0)
[160052.031782] EXT4-fs (dm-0): group descriptors corrupted!

Are there any other options I can try to force the mount so I can try to
get to the changed files?  If that'll be challenging, I'll just
sacrifice
those files, but if it'd be relatively straightforward I'd like to make
the attempt.

Thanks again!

--keith

-- 
kkeller@wombat.san-francisco.ca.us

Keith Keller

2014-Jun-02 02:56 UTC

head link

Re: [long] major problems on fs; e2fsck running out of memory

Hi again all,

I apologize for not asking this in my first message; I just remembered
the question after sending.

On Sun, Jun 01, 2014 at 07:43:12PM -0700, Keith Keller
wrote:> 
> On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o wrote:
> > Unfortunately, there has been a huge number of bug fixes for
ext4's
> > online resize since 2.6.32 and 1.42.11.  It's quite possible that
you
> > hit one of them.
> 
> Would this scenario be explained by these bugs?  I'd expect that if a
> resize2fs failed, it would report a problem pretty quickly.  (But
> perhaps that's the nature of some of these bugs.)
I have a very similar second server which has undergone a similar chain
of events, an initial ~2.5tb fs followed by a resize later.  I believe
that it has been fsck'd since the resize (but don't quote me on that).
Am I likely to run into this issue with this fs?  And if I do, what
steps should I do differently (e.g., use the latest e2fsck right away;
don't e2fsck, get files off quickly, and mke2fs; something else)?

--keith

-- 
kkeller@wombat.san-francisco.ca.us

Theodore Ts'o

2014-Jun-02 03:24 UTC

head link

Re: [long] major problems on fs; e2fsck running out of memory

On Sun, Jun 01, 2014 at 07:43:12PM -0700, Keith Keller
wrote:> 
> That's clearly a spurious error, so I checked dmesg:
> 
> [159891.225762] EXT4-fs (dm-0): get root inode failed
> [159891.227436] EXT4-fs (dm-0): mount failed
The "get root inode failed" is rather unfortunate.

Try running "debugfs /dev/dm0"

and then use the "stat /" command.

You can use debugfs to look at the file system and recover individual
files without needing to mount it.  However, if the root directory has
been compromised, that makes using debugfs quite a bit more difficult.
You can look at inodes by inode number by surrounding them with angle
brackets.  i.e., if you want to look at inode 12345, you could say
"stat <12345>", and if you inode 12345 is a directory, you could
list
it via "ls <12345>", etc.  See the debugfs man page for more
details.

       	   	     	       	   	   - Ted

Keith Keller

2014-Jun-02 03:54 UTC

head link

Re: [long] major problems on fs; e2fsck running out of memory

On Sun, Jun 01, 2014 at 11:24:51PM -0400, Theodore Ts'o
wrote:> 
> The "get root inode failed" is rather unfortunate.
Heh, I like your understatement.  :)  I think this helps answer part of
my questions in my second email: I should probably try to preserve
changes from last backup before getting too deep into a tricky e2fsck.
At one point the fs was still mountable, so I could have tried to copy
files off first.  (In a physical failure scenario it's exactly what I'd
have done, but I wasn't thinking of that in this case.)
> Try running "debugfs /dev/dm0"
> 
> and then use the "stat /" command.
No happiness:

# ./e2fsprogs-1.42.10/debugfs/debugfs /dev/dm-0
debugfs 1.42.10 (18-May-2014)
debugfs:  stat /
stat: A block group is missing an inode table while reading inode 2

My hunch is that it would take a large and lucky effort to try to get
anything useful off this fs.  Does that seem like a reasonable guess?

--keith

-- 
kkeller@wombat.san-francisco.ca.us

Eric Sandeen

2014-Jun-02 15:51 UTC

head link

Re: [long] major problems on fs; e2fsck running out of memory

On 6/1/14, 9:43 PM, Keith Keller wrote:> Hi Bodo and Ted,
> 
> Thank you both for your responses; they confirm what I thought might be
> the case.  Knowing that I can try to proceed with your suggestions.  I
> do have some followup questions for you:
> 
> 
> On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o wrote:
>> Unfortunately, there has been a huge number of bug fixes for ext4's
>> online resize since 2.6.32 and 1.42.11.  It's quite possible that
you
>> hit one of them.
> 
> Would this scenario be explained by these bugs?  I'd expect that if a
> resize2fs failed, it would report a problem pretty quickly.  (But
> perhaps that's the nature of some of these bugs.)
Well, for what it's worth, there have been several resize fixes shipped
in RHEL6/Centos6, so it's not just vanilla 1.42.11 or 2.6.32.  But we walk
a fine line between too much churn and risk, and fixing the serious
problems, so it's possible that you hit an unfixed case.  I think it's
fairly
hard to know without a reproducer.  Your corruption looks bad enough that
I tend to agree with Bodo - that it may be some more fundamental underlying
storage problem.

However, some semi-recent fixes, for example:

resize2fs: reserve all metadata blocks for flex_bg file systems

have yet to make it into RHEL6 (they will soon...)

-Eric

Bodo Thiesen

2014-Jun-02 20:52 UTC

head link

Re: [long] major problems on fs; e2fsck running out of memory

* Keith Keller <kkeller@wombat.san-francisco.ca.us> hat geschrieben:

Hi Keith
> I have a very similar second server which has undergone a similar chain
> of events, an initial ~2.5tb fs followed by a resize later.  I believe
> that it has been fsck'd since the resize (but don't quote me on
that).
> Am I likely to run into this issue with this fs?  And if I do, what
> steps should I do differently (e.g., use the latest e2fsck right away;
> don't e2fsck, get files off quickly, and mke2fs; something else)?
umount and then e2fsck -f -n -C 0
(the -C 0 is only for the progress bar)

If it report the fs to be clean (a hand full of errors like b_size wrong
or deleted inode has zero dtime and stuff like that in low number is
ok - to be sure, you might want to post that output here and ask before
removing the -n to fix those errors), you should be save. If it reports
tons of errors or includes invalid blocks or checksum errors. mount -o ro
and backup everything and then mke2fs.

Regards, Bodo

Apparently Analagous Threads

Search for more maybe matching threads

Ext3 users - Jun 2014 - Re: [long] major problems on fs; e2fsck running out of memory

Re: [long] major problems on fs; e2fsck running out of memory

Re: [long] major problems on fs; e2fsck running out of memory

Re: [long] major problems on fs; e2fsck running out of memory

Re: [long] major problems on fs; e2fsck running out of memory

Re: [long] major problems on fs; e2fsck running out of memory

Re: [long] major problems on fs; e2fsck running out of memory

Apparently Analagous Threads