thr3ads.net - Btrfs devel - Error mounting multi-device fs after restart [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Diwaker Gupta

2011-Feb-07 19:46 UTC

Error mounting multi-device fs after restart

Hello,

We have 10 1-TB drives hosting a multi-device btrfs filesystem,
configured with raid1+0 for both data and metadata. After some package
upgrades over the weekend I restarted the system and it did not come
back up afterwards. I booted using a rescue disk and ran btrfsck (next
branch from Chris''s git repository). Unfortunately btrfsck aborts on
every single drive with errors like this:

parent transid verify failed on 12050980864 wanted 377535 found 128327
parent transid verify failed on 12074557440 wanted 422817 found 126691
parent transid verify failed on 12057542656 wanted 422786 found 126395
parent transid verify failed on 12075556864 wanted 423004 found 126691
bad block 12095545344
parent transid verify failed on 12079190016 wanted 422826 found 105147
leaf parent key incorrect 12097544192
bad block 12097544192

I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel:
2.6.35-23-server

Attempting to mount the filesystem blocks indefinitely, with
/var/log/messages getting filled with the ''parent transid
verify''
errors.

IIUC the ''btrfs-select-super'' utility is not really helpful in
our
case. At this point, my only priority is to somehow rescue the data
from the filesystem. I''d really appreciate if someone on the list
could help me out.

I''m happy to provide any other information required. Please CC me on
replies as I''m not subscribed to the list.

Thanks,
Diwaker
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Diwaker Gupta

2011-Feb-08 20:25 UTC

head link

Re: Error mounting multi-device fs after restart

Help, anyone? Sorry for the quick repost, but there was some important
data on that filesystem that I don''t have a backup for. I''d
really
appreciate any pointers that can help recover the data.

Searching through the archives, it seems others have faced similar
issues due to sudden power outages. AFAIK we did not have any power
outage.

I''ve run badblocks on all of the 10 drives and three of them had a few
bad blocks. I''m inclined to rule out bad disks as the root cause. In
any case, isn''t this exactly the kind of situation btrfs should
protect users against?

A ''btrfsck'' aborts on all of the drives. I''ve tried
running it with
''-s 1'' as well as ''-s 2'' with no success.
Does that mean that none of
the drives have any copy of the superblock intact?

Diwaker

On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.com>
wrote:> Hello,
>
> We have 10 1-TB drives hosting a multi-device btrfs filesystem,
> configured with raid1+0 for both data and metadata. After some package
> upgrades over the weekend I restarted the system and it did not come
> back up afterwards. I booted using a rescue disk and ran btrfsck (next
> branch from Chris''s git repository). Unfortunately btrfsck aborts
on
> every single drive with errors like this:
>
> parent transid verify failed on 12050980864 wanted 377535 found 128327
> parent transid verify failed on 12074557440 wanted 422817 found 126691
> parent transid verify failed on 12057542656 wanted 422786 found 126395
> parent transid verify failed on 12075556864 wanted 423004 found 126691
> bad block 12095545344
> parent transid verify failed on 12079190016 wanted 422826 found 105147
> leaf parent key incorrect 12097544192
> bad block 12097544192
>
> I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64
kernel:
> 2.6.35-23-server
>
> Attempting to mount the filesystem blocks indefinitely, with
> /var/log/messages getting filled with the ''parent transid
verify''
> errors.
>
> IIUC the ''btrfs-select-super'' utility is not really
helpful in our
> case. At this point, my only priority is to somehow rescue the data
> from the filesystem. I''d really appreciate if someone on the list
> could help me out.
>
> I''m happy to provide any other information required. Please CC me
on
> replies as I''m not subscribed to the list.
>
> Thanks,
> Diwaker
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Felix Blanke

2011-Feb-08 21:04 UTC

head link

Re: Error mounting multi-device fs after restart

I can''t help you with your problem, but:

It is a really really really bad idea to store data without a backup on a
filesystem
that is still in some kind of alpha stadium (don''t understand me wrong,
I like btrfs
and you guys do a really good job. But the lack of a working fsck keeps btrfs in
that
stadium in my eyes).
I can''t believe there are ppl out there who do that stupid things :/


Felix


On 08. February 2011 - 12:25, Diwaker Gupta wrote:> Date:	Tue, 8 Feb 2011 12:25:55 -0800
> From: Diwaker Gupta <diwaker@maginatics.com>
> To: linux-btrfs@vger.kernel.org
> Subject: Re: Error mounting multi-device fs after restart
> 
> Help, anyone? Sorry for the quick repost, but there was some important
> data on that filesystem that I don''t have a backup for.
I''d really
> appreciate any pointers that can help recover the data.
> 
> Searching through the archives, it seems others have faced similar
> issues due to sudden power outages. AFAIK we did not have any power
> outage.
> 
> I''ve run badblocks on all of the 10 drives and three of them had a
few
> bad blocks. I''m inclined to rule out bad disks as the root cause.
In
> any case, isn''t this exactly the kind of situation btrfs should
> protect users against?
> 
> A ''btrfsck'' aborts on all of the drives. I''ve
tried running it with
> ''-s 1'' as well as ''-s 2'' with no
success. Does that mean that none of
> the drives have any copy of the superblock intact?
> 
> Diwaker
> 
> On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta
<diwaker@maginatics.com> wrote:
> > Hello,
> >
> > We have 10 1-TB drives hosting a multi-device btrfs filesystem,
> > configured with raid1+0 for both data and metadata. After some package
> > upgrades over the weekend I restarted the system and it did not come
> > back up afterwards. I booted using a rescue disk and ran btrfsck (next
> > branch from Chris''s git repository). Unfortunately btrfsck
aborts on
> > every single drive with errors like this:
> >
> > parent transid verify failed on 12050980864 wanted 377535 found 128327
> > parent transid verify failed on 12074557440 wanted 422817 found 126691
> > parent transid verify failed on 12057542656 wanted 422786 found 126395
> > parent transid verify failed on 12075556864 wanted 423004 found 126691
> > bad block 12095545344
> > parent transid verify failed on 12079190016 wanted 422826 found 105147
> > leaf parent key incorrect 12097544192
> > bad block 12097544192
> >
> > I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64
kernel:
> > 2.6.35-23-server
> >
> > Attempting to mount the filesystem blocks indefinitely, with
> > /var/log/messages getting filled with the ''parent transid
verify''
> > errors.
> >
> > IIUC the ''btrfs-select-super'' utility is not really
helpful in our
> > case. At this point, my only priority is to somehow rescue the data
> > from the filesystem. I''d really appreciate if someone on the
list
> > could help me out.
> >
> > I''m happy to provide any other information required. Please
CC me on
> > replies as I''m not subscribed to the list.
> >
> > Thanks,
> > Diwaker
> >
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hubert Kario

2011-Feb-08 21:51 UTC

head link

Re: Error mounting multi-device fs after restart

On Tuesday 08 of February 2011 21:25:55 Diwaker Gupta
wrote:> Searching through the archives, it seems others have faced similar
> issues due to sudden power outages. AFAIK we did not have any power
> outage.
SysRq+B will have the same effect, OOPS or BUG will have similar effect
> I''ve run badblocks on all of the 10 drives and three of them had a
few
> bad blocks. I''m inclined to rule out bad disks as the root cause.
In
> any case, isn''t this exactly the kind of situation btrfs should
> protect users against?
And in the end it will, unfortunately at the moment it will only report the 
read data doesn''t match stored checksum in the dmesg. If you have
redundacy in
place it will try to read the other copy of data. That''s it.

As a side note, if a drive made in the past 5 years has badblocks detectable 
by `badblocks` it''s long gone, probably it was silently corrupting data
for a
long time now.
> A ''btrfsck'' aborts on all of the drives. I''ve
tried running it with
> ''-s 1'' as well as ''-s 2'' with no
success. Does that mean that none of
> the drives have any copy of the superblock intact?
-s 1 and -s 2 will try to read backup copies of superblock, not superblock 
copies on other devices. Regular code should perform the latter by itself.
 > Diwaker
> 
> On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta
<diwaker@maginatics.com>
wrote:> > Hello,
> > 
> > We have 10 1-TB drives hosting a multi-device btrfs filesystem,
> > configured with raid1+0 for both data and metadata. After some package
> > upgrades over the weekend I restarted the system and it did not come
> > back up afterwards. I booted using a rescue disk and ran btrfsck (next
> > branch from Chris''s git repository). Unfortunately btrfsck
aborts on
> > every single drive with errors like this:
> > 
> > parent transid verify failed on 12050980864 wanted 377535 found 128327
> > parent transid verify failed on 12074557440 wanted 422817 found 126691
> > parent transid verify failed on 12057542656 wanted 422786 found 126395
> > parent transid verify failed on 12075556864 wanted 423004 found 126691
> > bad block 12095545344
> > parent transid verify failed on 12079190016 wanted 422826 found 105147
> > leaf parent key incorrect 12097544192
> > bad block 12097544192
> > 
> > I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64
kernel:
> > 2.6.35-23-server
> > 
> > Attempting to mount the filesystem blocks indefinitely, with
> > /var/log/messages getting filled with the ''parent transid
verify''
> > errors.
Define *indefinitely*.
Are the drives not working?
If the drives are working, have you tried waiting 2-3 days, possibly longer?
10TB is a *lot* of data
> > 
> > IIUC the ''btrfs-select-super'' utility is not really
helpful in our
> > case. At this point, my only priority is to somehow rescue the data
> > from the filesystem. I''d really appreciate if someone on the
list
> > could help me out.
getting the FS mountable is your best bet at the moment (apart from diving in 
the drive with dd in one hand and hexdump in the other...)
> > 
> > I''m happy to provide any other information required. Please
CC me on
> > replies as I''m not subscribed to the list.
> > 
> > Thanks,
> > Diwaker
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Diwaker Gupta

2011-Feb-08 21:59 UTC

head link

Re: Error mounting multi-device fs after restart

> Define *indefinitely*.
Meaning the messages continued for as long as the system was under observation.
> Are the drives not working?
I believe they are. Working in the sense that I can read off data
using ''dd'', I can inspect partition tables etc.
> If the drives are working, have you tried waiting 2-3 days, possibly
longer?
> 10TB is a *lot* of data
The system was running overnight when I first hit the problem. On
subsequent reboots, I''ve only waited less than half an hour. Usually
the mount is instantaneous, so I wasn''t sure if waiting would help at
all. The error messages did not indicate that the system could recover
at that stage. If there''s even a slight chance that the fs would
eventually mount, I''m happy to let it run for a day or two. Note that
if I mount using the ''degraded'' option, the mount succeeds but
subsequent attempts to read the data fail.
> getting the FS mountable is your best bet at the moment (apart from diving
in
> the drive with dd in one hand and hexdump in the other...)
sigh, I feared as much.
>> >
>> > I''m happy to provide any other information required.
Please CC me on
>> > replies as I''m not subscribed to the list.
>> >
>> > Thanks,
>> > Diwaker
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Hubert Kario
> QBS - Quality Business Software
> 02-656 Warszawa, ul. Ksawerów 30/85
> tel. +48 (22) 646-61-51, 646-74-24
> www.qbs.com.pl
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cwillu

2011-Feb-08 22:04 UTC

head link

Re: Error mounting multi-device fs after restart

On Tue, Feb 8, 2011 at 3:59 PM, Diwaker Gupta <diwaker@maginatics.com>
wrote:>> Define *indefinitely*.
>
> Meaning the messages continued for as long as the system was under
observation.
>
>> Are the drives not working?
>
> I believe they are. Working in the sense that I can read off data
> using ''dd'', I can inspect partition tables etc.
>
>> If the drives are working, have you tried waiting 2-3 days, possibly
longer?
>> 10TB is a *lot* of data
>
> The system was running overnight when I first hit the problem. On
> subsequent reboots, I''ve only waited less than half an hour.
Usually
> the mount is instantaneous, so I wasn''t sure if waiting would help
at
> all. The error messages did not indicate that the system could recover
> at that stage. If there''s even a slight chance that the fs would
> eventually mount, I''m happy to let it run for a day or two. Note
that
> if I mount using the ''degraded'' option, the mount
succeeds but
> subsequent attempts to read the data fail.
Huh.  How do those attempts fail?

Try mounting ro, or degraded,ro, and reading the data off.  That
worked for me recently on a broken btrfs raid10 (and didn''t on another
one, so your mileage may vary).

There''s also the perpetually imminent fsck development which might save
the day.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Diwaker Gupta

2011-Feb-08 22:11 UTC

head link

Re: Error mounting multi-device fs after restart

>> The system was running overnight when I first hit the problem. On
>> subsequent reboots, I''ve only waited less than half an hour.
Usually
>> the mount is instantaneous, so I wasn''t sure if waiting would
help at
>> all. The error messages did not indicate that the system could recover
>> at that stage. If there''s even a slight chance that the fs
would
>> eventually mount, I''m happy to let it run for a day or two.
Note that
>> if I mount using the ''degraded'' option, the mount
succeeds but
>> subsequent attempts to read the data fail.
>
> Huh.  How do those attempts fail?
Same way when I try to do a regular mount: the read blocks and I see a
continuous stream of the ''parent transid verify failed''
messages in
dmesg.
> Try mounting ro, or degraded,ro, and reading the data off.  That
> worked for me recently on a broken btrfs raid10 (and didn''t on
another
> one, so your mileage may vary).
Ok I''ll give these a shot. I still don''t quite understand what
it
means if btrfsck aborts; if it can''t find the superblock on any of the
drives, how would btrfs ever be able to mount the fs?

Diwaker
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Feb 2011 - Error mounting multi-device fs after restart

Error mounting multi-device fs after restart

Re: Error mounting multi-device fs after restart

Re: Error mounting multi-device fs after restart

Re: Error mounting multi-device fs after restart

Re: Error mounting multi-device fs after restart

Re: Error mounting multi-device fs after restart

Re: Error mounting multi-device fs after restart