Hello, We have 10 1-TB drives hosting a multi-device btrfs filesystem, configured with raid1+0 for both data and metadata. After some package upgrades over the weekend I restarted the system and it did not come back up afterwards. I booted using a rescue disk and ran btrfsck (next branch from Chris''s git repository). Unfortunately btrfsck aborts on every single drive with errors like this: parent transid verify failed on 12050980864 wanted 377535 found 128327 parent transid verify failed on 12074557440 wanted 422817 found 126691 parent transid verify failed on 12057542656 wanted 422786 found 126395 parent transid verify failed on 12075556864 wanted 423004 found 126691 bad block 12095545344 parent transid verify failed on 12079190016 wanted 422826 found 105147 leaf parent key incorrect 12097544192 bad block 12097544192 I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel: 2.6.35-23-server Attempting to mount the filesystem blocks indefinitely, with /var/log/messages getting filled with the ''parent transid verify'' errors. IIUC the ''btrfs-select-super'' utility is not really helpful in our case. At this point, my only priority is to somehow rescue the data from the filesystem. I''d really appreciate if someone on the list could help me out. I''m happy to provide any other information required. Please CC me on replies as I''m not subscribed to the list. Thanks, Diwaker -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help, anyone? Sorry for the quick repost, but there was some important data on that filesystem that I don''t have a backup for. I''d really appreciate any pointers that can help recover the data. Searching through the archives, it seems others have faced similar issues due to sudden power outages. AFAIK we did not have any power outage. I''ve run badblocks on all of the 10 drives and three of them had a few bad blocks. I''m inclined to rule out bad disks as the root cause. In any case, isn''t this exactly the kind of situation btrfs should protect users against? A ''btrfsck'' aborts on all of the drives. I''ve tried running it with ''-s 1'' as well as ''-s 2'' with no success. Does that mean that none of the drives have any copy of the superblock intact? Diwaker On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.com> wrote:> Hello, > > We have 10 1-TB drives hosting a multi-device btrfs filesystem, > configured with raid1+0 for both data and metadata. After some package > upgrades over the weekend I restarted the system and it did not come > back up afterwards. I booted using a rescue disk and ran btrfsck (next > branch from Chris''s git repository). Unfortunately btrfsck aborts on > every single drive with errors like this: > > parent transid verify failed on 12050980864 wanted 377535 found 128327 > parent transid verify failed on 12074557440 wanted 422817 found 126691 > parent transid verify failed on 12057542656 wanted 422786 found 126395 > parent transid verify failed on 12075556864 wanted 423004 found 126691 > bad block 12095545344 > parent transid verify failed on 12079190016 wanted 422826 found 105147 > leaf parent key incorrect 12097544192 > bad block 12097544192 > > I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel: > 2.6.35-23-server > > Attempting to mount the filesystem blocks indefinitely, with > /var/log/messages getting filled with the ''parent transid verify'' > errors. > > IIUC the ''btrfs-select-super'' utility is not really helpful in our > case. At this point, my only priority is to somehow rescue the data > from the filesystem. I''d really appreciate if someone on the list > could help me out. > > I''m happy to provide any other information required. Please CC me on > replies as I''m not subscribed to the list. > > Thanks, > Diwaker >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I can''t help you with your problem, but: It is a really really really bad idea to store data without a backup on a filesystem that is still in some kind of alpha stadium (don''t understand me wrong, I like btrfs and you guys do a really good job. But the lack of a working fsck keeps btrfs in that stadium in my eyes). I can''t believe there are ppl out there who do that stupid things :/ Felix On 08. February 2011 - 12:25, Diwaker Gupta wrote:> Date: Tue, 8 Feb 2011 12:25:55 -0800 > From: Diwaker Gupta <diwaker@maginatics.com> > To: linux-btrfs@vger.kernel.org > Subject: Re: Error mounting multi-device fs after restart > > Help, anyone? Sorry for the quick repost, but there was some important > data on that filesystem that I don''t have a backup for. I''d really > appreciate any pointers that can help recover the data. > > Searching through the archives, it seems others have faced similar > issues due to sudden power outages. AFAIK we did not have any power > outage. > > I''ve run badblocks on all of the 10 drives and three of them had a few > bad blocks. I''m inclined to rule out bad disks as the root cause. In > any case, isn''t this exactly the kind of situation btrfs should > protect users against? > > A ''btrfsck'' aborts on all of the drives. I''ve tried running it with > ''-s 1'' as well as ''-s 2'' with no success. Does that mean that none of > the drives have any copy of the superblock intact? > > Diwaker > > On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.com> wrote: > > Hello, > > > > We have 10 1-TB drives hosting a multi-device btrfs filesystem, > > configured with raid1+0 for both data and metadata. After some package > > upgrades over the weekend I restarted the system and it did not come > > back up afterwards. I booted using a rescue disk and ran btrfsck (next > > branch from Chris''s git repository). Unfortunately btrfsck aborts on > > every single drive with errors like this: > > > > parent transid verify failed on 12050980864 wanted 377535 found 128327 > > parent transid verify failed on 12074557440 wanted 422817 found 126691 > > parent transid verify failed on 12057542656 wanted 422786 found 126395 > > parent transid verify failed on 12075556864 wanted 423004 found 126691 > > bad block 12095545344 > > parent transid verify failed on 12079190016 wanted 422826 found 105147 > > leaf parent key incorrect 12097544192 > > bad block 12097544192 > > > > I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel: > > 2.6.35-23-server > > > > Attempting to mount the filesystem blocks indefinitely, with > > /var/log/messages getting filled with the ''parent transid verify'' > > errors. > > > > IIUC the ''btrfs-select-super'' utility is not really helpful in our > > case. At this point, my only priority is to somehow rescue the data > > from the filesystem. I''d really appreciate if someone on the list > > could help me out. > > > > I''m happy to provide any other information required. Please CC me on > > replies as I''m not subscribed to the list. > > > > Thanks, > > Diwaker > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 08 of February 2011 21:25:55 Diwaker Gupta wrote:> Searching through the archives, it seems others have faced similar > issues due to sudden power outages. AFAIK we did not have any power > outage.SysRq+B will have the same effect, OOPS or BUG will have similar effect> I''ve run badblocks on all of the 10 drives and three of them had a few > bad blocks. I''m inclined to rule out bad disks as the root cause. In > any case, isn''t this exactly the kind of situation btrfs should > protect users against?And in the end it will, unfortunately at the moment it will only report the read data doesn''t match stored checksum in the dmesg. If you have redundacy in place it will try to read the other copy of data. That''s it. As a side note, if a drive made in the past 5 years has badblocks detectable by `badblocks` it''s long gone, probably it was silently corrupting data for a long time now.> A ''btrfsck'' aborts on all of the drives. I''ve tried running it with > ''-s 1'' as well as ''-s 2'' with no success. Does that mean that none of > the drives have any copy of the superblock intact?-s 1 and -s 2 will try to read backup copies of superblock, not superblock copies on other devices. Regular code should perform the latter by itself.> Diwaker > > On Mon, Feb 7, 2011 at 11:46 AM, Diwaker Gupta <diwaker@maginatics.com>wrote:> > Hello, > > > > We have 10 1-TB drives hosting a multi-device btrfs filesystem, > > configured with raid1+0 for both data and metadata. After some package > > upgrades over the weekend I restarted the system and it did not come > > back up afterwards. I booted using a rescue disk and ran btrfsck (next > > branch from Chris''s git repository). Unfortunately btrfsck aborts on > > every single drive with errors like this: > > > > parent transid verify failed on 12050980864 wanted 377535 found 128327 > > parent transid verify failed on 12074557440 wanted 422817 found 126691 > > parent transid verify failed on 12057542656 wanted 422786 found 126395 > > parent transid verify failed on 12075556864 wanted 423004 found 126691 > > bad block 12095545344 > > parent transid verify failed on 12079190016 wanted 422826 found 105147 > > leaf parent key incorrect 12097544192 > > bad block 12097544192 > > > > I''m running 10.04 Ubuntu Lucid with the lts-backport x86_64 kernel: > > 2.6.35-23-server > > > > Attempting to mount the filesystem blocks indefinitely, with > > /var/log/messages getting filled with the ''parent transid verify'' > > errors.Define *indefinitely*. Are the drives not working? If the drives are working, have you tried waiting 2-3 days, possibly longer? 10TB is a *lot* of data> > > > IIUC the ''btrfs-select-super'' utility is not really helpful in our > > case. At this point, my only priority is to somehow rescue the data > > from the filesystem. I''d really appreciate if someone on the list > > could help me out.getting the FS mountable is your best bet at the moment (apart from diving in the drive with dd in one hand and hexdump in the other...)> > > > I''m happy to provide any other information required. Please CC me on > > replies as I''m not subscribed to the list. > > > > Thanks, > > Diwaker > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Define *indefinitely*.Meaning the messages continued for as long as the system was under observation.> Are the drives not working?I believe they are. Working in the sense that I can read off data using ''dd'', I can inspect partition tables etc.> If the drives are working, have you tried waiting 2-3 days, possibly longer? > 10TB is a *lot* of dataThe system was running overnight when I first hit the problem. On subsequent reboots, I''ve only waited less than half an hour. Usually the mount is instantaneous, so I wasn''t sure if waiting would help at all. The error messages did not indicate that the system could recover at that stage. If there''s even a slight chance that the fs would eventually mount, I''m happy to let it run for a day or two. Note that if I mount using the ''degraded'' option, the mount succeeds but subsequent attempts to read the data fail.> getting the FS mountable is your best bet at the moment (apart from diving in > the drive with dd in one hand and hexdump in the other...)sigh, I feared as much.>> > >> > I''m happy to provide any other information required. Please CC me on >> > replies as I''m not subscribed to the list. >> > >> > Thanks, >> > Diwaker >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Hubert Kario > QBS - Quality Business Software > 02-656 Warszawa, ul. Ksawerów 30/85 > tel. +48 (22) 646-61-51, 646-74-24 > www.qbs.com.pl >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 8, 2011 at 3:59 PM, Diwaker Gupta <diwaker@maginatics.com> wrote:>> Define *indefinitely*. > > Meaning the messages continued for as long as the system was under observation. > >> Are the drives not working? > > I believe they are. Working in the sense that I can read off data > using ''dd'', I can inspect partition tables etc. > >> If the drives are working, have you tried waiting 2-3 days, possibly longer? >> 10TB is a *lot* of data > > The system was running overnight when I first hit the problem. On > subsequent reboots, I''ve only waited less than half an hour. Usually > the mount is instantaneous, so I wasn''t sure if waiting would help at > all. The error messages did not indicate that the system could recover > at that stage. If there''s even a slight chance that the fs would > eventually mount, I''m happy to let it run for a day or two. Note that > if I mount using the ''degraded'' option, the mount succeeds but > subsequent attempts to read the data fail.Huh. How do those attempts fail? Try mounting ro, or degraded,ro, and reading the data off. That worked for me recently on a broken btrfs raid10 (and didn''t on another one, so your mileage may vary). There''s also the perpetually imminent fsck development which might save the day. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> The system was running overnight when I first hit the problem. On >> subsequent reboots, I''ve only waited less than half an hour. Usually >> the mount is instantaneous, so I wasn''t sure if waiting would help at >> all. The error messages did not indicate that the system could recover >> at that stage. If there''s even a slight chance that the fs would >> eventually mount, I''m happy to let it run for a day or two. Note that >> if I mount using the ''degraded'' option, the mount succeeds but >> subsequent attempts to read the data fail. > > Huh. How do those attempts fail?Same way when I try to do a regular mount: the read blocks and I see a continuous stream of the ''parent transid verify failed'' messages in dmesg.> Try mounting ro, or degraded,ro, and reading the data off. That > worked for me recently on a broken btrfs raid10 (and didn''t on another > one, so your mileage may vary).Ok I''ll give these a shot. I still don''t quite understand what it means if btrfsck aborts; if it can''t find the superblock on any of the drives, how would btrfs ever be able to mount the fs? Diwaker -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html