Marc MERLIN
2012-Oct-29 17:48 UTC
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
First, I used another tool to see how the FS looked like, and maybe in the hopes of having a list of subvolumes without mounting it: gandalfthegreat:~# btrfs-calc-size /dev/mapper/bootdsk Calculating size of root tree 180.00KB total size, 0.00 inline data, 1 nodes, 44 leaves, 2 levels Calculating size of extent tree 387.90MB total size, 0.00 inline data, 1423 nodes, 97879 leaves, 4 levels Calculating size of csum tree 440.88MB total size, 0.00 inline data, 1425 nodes, 111441 leaves, 4 levels Calculatin'' size of fs tree 20.00KB total size, 0.00 inline data, 1 nodes, 4 leaves, 2 levels Then, I figured, I''d try mounting all the active snapshots one per one, and they worked: [330514.202529] device label btrfs_pool2 devid 1 transid 39698 /dev/dm-0 [330514.203337] device label btrfs_pool1 devid 1 transid 145479 /dev/dm-1 [330629.438572] device label btrfs_pool1 devid 1 transid 145479 /dev/mapper/bootdsk [330629.439208] btrfs: use lzo compression [330629.439213] btrfs: not using ssd allocation scheme [330629.439216] btrfs: disk space caching is enabled [330653.208718] device label btrfs_pool1 devid 1 transid 145479 /dev/mapper/bootdsk [330658.854162] device label btrfs_pool1 devid 1 transid 145479 /dev/mapper/bootdsk [330661.786204] btrfs: unlinked 25 orphans [330708.314984] device label btrfs_pool1 devid 1 transid 145480 /dev/mapper/bootdsk [330708.675443] btrfs: unlinked 165 orphans [330721.558581] device label btrfs_pool1 devid 1 transid 145480 /dev/mapper/bootdsk [330721.583214] btrfs: unlinked 9 orphans After that, I was able to mount the root (volid 0) without a crash and my filesystem looks fine again. So as far as I can tell, my filesystem is not badly corrupted, and there was just a small bit that triggered a bug in the mounting code. Somehow mounting subvolumes separately cleared the state that triggered the bug, which I can''t quite explain. If someone cares, I made a dd image of the FS to a file on a backup server, but if not, I''ll just delete it. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Oct-30 15:46 UTC
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
On Mon, Oct 29, 2012 at 10:48:02AM -0700, Marc MERLIN wrote:> Then, I figured, I''d try mounting all the active snapshots one per one, > and they worked: > > After that, I was able to mount the root (volid 0) without a crash and > my filesystem looks fine again.Ok, I was wrong. What happened is that my SSD is craping out and failing to write after a certain number of uptime hours. I just had the same problem happen again yesterday. Turns out btrfs-zero-log does fix the problem, but because it output the errors I saw, I thought it did nothing and forgot that I had run it. So 1) btrfs-zero-log does fix the problem 2) my drive causes btrfs to reliably enter a state where the filesystem becomes unmountable and crashes the kernel on the next mount. It would be nice if the kernel wouldn''t crash and refuse to mount instead or even automatically run the equivalent of btrfs-zero-log if necessary. Details below if that helps. gandalfthegreat:~# btrfs-calc-size /dev/mapper/bootdsk Check tree block failed, want=259264512, have=12301165138967429629 Check tree block failed, want=259264512, have=12301165138967429629 Check tree block failed, want=259264512, have=7949122546735189447 Check tree block failed, want=259264512, have=12301165138967429629 Check tree block failed, want=259264512, have=12301165138967429629 read block failed check_tree_block Calculating size of root tree 216.00KB total size, 0.00 inline data, 1 nodes, 53 leaves, 2 levels Calculating size of extent tree 390.99MB total size, 0.00 inline data, 1443 nodes, 98651 leaves, 4 levels Calculating size of csum tree 458.78MB total size, 0.00 inline data, 1472 nodes, 115976 leaves, 4 levels Calculatin'' size of fs tree 20.00KB total size, 0.00 inline data, 1 nodes, 4 leaves, 2 levels gandalfthegreat:~# btrfs-find-root /dev/mapper/bootdsk Super think''s the tree root is at 147779584, chunk root 20979712 Found tree root at 147779584 gandalfthegreat:~# btrfs filesystem show Label: ''btrfs_pool1'' uuid: 92584fa9-85cd-4df6-b182-d32198b76a0b Total devices 1 FS bytes used 344.85GB devid 1 size 441.70GB used 441.70GB path /dev/dm-1 Label: ''btrfs_pool2'' uuid: 04071703-df6b-4022-9632-6c3aeabff206 Total devices 1 FS bytes used 654.12GB devid 1 size 872.51GB used 872.51GB path /dev/dm-0 Btrfs Btrfs v0.19 gandalfthegreat:~# btrfs-zero-log /dev/mapper/bootdsk Check tree block failed, want=259264512, have=12301165138967429629 Check tree block failed, want=259264512, have=12301165138967429629 Check tree block failed, want=259264512, have=7949122546735189447 Check tree block failed, want=259264512, have=12301165138967429629 Check tree block failed, want=259264512, have=12301165138967429629 read block failed check_tree_block gandalfthegreat:~# btrfs-calc-size /dev/mapper/bootdsk Calculating size of root tree 216.00KB total size, 0.00 inline data, 1 nodes, 53 leaves, 2 levels Calculating size of extent tree 390.99MB total size, 0.00 inline data, 1443 nodes, 98651 leaves, 4 levels Calculating size of csum tree 458.78MB total size, 0.00 inline data, 1472 nodes, 115976 leaves, 4 levels Calculatin'' size of fs tree 20.00KB total size, 0.00 inline data, 1 nodes, 4 leaves, 2 levels gandalfthegreat:~# btrfs-zero-log /dev/mapper/bootdsk gandalfthegreat:~# -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sander
2012-Oct-31 09:24 UTC
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
Marc MERLIN wrote (ao):> What happened is that my SSD is craping out and failing to write after > a certain number of uptime hours.What model ssd is that if I may ask? Sander -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Oct-31 15:40 UTC
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
On Wed, Oct 31, 2012 at 10:24:40AM +0100, Sander wrote:> Marc MERLIN wrote (ao): > > What happened is that my SSD is craping out and failing to write after > > a certain number of uptime hours. > > What model ssd is that if I may ask?I had my first one, Crucial C300 just die with all my data about 3 months later. I spent 2-3 weeks trying to get acceptable performance (i.e. faster than a HD) off 2 samsung 830s (you might remember some spam from me here about them when I thought it might be an issue with btrfs initially). Now, I have an OCZ Vertex 4. That said, it''s working fine again for now after I went back to kernel 3.5.3 (down from 3.6.3). It hasn''t been long enough to say for sure, but there is a remote possibility that changes in 3.6 actually caused my drive to freeze after several hours of use. When that happened (3 times), 2 of those times, btrfs did not manage to write all its data before access was cutoff, and I got the bug I reported here, which in turn crashes any kernel you try to mount the FS with. Cleaning the log manually fixed it both times so far. For now, I''ll stick with 3.5.3 for a while to make sure my drive is actually ok (it seems to be afterall), and once I''m happy that it''s the case, I''ll go back to 3.6.3 with serial console remote logging and try to capture the full sata failure I got with 3.6.3. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sander
2012-Nov-01 10:56 UTC
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
Marc MERLIN wrote (ao):> That said, it''s working fine again for now after I went back to kernel 3.5.3 > (down from 3.6.3). It hasn''t been long enough to say for sure, but there is > a remote possibility that changes in 3.6 actually caused my drive to freeze > after several hours of use. > When that happened (3 times), 2 of those times, btrfs did not manage to > write all its data before access was cutoff, and I got the bug I reported > here, which in turn crashes any kernel you try to mount the FS with. > Cleaning the log manually fixed it both times so far. > > For now, I''ll stick with 3.5.3 for a while to make sure my drive is actually > ok (it seems to be afterall), and once I''m happy that it''s the case, I''ll go > back to 3.6.3 with serial console remote logging and try to capture the full > sata failure I got with 3.6.3.Thanks for the info. You could put some load on the ssd to see if you can trigger an issue under 3.6.3(+) with btrfs filesystem scrub or badblocks (in the default non-destructive mode). Can you collect SMART data (with smartctl) from the ssd? Sander -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Nov-01 16:16 UTC
Re: Need help mounting laptop corrupted root btrfs. Kernel BUG at fs/btrfs/volumes.c:3707 - FIXED
On Thu, Nov 01, 2012 at 11:56:18AM +0100, Sander wrote:> > For now, I''ll stick with 3.5.3 for a while to make sure my drive is actually > > ok (it seems to be afterall), and once I''m happy that it''s the case, I''ll go > > back to 3.6.3 with serial console remote logging and try to capture the full > > sata failure I got with 3.6.3. > > Thanks for the info. You could put some load on the ssd to see if you > can trigger an issue under 3.6.3(+) with btrfs filesystem scrub or > badblocks (in the default non-destructive mode).I''ll try this in a few days when I''ve first comfirmed that my SSD is still 100% stable under 3.5.3 (so far it is). After that, I''ll go back to 3.6.3 and see what it takes to crash it. But as per my original report and http://marc.merlins.org/tmp/crash.jpg this does look like a sata layer problem, which btrfs isn''t responsible for. Also there is still that unaddressed bug that when it does happen, btrfs then can end up in a state where the filesystem is unmountable without manually fixing it.> Can you collect SMART data (with smartctl) from the ssd?I did actually have a look, but to be honest, SSDs have pretty useless smart data overall. Mine''s likely a bit worse than the average even. gandalfthegreat:~# smartctl -a /dev/sda smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.3-amd64-preempt-noide-20120903] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION ==Device Model: OCZ-VERTEX4 Serial Number: OCZ-26W4VJ3SP32E1WC2 LU WWN Device Id: 5 e83a97 59be3b57e Firmware Version: 1.5 User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 9 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Nov 1 09:14:43 2012 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x00) Error logging NOT supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 0) minutes. Extended self-test routine recommended polling time: ( 0) minutes. SMART Attributes Data Structure revision number: 18 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0000 006 000 000 Old_age Offline - 6 3 Spin_Up_Time 0x0000 100 100 000 Old_age Offline - 0 4 Start_Stop_Count 0x0000 100 100 000 Old_age Offline - 0 5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 8 9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 1210 12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 240 232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 8019542246 233 Media_Wearout_Indicator 0x0000 099 000 000 Old_age Offline - 99 SMART Error Log not supported Warning! SMART Self-Test Log Structure error: invalid SMART checksum. SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html