thr3ads.net - Btrfs devel - Debian 3.7.1 BTRFS crash [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Russell Coker

2013-Mar-13 01:38 UTC

Debian 3.7.1 BTRFS crash

I have a workstation running the Debian packaged 3.7.1 kernel from 24th 
December last year.  After some period of uptime (maybe months) it crashed and 
mounted the root filesystem read-only.  Now when I boot it the root filesystem 
gets mounted read-only.

I have attached the dmesg output from the last boot.

The system has an Intel 120G SSD and apart from 4G of swap and 400M of /boot 
it''s all a single encrypted BTRFS filesystem.

Any suggestions on what I should do next?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Harald Glatt

2013-Mar-13 01:56 UTC

head link

Re: Debian 3.7.1 BTRFS crash

If you care about the data, create a backup if you haven''t already
done so. Then you can try btrfsck, maybe you are in luck!

On Wed, Mar 13, 2013 at 2:38 AM, Russell Coker <russell@coker.com.au>
wrote:> I have a workstation running the Debian packaged 3.7.1 kernel from 24th
> December last year.  After some period of uptime (maybe months) it crashed
and
> mounted the root filesystem read-only.  Now when I boot it the root
filesystem
> gets mounted read-only.
>
> I have attached the dmesg output from the last boot.
>
> The system has an Intel 120G SSD and apart from 4G of swap and 400M of
/boot
> it''s all a single encrypted BTRFS filesystem.
>
> Any suggestions on what I should do next?
>
> --
> My Main Blog         http://etbe.coker.com.au/
> My Documents Blog    http://doc.coker.com.au/--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Sandeen

2013-Mar-13 02:03 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On 3/12/13 8:38 PM, Russell Coker wrote:> I have a workstation running the Debian packaged 3.7.1 kernel from 24th 
> December last year.  After some period of uptime (maybe months) it crashed
and
> mounted the root filesystem read-only.  Now when I boot it the root
filesystem
> gets mounted read-only.
> 
> I have attached the dmesg output from the last boot.
> 
> The system has an Intel 120G SSD and apart from 4G of swap and 400M of
/boot
> it''s all a single encrypted BTRFS filesystem.
> 
> Any suggestions on what I should do next?
Not offhand, but I took a look at the logs, and maybe this will help the
people who are more guru-like than I am.

First you hit:

[   37.175750] btrfs: corrupt leaf, bad key order: block=70852288512,root=1,
slot=8
[   37.176435] btrfs: corrupt leaf, bad key order: block=70852288512,root=1,
slot=8

which led to an aborted transaction and an attempt at graceful shutdown:

[   37.176478] WARNING: at
/build/buildd-linux_3.7.1-1~experimental.1-amd64-lU7Aeh/linux-3.7.1/fs/btrfs/super.c:246
__btrfs_abort_transaction+0x4c/0xcf [btrfs]()
[   37.176481] btrfs: Transaction aborted
...
[   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143: IO failure
[   37.176791] btrfs is forced readonly
[   37.176793] btrfs: run_one_delayed_ref returned -5


in the end, despite that attempt at graceful exit, you hit:

[   37.937174] kernel BUG at
/build/buildd-linux_3.7.1-1~experimental.1-amd64-lU7Aeh/linux-3.7.1/fs/btrfs/transaction.c:1753!

because in btrfs_clean_old_snapshots(), btrfs_drop_snapshot() failed, and

		BUG_ON(ret < 0);

it doesn''t handle that well.

I have no idea what btrfsck might do, but it seems like if there is a corrupt
leaf, that might be in order.  I might make a device image first, as well.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jérôme Poulin

2013-Mar-13 05:07 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On Tue, Mar 12, 2013 at 10:03 PM, Eric Sandeen <sandeen@redhat.com>
wrote:> [   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143: IO
failure
> [   37.176791] btrfs is forced readonly
> [   37.176793] btrfs: run_one_delayed_ref returned -5
>

It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
burnt 1 SSD disk and 2 USB flash drive since I''m using BTRFS, in about
2 months for each. ddrescue''ing the SSD would probably give better
chances of recovery and give BTRFS/btrfsck a chance to write correctly
to the newly copied image.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bart Noordervliet

2013-Mar-13 10:56 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On Wed, Mar 13, 2013 at 6:07 AM, Jérôme Poulin <jeromepoulin@gmail.com>
wrote:> It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> burnt 1 SSD disk and 2 USB flash drive since I''m using BTRFS, in
about
> 2 months for each.
USB flash drives are rubbish for any filesystem except FAT32 and then
still only gracefully accept large sequential writes. A few years ago
I thought it would be a good idea to put the root partition of a few
of my small Debian servers on USB flash, so that the harddisks could
spin down at night and I could easily prepare and switch a new
Debian-version. However, each and every USB stick got trashed within a
year, no matter which brand, size or product line and despite
specifically formatting them ext3 without a journal. I now use low-end
but recent series of SSD''s and have had no such problems any more. I
don''t use btrfs on them as yet, but ext4 even with a journal is doing
just fine.

Regards,

Bart
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Swâmi Petaramesh

2013-Mar-13 11:31 UTC

head link

Re: Debian 3.7.1 BTRFS crash

Le 13/03/2013 11:56, Bart Noordervliet a écrit :> USB flash drives are rubbish for any filesystem except FAT32 and then
> still only gracefully accept large sequential writes. A few years ago
> I thought it would be a good idea to put the root partition of a few
> of my small Debian servers on USB flash, so that the harddisks could
> spin down at night and I could easily prepare and switch a new
> Debian-version. However, each and every USB stick got trashed within a
> yearI have an ARM box that runs a little Debian server (typically an
advanced NAS), it uses an USB key as an ext2 root filesystem. Everything
but big storage is there, and it''s been up and running 24/7 for 3+
years
without any USB key incident...

The USB key is a cheap 1 GB Verbatim I purchased from the next drugstore ;-)

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Sandeen

2013-Mar-13 13:47 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On 3/13/13 12:07 AM, Jérôme Poulin wrote:> On Tue, Mar 12, 2013 at 10:03 PM, Eric Sandeen <sandeen@redhat.com>
wrote:
>> [   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143:
IO failure
>> [   37.176791] btrfs is forced readonly
>> [   37.176793] btrfs: run_one_delayed_ref returned -5
>>
> 
> 
> It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> burnt 1 SSD disk and 2 USB flash drive since I''m using BTRFS, in
about
> 2 months for each. ddrescue''ing the SSD would probably give better
> chances of recovery and give BTRFS/btrfsck a chance to write correctly
> to the newly copied image.
On what do you base that theory?  I suppose it could be, but nothing
in the logs necessarily suggests that.  The "IO failure" is because 
the fs shut down, went readonly, and subsequent IOs got -EIO,
I think.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Russell Coker

2013-Mar-13 14:03 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On Thu, 14 Mar 2013, Eric Sandeen <sandeen@redhat.com>
wrote:> > It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> > burnt 1 SSD disk and 2 USB flash drive since I''m using BTRFS,
in about
> > 2 months for each. ddrescue''ing the SSD would probably give
better
> > chances of recovery and give BTRFS/btrfsck a chance to write correctly
> > to the newly copied image.
> 
> On what do you base that theory?  I suppose it could be, but nothing
> in the logs necessarily suggests that.  The "IO failure" is
because
> the fs shut down, went readonly, and subsequent IOs got -EIO,
> I think.
I''ve just used nc to transfer the filesystem to another system, there
were no
read errors so I don''t think that a SSD hardware failure is the problem
here.

I''m now getting similar problems running a 3.8 kernel with the
filesystem on a
loopback device.  I''ll provide more information soon.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Mar-13 19:19 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On Wed, Mar 13, 2013 at 08:03:53AM -0600, Russell Coker
wrote:> On Thu, 14 Mar 2013, Eric Sandeen <sandeen@redhat.com> wrote:
> > > It seems the SSD has bad blocks now, BTRFS seems to abuse SSD
disks, I
> > > burnt 1 SSD disk and 2 USB flash drive since I''m using
BTRFS, in about
> > > 2 months for each. ddrescue''ing the SSD would probably
give better
> > > chances of recovery and give BTRFS/btrfsck a chance to write
correctly
> > > to the newly copied image.
> > 
> > On what do you base that theory?  I suppose it could be, but nothing
> > in the logs necessarily suggests that.  The "IO failure" is
because
> > the fs shut down, went readonly, and subsequent IOs got -EIO,
> > I think.
> 
> I''ve just used nc to transfer the filesystem to another system,
there were no
> read errors so I don''t think that a SSD hardware failure is the
problem here.
> 
> I''m now getting similar problems running a 3.8 kernel with the
filesystem on a
> loopback device.  I''ll provide more information soon.
Bad key ordering is pretty rare, and it usually means memory
corruptions.  Are you reproducing this on the same machine or a
different one?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Russell Coker

2013-Mar-14 06:36 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On Thu, 14 Mar 2013, Chris Mason <chris.mason@fusionio.com>
wrote:> Bad key ordering is pretty rare, and it usually means memory
> corruptions.  Are you reproducing this on the same machine or a
> different one?
I''ve attached a kernel message log of mounting it on another system
(which
incidentally has ECC RAM) running the Debian package of kernel 3.8.2.  The end 
result of this was a system on which the sync command blocked in D state 
indefinitely and which couldn''t be rebooted in any way other than a
hardware
reset.

After that I ran btrfsck (which reported lots of errors) and it appeared to 
mount correctly.  I haven''t yet tried to verify the integrity of the
contents.

I''ve now run memtest86+ on the origin system and it reported some
memory
errors.  I''m now in the process of trying to determine what parts of
the
hardware failed.

So while the original corrupted filesystem was probably no fault of BTRFS the 
fact that another system with no hardware problem failed to operate correctly 
after trying to mount it seems to be a bug.

Thanks for your advice.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Martin Steigerwald

2013-Mar-14 09:48 UTC

head link

Re: Debian 3.7.1 BTRFS crash

Hi Jérôme,

Am Mittwoch, 13. März 2013 schrieb Jérôme Poulin:> On Tue, Mar 12, 2013 at 10:03 PM, Eric Sandeen <sandeen@redhat.com>
wrote:
> > [   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143:
> > IO failure [   37.176791] btrfs is forced readonly
> > [   37.176793] btrfs: run_one_delayed_ref returned -5
> 
> It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> burnt 1 SSD disk and 2 USB flash drive since I''m using BTRFS, in
about
> 2 months for each. ddrescue''ing the SSD would probably give better
> chances of recovery and give BTRFS/btrfsck a chance to write correctly
> to the newly copied image.
Well, the Intel SSD 320 in this ThinkPad T520 so far didn´t seem to notice
any significant abuse due to BTRFS in use:

smartctl-a-2013-03-14
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
5250
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
202408
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203778

Above value has always been in that range… according to a PDF from Intel the 
Media_Wearout_Indicator below is important.

233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
202408


I have about 20 GB / on BTRFS since the beginning. Thats almost 2 years now.

Now I also have 200GB /home on BTRFS, since a month or two. Granted this is
more data, but unless proven by observed I/O patterns or so, I suggest
being careful with suggestions that BTRFS abuses SSD disks out of just your
own experience and suggest to ask it as a question in case you do not know
for sure.

According to my irregular data points I also see no significant increase in
wear out after I switched BTRFS to /home although it is a bit premature to
say for sure - I will continue to have a look at it:

martin@merkaba:~/Computer/Merkaba/Intel SSD 320> for F in $(ls smartctl*) ;
do echo "$F" | cut -c1-21 ; egrep
"(Wear|Host_Writes|Erase_Fail_Count|
Power_On_Hours)" "$F" ; done
smartctl-a-2011-05-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
smartctl-a-2011-05-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
1
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
324
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19158
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19158
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
324
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19158
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19158
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
325
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19160
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19160
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
320
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19041
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
19041
smartctl-a-2011-12-16
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
271

Mow thats funny. Intel SSD went back in time? From 325 to 271 power on
hours in half year. I knew I had a time machine somewhere. I just forgot
where it is. :)

172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169

First occurence with erase failures. But didn´t raise after then.

No other error related occurences in other values so far :)

225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
66757
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203450
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
66757
smartctl-a-2011-12-16
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
271
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
66759
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203450
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
66759
smartctl-a-2011-12-16
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
271
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
66757
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203450
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
66757
smartctl-a-2012-07-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
2444
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
128105
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
314
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
128105
smartctl-a-2012-07-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
2443
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
127984
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
314
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
127984
smartctl-a-2012-07-30
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
2582
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
131072
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203604
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
131072
smartctl-a-2012-12-02
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
4023
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
170107
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203703
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
170107
smartctl-a-2013-02-22
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
5010
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
198165
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203768
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
198165
smartctl-a-2013-02-22
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
5010
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
198163
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203768
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
198163
smartctl-a-2013-03-14
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
5250
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -
169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
202408
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -
2203778
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -
0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -
202408

More than one data point on one day, is before and after self tests.

Basically the wear out related values didn´t change much at all. The
indicicative
Media_Wearout_Indicitor didn´t change at all.

I leave about 20 GB of the 300 GB free at most times, according to a paper
from Intel this helps long time performance and from my understanding of
SSD workings it also helps long evity.

Thanks,
-- 
Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Mar-14 13:04 UTC

head link

Re: Debian 3.7.1 BTRFS crash

On Thu, Mar 14, 2013 at 12:36:09AM -0600, Russell Coker
wrote:> On Thu, 14 Mar 2013, Chris Mason <chris.mason@fusionio.com> wrote:
> > Bad key ordering is pretty rare, and it usually means memory
> > corruptions.  Are you reproducing this on the same machine or a
> > different one?
> 
> I''ve attached a kernel message log of mounting it on another
system (which
> incidentally has ECC RAM) running the Debian package of kernel 3.8.2.  The
end
> result of this was a system on which the sync command blocked in D state 
> indefinitely and which couldn''t be rebooted in any way other than
a hardware
> reset.
Just to make sure I''ve got the sequence right, this is mounting the
same
corrupted image on a second system?

The end result of that should be some messages about the bad blocks we
found and then the FS forced readonly.  If not, you''re right there is
definitely a bug there.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Norbert Scheibner

2013-Mar-14 19:04 UTC

head link

Re: Debian 3.7.1 BTRFS crash

Am 13.03.2013, 12:31 Uhr, schrieb Swâmi Petaramesh <swami@petaramesh.org>:
> Le 13/03/2013 11:56, Bart Noordervliet a écrit :
>> USB flash drives are rubbish for any filesystem except FAT32 and then
>> still only gracefully accept large sequential writes. A few years ago
>> I thought it would be a good idea to put the root partition of a few
>> of my small Debian servers on USB flash, so that the harddisks could
>> spin down at night and I could easily prepare and switch a new
>> Debian-version. However, each and every USB stick got trashed within a
>> year
> I have an ARM box that runs a little Debian server (typically an
> advanced NAS), it uses an USB key as an ext2 root filesystem. Everything
> but big storage is there, and it''s been up and running 24/7 for 3+
years
> without any USB key incident...
The difference is the fs. Ext3 uses a journal which uses always the same
physical sectors on disc. If the disc is a hard disk, it does not matter,
rewrites are no problem for platters. If it is an modern SSD, the SSD-
controller takes care and redirects the writes to different physical
sectors. USB-sticks have no smart controller and so the writes hit
always the same physical sector, it''s like burning a hole in the flash
chip. If the commit time is standard for desktops set to 5 seconds, then
a whole year means a lot of writes to the same sector on an USB-stick.

Regards
   Norbert

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Steigerwald

2013-Mar-14 23:17 UTC

head link

Re: Debian 3.7.1 BTRFS crash

Am Donnerstag, 14. März 2013 schrieb Norbert Scheibner:> Am 13.03.2013, 12:31 Uhr, schrieb Swâmi Petaramesh
<swami@petaramesh.org>:
> > Le 13/03/2013 11:56, Bart Noordervliet a écrit :
> >> USB flash drives are rubbish for any filesystem except FAT32 and
then
> >> still only gracefully accept large sequential writes. A few years
ago
> >> I thought it would be a good idea to put the root partition of a
few
> >> of my small Debian servers on USB flash, so that the harddisks
could
> >> spin down at night and I could easily prepare and switch a new
> >> Debian-version. However, each and every USB stick got trashed
within a
> >> year
> > 
> > I have an ARM box that runs a little Debian server (typically an
> > advanced NAS), it uses an USB key as an ext2 root filesystem.
> > Everything but big storage is there, and it''s been up and
running 24/7
> > for 3+ years without any USB key incident...
> 
> The difference is the fs. Ext3 uses a journal which uses always the same
> physical sectors on disc. If the disc is a hard disk, it does not matter,
> rewrites are no problem for platters. If it is an modern SSD, the SSD-
> controller takes care and redirects the writes to different physical
> sectors. USB-sticks have no smart controller and so the writes hit
> always the same physical sector, it''s like burning a hole in the
flash
> chip. If the commit time is standard for desktops set to 5 seconds, then
> a whole year means a lot of writes to the same sector on an USB-stick.
Are you sure that modern, high quality USB sticks don´t do any wear 
leveling?

On some SD cards there is some FAT optimizition in place[1][2]. I.e. good 
random access at beginning of drive, where FAT table and thus random I/O 
metadata accesses are. Ext3 places metadata elsewhere - I believe in about 
the middle of the partition.

[1] Flash memory card design, FAT optimization

https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey

[2] Arnd Bergmann, Optimizing Linux with cheap flash drives

https://lwn.net/Articles/428584/

-- 
Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Mar 2013 - Debian 3.7.1 BTRFS crash

Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash

Re: Debian 3.7.1 BTRFS crash