thr3ads.net - Btrfs devel - Hardware failure or btrfs issue? [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Peter Chant

2013-Jul-01 22:56 UTC

Hardware failure or btrfs issue?

Sirs,

my recently slowing file system is now going read only after trying a 
defrag or other operation.  I''m wondering whether this is the result of
a hardware failure or a btrfs or some other issue.  Output of dmesg:

   127.750401] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  127.750494] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[  127.750590] Process btrfs-cleaner (pid: 1346, threadinfo 
ffff8800687ec000, task ffff88006d742a00)
[  127.750704] Stack:
[  127.750733]  ffff880068024c38 ffff88006a9a0438 ffff8800687ede48 
ffff880069928800
[  127.750850]  ffff88006d742a00 ffff88006d742a00 ffff88006d742a00 
0000000000000000
[  127.750968]  ffff8800687edeb8 ffffffff812b8c29 ffff880069928800 
0000000000000000
[  127.751085] Call Trace:
[  127.751122]  [<ffffffff812b8c29>] cleaner_kthread+0xa9/0x120
[  127.751200]  [<ffffffff812b8b80>] ? write_dev_flush.part.107+0xc0/0xc0
[  127.751289]  [<ffffffff81069450>] kthread+0xc0/0xd0
[  127.751354]  [<ffffffff81069390>] ? kthread_create_on_node+0x130/0x130
[  127.751444]  [<ffffffff816976dc>] ret_from_fork+0x7c/0xb0
[  127.751516]  [<ffffffff81069390>] ? kthread_create_on_node+0x130/0x130
[  127.751602] Code: 44 28 3f 85 c0 7f 83 31 d2 31 f6 4c 89 ff e8 f7 c5 
fe ff eb 84 0f 1f 44 00 00 48 83 c4 18 31 c0 5b 41 5c 41 5d 41 5e 41 5f 
5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 66 66 66 66 90 48
[  127.752207] RIP  [<ffffffff812c1611>] 
btrfs_clean_old_snapshots+0x131/0x140
[  127.752305]  RSP <ffff8800687ede38>
[  127.752371] ---[ end trace cc41fa39a41b468e ]---
[  127.862825] btrfs: corrupt leaf, bad key order: 
block=2837196627968,root=1, slot=121
[  127.862938] ------------[ cut here ]------------
[  127.863009] WARNING: at fs/btrfs/super.c:255 
__btrfs_abort_transaction+0xdf/0x100()
[  127.863110] Hardware name: System Product Name
[  127.863171] btrfs: Transaction aborted
[  127.863222] Modules linked in: usblp pl2303 usbserial hid_generic 
usbhid hid usb_storage lp ppdev parport_pc parport snd_hda_codec_via 
sp5100_tco acpi_cpufreq mperf freq_table kvm_amd kvm evdev radeon ttm 
drm_kms_helper psmouse drm serio_raw agpgart i2c_algo_bit microcode 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc i2c_piix4 
snd_timer snd atl1e ohci_hcd via_rhine i2c_core shpchp soundcore 
ehci_pci ehci_hcd mii wmi k10temp asus_atk0110 processor thermal_sys 
hwmon button
[  127.864073] Pid: 1347, comm: btrfs-transacti Tainted: G D      3.9.3 #1
[  127.864167] Call Trace:
[  127.864204]  [<ffffffff8104614f>] warn_slowpath_common+0x7f/0xc0
[  127.864285]  [<ffffffff81046246>] warn_slowpath_fmt+0x46/0x50
[  127.864370]  [<ffffffff812962ef>] __btrfs_abort_transaction+0xdf/0x100
[  127.864460]  [<ffffffff812a71f2>] __btrfs_free_extent+0x242/0x870
[  127.864543]  [<ffffffff813046bc>] ?
btrfs_merge_delayed_refs+0x1fc/0x3c0
[  127.870518]  [<ffffffff812ab59b>] run_clustered_refs+0x50b/0xc40
[  127.876503]  [<ffffffff81303813>] ? find_ref_head+0x83/0xf0
[  127.882501]  [<ffffffff812af6b0>] btrfs_run_delayed_refs+0xe0/0x570
[  127.882503]  [<ffffffff812bfb9a>] btrfs_commit_transaction+0xea/0xad0
[  127.882505]  [<ffffffff81069b90>] ? finish_wait+0x80/0x80
[  127.882513]  [<ffffffff812b8605>] transaction_kthread+0x1a5/0x220
[  127.882517]  [<ffffffff812b8460>] ? 
btree_readpage_end_io_hook+0x2a0/0x2a0
[  127.882520]  [<ffffffff81069450>] kthread+0xc0/0xd0
[  127.882521]  [<ffffffff81069390>] ? kthread_create_on_node+0x130/0x130
[  127.882523]  [<ffffffff816976dc>] ret_from_fork+0x7c/0xb0
[  127.882524]  [<ffffffff81069390>] ? kthread_create_on_node+0x130/0x130
[  127.882525] ---[ end trace cc41fa39a41b468f ]---
[  127.882527] BTRFS error (device sdb) in __btrfs_free_extent:5394: IO 
failure
[  127.882528] btrfs: run_one_delayed_ref returned -5
[  127.882529] BTRFS error (device sdb) in btrfs_run_delayed_refs:2565: 
IO failure

Not that I''ve done anything other than a cursory check but it looks
like
the read only data is fine.

Pete

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugo Mills

2013-Jul-02 07:29 UTC

head link

Re: Hardware failure or btrfs issue?

On Mon, Jul 01, 2013 at 11:56:30PM +0100, Peter Chant
wrote:> Sirs,
> 
> my recently slowing file system is now going read only after trying
> a defrag or other operation.  I''m wondering whether this is the
> result of a hardware failure or a btrfs or some other issue.  Output
> of dmesg:
[snip]> [  127.862825] btrfs: corrupt leaf, bad key order:
> block=2837196627968,root=1, slot=121[snip]

   This is usually an indication that you have bad hardware -- I''d
suggest testing RAM, PSU, CPU in that order. I''m not sure what, if
anything, can be done to fix the error on the disk right now.
> Not that I''ve done anything other than a cursory check but it
looks
> like the read only data is fine.
   Might be a good idea to use that to refresh your backups, just in
case my prediction about the fixability is correct.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==  PGP
key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- "How deep will this sub go?" "Oh,  she''ll go all
the way to ---
                    the bottom if we don''t stop her."

Peter Chant

2013-Jul-02 17:36 UTC

head link

Re: Hardware failure or btrfs issue?

On 07/02/2013 08:29 AM, Hugo Mills wrote:> This is usually an indication that you have bad hardware -- I''d 
> suggest testing RAM, PSU, CPU in that order. I''m not sure what, if
> anything, can be done to fix the error on the disk right now. 
Thanks, appreciated.

Hmm.  I''ve got one stick of ram out of the machine due to testing as I 
had some freezes last week.
If it were one of the RAM, PSU and CPU then I''m unsure why this IO
issue
only surfaces on the HDD and not the SSD.  I ordered a new HDD last 
night, before reading your post.  If its not the disk I''ll go raid1. 
If
it is the disk then I''ll probally find out.
>> Not that I''ve done anything other than a cursory check but it
looks
>> like the read only data is fine.
>     Might be a good idea to use that to refresh your backups, just in
> case my prediction about the fixability is correct.
Well, first option is to drop in the new disk, freshly format it and 
copy the data across (not add it as a second disk).  If that fails last 
backup was wednesday.  I''ve not done much of note since then apart from
try to fix the disk issues.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugo Mills

2013-Jul-02 17:48 UTC

head link

Re: Hardware failure or btrfs issue?

On Tue, Jul 02, 2013 at 06:36:48PM +0100, Peter Chant
wrote:> On 07/02/2013 08:29 AM, Hugo Mills wrote:
> >This is usually an indication that you have bad hardware --
I''d
> >suggest testing RAM, PSU, CPU in that order. I''m not sure
what, if
> >anything, can be done to fix the error on the disk right now.
> 
> Thanks, appreciated.
> 
> Hmm.  I''ve got one stick of ram out of the machine due to testing
as
> I had some freezes last week.
   So the damage probably happened then, if that stick is bad.
Filesystems have this irritating habit of remembering things done to
them across reboots. :)

   Hugo.
> If it were one of the RAM, PSU and CPU then I''m unsure why this IO
> issue only surfaces on the HDD and not the SSD.  I ordered a new HDD
> last night, before reading your post.  If its not the disk I''ll go
> raid1.  If it is the disk then I''ll probally find out.
> 
> >>Not that I''ve done anything other than a cursory check but
it looks
> >>like the read only data is fine.
> >    Might be a good idea to use that to refresh your backups, just in
> >case my prediction about the fixability is correct.
> 
> Well, first option is to drop in the new disk, freshly format it and
> copy the data across (not add it as a second disk).  If that fails
> last backup was wednesday.  I''ve not done much of note since then
> apart from try to fix the disk issues.
> 
> 
-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==  PGP
key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The glass is neither half-full nor half-empty; it is twice as ---  
                        large as it needs to be.

Peter Chant

2013-Jul-02 21:37 UTC

head link

Re: Hardware failure or btrfs issue?

On 07/02/2013 06:48 PM, Hugo Mills wrote:> So the damage probably happened then, if that stick is bad. 
> Filesystems have this irritating habit of remembering things done to 
> them across reboots. :) Hugo.
The previous action  to the defrag was to delete 48 hours worth of 
hourly snapshots.  I was wondering if the numerous snapshots were what 
was making defrag so painfully slow.  Not that I know anything about 
btrfs internals, but I suspect that is major enough action to catch out 
any random corruption if there was any.  I think I''ll restrict
snapshots
to once or twice a day at most unless that really should cause no issue.

Pete

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jul 2013 - Hardware failure or btrfs issue?

Hardware failure or btrfs issue?

Re: Hardware failure or btrfs issue?

Re: Hardware failure or btrfs issue?

Re: Hardware failure or btrfs issue?

Re: Hardware failure or btrfs issue?