thr3ads.net - Btrfs devel - csum failed during rebalance [Jun 2013]

If this information is useful, please help other people find it:
Share via:

John Haller

2013-Jun-03 02:05 UTC

csum failed during rebalance

Hi,

I added a new drive to an existing RAID 0 array. Every
attempt to rebalance the array fails:
# btrfs filesystem balance /share/bd8
ERROR: error during balancing ''/share/bd8'' - Input/output
error
# dmesg | tail
btrfs: found 1 extents
btrfs: relocating block group 10752513540096 flags 1
btrfs: found 5 extents
btrfs: found 5 extents
btrfs: relocating block group 10751439798272 flags 1
btrfs: found 1 extents
btrfs: found 1 extents
btrfs: relocating block group 10048138903552 flags 1
btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028

An earlier rebalance attempt had the same csum error on a different inode:
btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028

Every rebalance attempt fails the same way, but with a different inum.

Here is the array:
# btrfs filesystem show
Label: ''bd8''  uuid: b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
        Total devices 4 FS bytes used 7.37TB
        devid    4 size 3.64TB used 52.00GB path /dev/sde
        devid    1 size 3.64TB used 3.32TB path /dev/sdf1
        devid    3 size 3.64TB used 2.92TB path /dev/sdc
        devid    2 size 3.64TB used 2.97TB path /dev/sdb

While I didn''t finish the scrub, no errors were found:
# btrfs scrub status -d /share/bd8
scrub status for b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
scrub device /dev/sdf1 (id 1) status
        scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360 seconds
        total bytes scrubbed: 845.53GB with 0 errors
scrub device /dev/sdb (id 2) status
        scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360 seconds
        total bytes scrubbed: 869.38GB with 0 errors
scrub device /dev/sdc (id 3) status
        scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360 seconds
        total bytes scrubbed: 706.04GB with 0 errors
scrub device /dev/sde (id 4) history
        scrub started at Sun Jun  2 12:48:36 2013 and finished after 0 seconds
        total bytes scrubbed: 0.00 with 0 errors

Mount options:
/dev/sdf1 on /share/bd8 type btrfs (rw,flushoncommit)

Kernel 3.9.4

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John Haller

2013-Jun-04 02:45 UTC

head link

Re: csum failed during rebalance

On Sun, Jun 2, 2013 at 9:05 PM, John Haller <john.h.haller@gmail.com>
wrote:> Hi,
>
> I added a new drive to an existing RAID 0 array. Every
> attempt to rebalance the array fails:
> # btrfs filesystem balance /share/bd8
> btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
> btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
>
> An earlier rebalance attempt had the same csum error on a different inode:
> btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
> btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
>
> Every rebalance attempt fails the same way, but with a different inum.
>
Final scrub results:
btrfs: checksum error at logical 9524548104192 on dev /dev/sdc, sector
5252042552, root 5, inode 6754, offset 14188032000, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
btrfs: unable to fixup (regular) error at logical 9524548104192 on dev /dev/sdc
btrfs: checksum error at logical 9531724152832 on dev /dev/sdc, sector
5266058272, root 5, inode 6755, offset 1801699328, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
btrfs: unable to fixup (regular) error at logical 9531724152832 on dev /dev/sdc
btrfs: checksum error at logical 9628551053312 on dev /dev/sdc, sector
5455173312, root 5, inode 6757, offset 10686889984, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
btrfs: unable to fixup (regular) error at logical 9628551053312 on dev /dev/sdc
btrfs: checksum error at logical 9645596147712 on dev /dev/sdc, sector
5488464512, root 5, inode 6757, offset 22100770816, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
btrfs: unable to fixup (regular) error at logical 9645596147712 on dev /dev/sdc
btrfs: checksum error at logical 9662878707712 on dev /dev/sdc, sector
5522219512, root 5, inode 6758, offset 1697771520, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
btrfs: unable to fixup (regular) error at logical 9662878707712 on dev /dev/sdc
btrfs: checksum error at logical 9967720464384 on dev /dev/sdc, sector
6117613568, root 5, inode 6767, offset 19135102976, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
btrfs: unable to fixup (regular) error at logical 9967720464384 on dev /dev/sdc
btrfs: checksum error at logical 10048360648704 on dev /dev/sdc,
sector 6275113928, root 5, inode 6771, offset 4187852800, length 4096,
links 1 (path: bd6/...)
btrfs: bdev /dev/sdc errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
btrfs: unable to fixup (regular) error at logical 10048360648704 on dev /dev/sdc
btrfs: checksum error at logical 6748601905152 on dev /dev/sdb, sector
6199388792, root 5, inode 6782, offset 7338291200, length 4096, links
1 (path: bd6/...)
btrfs: bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 1, gen 0


reading one of the files yields the following in dmesg:
btrfs csum failed ino 6782 off 7338291200 csum 444044750 private 1783974550

But, none of these reflect the inodes of the original csum failure.

So, what''s causing the problem at offset 221745152 that didn''t
show up
in the scrub, but''s associated with multiple inodes?

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John Haller

2013-Jun-19 23:55 UTC

head link

Re: csum failed during rebalance

On Sun, Jun 2, 2013 at 9:05 PM, John Haller <john.h.haller@...>
wrote:> Hi,
>
> I added a new drive to an existing RAID 0 array. Every
> attempt to rebalance the array fails:
> # btrfs filesystem balance /share/bd8
> ERROR: error during balancing ''/share/bd8'' - Input/output
error
> # dmesg | tail
> btrfs: found 1 extents
> btrfs: relocating block group 10752513540096 flags 1
> btrfs: found 5 extents
> btrfs: found 5 extents
> btrfs: relocating block group 10751439798272 flags 1
> btrfs: found 1 extents
> btrfs: found 1 extents
> btrfs: relocating block group 10048138903552 flags 1
> btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
> btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
>
> An earlier rebalance attempt had the same csum error on a different inode:
> btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
> btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
>
> Every rebalance attempt fails the same way, but with a different inum.
>
> Here is the array:
> # btrfs filesystem show
> Label: ''bd8''  uuid: b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
>         Total devices 4 FS bytes used 7.37TB
>         devid    4 size 3.64TB used 52.00GB path /dev/sde
>         devid    1 size 3.64TB used 3.32TB path /dev/sdf1
>         devid    3 size 3.64TB used 2.92TB path /dev/sdc
>         devid    2 size 3.64TB used 2.97TB path /dev/sdb
>
> While I didn''t finish the scrub, no errors were found:
> # btrfs scrub status -d /share/bd8
> scrub status for b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
> scrub device /dev/sdf1 (id 1) status
>         scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360
seconds
>         total bytes scrubbed: 845.53GB with 0 errors
> scrub device /dev/sdb (id 2) status
>         scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360
seconds
>         total bytes scrubbed: 869.38GB with 0 errors
> scrub device /dev/sdc (id 3) status
>         scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360
seconds
>         total bytes scrubbed: 706.04GB with 0 errors
> scrub device /dev/sde (id 4) history
>         scrub started at Sun Jun  2 12:48:36 2013 and finished after 0
seconds
>         total bytes scrubbed: 0.00 with 0 errors
>
> Mount options:
> /dev/sdf1 on /share/bd8 type btrfs (rw,flushoncommit)
>
> Kernel 3.9.4
>
> John
After cleaning up the scrub, the balance succeeded. The failure
messages from dmesg from the balance were not helpful in finding bad
sectors, only the scrub dmesg pointed to the right files with errors.

Now, the question is why did the balance leave things so unbalanced as
compared with above:
# btrfs scrub status /share/bd8
scrub status for b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
        scrub started at Mon Jun 17 23:07:01 2013 and finished after
39209 seconds
        total bytes scrubbed: 7.49TB with 0 errors

# btrfs filesystem show
Label: ''bd8''  uuid: b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
        Total devices 4 FS bytes used 7.49TB
        devid    4 size 3.64TB used 1.99TB path /dev/sdf
        devid    1 size 3.64TB used 3.32TB path /dev/sdg1
        devid    3 size 3.64TB used 1.99TB path /dev/sdc
        devid    2 size 3.64TB used 1.97TB path /dev/sdb

Btrfs v0.20-rc1

It appears that devid 1 was never balanced. Note that the drive
numbers are different because I still have the backup device connected
which had the originals of corrupted files. The filesystem started
with devid 1, was filled to the above capacity, and the other drives
were added later, so it didn''t start as a RAID 0 system. The metadata
is RAID 1.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Murphy

2013-Jun-20 00:54 UTC

head link

Re: csum failed during rebalance

On Jun 19, 2013, at 5:55 PM, John Haller <john.h.haller@gmail.com>
wrote:> 
> It appears that devid 1 was never balanced. Note that the drive
> numbers are different because I still have the backup device connected
> which had the originals of corrupted files. The filesystem started
> with devid 1, was filled to the above capacity, and the other drives
> were added later, so it didn''t start as a RAID 0 system.
Did you balance with -dconvert raid0 when you added the additional drives?


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maybe Matching Threads

Search for more maybe matching threads

Btrfs devel - Jun 2013 - csum failed during rebalance

csum failed during rebalance

Re: csum failed during rebalance

Re: csum failed during rebalance

Re: csum failed during rebalance

Maybe Matching Threads