thr3ads.net - Btrfs devel - error count [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Russell Coker

2013-Aug-04 11:42 UTC

error count

I''ve got a 3TB SATA disk that is known to have problems (it failed in a
zpool
for one of my clients).  For test purposes I''m running a BTRFS RAID-1
on two
partitions on that disk, bad for performance and not something you''d
normally
do but good for testing.

BTRFS recovers from read errors quite well and gives informative log messages.

But it doesn''t seem possible to get a count of the number of errors.  I
think
that at the minimum I should be able to get a count of the number of errors 
from a device since it was attached to the system.  I think that the ideal 
would be to have an error count stored on the device and available to the 
sysadmin.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bart Noordervliet

2013-Aug-04 13:37 UTC

head link

Re: error count

Hi Russell,

a sufficiently up-to-date kernel and btrfs tool will provide the
''btrfs device stats'' command, which should give you the info
you want.

Regards,

Bart


On Sun, Aug 4, 2013 at 1:42 PM, Russell Coker <russell@coker.com.au>
wrote:> I''ve got a 3TB SATA disk that is known to have problems (it failed
in a zpool
> for one of my clients).  For test purposes I''m running a BTRFS
RAID-1 on two
> partitions on that disk, bad for performance and not something
you''d normally
> do but good for testing.
>
> BTRFS recovers from read errors quite well and gives informative log
messages.
>
> But it doesn''t seem possible to get a count of the number of
errors.  I think
> that at the minimum I should be able to get a count of the number of errors
> from a device since it was attached to the system.  I think that the ideal
> would be to have an error count stored on the device and available to the
> sysadmin.
>
> --
> My Main Blog         http://etbe.coker.com.au/
> My Documents Blog    http://doc.coker.com.au/
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Samuel

2013-Aug-10 04:58 UTC

head link

Re: error count

On Sun, 4 Aug 2013 03:37:22 PM Bart Noordervliet wrote:
> a sufficiently up-to-date kernel and btrfs tool will provide the
> ''btrfs device stats'' command, which should give you the
info you want.
This is what it looks like:

chris@quad:~/Downloads/Linux/FileSystems/BtrFS/btrfs-progs$ sudo ./btrfs 
device stats /srv/DR
[/dev/sdb3].write_io_errs   0
[/dev/sdb3].read_io_errs    0
[/dev/sdb3].flush_io_errs   0
[/dev/sdb3].corruption_errs 0
[/dev/sdb3].generation_errs 0

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

Russell Coker

2013-Aug-10 09:19 UTC

head link

Re: error count

On Sat, 10 Aug 2013, Chris Samuel <chris@csamuel.org>
wrote:> On Sun, 4 Aug 2013 03:37:22 PM Bart Noordervliet wrote:
> > a sufficiently up-to-date kernel and btrfs tool will provide the
> > ''btrfs device stats'' command, which should give you
the info you want.
> 
> This is what it looks like:
> 
> chris@quad:~/Downloads/Linux/FileSystems/BtrFS/btrfs-progs$ sudo ./btrfs
> device stats /srv/DR
> [/dev/sdb3].write_io_errs   0
> [/dev/sdb3].read_io_errs    0
> [/dev/sdb3].flush_io_errs   0
> [/dev/sdb3].corruption_errs 0
> [/dev/sdb3].generation_errs 0
Thanks Chris and Bart.

Would it be possible to get the man page updated to include a brief 
description of those errors?  The first three are somewhat obvious in meaning 
(although not obvious in how they would happen) and the fourth is very 
obvious.  But what does generation_errs mean?  I''m seeing some on one
system.
Should I be concerned?  If I write a Nagios check which ones should be 
warnings and which ones errors?

Also why does it give the following errors about trying to open /dev/sr0 when 
using a BTRFS RAID-1 filesystem?  Below is for a RAID-1 over /dev/sdb and 
/dev/sdc.

# btrfs device stats /dev/sdb
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs    0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
# btrfs device stats /dev/sdc
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs    0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0

Why is it even searching for the other part when only a single device is 
specified and why can''t it give stats without checking /dev/sr0 when
checking
a single device when it can do so while checking all devices?

# btrfs device stats /dev/sdd1
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdd1].write_io_errs   0
[/dev/sdd1].read_io_errs    136
[/dev/sdd1].flush_io_errs   0
[/dev/sdd1].corruption_errs 0
[/dev/sdd1].generation_errs 0
# btrfs device stats /dev/sdd2
failed to open /dev/sr0: No medium found
failed to open /dev/sr0: No medium found
[/dev/sdd2].write_io_errs   0
[/dev/sdd2].read_io_errs    0
[/dev/sdd2].flush_io_errs   0
[/dev/sdd2].corruption_errs 0
[/dev/sdd2].generation_errs 0
# btrfs device stats /mnt/backup/
[/dev/sdd1].write_io_errs   0
[/dev/sdd1].read_io_errs    136
[/dev/sdd1].flush_io_errs   0
[/dev/sdd1].corruption_errs 0
[/dev/sdd1].generation_errs 0
[/dev/sdd2].write_io_errs   0
[/dev/sdd2].read_io_errs    0
[/dev/sdd2].flush_io_errs   0
[/dev/sdd2].corruption_errs 0
[/dev/sdd2].generation_errs 0


Thanks.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Samuel

2013-Aug-10 13:38 UTC

head link

Re: error count

On Sat, 10 Aug 2013 07:19:27 PM Russell Coker wrote:
> But what does generation_errs mean?  I''m seeing some on one
system.
> Should I be concerned?  If I write a Nagios check which ones should be 
> warnings and which ones errors?
All I know is that ioctl.h says:

BTRFS_DEV_STAT_GENERATION_ERRS, /* an indication that blocks have not
                                                           * been written */

Looking at the kernel code that only seems to get incremented during a scrub.  
The code that does that says:

                } else if (generation != le64_to_cpu(h->generation)) {
                        sblock->header_error = 1;
                        sblock->generation_error = 1;
                }

The generation there is from the btrfs inode structure, the header says:

        /* full 64 bit generation number, struct vfs_inode doesn''t have
a big
         * enough field for this.
         */
        u64 generation;

The wiki says:

https://btrfs.wiki.kernel.org/index.php/Glossary

# generation 
#   An internal counter which updates for each transaction. When a
# metadata block is written (using copy on write), current generation
# is stored in the block, so that blocks which are too new (and hence
# possibly inconsistent) can be identified.

and:

https://btrfs.wiki.kernel.org/index.php/Btrfs_design

# Everything that points to a btree block also stores the generation
# field it expects that block to have. This allows Btrfs to detect
# phantom or misplaced writes on the media.

HTH!
> Also why does it give the following errors about trying to open /dev/sr0
> when  using a BTRFS RAID-1 filesystem?  Below is for a RAID-1 over /dev/sdb
> and /dev/sdc.
I don''t get that here, I''m building btrfs-progs from git at
commit
194aa4a1bd6447bb545286d0bcb0b0be8204d79f (July 5th), aka:

btrfs-progs$ git describe --tags
v0.20-rc1-358-g194aa4a

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

Btrfs devel - Aug 2013 - error count

error count

Re: error count

Re: error count

Re: error count

Re: error count