thr3ads.net - zfs discuss - [zfs-discuss] Disk failing? High asvc

If this information is useful, please help other people find it:
Share via:

Jan Hellevik

2012-Feb-01 18:20 UTC

[zfs-discuss] Disk failing? High asvc_t and %b.

I suspect that something is wrong with one of my disks.

This is the output from iostat:

                            extended device statistics       ---- errors --- 
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
device
    2.0   18.9   38.1  160.9  0.0  0.1    0.1    3.2   0   6   0   0   0   0
c5d0
    2.7   18.8   59.3  160.9  0.0  0.1    0.2    3.2   0   6   0   0   0   0
c5d1
    0.0   36.8    1.1 3593.7  0.0  0.1    0.0    2.9   0   8   0   0   0   0
c6t66d0
    0.0   38.2    0.0 3693.7  0.0  0.2    0.0    4.6   0  12   0   0   0   0
c6t70d0
    0.0   38.1    0.0 3693.7  0.0  0.1    0.0    2.4   0   5   0   0   0   0
c6t74d0
    0.0   42.0    0.0 4155.4  0.0  0.0    0.0    0.6   0   2   0   0   0   0
c6t76d0
    0.0   36.9    0.0 3593.7  0.0  0.1    0.0    1.4   0   3   0   0   0   0
c6t78d0
    0.0   41.7    0.0 4155.4  0.0  0.0    0.0    1.2   0   4   0   0   0   0
c6t80d0

The disk in question is c6t70d0 - it shows consistently higher %b and asvc_t 
than the other disks in the pool. The output is from a ''zfs
receive'' after about 3 hours.
The two c5dx disks are the ''rpool'' mirror, the others belong
to the ''backup'' pool.

admin at master:~# zpool status
  pool: backup
 state: ONLINE
 scan: scrub repaired 0 in 5h7m with 0 errors on Tue Jan 31 04:55:31 2012
config:

        NAME         STATE     READ WRITE CKSUM
        backup       ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            c6t78d0  ONLINE       0     0     0
            c6t66d0  ONLINE       0     0     0
          mirror-1   ONLINE       0     0     0
            c6t70d0  ONLINE       0     0     0
            c6t74d0  ONLINE       0     0     0
          mirror-2   ONLINE       0     0     0
            c6t76d0  ONLINE       0     0     0
            c6t80d0  ONLINE       0     0     0

errors: No known data errors

admin at master:~# zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  4.53T  1.37T  3.16T    30%  1.00x  ONLINE  -

admin at master:~# uname -a
SunOS master 5.11 oi_148 i86pc i386 i86pc

Should I be worried? And what other commands can I use to investigate further?

Bob Friesenhahn

2012-Feb-01 18:43 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

On Wed, 1 Feb 2012, Jan Hellevik wrote:> The disk in question is c6t70d0 - it shows consistently higher %b and
asvc_t
> than the other disks in the pool. The output is from a ''zfs
receive'' after about 3 hours.
> The two c5dx disks are the ''rpool'' mirror, the others
belong to the ''backup'' pool.
Are all of the disks the same make and model?  What type of chassis 
are the disks mounted in?  Is it possible that the environment that 
this disk experiences is somehow different than the others (e.g. due 
to vibration)?
> Should I be worried? And what other commands can I use to investigate
further?
It is difficult to say if you should be worried.

Be sure to do ''iostat -xe'' to see if there are any
accumulating errors
related to the disk.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jan Hellevik

2012-Feb-01 18:55 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

Hi!

On Feb 1, 2012, at 7:43 PM, Bob Friesenhahn wrote:
> On Wed, 1 Feb 2012, Jan Hellevik wrote:
>> The disk in question is c6t70d0 - it shows consistently higher %b and
asvc_t
>> than the other disks in the pool. The output is from a ''zfs
receive'' after about 3 hours.
>> The two c5dx disks are the ''rpool'' mirror, the others
belong to the ''backup'' pool.
> 
> Are all of the disks the same make and model?  What type of chassis are the
disks mounted in?  Is it possible that the environment that this disk
experiences is somehow different than the others (e.g. due to vibration)?
They are different makes - I try to make pairs of different brands to minimise
risk.

The disks are in a Rackable Systems enclosure (disk shelf?). 16 disks, all SATA.
Connected to a SASUC8I controller on the server.

This is a backup server I recently put together to keep backups from my main
server. I put in the disks from the old ''backup'' pool and have
started a 2TB zfs send/receive from my main server. So far thing look ok, it is
just the somewhat high values on that one disk that worries me a little.
> 
>> Should I be worried? And what other commands can I use to investigate
further?
> 
> It is difficult to say if you should be worried.
> 
> Be sure to do ''iostat -xe'' to see if there are any
accumulating errors related to the disk.
> 
This is the most current output from iostat. It has been running a zfs receive
for more than a day. No errors. zpool status also reports no errors.


                            extended device statistics       ---- errors --- 
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
device
    8.1   18.7  142.5  180.4  0.0  0.1    0.1    3.2   0   8   0   0   0   0
c5d0
   10.2   18.7  186.3  180.4  0.0  0.1    0.1    3.3   0   9   0   0   0   0
c5d1
    0.0   36.7    0.0 3595.8  0.0  0.1    0.0    3.2   0   9   0   0   0   0
c6t66d0
    0.0   36.0    0.0 3642.2  0.0  0.1    0.0    3.9   0  12   0   0   0   0
c6t70d0
    0.0   36.1    0.0 3642.2  0.0  0.1    0.0    2.9   0   5   0   0   0   0
c6t74d0
    0.0   39.6    0.0 4071.8  0.0  0.0    0.0    0.7   0   2   0   0   0   0
c6t76d0
    0.2    0.0    0.3    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0
c6t77d0
    0.2   36.8    0.3 3595.8  0.0  0.1    0.0    1.9   0   4   0   0   0   0
c6t78d0
    0.2    0.0    0.3    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0
c6t79d0
    0.2   39.6    0.3 4071.6  0.0  0.1    0.0    1.6   0   5   0   0   0   0
c6t80d0
    0.2    0.0    0.3    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0
c6t81d0

admin at master:/export/home/admin$ zpool list         
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  4.53T  2.17T  2.36T    47%  1.00x  ONLINE  -

admin at master:/export/home/admin$ zpool status
  pool: backup
 state: ONLINE
 scan: scrub repaired 0 in 5h7m with 0 errors on Tue Jan 31 04:55:31 2012
config:

        NAME         STATE     READ WRITE CKSUM
        backup       ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            c6t78d0  ONLINE       0     0     0
            c6t66d0  ONLINE       0     0     0
          mirror-1   ONLINE       0     0     0
            c6t70d0  ONLINE       0     0     0
            c6t74d0  ONLINE       0     0     0
          mirror-2   ONLINE       0     0     0
            c6t76d0  ONLINE       0     0     0
            c6t80d0  ONLINE       0     0     0

errors: No known data errors

Cindy Swearingen

2012-Feb-01 18:56 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

Hi Jan,

These commands will tell you if FMA faults are logged:

# fmdump
# fmadm faulty

This command will tell you if errors are accumulating on this
disk:

# fmdump -eV | more

Thanks,

Cindy

On 02/01/12 11:20, Jan Hellevik wrote:> I suspect that something is wrong with one of my disks.
>
> This is the output from iostat:
>
>                              extended device statistics       ---- errors
---
>      r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn
tot device
>      2.0   18.9   38.1  160.9  0.0  0.1    0.1    3.2   0   6   0   0   0  
0 c5d0
>      2.7   18.8   59.3  160.9  0.0  0.1    0.2    3.2   0   6   0   0   0  
0 c5d1
>      0.0   36.8    1.1 3593.7  0.0  0.1    0.0    2.9   0   8   0   0   0  
0 c6t66d0
>      0.0   38.2    0.0 3693.7  0.0  0.2    0.0    4.6   0  12   0   0   0  
0 c6t70d0
>      0.0   38.1    0.0 3693.7  0.0  0.1    0.0    2.4   0   5   0   0   0  
0 c6t74d0
>      0.0   42.0    0.0 4155.4  0.0  0.0    0.0    0.6   0   2   0   0   0  
0 c6t76d0
>      0.0   36.9    0.0 3593.7  0.0  0.1    0.0    1.4   0   3   0   0   0  
0 c6t78d0
>      0.0   41.7    0.0 4155.4  0.0  0.0    0.0    1.2   0   4   0   0   0  
0 c6t80d0
>
> The disk in question is c6t70d0 - it shows consistently higher %b and
asvc_t
> than the other disks in the pool. The output is from a ''zfs
receive'' after about 3 hours.
> The two c5dx disks are the ''rpool'' mirror, the others
belong to the ''backup'' pool.
>
> admin at master:~# zpool status
>    pool: backup
>   state: ONLINE
>   scan: scrub repaired 0 in 5h7m with 0 errors on Tue Jan 31 04:55:31 2012
> config:
>
>          NAME         STATE     READ WRITE CKSUM
>          backup       ONLINE       0     0     0
>            mirror-0   ONLINE       0     0     0
>              c6t78d0  ONLINE       0     0     0
>              c6t66d0  ONLINE       0     0     0
>            mirror-1   ONLINE       0     0     0
>              c6t70d0  ONLINE       0     0     0
>              c6t74d0  ONLINE       0     0     0
>            mirror-2   ONLINE       0     0     0
>              c6t76d0  ONLINE       0     0     0
>              c6t80d0  ONLINE       0     0     0
>
> errors: No known data errors
>
> admin at master:~# zpool list
> NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> backup  4.53T  1.37T  3.16T    30%  1.00x  ONLINE  -
>
> admin at master:~# uname -a
> SunOS master 5.11 oi_148 i86pc i386 i86pc
>
> Should I be worried? And what other commands can I use to investigate
further?
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2012-Feb-01 19:07 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

On Wed, 1 Feb 2012, Jan Hellevik wrote:>>
>> Are all of the disks the same make and model?
>
> They are different makes - I try to make pairs of different brands to
minimise risk.
Does your pairing maintain the same pattern of disk type across all 
the pairings?

Some modern disks use 4k sectors while others still use 512 bytes.  If 
the slow disk is a 4k sector model but the others are 512 byte models, 
then that would certainly explain a difference.

Assuming that a couple of your disks are still unused, you could try 
replacing the suspect drive with an unused drive (via zfs command) to 
see if the slowness goes away. You could also make that vdev a 
triple-mirror since it is very easy to add/remove drives from a mirror 
vdev.  Just make sure that your zfs syntax is correct so that you 
don''t accidentally add a single-drive vdev to the pool (oops!). 
These sorts of things can be tested with zfs commands without 
physically moving/removing drives or endangering your data.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jan Hellevik

2012-Feb-01 19:41 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

On Feb 1, 2012, at 8:07 PM, Bob Friesenhahn wrote:
> On Wed, 1 Feb 2012, Jan Hellevik wrote:
>>> 
>>> Are all of the disks the same make and model?
>> 
>> They are different makes - I try to make pairs of different brands to
minimise risk.
> 
> Does your pairing maintain the same pattern of disk type across all the
pairings?
> 
Not 100% percent sure I understand what you mean (english is not my first
language).
These are the disks:
mirror-0: wd15ears + hd154ui
mirror-1: wd15ears + hd154ui
mirror-2: wd20ears + hd204ui

Two pairs of 1.5TB and one pair of 2.0TB. I would like to have pairs of the same
size, but these were the disks I had available, and since it is a backup pool I
do not think it matters that much. If the flooding hadn''t tripled the
price of disks I would probably buy a few more, but not with the current price
level. :-(

I am waiting for a replacement 1.5TB disk and will replace the
''bad'' one as soon as I get it.
> Some modern disks use 4k sectors while others still use 512 bytes.  If the
slow disk is a 4k sector model but the others are 512 byte models, then that
would certainly explain a difference.
> 
AVAILABLE DISK SELECTIONS:
       0. c5d0 <?????xH?????????????0?0"??? cyl 14590 alt 2 hd 255 sec
63>
       1. c5d1 <?????xH?????????????0?0"??? cyl 14590 alt 2 hd 255 sec
63>
       2. c6t66d0 <ATA-WDC WD15EARS-00Z-0A80-1.36TB>
       3. c6t67d0 <ATA-SAMSUNG HD501LJ-0-12-465.76GB>
       4. c6t68d0 <ATA-WDC WD6400AAKS-2-3B01-596.17GB>
       5. c6t69d0 <ATA-SAMSUNG HD501LJ-0-12-465.76GB>
       6. c6t70d0 <ATA-WDC WD15EARS-00Z-0A80-1.36TB>
       7. c6t71d0 <ATA-SAMSUNG HD501LJ-0-13-465.76GB>
       8. c6t72d0 <ATA    -WDC WD6400AAKS--3B01 cyl 38909 alt 2 hd 255 sec
126>
       9. c6t73d0 <ATA-SAMSUNG HD501LJ-0-13-465.76GB>
      10. c6t74d0 <ATA-SAMSUNG HD154UI-1118-1.36TB>
      11. c6t75d0 <ATA-SAMSUNG HD501LJ-0-11-465.76GB>
      12. c6t76d0 <ATA-SAMSUNG HD204UI-0001-1.82TB>
      13. c6t77d0 <ATA-SAMSUNG HD501LJ-0-11-465.76GB>
      14. c6t78d0 <ATA-SAMSUNG HD154UI-1118-1.36TB>
      15. c6t79d0 <ATA-SAMSUNG HD501LJ-0-11-465.76GB>
      16. c6t80d0 <ATA-WDC WD20EARS-00M-AB51-1.82TB>
      17. c6t81d0 <ATA-SAMSUNG HD501LJ-0-11-465.76GB>

mirror-0
       2. c6t66d0 <ATA-WDC WD15EARS-00Z-0A80-1.36TB>
      14. c6t78d0 <ATA-SAMSUNG HD154UI-1118-1.36TB>
mirror-1
       6. c6t70d0 <ATA-WDC WD15EARS-00Z-0A80-1.36TB>
      10. c6t74d0 <ATA-SAMSUNG HD154UI-1118-1.36TB>
mirror-2
      12. c6t76d0 <ATA-SAMSUNG HD204UI-0001-1.82TB>
      16. c6t80d0 <ATA-WDC WD20EARS-00M-AB51-1.82TB>

You can see that mirror-0 and mirror-1 have identical disk pairs.

BTW: Can someone explain why this:
       8. c6t72d0 <ATA    -WDC WD6400AAKS--3B01 cyl 38909 alt 2 hd 255 sec
126>
is not shown the same way as this:
       4. c6t68d0 <ATA-WDC WD6400AAKS-2-3B01-596.17GB>

Why the cylinder/sector in line 8?
> Assuming that a couple of your disks are still unused, you could try
replacing the suspect drive with an unused drive (via zfs command) to see if the
slowness goes away. You could also make that vdev a triple-mirror since it is
very easy to add/remove drives from a mirror vdev.  Just make sure that your zfs
syntax is correct so that you don''t accidentally add a single-drive
vdev to the pool (oops!). These sorts of things can be tested with zfs commands
without physically moving/removing drives or endangering your data.
> 
If I had available disks, I would. As of now, the are all busy. :-)

Thanks for the advice!
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Christian Meier

2012-Feb-03 09:18 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

Hello Jan> BTW: Can someone explain why this:
>        8. c6t72d0 <ATA    -WDC WD6400AAKS--3B01 cyl 38909 alt 2 hd 255
sec 126>
> is not shown the same way as this:
>        4. c6t68d0 <ATA-WDC WD6400AAKS-2-3B01-596.17GB>
>
> Why the cylinder/sector in line 8?As I know this is depending on the Format Label you have
SMI or EFI

what does the prtvtoc shows you?

S0013(root)#~> prtvtoc /dev/dsk/<diskname>s2
* /dev/dsk/<diskname>s2 partition map
*
* Dimensions:
*     512 bytes/sector
* 2097152 sectors
* 2097085 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*       First     Sector    Last
*       Sector     Count    Sector
*          34       222       255
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      4    00        256   2080479   2080734
       8     11    00    2080735     16384   2097118 <<<< indicates
EFI
Label

S0013(root)#~> prtvtoc /dev/dsk/<diskname>s2
* /dev/dsk/c1t0d0s2 (volume "ROOTDISK") partition map
*
* Dimensions:
*     512 bytes/sector
*     255 sectors/track
*      16 tracks/cylinder
*    4080 sectors/cylinder
*   38309 cylinders
*   38307 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00          0 156292560 156292559
       2      5    00          0 156292560 156292559  <<<<<<
indicates
SMI Label


      19. c0t<diskname>d0 <SUN-SOLARIS-1-1.00GB>
      24. c1t<diskname>d0 <DEFAULT cyl 38307 alt 2 hd 16 sec 255> 
ROOTDISK


Regards Christian

Jan Hellevik

2012-Feb-05 10:21 UTC

head link

[zfs-discuss] Disk failing? High asvc_t and %b.

Hi!

You were right. It turns out that the disks were not part of a pool yet. One of
them had previously been used in a pool in another machine, but one of them had
been used somewhere else (Ubuntu or OS X), and that explains it. After I put
them to use in a pool, ''format'' show what I expected:

       4. c6t68d0 <ATA-WDC WD6400AAKS-2-3B01-596.17GB>
          /pci at 0,0/pci1022,9603 at 2/pci1000,3140 at 0/sd at 44,0
       8. c6t72d0 <ATA-WDC WD6400AAKS-2-3B01-596.17GB>
          /pci at 0,0/pci1022,9603 at 2/pci1000,3140 at 0/sd at 48,0


Thank you for the explanation!


On Feb 3, 2012, at 12:02 PM, Christian Meier wrote:
> Hello Jan, 
> 
> I''m not sure I you saw my answer, because I answered to the
mailing List
> 
>> > BTW: Can someone explain why this:
>> >        8. c6t72d0 <ATA    -WDC WD6400AAKS--3B01 cyl 38909 alt 2
hd 255 sec 126>
>> > is not shown the same way as this:
>> >        4. c6t68d0 <ATA-WDC WD6400AAKS-2-3B01-596.17GB>
>> >
>> > Why the cylinder/sector in line 8?
> As I know this is depending on the Format Label you have
> SMI or EFI
> 
> what does the prtvtoc shows you?
> 
> S0013(root)#~> prtvtoc /dev/dsk/<diskname>s2
> * /dev/dsk/<diskname>s2 partition map
> *
> * Dimensions:
> *     512 bytes/sector
> * 2097152 sectors
> * 2097085 accessible sectors
> *
> * Flags:
> *   1: unmountable
> *  10: read-only
> *
> * Unallocated space:
> *       First     Sector    Last
> *       Sector     Count    Sector
> *          34       222       255
> *
> *                          First     Sector    Last
> * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
>        0      4    00        256   2080479   2080734
>        8     11    00    2080735     16384   2097118 <<<<
indicates EFI
> Label
> 
> S0013(root)#~> prtvtoc /dev/dsk/<diskname>s2
> * /dev/dsk/c1t0d0s2 (volume "ROOTDISK") partition map
> *
> * Dimensions:
> *     512 bytes/sector
> *     255 sectors/track
> *      16 tracks/cylinder
> *    4080 sectors/cylinder
> *   38309 cylinders
> *   38307 accessible cylinders
> *
> * Flags:
> *   1: unmountable
> *  10: read-only
> *
> *                          First     Sector    Last
> * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
>        0      2    00          0 156292560 156292559
>        2      5    00          0 156292560 156292559 
<<<<<< indicates
> SMI Label
> 
> 
>       19. c0t<diskname>d0 <SUN-SOLARIS-1-1.00GB>
>       24. c1t<diskname>d0 <DEFAULT cyl 38307 alt 2 hd 16 sec
255>  ROOTDISK
> 
> 
> Regards Christian
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120205/e669353e/attachment-0001.html>

zfs discuss - Feb 2012 - Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.

[zfs-discuss] Disk failing? High asvc_t and %b.