thr3ads.net - zfs discuss - [zfs-discuss] strange pool disks usage pattern [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Maurilio Longo

2009-Oct-02 07:41 UTC

[zfs-discuss] strange pool disks usage pattern

Hi,

I have a pc with a MARVELL AOC-SAT2-MV8 controller and a pool made up of a six
disks in a raid-z pool with a hot spare.

<pre>
-bash-3.2$ /sbin/zpool status
  pool: nas
 stato: ONLINE
 scrub: scrub in progress for 9h4m, 81,59% done, 2h2m to go
config:

        NAME        STATE     READ WRITE CKSUM
        nas         ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
        dischi di riserva
          c2t7d0    AVAIL

errori: nessun errore di dati rilevato
</pre>

Now, the problem is that issuing an

iostat -Cmnx 10 

or any other time intervall, I''ve seen, sometimes, a complete stall of
disk I/O due to a disk in the pool (not always the same) being 100% busy.

<pre>

$ iostat -Cmnx 10 

   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,0    0,3    0,0    2,0  0,0  0,0    0,0    0,1   0   0 c1
    0,0    0,3    0,0    2,0  0,0  0,0    0,0    0,1   0   0 c1t0d0
 1852,1  297,0 13014,9 4558,4  9,2  1,6    4,3    0,7   2 158 c2
  311,8   61,3 2185,3  750,7  2,0  0,3    5,5    0,7  17  25 c2t0d0
  309,5   34,7 2207,2  769,5  1,6  0,5    4,7    1,4  41  47 c2t1d0
  309,3   36,3 2173,0  770,0  1,0  0,3    2,9    0,7  18  26 c2t2d0
  296,0   65,5 2057,3  749,2  2,1  0,2    5,9    0,6  16  23 c2t3d0
  313,3   64,1 2187,3  748,8  1,7  0,2    4,6    0,5  15  21 c2t4d0
  311,9   35,1 2204,8  770,1  0,7  0,2    2,1    0,5  11  17 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,4   14,7    3,2   30,4  0,0  0,2    0,0   13,2   0   2 c1
    0,4   14,7    3,2   30,4  0,0  0,2    0,0   13,2   0   2 c1t0d0
    1,7    0,0   58,9    0,0  3,0  1,0 1766,4  593,1   2 101 c2
    0,3    0,0    7,7    0,0  0,0  0,0    0,3    0,4   0   0 c2t0d0
    0,3    0,0   11,5    0,0  0,0  0,0    4,4    8,4   0   0 c2t1d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
    0,4    0,0   14,1    0,0  0,0  0,0    0,4    6,6   0   0 c2t3d0
    0,4    0,0   14,1    0,0  0,0  0,0    0,3    2,5   0   0 c2t4d0
    0,3    0,0   11,5    0,0  0,0  0,0    3,6    6,9   0   0 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,0    3,1    0,0    3,1  0,0  0,0    0,0    0,7   0   0 c1
    0,0    3,1    0,0    3,1  0,0  0,0    0,0    0,7   0   0 c1t0d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0   2 100 c2
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t0d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t1d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t3d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t4d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,0    0,1    0,0    0,4  0,0  0,0    0,0    1,2   0   0 c1
    0,0    0,1    0,0    0,4  0,0  0,0    0,0    1,2   0   0 c1t0d0
    0,0   29,5    0,0  320,2  3,4  1,0  113,9   34,6   2 102 c2
    0,0    6,9    0,0   63,3  0,1  0,0   12,6    0,7   0   0 c2t0d0
    0,0    4,4    0,0   65,5  0,0  0,0    8,7    0,8   0   0 c2t1d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
    0,0    7,4    0,0   62,7  0,1  0,0   15,4    0,8   1   1 c2t3d0
    0,0    6,8    0,0   63,6  0,1  0,0   13,2    0,7   0   0 c2t4d0
    0,0    4,0    0,0   65,1  0,0  0,0    7,9    0,7   0   0 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,0    0,3    0,0    2,4  0,0  0,0    0,0    0,1   0   0 c1
    0,0    0,3    0,0    2,4  0,0  0,0    0,0    0,1   0   0 c1t0d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0   2 100 c2
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t0d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t1d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t3d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t4d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,5    3,5    3,8   17,0  0,0  0,0    0,0    0,9   0   0 c1
    0,5    3,5    3,8   17,0  0,0  0,0    0,0    0,9   0   0 c1t0d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0   2 100 c2
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t0d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t1d0
    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t3d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t4d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,1    4,2    0,8    6,5  0,0  0,0    0,0    1,2   0   0 c1
    0,1    4,2    0,8    6,5  0,0  0,0    0,0    1,2   0   0 c1t0d0
   93,2   84,7 1630,3  439,2  4,5  1,3   25,5    7,1   3 127 c2
   15,0   15,5  247,8   60,5  0,3  0,1    9,5    2,1   5   7 c2t0d0
   14,5   10,7  281,3   63,8  0,3  0,1   12,2    2,7   6   7 c2t1d0
   16,8   16,6  321,0  129,1  3,1  0,9   92,7   28,4  96  95 c2t2d0
   17,2   15,5  262,7   60,8  0,3  0,1    9,6    2,1   6   7 c2t3d0
   16,5   15,1  237,1   61,0  0,3  0,1    8,8    1,9   6   6 c2t4d0
   13,3   11,4  280,5   64,1  0,3  0,1   10,7    2,6   6   7 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0,0    3,9    0,0   30,3  0,0  0,0    0,0    0,1   0   0 c1
    0,0    3,9    0,0   30,3  0,0  0,0    0,0    0,1   0   0 c1t0d0
 2941,0  561,1 22458,3 5278,8 11,4  2,4    3,3    0,7   4 235 c2
  499,2  118,7 3773,8  859,2  2,7  0,4    4,3    0,7  35  44 c2t0d0
  503,7   69,8 3916,4  898,9  1,4  0,4    2,4    0,7  31  38 c2t1d0
  473,3   70,7 3800,4  899,3  1,2  0,4    2,1    0,7  28  35 c2t2d0
  500,9  113,8 3725,2  861,9  2,6  0,4    4,2    0,7  33  41 c2t3d0
  485,3  119,5 3537,0  861,0  2,4  0,4    4,0    0,7  32  40 c2t4d0
  478,7   68,7 3705,4  898,5  1,2  0,4    2,2    0,7  28  37 c2t5d0
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
</pre>

In this case it was c2t2d0 and it blocked the pool for 30 or 40 seconds.

/var/adm/messages does not contain anything related to the pool.

What can it be?

Thanks.

Maurilio.
-- 
This message posted from opensolaris.org

Carson Gaspar

2009-Oct-02 08:23 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Maurilio Longo wrote:> Hi,
> 
> I have a pc with a MARVELL AOC-SAT2-MV8 controller and a pool made up of a
> six disks in a raid-z pool with a hot spare.
...> Now, the problem is that issuing an
> 
> iostat -Cmnx 10
> 
> or any other time intervall, I''ve seen, sometimes, a complete
stall of disk
> I/O due to a disk in the pool (not always the same) being 100% busy.
...> In this case it was c2t2d0 and it blocked the pool for 30 or 40 seconds.
> 
> /var/adm/messages does not contain anything related to the pool.
> 
> What can it be?
This usually means you have either a driver bug, a bad controller, or a bad disk

The marvell driver bug sometimes manifested in this way, but you would have seen
bus resets in your error logs.

Given you have exactly one outstanding transaction on the "stuck"
disk, I
suspect the disk is busy doing error recovery.

Speaking from my recent extremely painful experience, replace that disk ASAP.

-- 
Carson

Maurilio Longo

2009-Oct-02 09:15 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Carson,

the strange thing is that this is happening on several disks (can it be that are
all failing?)

What is the controller bug you''re talking about? I''m running
snv_114 on this pc, so it is fairly recent.

Best regards.

Maurilio.
-- 
This message posted from opensolaris.org

Carson Gaspar

2009-Oct-02 09:19 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Maurilio Longo wrote:> the strange thing is that this is happening on several disks (can it be
that
> are all failing?)
Possible, but less likely. I''d suggest running some disk I/O tests,
looking at
the drive error counters before/after.
> What is the controller bug you''re talking about? I''m
running snv_114 on this
> pc, so it is fairly recent.
There was a bug in the marvell driver for the controller used on the X4500 that 
caused bus hangs / resets. It was fixed around U6, so it should be long gone 
from OpenSolaris. But perhaps there''s a different bug?

You could also have a firmware bug on your disks. You might try lowering the 
number of tagged commands per disk and see if that helps at all.

-- 
Carson

Maurilio Longo

2009-Oct-02 09:43 UTC

head link

[zfs-discuss] strange pool disks usage pattern

> Possible, but less likely. I''d suggest running some
> disk I/O tests, looking at 
> the drive error counters before/after.
> 
These disks have a few months of life and are scrubbed weekly, no errors so far.

I did try to use smartmontools, but it cannot report SMART logs nor start SMART
tests, so I don''t know how to look at their internal state.
> You could also have a firmware bug on your disks. You
> might try lowering the 
> number of tagged commands per disk and see if that
> helps at all.
from man marvell88sx I read that this driver has no tunable parameters, so I
don''t know how I could change NCQ depth.

Best regards.

Maurilio.
-- 
This message posted from opensolaris.org

Carson Gaspar

2009-Oct-02 09:45 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Maurilio Longo wrote:
> I did try to use smartmontools, but it cannot report SMART logs nor start
> SMART tests, so I don''t know how to look at their internal state.
Really? That''s odd...
>> You could also have a firmware bug on your disks. You might try
lowering
>> the number of tagged commands per disk and see if that helps at all.
> 
> from man marvell88sx I read that this driver has no tunable parameters, so
I
> don''t know how I could change NCQ depth.
ZFS has a per block device outstanding IO tunable - I think it''s in the
evil
tuning guide.

-- 
Carson

Robert Milkowski

2009-Oct-02 11:54 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Maurilio Longo wrote:> Carson,
>
> the strange thing is that this is happening on several disks (can it be
that are all failing?)
>
> What is the controller bug you''re talking about? I''m
running snv_114 on this pc, so it is fairly recent.
>
> Best regards.
>
> Maurilio.
>   
See ''iostat -En'' output.

Maurilio Longo

2009-Oct-02 13:15 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Milek,

this is it

<pre>
# iostat -En
c1t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3808110AS      Revision: D    Serial No:
Size: 80,03GB <80026361856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 91 Predictive Failure Analysis: 0
c2t0d0           Soft Errors: 0 Hard Errors: 11 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 101 Predictive Failure Analysis: 0
c2t1d0           Soft Errors: 0 Hard Errors: 4 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 96 Predictive Failure Analysis: 0
c2t2d0           Soft Errors: 0 Hard Errors: 69 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 105 Predictive Failure Analysis: 0
c2t3d0           Soft Errors: 0 Hard Errors: 5 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 96 Predictive Failure Analysis: 0
c2t4d0           Soft Errors: 0 Hard Errors: 90 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 96 Predictive Failure Analysis: 0
c2t5d0           Soft Errors: 0 Hard Errors: 30 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 96 Predictive Failure Analysis: 0
c2t7d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST31000333AS     Revision: CC1H Serial No:
Size: 1000,20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 94 Predictive Failure Analysis: 0
#
</pre>

What are hard errors?

Maurilio.
-- 
This message posted from opensolaris.org

Maurilio Longo

2009-Oct-02 13:18 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Carson,

they''re seagate  ST31000340AS with a firmware release CC1H, which from
a rapid googling should have no firmware errors.

Anyway, setting NCQ depth to 1 

# echo zfs_vdev_max_pending/W0t1 | mdb -kw

did not solve the problem :(

Maurilio.
-- 
This message posted from opensolaris.org

Maurilio Longo

2009-Oct-02 13:20 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Errata,

they''re ST31000333AS and not 340AS

Maurilio.
-- 
This message posted from opensolaris.org

Richard Elling

2009-Oct-02 17:55 UTC

head link

[zfs-discuss] strange pool disks usage pattern

For the archives...

On Oct 2, 2009, at 12:41 AM, Maurilio Longo wrote:
> Hi,
>
> I have a pc with a MARVELL AOC-SAT2-MV8 controller and a pool made  
> up of a six disks in a raid-z pool with a hot spare.
>
> <pre>
> -bash-3.2$ /sbin/zpool status
>  pool: nas
> stato: ONLINE
> scrub: scrub in progress for 9h4m, 81,59% done, 2h2m to go
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        nas         ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            c2t1d0  ONLINE       0     0     0
>            c2t4d0  ONLINE       0     0     0
>            c2t5d0  ONLINE       0     0     0
>            c2t3d0  ONLINE       0     0     0
>            c2t2d0  ONLINE       0     0     0
>            c2t0d0  ONLINE       0     0     0
>        dischi di riserva
>          c2t7d0    AVAIL
>
> errori: nessun errore di dati rilevato
> </pre>
>
> Now, the problem is that issuing an
>
> iostat -Cmnx 10
>
> or any other time intervall, I''ve seen, sometimes, a complete
stall
> of disk I/O due to a disk in the pool (not always the same) being  
> 100% busy.
>
> <pre>
>
> $ iostat -Cmnx 10
>
>   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,0    0,3    0,0    2,0  0,0  0,0    0,0    0,1   0   0 c1
>    0,0    0,3    0,0    2,0  0,0  0,0    0,0    0,1   0   0 c1t0d0
> 1852,1  297,0 13014,9 4558,4  9,2  1,6    4,3    0,7   2 158 c2
>  311,8   61,3 2185,3  750,7  2,0  0,3    5,5    0,7  17  25 c2t0d0
>  309,5   34,7 2207,2  769,5  1,6  0,5    4,7    1,4  41  47 c2t1d0
>  309,3   36,3 2173,0  770,0  1,0  0,3    2,9    0,7  18  26 c2t2d0
>  296,0   65,5 2057,3  749,2  2,1  0,2    5,9    0,6  16  23 c2t3d0
>  313,3   64,1 2187,3  748,8  1,7  0,2    4,6    0,5  15  21 c2t4d0
>  311,9   35,1 2204,8  770,1  0,7  0,2    2,1    0,5  11  17 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,4   14,7    3,2   30,4  0,0  0,2    0,0   13,2   0   2 c1
>    0,4   14,7    3,2   30,4  0,0  0,2    0,0   13,2   0   2 c1t0d0
>    1,7    0,0   58,9    0,0  3,0  1,0 1766,4  593,1   2 101 c2
>    0,3    0,0    7,7    0,0  0,0  0,0    0,3    0,4   0   0 c2t0d0
>    0,3    0,0   11,5    0,0  0,0  0,0    4,4    8,4   0   0 c2t1d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
This is a symptom of an I/O getting dropped in the data path.
You can clearly see 1 IOP in actv queue (which is the queue
between the interface card and target).  The %busy is calculated
by counting the percentage of time that at least one IOP is in
the actv queue.  The higher level device drivers have timeouts
and will try to reset and re-issue IOPs as needed.
  -- richard

>    0,4    0,0   14,1    0,0  0,0  0,0    0,4    6,6   0   0 c2t3d0
>    0,4    0,0   14,1    0,0  0,0  0,0    0,3    2,5   0   0 c2t4d0
>    0,3    0,0   11,5    0,0  0,0  0,0    3,6    6,9   0   0 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,0    3,1    0,0    3,1  0,0  0,0    0,0    0,7   0   0 c1
>    0,0    3,1    0,0    3,1  0,0  0,0    0,0    0,7   0   0 c1t0d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0   2 100 c2
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t0d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t1d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t3d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t4d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,0    0,1    0,0    0,4  0,0  0,0    0,0    1,2   0   0 c1
>    0,0    0,1    0,0    0,4  0,0  0,0    0,0    1,2   0   0 c1t0d0
>    0,0   29,5    0,0  320,2  3,4  1,0  113,9   34,6   2 102 c2
>    0,0    6,9    0,0   63,3  0,1  0,0   12,6    0,7   0   0 c2t0d0
>    0,0    4,4    0,0   65,5  0,0  0,0    8,7    0,8   0   0 c2t1d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
>    0,0    7,4    0,0   62,7  0,1  0,0   15,4    0,8   1   1 c2t3d0
>    0,0    6,8    0,0   63,6  0,1  0,0   13,2    0,7   0   0 c2t4d0
>    0,0    4,0    0,0   65,1  0,0  0,0    7,9    0,7   0   0 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,0    0,3    0,0    2,4  0,0  0,0    0,0    0,1   0   0 c1
>    0,0    0,3    0,0    2,4  0,0  0,0    0,0    0,1   0   0 c1t0d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0   2 100 c2
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t0d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t1d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t3d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t4d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,5    3,5    3,8   17,0  0,0  0,0    0,0    0,9   0   0 c1
>    0,5    3,5    3,8   17,0  0,0  0,0    0,0    0,9   0   0 c1t0d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0   2 100 c2
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t0d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t1d0
>    0,0    0,0    0,0    0,0  3,0  1,0    0,0    0,0 100 100 c2t2d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t3d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t4d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,1    4,2    0,8    6,5  0,0  0,0    0,0    1,2   0   0 c1
>    0,1    4,2    0,8    6,5  0,0  0,0    0,0    1,2   0   0 c1t0d0
>   93,2   84,7 1630,3  439,2  4,5  1,3   25,5    7,1   3 127 c2
>   15,0   15,5  247,8   60,5  0,3  0,1    9,5    2,1   5   7 c2t0d0
>   14,5   10,7  281,3   63,8  0,3  0,1   12,2    2,7   6   7 c2t1d0
>   16,8   16,6  321,0  129,1  3,1  0,9   92,7   28,4  96  95 c2t2d0
>   17,2   15,5  262,7   60,8  0,3  0,1    9,6    2,1   6   7 c2t3d0
>   16,5   15,1  237,1   61,0  0,3  0,1    8,8    1,9   6   6 c2t4d0
>   13,3   11,4  280,5   64,1  0,3  0,1   10,7    2,6   6   7 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0,0    3,9    0,0   30,3  0,0  0,0    0,0    0,1   0   0 c1
>    0,0    3,9    0,0   30,3  0,0  0,0    0,0    0,1   0   0 c1t0d0
> 2941,0  561,1 22458,3 5278,8 11,4  2,4    3,3    0,7   4 235 c2
>  499,2  118,7 3773,8  859,2  2,7  0,4    4,3    0,7  35  44 c2t0d0
>  503,7   69,8 3916,4  898,9  1,4  0,4    2,4    0,7  31  38 c2t1d0
>  473,3   70,7 3800,4  899,3  1,2  0,4    2,1    0,7  28  35 c2t2d0
>  500,9  113,8 3725,2  861,9  2,6  0,4    4,2    0,7  33  41 c2t3d0
>  485,3  119,5 3537,0  861,0  2,4  0,4    4,0    0,7  32  40 c2t4d0
>  478,7   68,7 3705,4  898,5  1,2  0,4    2,2    0,7  28  37 c2t5d0
>    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 c2t7d0
> </pre>
>
> In this case it was c2t2d0 and it blocked the pool for 30 or 40  
> seconds.
>
> /var/adm/messages does not contain anything related to the pool.
>
> What can it be?
>
> Thanks.
>
> Maurilio.
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Maurilio Longo

2009-Oct-05 06:52 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Richard,

thanks for the explanation.

So can we say that the problem is in the disks loosing a command now and then 
under stress?

Best regards.

Maurilio.
-- 
This message posted from opensolaris.org

Richard Elling

2009-Oct-05 16:32 UTC

head link

[zfs-discuss] strange pool disks usage pattern

On Oct 4, 2009, at 11:52 PM, Maurilio Longo wrote:
> Richard,
>
> thanks for the explanation.
>
> So can we say that the problem is in the disks loosing a command now  
> and then  under stress?
It may be the disks or the HBA. I''ll bet a steak dinner it is the HBA.
  -- richard

Maurilio Longo

2009-Oct-05 16:48 UTC

head link

[zfs-discuss] strange pool disks usage pattern

Richard,

it is the same controller used inside Sun''s thumpers; It could be a
problem in my unit (which is a couple of years old now), though.

Is there something I can do to find out if I owe you that steak? :)

Thanks.

Maurilio.
-- 
This message posted from opensolaris.org

Maurilio Longo

2009-Oct-08 12:28 UTC

head link

[zfs-discuss] strange pool disks usage pattern

By the way,

there are more than fifty bugs logged for marevell88sx, many of them about
problems with DMA handling and/or driver behaviour under stress.

Can it be that I''m stumbling upon something along these lines?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6826483

Maurilio.
-- 
This message posted from opensolaris.org

zfs discuss - Oct 2009 - strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern

[zfs-discuss] strange pool disks usage pattern