thr3ads.net - zfs discuss - [zfs-discuss] Slow zfs writes [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Ram Chander

2013-Feb-11 12:55 UTC

[zfs-discuss] Slow zfs writes

Hi,

My OmniOS host  is expreiencing slow zfs writes ( around 30 times slower ).
iostat reports below error though pool is healthy. This is happening in
past 4 days though no change was done to system. Is the hard disks faulty ?
Please help.

# zpool status -v
root at host:~# zpool status -v
  pool: test
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using ''zpool upgrade''.  Once this is
done, the
        pool will no longer be accessible on software that does not support
feature flags.
config:

        NAME         STATE     READ WRITE CKSUM
        test       ONLINE       0     0     0
          raidz1-0   ONLINE       0     0     0
            c2t0d0   ONLINE       0     0     0
            c2t1d0   ONLINE       0     0     0
            c2t2d0   ONLINE       0     0     0
            c2t3d0   ONLINE       0     0     0
            c2t4d0   ONLINE       0     0     0
          raidz1-1   ONLINE       0     0     0
            c2t5d0   ONLINE       0     0     0
            c2t6d0   ONLINE       0     0     0
            c2t7d0   ONLINE       0     0     0
            c2t8d0   ONLINE       0     0     0
            c2t9d0   ONLINE       0     0     0
          raidz1-3   ONLINE       0     0     0
            c2t12d0  ONLINE       0     0     0
            c2t13d0  ONLINE       0     0     0
            c2t14d0  ONLINE       0     0     0
            c2t15d0  ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
            c2t19d0  ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c2t22d0  ONLINE       0     0     0
            c2t23d0  ONLINE       0     0     0
          raidz1-4   ONLINE       0     0     0
            c2t24d0  ONLINE       0     0     0
            c2t25d0  ONLINE       0     0     0
            c2t26d0  ONLINE       0     0     0
            c2t27d0  ONLINE       0     0     0
            c2t28d0  ONLINE       0     0     0
            c2t29d0  ONLINE       0     0     0
            c2t30d0  ONLINE       0     0     0
          raidz1-5   ONLINE       0     0     0
            c2t31d0  ONLINE       0     0     0
            c2t32d0  ONLINE       0     0     0
            c2t33d0  ONLINE       0     0     0
            c2t34d0  ONLINE       0     0     0
            c2t35d0  ONLINE       0     0     0
            c2t36d0  ONLINE       0     0     0
            c2t37d0  ONLINE       0     0     0
          raidz1-6   ONLINE       0     0     0
            c2t38d0  ONLINE       0     0     0
            c2t39d0  ONLINE       0     0     0
            c2t40d0  ONLINE       0     0     0
            c2t41d0  ONLINE       0     0     0
            c2t42d0  ONLINE       0     0     0
            c2t43d0  ONLINE       0     0     0
            c2t44d0  ONLINE       0     0     0
        spares
          c5t10d0    AVAIL
          c5t11d0    AVAIL
          c2t45d0    AVAIL
          c2t46d0    AVAIL
          c2t47d0    AVAIL



# iostat -En

c4t0d0           Soft Errors: 0 Hard Errors: 5 Transport Errors: 0
Vendor: iDRAC    Product: Virtual CD       Revision: 0323 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 5 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c3t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: iDRAC    Product: LCDRIVE          Revision: 0323 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t0d1           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: iDRAC    Product: Virtual Floppy   Revision: 0323 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0


root at host:~# fmadm faulty
--------------- ------------------------------------  --------------
---------
TIME            EVENT-ID                              MSG-ID
SEVERITY
--------------- ------------------------------------  --------------
---------
Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a  ZFS-8000-HC
Major

Host        : host
Platform    : PowerEdge-R810
Product_sn  :

Fault class : fault.fs.zfs.io_failure_wait
Affects     : zfs://pool=test
                  faulted but still in service
Problem in  : zfs://pool=test
                  faulted but still in service

Description : The ZFS pool has experienced currently unrecoverable I/O
                    failures.  Refer to http://illumos.org/msg/ZFS-8000-HCfor
              more information.

Response    : No automated response will be taken.

Impact      : Read and write I/Os cannot be serviced.

Action      : Make sure the affected devices are connected, then run
                    ''zpool clear''.

Regards,
Ram
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130211/d088bb7d/attachment.html>

Roy Sigurd Karlsbakk

2013-Feb-11 16:53 UTC

head link

[zfs-discuss] Slow zfs writes

> root at host:~# fmadm faulty
> --------------- ------------------------------------ --------------
> ---------
> TIME EVENT-ID MSG-ID SEVERITY
> --------------- ------------------------------------ --------------
> ---------
> Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC Major
> Host : host
> Platform : PowerEdge-R810
> Product_sn :
> Fault class : fault.fs.zfs.io_failure_wait
> Affects : zfs://pool=test
> faulted but still in service
> Problem in : zfs://pool=test
> faulted but still in service
> Description : The ZFS pool has experienced currently unrecoverable I/O
> failures. Refer to http://illumos.org/msg/ZFS-8000-HC for
> more information.
> Response : No automated response will be taken.
> Impact : Read and write I/Os cannot be serviced.
> Action : Make sure the affected devices are connected, then run
> ''zpool clear''.
> --The pool looks healthy to me, but it it isn''t very well balanced. Have
you been adding new VDEVs on your way to grow it? Check if of the VDEVs are
fuller than others. I don''t have an OI/IllumOS system available ATM,
but IIRC this can be done with iostat -v. Older versions of ZFS striped to all
VDEVs regardless to fill, which slowed down the write speeds rather horribly if
some VDEVs were full (>90%). This shouldn''t be the case with OmniOS,
but it *may* be the case with an old zpool version. I don''t know.
I''d check fill rate of the VDEVs first, then perhaps try to upgrade the
zpool unless you have to be able to mount it on an older version of zpool (on
S10 or similar). Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk
(+47) 98013356 roy at karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key:
http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det
essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ
for alle pedagoger ? unng? eksessiv anvendelse av idiomer med xenotyp etymologi.
I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130211/e1075a3f/attachment.html>

Ram Chander

2013-Feb-12 05:08 UTC

head link

[zfs-discuss] Slow zfs writes

Hi Roy,
You are right. So it looks like re-distribution issue. Initially  there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which
which we added 24 more disks and created additional vdevs. The initial
vdevs are filled up and so write speed declined. Now  how to find files
that are present in a Vdev or a disk. That way I can remove and re-copy
back to distribute data. Any other way to solve this ?

Total capacity of pool - 98Tb
Used - 44Tb
Free - 54 Tb

root at host:# zpool iostat -v
                capacity     operations    bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
test       54.0T  62.7T     52  1.12K  2.16M  5.78M
  raidz1     11.2T  2.41T     13     30   176K   146K
    c2t0d0       -      -      5     18  42.1K  39.0K
    c2t1d0       -      -      5     18  42.2K  39.0K
    c2t2d0       -      -      5     18  42.5K  39.0K
    c2t3d0       -      -      5     18  42.9K  39.0K
    c2t4d0       -      -      5     18  42.6K  39.0K
  raidz1     13.3T   308G     13    100   213K   521K
    c2t5d0       -      -      5     94  50.8K   135K
    c2t6d0       -      -      5     94  51.0K   135K
    c2t7d0       -      -      5     94  50.8K   135K
    c2t8d0       -      -      5     94  51.1K   135K
    c2t9d0       -      -      5     94  51.1K   135K
  raidz1     13.4T  19.1T      9    455   743K  2.31M
    c2t12d0      -      -      3    137  69.6K   235K
    c2t13d0      -      -      3    129  69.4K   227K
    c2t14d0      -      -      3    139  69.6K   235K
    c2t15d0      -      -      3    131  69.6K   227K
    c2t16d0      -      -      3    141  69.6K   235K
    c2t17d0      -      -      3    132  69.5K   227K
    c2t18d0      -      -      3    142  69.6K   235K
    c2t19d0      -      -      3    133  69.6K   227K
    c2t20d0      -      -      3    143  69.6K   235K
    c2t21d0      -      -      3    133  69.5K   227K
    c2t22d0      -      -      3    143  69.6K   235K
    c2t23d0      -      -      3    133  69.5K   227K
  raidz1     2.44T  16.6T      5    103   327K   485K
    c2t24d0      -      -      2     48  50.8K  87.4K
    c2t25d0      -      -      2     49  50.7K  87.4K
    c2t26d0      -      -      2     49  50.8K  87.3K
    c2t27d0      -      -      2     49  50.8K  87.3K
    c2t28d0      -      -      2     49  50.8K  87.3K
    c2t29d0      -      -      2     49  50.8K  87.3K
    c2t30d0      -      -      2     49  50.8K  87.3K
  raidz1     8.18T  10.8T      5    295   374K  1.54M
    c2t31d0      -      -      2    131  58.2K   279K
    c2t32d0      -      -      2    131  58.1K   279K
    c2t33d0      -      -      2    131  58.2K   279K
    c2t34d0      -      -      2    132  58.2K   279K
    c2t35d0      -      -      2    132  58.1K   279K
    c2t36d0      -      -      2    133  58.3K   279K
    c2t37d0      -      -      2    133  58.2K   279K
  raidz1     5.42T  13.6T      5    163   383K   823K
    c2t38d0      -      -      2     61  59.4K   146K
    c2t39d0      -      -      2     61  59.3K   146K
    c2t40d0      -      -      2     61  59.4K   146K
    c2t41d0      -      -      2     61  59.4K   146K
    c2t42d0      -      -      2     61  59.3K   146K
    c2t43d0      -      -      2     62  59.2K   146K
    c2t44d0      -      -      2     62  59.3K   146K


On Mon, Feb 11, 2013 at 10:23 PM, Roy Sigurd Karlsbakk <roy at
karlsbakk.net>wrote:
>
> root at host:~# fmadm faulty
> --------------- ------------------------------------  --------------
> ---------
> TIME            EVENT-ID                              MSG-ID
> SEVERITY
> --------------- ------------------------------------  --------------
> ---------
> Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a  ZFS-8000-HC
> Major
>
> Host        : host
> Platform    : PowerEdge-R810
> Product_sn  :
>
> Fault class : fault.fs.zfs.io_failure_wait
> Affects     : zfs://pool=test
>                   faulted but still in service
> Problem in  : zfs://pool=test
>                   faulted but still in service
>
> Description : The ZFS pool has experienced currently unrecoverable I/O
>                     failures.  Refer to
http://illumos.org/msg/ZFS-8000-HCfor
>               more information.
>
> Response    : No automated response will be taken.
>
> Impact      : Read and write I/Os cannot be serviced.
>
> Action      : Make sure the affected devices are connected, then run
>                     ''zpool clear''.
> --
>
> The pool looks healthy to me, but it it isn''t very well balanced.
Have you
> been adding new VDEVs on your way to grow it? Check if of the VDEVs are
> fuller than others. I don''t have an OI/IllumOS system available
ATM, but
> IIRC this can be done with iostat -v. Older versions of ZFS striped to all
> VDEVs regardless to fill, which slowed down the write speeds rather
> horribly if some VDEVs were full (>90%). This shouldn''t be the
case with
> OmniOS, but it *may* be the case with an old zpool version. I
don''t know.
>
> I''d check fill rate of the VDEVs first, then perhaps try to
upgrade the
> zpool unless you have to be able to mount it on an older version of zpool
> (on S10 or similar).
>
> Vennlige hilsener / Best regards
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 98013356
> roy at karlsbakk.net
> http://blogg.karlsbakk.net/
> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt.
> Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv
> anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller
> eksisterer adekvate og relevante synonymer p? norsk.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130212/182336a8/attachment.html>

Ian Collins

2013-Feb-12 09:32 UTC

head link

[zfs-discuss] Slow zfs writes

Ram Chander wrote:>
> Hi Roy,
> You are right. So it looks like re-distribution issue. Initially  
> there were two Vdev with 24 disks ( disk 0-23 ) for close to year. 
> After which  which we added 24 more disks and created additional 
> vdevs. The initial vdevs are filled up and so write speed declined. 
> Now  how to find files that are present in a Vdev or a disk. That way 
> I can remove and re-copy back to distribute data. Any other way to 
> solve this ?
>The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.

-- 
Ian.

Jim Klimov

2013-Feb-12 10:22 UTC

head link

[zfs-discuss] Slow zfs writes

On 2013-02-12 10:32, Ian Collins wrote:> Ram Chander wrote:
>>
>> Hi Roy,
>> You are right. So it looks like re-distribution issue. Initially there
>> were two Vdev with 24 disks ( disk 0-23 ) for close to year. After
>> which  which we added 24 more disks and created additional vdevs. The
>> initial vdevs are filled up and so write speed declined. Now  how to
>> find files that are present in a Vdev or a disk. That way I can remove
>> and re-copy back to distribute data. Any other way to solve this ?
>>
> The only way is to avoid the problem in the first place by not mixing
> vdev sizes in a pool.
>

Well, that disbalance is there - in the zpool status printout we see
raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares 
- which seems to sum up to 48 ;)

Depending on disk size, it might be possible that tlvdev sizes in
gigabytes were kept the same (i.e. a raidz set with twice as many
disks of half size), but we have no info on this detail and it is
unlikely. The disk sets being in one pool, this would still quite
disbalance the load among spindles and IO buses.

Beside all that - with the "older" tlvdev''s being more full
than
the "newer" ones, there is the disbalance which wouldn''t be
avoided
by not mixing vdev sizes - writes into newer ones are more likely
to quickly find available "holes", while writes into older ones are
more
fragmented and longer data inspection is needed to find a hole -
if not even the gang-block fragmentation. These two are, I believe,
the basis for performance drop on "full" pools, with the measure
being rather the mix of IO patterns and fragmentation of data and
holes.

I think there were developments in illumos ZFS to address more
writes onto devices with more available space; I am not sure if
the average write latency to a tlvdev was monitored and taken
into account during write-targeting decisions (which would also
wrap the case of failing devices which take longer to respond).
I am not sure which portions nave been completed and integrated
into common illumos-gate.

As was suggested, you can use "zpool iostat -v 5" to monitor IOs
to the pool with a fanout per TLVDEV and per disk, and witness
possible patterns there. Do keep in mind, however, that for a
non-failed raidz set you should see reads from only the data
disks for a particular stripe, while parity disks are not used
unless a checksum mismatch occurs. On the average data should
be on all disks in such a manner that there is no "dedicated"
parity disk, but with small IOs you are likely to notice this.

If the budget permits, I''d suggest building (or leasing) another
system with balanced disk sets and replicating all data onto it,
then repurposing the older system - for example, to be a backup
of the newer box (also after remaking the disk layout).

As for the question of "which files are on the older disks" -
you can as a rule of thumb use the file creation/modification
time in comparison with the date when you expanded the pool ;)
Closer inspection could be done with a ZDB walk to print out
the DVA block addresses for blocks of a file (the DVA includes
the number of the top-level vdev), but that would take some
time - to determine which files you want to expect (likely
some band of sizes) and then to do these zdb walks.

Good luck,
//Jim

Ian Collins

2013-Feb-12 20:06 UTC

head link

[zfs-discuss] Slow zfs writes

Jim Klimov wrote:> On 2013-02-12 10:32, Ian Collins wrote:
>> Ram Chander wrote:
>>> Hi Roy,
>>> You are right. So it looks like re-distribution issue. Initially
there
>>> were two Vdev with 24 disks ( disk 0-23 ) for close to year. After
>>> which  which we added 24 more disks and created additional vdevs.
The
>>> initial vdevs are filled up and so write speed declined. Now  how
to
>>> find files that are present in a Vdev or a disk. That way I can
remove
>>> and re-copy back to distribute data. Any other way to solve this ?
>>>
>> The only way is to avoid the problem in the first place by not mixing
>> vdev sizes in a pool.
>>
I was a bit quick off the mark there, I didn''t notice that some vdevs 
were older than others.
> Well, that disbalance is there - in the zpool status printout we see
> raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares
> - which seems to sum up to 48 ;)
The vdev sizes are about (including parity space) 14, 14, 22, 19, 19, 
19TB respectively and 127TB total.  So even if the data is balanced, the 
performance of this pool will still start to degrade once ~84TB (about 
2/3 full) are used.

So the only viable long term solution is a rebuild, or putting bigger 
drives in the two smallest vdevs.

In the short term, when I''ve had similar issues I used zfs send to copy
a large filesystem within the pool then renamed the copy to the original 
name and deleted the original.  This can be repeated until you have an 
acceptable distribution.

One last thing: unless this is some form of backup pool, or the data on 
it isn''t important, avoid raidz vdevs in such a large pool!

-- 
Ian.

zfs discuss - Feb 2013 - Slow zfs writes

[zfs-discuss] Slow zfs writes

[zfs-discuss] Slow zfs writes

[zfs-discuss] Slow zfs writes

[zfs-discuss] Slow zfs writes

[zfs-discuss] Slow zfs writes

[zfs-discuss] Slow zfs writes