thr3ads.net - zfs discuss - [zfs-discuss] Puzzling ZFS behavior with COMPRESS option [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Anantha N. Srirama

2007-Jan-08 20:04 UTC

[zfs-discuss] Puzzling ZFS behavior with COMPRESS option

Our setup:

- E2900 (24 x 96); Solaris 10 Update 2 (aka 06/06)
- 2 2Gbps FC HBA
- EMC DMX storage
- 50 x 64GB LUNs configured in 1 ZFS pool
- Many filesystems created with COMPRESS enabled; specifically I''ve one
that is 768GB

I''m observing the following puzzling behavior:

- We are currently creating a large (>1.4TB) and sparse dataset; most of the
dataset contains repeating blanks (default/standard SAS dataset behavior.)
- ls -l reports the file size as 1.4+TB and du -sk reports the actual on disk
usage at around 65GB.
- My I/O on the system is pegged at 150+MB/S as reported by zpool iostat and
I''ve confirmed the same with iostat.

This is very confusing
 
- ZFS is doing very good compression as reported by the ratio of on disk versus
as reported size of the file (1.4TB vs 65GB)
- [b]Why on God''s green earth am I observing such high I/O when indeed
ZFS is compressing?[/b] I can''t believe that the program is actually
generating I/O at the rate of (150MB/S * compressratio).

Any thoughts?
 
 
This message posted from opensolaris.org

Anantha N. Srirama

2007-Jan-08 20:21 UTC

head link

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

Quick update, since my original post I''ve confirmed via DTrace (rwtop
script in toolkit) that the application is not generating 150MB/S *
compressratio of I/O. What then is causing this much I/O in our system?
 
 
This message posted from opensolaris.org

Neil Perrin

2007-Jan-08 20:52 UTC

head link

[zfs-discuss] Puzzling ZFS behavior with COMPRESS option

Anantha N. Srirama wrote On 01/08/07 13:04,:> Our setup:
> 
> - E2900 (24 x 96); Solaris 10 Update 2 (aka 06/06)
> - 2 2Gbps FC HBA
> - EMC DMX storage
> - 50 x 64GB LUNs configured in 1 ZFS pool
> - Many filesystems created with COMPRESS enabled; specifically
I''ve one that is 768GB
> 
> I''m observing the following puzzling behavior:
> 
> - We are currently creating a large (>1.4TB) and sparse dataset; most of
the dataset contains repeating blanks (default/standard SAS dataset behavior.)
> - ls -l reports the file size as 1.4+TB and du -sk reports the actual on
disk usage at around 65GB.
> - My I/O on the system is pegged at 150+MB/S as reported by zpool iostat
and I''ve confirmed the same with iostat.
> 
> This is very confusing
>  
> - ZFS is doing very good compression as reported by the ratio of on disk
versus as reported size of the file (1.4TB vs 65GB)
> - [b]Why on God''s green earth am I observing such high I/O when
indeed ZFS is compressing?[/b] I can''t believe that the program is
actually generating I/O at the rate of (150MB/S * compressratio).
> 
> Any thoughts?
>

One possibility is that the data is written synchronously (uses O_DSYNC,
fsync, etc), and so the ZFS Intent Log (ZIL) will write that uncompressed
data to stable storage in case of a crash/power fail before the txg
is committed.

Neil.

Bart Smaalders

2007-Jan-09 00:11 UTC

head link

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

Anantha N. Srirama wrote:> Quick update, since my original post I''ve confirmed via DTrace
(rwtop script in toolkit) that the application is not generating 150MB/S *
compressratio of I/O. What then is causing this much I/O in our system?
>  
>  
> This message posted from opensolaris.org

Are you doing random IO?  Appending or overwriting?

- Bart

-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts

Anantha N. Srirama

2007-Jan-09 13:46 UTC

head link

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

I''ll see if I can confirm what you are suggesting. Thanks.
 
 
This message posted from opensolaris.org

Anantha N. Srirama

2007-Jan-10 00:05 UTC

head link

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

I''ve some important information that should shed some light on this
behavior:

This evening I created a new filesystem across the very same 50 disks including
the COMPRESS attribute. My goal was to isolate some workload to the new
filesystem and started moving a 100GB directory tree over to the new FS. While I
was copying I was averaging around 25MB read and 25MB write as expected. [b]Now
I opened ''vi'' and wanted to write out a new file in the new
filesystem and what I saw was shocking: my reads remained the same but my writes
shot upto the 150+MB/S range. This abnormal I/O pattern continued until the
''vi'' returned from the write request.[/b] Here are the
''zpool iostat mtdc 30'' output:

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
mtdc         806G  2.48T     38    173  1.93M  7.52M
mtdc         806G  2.48T    188    228  15.0M  8.78M
mtdc         807G  2.48T    266    624  14.0M  16.5M
mtdc         807G  2.48T    286    670  17.1M  14.5M
mtdc         807G  2.48T    293  1.21K  18.2M  98.4M <<-- vi activity,
note mismatch in r/w rates
mtdc         808G  2.48T    457    560  35.5M  24.2M
mtdc         809G  2.48T    405    504  31.7M  26.3M
mtdc         809G  2.48T    328  1.37K  25.2M   152M <<-- vi activity,
note r/w mismatch in r/w rates
mtdc         810G  2.48T    428    671  33.0M  48.0M
mtdc         811G  2.48T    463    500  35.9M  26.4M
mtdc         811G  2.48T    207  1.39K  16.5M   154M<<-- vi activity, note
r/w mismatch in r/w rates
mtdc         812G  2.48T    310    878  23.9M  77.7M
mtdc         813G  2.48T    362    494  26.1M  25.3M
mtdc         813G  2.48T    381  1.05K  26.8M   103M
mtdc         814G  2.48T    347  1.33K  25.0M   135M
mtdc         815G  2.48T    288  1.38K  21.7M   150M
mtdc         815G  2.48T    425    513  32.7M  25.8M
mtdc         816G  2.47T    413    515  30.2M  25.1M
mtdc         817G  2.47T    341    512  21.9M  25.1M
mtdc         818G  2.47T    293    529  18.5M  25.5M
mtdc         818G  2.47T    344    508  23.4M  24.7M
mtdc         819G  2.47T    442    512  33.4M  24.1M
mtdc         820G  2.47T    385    483  28.3M  24.4M
mtdc         820G  2.47T    372    483  24.7M  24.7M
mtdc         821G  2.47T    347    535  23.0M  24.2M
mtdc         821G  2.47T    290    497  17.9M  24.9M
mtdc         823G  2.47T    349    517  20.0M  24.1M
mtdc         823G  2.47T    399    512  21.2M  24.5M
mtdc         824G  2.47T    383    612  19.3M  17.7M
mtdc         824G  2.47T    390    614  14.2M  17.5M
 
 
This message posted from opensolaris.org

Neil Perrin

2007-Jan-10 00:30 UTC

head link

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

Ah, vi does an fsync. So I suspect that this is bug:

6413510 zfs: writing to ZFS filesystem slows down fsync() on other files
	in the same FS

Here''s a snippet from the Evaluation:
-----------
    ZFS keeps in list in memory of all transactions and will push *all*
    of them out on a fsync. This includes those not necessarily related
    to the znode being "fsunk". We consciously designed it this way
    to avoid possible problems with dependencies between znodes.
    This behaviour could also explain the extra fsync load on jurassic
    (see 6404018) as ZFS can do much more IO for fsyncs. However, I still
    don''t think it''s the whole problem.

    So it looks like we ought to just flush those changes to the specified
    znode, and work out the dependencies.
------------

This has been fixed since August and will be available in s10u4.
Sorry

Anantha N. Srirama wrote On 01/09/07 17:05,:> I''ve some important information that should shed some light on
this behavior:
> 
> This evening I created a new filesystem across the very same 50 disks
including the COMPRESS attribute. My goal was to isolate some workload to the
new filesystem and started moving a 100GB directory tree over to the new FS.
While I was copying I was averaging around 25MB read and 25MB write as expected.
[b]Now I opened ''vi'' and wanted to write out a new file in the
new filesystem and what I saw was shocking: my reads remained the same but my
writes shot upto the 150+MB/S range. This abnormal I/O pattern continued until
the ''vi'' returned from the write request.[/b] Here are the
''zpool iostat mtdc 30'' output:
> 
>                capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> mtdc         806G  2.48T     38    173  1.93M  7.52M
> mtdc         806G  2.48T    188    228  15.0M  8.78M
> mtdc         807G  2.48T    266    624  14.0M  16.5M
> mtdc         807G  2.48T    286    670  17.1M  14.5M
> mtdc         807G  2.48T    293  1.21K  18.2M  98.4M <<-- vi
activity, note mismatch in r/w rates
> mtdc         808G  2.48T    457    560  35.5M  24.2M
> mtdc         809G  2.48T    405    504  31.7M  26.3M
> mtdc         809G  2.48T    328  1.37K  25.2M   152M <<-- vi
activity, note r/w mismatch in r/w rates
> mtdc         810G  2.48T    428    671  33.0M  48.0M
> mtdc         811G  2.48T    463    500  35.9M  26.4M
> mtdc         811G  2.48T    207  1.39K  16.5M   154M<<-- vi activity,
note r/w mismatch in r/w rates
> mtdc         812G  2.48T    310    878  23.9M  77.7M
> mtdc         813G  2.48T    362    494  26.1M  25.3M
> mtdc         813G  2.48T    381  1.05K  26.8M   103M
> mtdc         814G  2.48T    347  1.33K  25.0M   135M
> mtdc         815G  2.48T    288  1.38K  21.7M   150M
> mtdc         815G  2.48T    425    513  32.7M  25.8M
> mtdc         816G  2.47T    413    515  30.2M  25.1M
> mtdc         817G  2.47T    341    512  21.9M  25.1M
> mtdc         818G  2.47T    293    529  18.5M  25.5M
> mtdc         818G  2.47T    344    508  23.4M  24.7M
> mtdc         819G  2.47T    442    512  33.4M  24.1M
> mtdc         820G  2.47T    385    483  28.3M  24.4M
> mtdc         820G  2.47T    372    483  24.7M  24.7M
> mtdc         821G  2.47T    347    535  23.0M  24.2M
> mtdc         821G  2.47T    290    497  17.9M  24.9M
> mtdc         823G  2.47T    349    517  20.0M  24.1M
> mtdc         823G  2.47T    399    512  21.2M  24.5M
> mtdc         824G  2.47T    383    612  19.3M  17.7M
> mtdc         824G  2.47T    390    614  14.2M  17.5M
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Brad Green

2007-Apr-17 15:01 UTC

head link

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

Did you find a resoltion to this issue?
 
 
This message posted from opensolaris.org

zfs discuss - Jan 2007 - Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option