thr3ads.net - zfs discuss - [zfs-discuss] gzip compression throttles system? [May 2007]

If this information is useful, please help other people find it:
Share via:

Ian Collins

2007-May-02 23:01 UTC

[zfs-discuss] gzip compression throttles system?

I just had a quick play with gzip compression on a filesystem and the
result was the machine grinding to a halt while copying some large
(.wav) files to it from another filesystem in the same pool.

The system became very unresponsive, taking several seconds to echo
keystrokes.  The box is a maxed out AMD QuadFX, so it should have plenty
of grunt for this.

Comments?

Ian

Bart Smaalders

2007-May-03 02:22 UTC

head link

[zfs-discuss] gzip compression throttles system?

Ian Collins wrote:> I just had a quick play with gzip compression on a filesystem and the
> result was the machine grinding to a halt while copying some large
> (.wav) files to it from another filesystem in the same pool.
> 
> The system became very unresponsive, taking several seconds to echo
> keystrokes.  The box is a maxed out AMD QuadFX, so it should have plenty
> of grunt for this.
> 
> Comments?
> 
> Ian
How big were the files, what OS build are you running and how
much memory on the system?  Were you copying in parallel?

- Bart



-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts

Ian Collins

2007-May-03 02:36 UTC

head link

[zfs-discuss] gzip compression throttles system?

Bart Smaalders wrote:> Ian Collins wrote:
>> I just had a quick play with gzip compression on a filesystem and the
>> result was the machine grinding to a halt while copying some large
>> (.wav) files to it from another filesystem in the same pool.
>>
>> The system became very unresponsive, taking several seconds to echo
>> keystrokes.  The box is a maxed out AMD QuadFX, so it should have
plenty
>> of grunt for this.
>>
>> Comments?
>>
>> Ian
>
> How big were the files, what OS build are you running and how
> much memory on the system?  Were you copying in parallel?
>quadfx> uname -a
SunOS quadfx 5.11 snv_62 i86pc i386 i86pc
quadfx> psrinfo -v
Status of virtual processor 0 as of: 05/03/2007 14:35:05
  on-line since 04/29/2007 22:18:57.
  The i386 processor operates at 3000 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 05/03/2007 14:35:05
  on-line since 04/29/2007 22:19:04.
  The i386 processor operates at 3000 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 2 as of: 05/03/2007 14:35:05
  on-line since 04/29/2007 22:19:04.
  The i386 processor operates at 3000 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 3 as of: 05/03/2007 14:35:05
  on-line since 04/29/2007 22:19:04.
  The i386 processor operates at 3000 MHz,
        and has an i387 compatible floating point processor.

The system has 8MB of RAM and a was using ''cp -r'' to copy the
directory.

The files are between 15 and 50MB.  It''s worth pointing out that .wav
files only compress by a few percent.

Ian

Cyril Plisko

2007-May-03 02:49 UTC

head link

[zfs-discuss] gzip compression throttles system?

On 5/3/07, Ian Collins <ian at ianshome.com>
wrote:>
> The system has 8MB of RAM and a was using ''cp -r'' to copy
the directory.
Hm, that would be a record breaking system. Did you mean 8 GB ?

-- 
Regards,
        Cyril

Ian Collins

2007-May-03 02:50 UTC

head link

[zfs-discuss] gzip compression throttles system?

Cyril Plisko wrote:> On 5/3/07, Ian Collins <ian at ianshome.com> wrote:
>>
>> The system has 8MB of RAM and a was using ''cp -r'' to
copy the directory.
>
> Hm, that would be a record breaking system. Did you mean 8 GB ?
>Oops.... Yes.

Dale Ghent

2007-May-03 06:31 UTC

head link

[zfs-discuss] gzip compression throttles system?

On May 2, 2007, at 10:36 PM, Ian Collins wrote:
> The files are between 15 and 50MB.  It''s worth pointing out that
.wav
> files only compress by a few percent.
Not entirely related to your maxed CPU problem, but....

gzip on PCM audio isn''t, as you point out, going to earn you much of  
a compression ratio. It''s also very very slow on this kind of data.  
If you''re looking to compress wav or other PCM-based audio formats  
for storage space reasons, use FLAC. It''s guaranteed that
you''ll get
a far better compress ratio and a quicker result for your troubles  
than you will with gzip. There are many technical reasons for this,  
but generally, FLAC knows about audio down to the sample and is  
geared for the properties of PCM audio. gzip just sees it as a  
generic data blob like any other which contributes to its  
inefficiencies in this case.

The downside is that, well, it''s not an option on the ZFS level, but  
you don''t necessarily have to pre-decompress your FLAC-compressed WAV  
file in order to listen to them :)

Hmm... a FLAC-based compression mech in ZFS for efficient (and  
lossless) PCM audio storage...

/dale

Ian Collins

2007-May-03 06:43 UTC

head link

[zfs-discuss] gzip compression throttles system?

Dale Ghent wrote:> On May 2, 2007, at 10:36 PM, Ian Collins wrote:
>
>> The files are between 15 and 50MB.  It''s worth pointing out
that .wav
>> files only compress by a few percent.
>
> Not entirely related to your maxed CPU problem, but....
>I don''t think it was a maxed CPU problem, only one core was loaded and
the prstat numbers I could get (the reporting period was erratic)
didn''t
show anything nasty.> gzip on PCM audio isn''t, as you point out, going to earn you much
of a
> compression ratio. It''s also very very slow on this kind of data.
If
> you''re looking to compress wav or other PCM-based audio formats
for
> storage space reasons, use FLAC. 
I was looking at some mixed content, just an experiment.

Ian

Wee Yeh Tan

2007-May-03 10:02 UTC

head link

[zfs-discuss] gzip compression throttles system?

Ian,

On 5/3/07, Ian Collins <ian at ianshome.com>
wrote:> I don''t think it was a maxed CPU problem, only one core was loaded
and
> the prstat numbers I could get (the reporting period was erratic)
didn''t
> show anything nasty.
Do you have the output of ''mpstat 5''?


-- 
Just me,
Wire ...
Blog: <prstat.blogspot.com>

Jürgen Keil

2007-May-03 10:23 UTC

head link

[zfs-discuss] Re: gzip compression throttles system?

> I just had a quick play with gzip compression on a filesystem and the
> result was the machine grinding to a halt while copying some large
> (.wav) files to it from another filesystem in the same pool.
> 
> The system became very unresponsive, taking several seconds to echo
> keystrokes.  The box is a maxed out AMD QuadFX, so it should have plenty
> of grunt for this.
I''ve observed the same behavior. With my test I''ve used a
zpool
created on a 1GB file (on an UFS filesystem):

# mkfile 1G  /var/tmp/vdev_for_gzip_pool
# zpool create gzip_pool /var/tmp/vdev_for_gzip_pool
# zfs set compression=gzip gzip_pool
# chown jk /gzip_pool

Now, when I run this command...

% dd bs=128k if=/dev/urandom of=/gzip_pool/junk

... the mouse cursor sometimes is frozen for two
(or more) seconds. Same with keyboard input.

This is on an amd64 x2 box, 4gb memory, and usb
keyboard and usb mouse.


Lots of system cpu time is used while the gzip compressed
poll is filled:

% mpstat 5
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   47   1  122   646  316  317   13   24    8    2  1024    3   3   0  94
  1   41   1  159   334  101  279   11   24    8    2   900    2   3   0  94
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    1   0 6860  7263  282   73    2    2   78    1    64    0  70   0  30
  1    0   0 6866  6870   10    4    9    1   85    0   121    0 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    6   576  301  465    3   19   18  146   126    1   5   0  95
  1    0   0   36  1471 1276  410   29   20   24  115   335    0  59   0  41
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 5404  5823  309  113    2    2   57    1   420    0  56   0  44
  1    0   0 5409  5431   13    5   12    1   80    1   179    0 100   0   0
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    1   529  300  281   17   13   10  103   274    0  64   0  36
  1    0   0    9  1348 1169  452    8   23   14  105   163    0   5   0  95
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 6186  6607  282   62    8    9   55   11   123    0  88   0  12
  1    0   0 6196  6259   53   78    4    3   80   12   132    0  75   0  25

A kernel profile seems to show that the kernel is busy with gzip''ing
(and busy with computing SHA1 hashes?):

# lockstat -kIW -D 20 sleep 20

Profiling interrupt: 3882 events in 20.021 seconds (194 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
-------------------------------------------------------------------------------
 1802  46%  46% 0.00     1931 cpu[0]                 mach_cpu_idle           
  517  13%  60% 0.00     6178 cpu[1]                 SHA1Transform           
  482  12%  72% 0.00     1094 cpu[0]                 deflate_slow            
  328   8%  81% 0.00     1111 cpu[1]                 longest_match           
  104   3%  83% 0.00      940 cpu[0]                 fill_window             
   98   3%  86% 0.00    47357 cpu[1]                 bcopy                   
   65   2%  87% 0.00     5438 cpu[1]                 SHA1Update              
   63   2%  89% 0.00      834 cpu[1]                 bzero                   
   50   1%  90% 0.00     1042 cpu[0]                 pqdownheap              
   44   1%  92% 0.00      676 cpu[1]                 Encode                  
   32   1%  92% 0.00     1136 cpu[0]                 copy_block              
   24   1%  93% 0.00     1214 cpu[1]                 do_copy_fault_nta       
   23   1%  94% 0.00   401205 cpu[0]                 do_splx                 
   23   1%  94% 0.00      644 cpu[1]                 hmac_encr               
   22   1%  95% 0.00     1058 cpu[1]                 z_adler32               
   19   0%  95% 0.00     1208 cpu[0]+10              todpc_rtcget            
   16   0%  96% 0.00      752 cpu[0]                 SHA1Final               
   14   0%  96% 0.00     1186 cpu[1]                 mutex_enter             
   12   0%  96% 0.00      642 cpu[1]                 kcopy                   
   11   0%  97% 0.00      948 cpu[0]                 page_create_va          
-------------------------------------------------------------------------------
 
 
This message posted from opensolaris.org

Darren J Moffat

2007-May-03 10:50 UTC

head link

[zfs-discuss] Re: gzip compression throttles system?

J?rgen Keil wrote:> A kernel profile seems to show that the kernel is busy with
gzip''ing
> (and busy with computing SHA1 hashes?):
> 
> # lockstat -kIW -D 20 sleep 20
> 
> Profiling interrupt: 3882 events in 20.021 seconds (194 events/sec)
> 
> Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
>
-------------------------------------------------------------------------------
>  1802  46%  46% 0.00     1931 cpu[0]                 mach_cpu_idle
>   517  13%  60% 0.00     6178 cpu[1]                 SHA1Transform
>    65   2%  87% 0.00     5438 cpu[1]                 SHA1Update
>    16   0%  96% 0.00      752 cpu[0]                 SHA1Final
The reason you are busy computing SHA1 hashes is you are using 
/dev/urandom.  The implementation of drv/random uses SHA1 for mixing, 
actually strictly speaking it is the swrand provider that does that part.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/crypto/io/swrand.c


-- 
Darren J Moffat

Jürgen Keil

2007-May-03 11:53 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

> The reason you are busy computing SHA1 hashes is you are using 
> /dev/urandom.  The implementation of drv/random uses
> SHA1 for mixing, 
> actually strictly speaking it is the swrand provider that does that part.
Ahh, ok.

So, instead of using dd reading from /dev/urandom all the time,
I''ve now used this quick C program to write one /dev/urandom block
over and over to the gzip compressed zpool:

================================#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>

int
main(int argc, char **argv)
{
        int fd;
        char buf[128*1024];

        fd = open("/dev/urandom", O_RDONLY);
        if (fd < 0) {
                perror("open /dev/urandom");
                exit(1);
        }
        if (read(fd, buf, sizeof(buf)) != sizeof(buf)) {
                perror("fill buf from /dev/urandom");
                exit(1);
        }
        close(fd);
        fd = open(argv[1], O_WRONLY|O_CREAT, 0666);
        if (fd < 0) {
                perror(argv[1]);
                exit(1);
        }
        for (;;) {
                if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
                        break;
                }
        }
        close(fd);
        exit(0);
}
================================

Avoiding the reads from /dev/urandom makes the effect even
more noticeable, the machine now "freezes" for 10+ seconds.

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 3109  3616  316  196    5   17   48   45   245    0  85   0  15
  1    0   0 3127  3797  592  217    4   17   63   46   176    0  84   0  15
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 3051  3529  277  201    2   14   25   48   216    0  83   0  17
  1    0   0 3065  3739  606  195    2   14   37   47   153    0  82   0  17
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 3011  3538  316  242    3   26   16   52   202    0  81   0  19
  1    0   0 3019  3698  578  269    4   25   23   56   309    0  83   0  17

# lockstat -kIW -D 20 sleep 30

Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
-------------------------------------------------------------------------------
 2068  34%  34% 0.00     1767 cpu[0]                 deflate_slow            
 1506  25%  59% 0.00     1721 cpu[1]                 longest_match           
 1017  17%  76% 0.00     1833 cpu[1]                 mach_cpu_idle           
  454   7%  83% 0.00     1539 cpu[0]                 fill_window             
  215   4%  87% 0.00     1788 cpu[1]                 pqdownheap              
  152   2%  89% 0.00     1691 cpu[0]                 copy_block              
   89   1%  90% 0.00     1839 cpu[1]                 z_adler32               
   77   1%  92% 0.00    36067 cpu[1]                 do_splx                 
   64   1%  93% 0.00     2090 cpu[0]                 bzero                   
   62   1%  94% 0.00     2082 cpu[0]                 do_copy_fault_nta       
   48   1%  95% 0.00     1976 cpu[0]                 bcopy                   
   41   1%  95% 0.00    62913 cpu[0]                 mutex_enter             
   27   0%  96% 0.00     1862 cpu[1]                 build_tree              
   19   0%  96% 0.00     1771 cpu[1]                 gen_bitlen              
   17   0%  96% 0.00     1744 cpu[0]                 bi_reverse              
   15   0%  97% 0.00     1783 cpu[0]                 page_create_va          
   15   0%  97% 0.00     1406 cpu[1]                 fletcher_2_native       
   14   0%  97% 0.00     1778 cpu[1]                 gen_codes               
   11   0%  97% 0.00      912 cpu[1]+6               ddi_mem_put8            
    5   0%  97% 0.00     3854 cpu[1]                 fsflush_do_pages        
-------------------------------------------------------------------------------


It seems the same problem can be observed with "lzjb" compression,
but the pauses with lzjb are much shorter and the kernel consumes
less system cpu time with "lzjb" (which is expected, I think).
 
 
This message posted from opensolaris.org

Frank Hofmann

2007-May-03 12:00 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

I''m not quite sure what this test should show ?

Compressing random data is the perfect way to generate heat.
After all, compression working relies on input entropy being low.
But good random generators are characterized by the opposite - output 
entropy being high.
Even a good compressor, if operated on a good random generator''s
output,
will only end up burning cycles, but not reducing the data size.

Hence, is the request here for the compressor module to
''adapt'', kind of
first-pass check the input data whether it''s sufficiently low-entropy
to
warrant a compression attempt ?

If not, then what ?

FrankH.

On Thu, 3 May 2007, J?rgen Keil wrote:
>> The reason you are busy computing SHA1 hashes is you are using
>> /dev/urandom.  The implementation of drv/random uses
>> SHA1 for mixing,
>> actually strictly speaking it is the swrand provider that does that
part.
>
> Ahh, ok.
>
> So, instead of using dd reading from /dev/urandom all the time,
> I''ve now used this quick C program to write one /dev/urandom block
> over and over to the gzip compressed zpool:
>
> ================================> #include <stdio.h>
> #include <stdlib.h>
> #include <fcntl.h>
>
> int
> main(int argc, char **argv)
> {
>        int fd;
>        char buf[128*1024];
>
>        fd = open("/dev/urandom", O_RDONLY);
>        if (fd < 0) {
>                perror("open /dev/urandom");
>                exit(1);
>        }
>        if (read(fd, buf, sizeof(buf)) != sizeof(buf)) {
>                perror("fill buf from /dev/urandom");
>                exit(1);
>        }
>        close(fd);
>        fd = open(argv[1], O_WRONLY|O_CREAT, 0666);
>        if (fd < 0) {
>                perror(argv[1]);
>                exit(1);
>        }
>        for (;;) {
>                if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
>                        break;
>                }
>        }
>        close(fd);
>        exit(0);
> }
> ================================>
>
> Avoiding the reads from /dev/urandom makes the effect even
> more noticeable, the machine now "freezes" for 10+ seconds.
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>  0    0   0 3109  3616  316  196    5   17   48   45   245    0  85   0  15
>  1    0   0 3127  3797  592  217    4   17   63   46   176    0  84   0  15
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>  0    0   0 3051  3529  277  201    2   14   25   48   216    0  83   0  17
>  1    0   0 3065  3739  606  195    2   14   37   47   153    0  82   0  17
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>  0    0   0 3011  3538  316  242    3   26   16   52   202    0  81   0  19
>  1    0   0 3019  3698  578  269    4   25   23   56   309    0  83   0  17
>
> # lockstat -kIW -D 20 sleep 30
>
> Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec)
>
> Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
>
-------------------------------------------------------------------------------
> 2068  34%  34% 0.00     1767 cpu[0]                 deflate_slow
> 1506  25%  59% 0.00     1721 cpu[1]                 longest_match
> 1017  17%  76% 0.00     1833 cpu[1]                 mach_cpu_idle
>  454   7%  83% 0.00     1539 cpu[0]                 fill_window
>  215   4%  87% 0.00     1788 cpu[1]                 pqdownheap
>  152   2%  89% 0.00     1691 cpu[0]                 copy_block
>   89   1%  90% 0.00     1839 cpu[1]                 z_adler32
>   77   1%  92% 0.00    36067 cpu[1]                 do_splx
>   64   1%  93% 0.00     2090 cpu[0]                 bzero
>   62   1%  94% 0.00     2082 cpu[0]                 do_copy_fault_nta
>   48   1%  95% 0.00     1976 cpu[0]                 bcopy
>   41   1%  95% 0.00    62913 cpu[0]                 mutex_enter
>   27   0%  96% 0.00     1862 cpu[1]                 build_tree
>   19   0%  96% 0.00     1771 cpu[1]                 gen_bitlen
>   17   0%  96% 0.00     1744 cpu[0]                 bi_reverse
>   15   0%  97% 0.00     1783 cpu[0]                 page_create_va
>   15   0%  97% 0.00     1406 cpu[1]                 fletcher_2_native
>   14   0%  97% 0.00     1778 cpu[1]                 gen_codes
>   11   0%  97% 0.00      912 cpu[1]+6               ddi_mem_put8
>    5   0%  97% 0.00     3854 cpu[1]                 fsflush_do_pages
>
-------------------------------------------------------------------------------
>
>
> It seems the same problem can be observed with "lzjb"
compression,
> but the pauses with lzjb are much shorter and the kernel consumes
> less system cpu time with "lzjb" (which is expected, I think).
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Rayson Ho

2007-May-03 12:21 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

On 5/3/07, Frank Hofmann <Frank.Hofmann at sun.com>
wrote:> I''m not quite sure what this test should show ?
I didn''t try the test myself... but I think what it shows is a
possible problem that turning compression can hang a machine.

Rayson


>
> Compressing random data is the perfect way to generate heat.
> After all, compression working relies on input entropy being low.
> But good random generators are characterized by the opposite - output
> entropy being high.
> Even a good compressor, if operated on a good random generator''s
output,
> will only end up burning cycles, but not reducing the data size.
>
> Hence, is the request here for the compressor module to
''adapt'', kind of
> first-pass check the input data whether it''s sufficiently
low-entropy to
> warrant a compression attempt ?
>
> If not, then what ?
>
> FrankH.
>

John Malone (At Home)

2007-May-03 13:21 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

I''ve also noticed brief hangs on my f4k laptop.

build 63 with / on a zfs pool. I was taring up /opt and putting it on a 
seperate zfspool.

This second zfspool had compression set to gzip and the machine would 
freeze for long periods of time. Setting compression to gzip-9 was 
hardest on the machine while setting it to gzip (no compression level 
specified) appeared better but still had hangs. I don''t have any hard 
numbers though but I can provide some if needed.

All the pools are on the one physical laptop harddrive. When I was able 
to get some keystrokes in the load was at 18 at one point. Killing the 
tar process free''d things up straightaway.

john

Rayson Ho wrote:> On 5/3/07, Frank Hofmann <Frank.Hofmann at sun.com> wrote:
>> I''m not quite sure what this test should show ?
> 
> I didn''t try the test myself... but I think what it shows is a
> possible problem that turning compression can hang a machine.
> 
> Rayson
> 
> 
> 
>>
>> Compressing random data is the perfect way to generate heat.
>> After all, compression working relies on input entropy being low.
>> But good random generators are characterized by the opposite - output
>> entropy being high.
>> Even a good compressor, if operated on a good random
generator''s output,
>> will only end up burning cycles, but not reducing the data size.
>>
>> Hence, is the request here for the compressor module to
''adapt'', kind of
>> first-pass check the input data whether it''s sufficiently
low-entropy to
>> warrant a compression attempt ?
>>
>> If not, then what ?
>>
>> FrankH.
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Roch Bourbonnais

2007-May-03 16:35 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

with recent bits ZFS compression is now handled concurrently with  
many CPUs working on different records.
So this load will burn more CPUs and acheive it''s results  
(compression) faster.

So the observed pauses should be consistent with that of a load  
generating high system time.
The assumption is that compression now goes faster than when is was  
single threaded.

Is this undesirable ? We might seek a way to slow down compression in  
order to limit the system load.

-r

Le 3 mai 07 ? 14:21, Rayson Ho a ?crit :
> On 5/3/07, Frank Hofmann <Frank.Hofmann at sun.com> wrote:
>> I''m not quite sure what this test should show ?
>
> I didn''t try the test myself... but I think what it shows is a
> possible problem that turning compression can hang a machine.
>
> Rayson
>
>
>
>>
>> Compressing random data is the perfect way to generate heat.
>> After all, compression working relies on input entropy being low.
>> But good random generators are characterized by the opposite - output
>> entropy being high.
>> Even a good compressor, if operated on a good random
generator''s
>> output,
>> will only end up burning cycles, but not reducing the data size.
>>
>> Hence, is the request here for the compressor module to
''adapt'',
>> kind of
>> first-pass check the input data whether it''s sufficiently low-
>> entropy to
>> warrant a compression attempt ?
>>
>> If not, then what ?
>>
>> FrankH.
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Wade.Stuart at fallon.com

2007-May-03 16:43 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

zfs-discuss-bounces at opensolaris.org wrote on 05/03/2007 11:35:24 AM:
>
> with recent bits ZFS compression is now handled concurrently with
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it''s results
> (compression) faster.
>
> So the observed pauses should be consistent with that of a load
> generating high system time.
> The assumption is that compression now goes faster than when is was
> single threaded.
I think this may be a premature leap -- It is still undetermined if we are
running up against a yet unknown bug in the kernel implementation of gzip
used for this compression type. From my understanding the gzip code has
been reused from an older kernel implementation,  it may be possible that
this code has some issues with kernel stuttering when used for zfs
compression that may have not been exposed with its original usage.  If it
turns out that it is just a case of high cpu trade-off for buying faster
compression times, then the talk of a tunable may make sense (if it is even
possible given the constraints of the gzip code in kernelspace).



-Wade

Jürgen Keil

2007-May-03 17:04 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

> I''m not quite sure what this test should show ?
For me, the test shows how writing to a gzip compressed
pool completely kills interactive desktop performance.

At least when using an usb keyboard and mouse.
(I''ve not yet tested with a ps/2 keyboard & mouse; or
a SPARC box)
> Compressing random data is the perfect way to generate heat.
> After all, compression working relies on input entropy being low.
> But good random generators are characterized by the opposite - output 
> entropy being high. Even a good compressor, if operated on a good random
> generator''s output, will only end up burning cycles, but not
reducing the
> data size.
Whatever I write to the gzip compressed pool
(128K of /dev/urandom random data, or 128K of a 
buffer filled with completely with **** characters, or 
the first 128K from /etc/termcap),  the Xorg / Gnome
desktop becomes completely unusable while 
writing to such a gzip compressed zpool / zfs.

With an "lzjb" compressed zpool / zfs the system
remains more or less usable...
> Hence, is the request here for the compressor module
> to ''adapt'', kind of first-pass check the input data
whether it''s
> sufficiently low-entropy to warrant a compression attempt ?
> 
> If not, then what ?
I''m not yet sure what the problem is. But it sure would be nice
if a gzip compressed zpool / zfs wouldn''t kill interactive desktop
performance as is does now.
 
 
This message posted from opensolaris.org

Ian Collins

2007-May-03 20:20 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

Roch Bourbonnais wrote:>
> with recent bits ZFS compression is now handled concurrently with many
> CPUs working on different records.
> So this load will burn more CPUs and acheive it''s results
> (compression) faster.
>Would changing (selecting a smaller) filesystem record size have any effect?
> So the observed pauses should be consistent with that of a load
> generating high system time.
> The assumption is that compression now goes faster than when is was
> single threaded.
>
> Is this undesirable ? We might seek a way to slow down compression in
> order to limit the system load.
>I think you should, otherwise we have a performance throttle that scales
with the number of cores!

Ian

johansen-osdev at sun.com

2007-May-03 20:21 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

A couple more questions here.

[mpstat]
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>   0    0   0 3109  3616  316  196    5   17   48   45   245    0  85   0 
15
>   1    0   0 3127  3797  592  217    4   17   63   46   176    0  84   0 
15
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>   0    0   0 3051  3529  277  201    2   14   25   48   216    0  83   0 
17
>   1    0   0 3065  3739  606  195    2   14   37   47   153    0  82   0 
17
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>   0    0   0 3011  3538  316  242    3   26   16   52   202    0  81   0 
19
>   1    0   0 3019  3698  578  269    4   25   23   56   309    0  83   0 
17
> 
> # lockstat -kIW -D 20 sleep 30
> 
> Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec)
> 
> Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
>
-------------------------------------------------------------------------------
>  2068  34%  34% 0.00     1767 cpu[0]                 deflate_slow
>  1506  25%  59% 0.00     1721 cpu[1]                 longest_match
>  1017  17%  76% 0.00     1833 cpu[1]                 mach_cpu_idle
>   454   7%  83% 0.00     1539 cpu[0]                 fill_window
>   215   4%  87% 0.00     1788 cpu[1]                 pqdownheap<snip>

What do you have zfs compresison set to?  The gzip level is tunable,
according to zfs set, anyway:

PROPERTY       EDIT  INHERIT   VALUES
compression     YES      YES   on | off | lzjb | gzip | gzip-[1-9]

You still have idle time in this lockstat (and mpstat).

What do you get for a lockstat -A -D 20 sleep 30?

Do you see anyone with long lock hold times, long sleeps, or excessive
spinning?

The largest numbers from mpstat are for interrupts and cross calls.
What does intrstat(1M) show?

Have you run dtrace to determine the most frequent cross-callers?

#!/usr/sbin/dtrace -s

sysinfo:::xcalls
{
        @a[stack(30)] = count();
}

END
{
	trunc(@a, 30);
}

is an easy way to do this.

-j

Darren J Moffat

2007-May-04 08:23 UTC

head link

[zfs-discuss] gzip compression throttles system?

Erblichs wrote:> 	So, my first order would be to take 1GB or 10GB .wav files
> 	AND time both the kernel implementation of Gzip and the
> 	user application. Approx the same times MAY indicate
> 	that the kernel implementation gzip funcs should
> 	be treatedly maybe more as  interactive scheduling
> 	threads and that it is too high and blocks other
> 	threads or proces from executing.
If you just run gzip(1) against the files you are operating on the whole 
file so you only incur startup costs once and are thus doing quite a 
different compression to operating on a block level.  A fairer 
comparison would be to build a userland program that compresses and then 
writes to disk in ZFS blocksize chunks, that way you are compressing the 
same sizes of data and doing the startup every time just like zio has to do.

-- 
Darren J Moffat

Erblichs

2007-May-04 09:01 UTC

head link

[zfs-discuss] gzip compression throttles system?

Darren Moffat,

	Yes and no. A earlier statement within this discussion
	was whether gzip is appropriate for .wav files. This just
	gets a relative time to compress. And relative
	sizes of the files after the compression.

	My assumption is that gzip will run as a user app
	in one environment. The normal r/w sys calls then take
	a user buffer. So, it would be hard to believe that the 
	.wav file won''t be read one user buff at at time. Yes,
	it could be mmap''ed, but then it would have to be
	unmapped. Too many sys calls, I think for the app.
	Sorry, haven''t looked at it for awhile..

	Overall, I am just trying to guess at the read-ahead
	delay versus the user buffer versus the internal fs.
	The internal FS should take it basicly one FS block
	at a time (or do multiple blocks in parallel)
	and the user app takes it anywhere from
	one buffer to one page size, 8k at a time. So, due
	to reading one buffer at a time in a loop, with
	a context switch from kernel to user each time. Thus,
	I would expect that the gzip app would be sloooower.

	So, my first step is to keep it simple (KISS)and tell
	the group "what happens if" we do this simple
	comparison? And how many bytes/sec is compressed?
	And are they approx the same speed? Do you end up
	with the same size file????

	Mitchell Erblich
	------------------
	

Darren J Moffat wrote:> 
> Erblichs wrote:
> >       So, my first order would be to take 1GB or 10GB .wav files
> >       AND time both the kernel implementation of Gzip and the
> >       user application. Approx the same times MAY indicate
> >       that the kernel implementation gzip funcs should
> >       be treatedly maybe more as  interactive scheduling
> >       threads and that it is too high and blocks other
> >       threads or proces from executing.
> 
> If you just run gzip(1) against the files you are operating on the whole
> file so you only incur startup costs once and are thus doing quite a
> different compression to operating on a block level.  A fairer
> comparison would be to build a userland program that compresses and then
> writes to disk in ZFS blocksize chunks, that way you are compressing the
> same sizes of data and doing the startup every time just like zio has to
do.
> 
> --
> Darren J Moffat

Jürgen Keil

2007-May-04 10:32 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

> A couple more questions here.
... > What do you have zfs compresison set to?  The gzip level is
> tunable, according to zfs set, anyway:
> 
> PROPERTY       EDIT  INHERIT   VALUES
> compression     YES      YES   on | off | lzjb | gzip | gzip-[1-9]
I''ve used the "default" gzip compression level, that is I
used

    zfs set compression=gzip gzip_pool
> You still have idle time in this lockstat (and mpstat).
> 
> What do you get for a lockstat -A -D 20 sleep 30?
# lockstat -A -D 20 /usr/tmp/fill /gzip_pool/junk
lockstat: warning: 723388 aggregation drops on CPU 0
lockstat: warning: 239335 aggregation drops on CPU 1
lockstat: warning: 62366 aggregation drops on CPU 0
lockstat: warning: 51856 aggregation drops on CPU 1
lockstat: warning: 45187 aggregation drops on CPU 0
lockstat: warning: 46536 aggregation drops on CPU 1
lockstat: warning: 687832 aggregation drops on CPU 0
lockstat: warning: 575675 aggregation drops on CPU 1
lockstat: warning: 46504 aggregation drops on CPU 0
lockstat: warning: 40874 aggregation drops on CPU 1
lockstat: warning: 45571 aggregation drops on CPU 0
lockstat: warning: 33422 aggregation drops on CPU 1
lockstat: warning: 501063 aggregation drops on CPU 0
lockstat: warning: 361041 aggregation drops on CPU 1
lockstat: warning: 651 aggregation drops on CPU 0
lockstat: warning: 7011 aggregation drops on CPU 1
lockstat: warning: 61600 aggregation drops on CPU 0
lockstat: warning: 19386 aggregation drops on CPU 1
lockstat: warning: 566156 aggregation drops on CPU 0
lockstat: warning: 105502 aggregation drops on CPU 1
lockstat: warning: 25362 aggregation drops on CPU 0
lockstat: warning: 8700 aggregation drops on CPU 1
lockstat: warning: 585002 aggregation drops on CPU 0
lockstat: warning: 645299 aggregation drops on CPU 1
lockstat: warning: 237841 aggregation drops on CPU 0
lockstat: warning: 20931 aggregation drops on CPU 1
lockstat: warning: 320102 aggregation drops on CPU 0
lockstat: warning: 435898 aggregation drops on CPU 1
lockstat: warning: 115 dynamic variable drops with non-empty dirty list
lockstat: warning: 385192 aggregation drops on CPU 0
lockstat: warning: 81833 aggregation drops on CPU 1
lockstat: warning: 259105 aggregation drops on CPU 0
lockstat: warning: 255812 aggregation drops on CPU 1
lockstat: warning: 486712 aggregation drops on CPU 0
lockstat: warning: 61607 aggregation drops on CPU 1
lockstat: warning: 1865 dynamic variable drops with non-empty dirty list
lockstat: warning: 250425 aggregation drops on CPU 0
lockstat: warning: 171415 aggregation drops on CPU 1
lockstat: warning: 166277 aggregation drops on CPU 0
lockstat: warning: 74819 aggregation drops on CPU 1
lockstat: warning: 39342 aggregation drops on CPU 0
lockstat: warning: 3556 aggregation drops on CPU 1
lockstat: warning: ran out of data records (use -n for more)

Adaptive mutex spin: 4701 events in 64.812 seconds (73 events/sec)

Count indv cuml rcnt     spin Lock                   Caller                  
-------------------------------------------------------------------------------
 1726  37%  37% 0.00        2 vph_mutex+0x17e8       pvn_write_done+0x10c    
 1518  32%  69% 0.00        1 vph_mutex+0x17e8       hat_page_setattr+0x70   
  264   6%  75% 0.00        2 vph_mutex+0x2000       page_hashin+0xad        
  194   4%  79% 0.00        4 0xfffffffed2ee0a88     cv_wait+0x69            
  106   2%  81% 0.00        2 vph_mutex+0x2000       page_hashout+0xdd       
   91   2%  83% 0.00        4 0xfffffffed2ee0a88     taskq_dispatch+0x2c9    
   83   2%  85% 0.00        4 0xfffffffed2ee0a88     taskq_thread+0x1cb      
   83   2%  86% 0.00        1 0xfffffffec17a56b0     ufs_iodone+0x3d         
   47   1%  87% 0.00        4 0xfffffffec1e4ce98     vdev_queue_io+0x85      
   43   1%  88% 0.00        6 0xfffffffec139a2c0     trap+0xf66              
   38   1%  89% 0.00        6 0xfffffffecb5f8cd0     cv_wait+0x69            
   37   1%  90% 0.00        4 0xfffffffec143ee90     dmult_deque+0x36        
   26   1%  91% 0.00        2 htable_mutex+0x108     htable_release+0x79     
   26   1%  91% 0.00        1 0xfffffffec17a56b0     ufs_putpage+0xa4        
   18   0%  91% 0.00        4 0xfffffffec00dca48     ghd_intr+0xa8           
   17   0%  92% 0.00        2 0xfffffffec00dca48     ghd_waitq_delete+0x35   
   12   0%  92% 0.00        2 htable_mutex+0x248     htable_release+0x79     
   11   0%  92% 0.00        8 0xfffffffec1e4ce98     vdev_queue_io_done+0x3b 
   10   0%  93% 0.00        3 0xfffffffec00dca48     ghd_transport+0x71      
   10   0%  93% 0.00        2 0xffffff00077dc138    
page_get_mnode_freelist+0xdb
-------------------------------------------------------------------------------

Adaptive mutex block: 167 events in 64.812 seconds (3 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
   78  47%  47% 0.00    31623 vph_mutex+0x17e8       pvn_write_done+0x10c    
   25  15%  62% 0.00    97897 0xfffffffed2ee0a88     cv_wait+0x69            
   13   8%  69% 0.00    70426 0xfffffffed2ee0a88     taskq_dispatch+0x2c9    
   10   6%  75% 0.00    30559 0xfffffffec17a56b0     ufs_iodone+0x3d         
    9   5%  81% 0.00    34569 0xfffffffec143ee90     dmult_deque+0x36        
    4   2%  83% 0.00    30579 0xfffffffec00dca48     ghd_waitq_delete+0x35   
    3   2%  85% 0.00   117726 0xfffffffec14365f0    
ohci_hcdi_pipe_bulk_xfer+0x50
    2   1%  86% 0.00    32324 pse_mutex+0x1a80       page_unlock+0x3b        
    1   1%  87% 0.00 381871279 ph_mutex+0xac0         page_create_va+0x334    
    1   1%  87% 0.00    34696 pse_mutex+0x1cc0       page_unlock+0x3b        
    1   1%  88% 0.00    29577 pse_mutex+0x1e80       page_unlock+0x3b        
    1   1%  89% 0.00    29324 pse_mutex+0x1580       page_unlock+0x3b        
    1   1%  89% 0.00    29029 pse_mutex+0x1400       page_unlock+0x3b        
    1   1%  90% 0.00    29806 pse_mutex+0x1700       page_unlock+0x3b        
    1   1%  90% 0.00    30873 pse_mutex+0x8c0        page_unlock+0x3b        
    1   1%  91% 0.00    34591 pse_mutex+0x700        page_unlock+0x3b        
    1   1%  92% 0.00    32196 pse_mutex+0x140        page_unlock+0x3b        
    1   1%  92% 0.00    31986 pse_mutex+0x440        page_unlock+0x3b        
    1   1%  93% 0.00    27500 pse_mutex+0xd00        page_unlock+0x3b        
    1   1%  93% 0.00    36400 pse_mutex+0xb00        page_unlock+0x3b        
-------------------------------------------------------------------------------

Spin lock spin: 252 events in 64.812 seconds (4 events/sec)

Count indv cuml rcnt     spin Lock                   Caller                  
-------------------------------------------------------------------------------
   97  38%  38% 0.00        4 cpu0_disp              disp_lock_enter+0x31    
   44  17%  56% 0.00        3 cpu0_disp              disp_lock_enter_high+0x11
   44  17%  73% 0.00        3 0xfffffffec1a82708     disp_lock_enter+0x31    
   36  14%  88% 0.00        3 xc_mbox_lock+0x11      mutex_vector_enter+0x4ee
   16   6%  94% 0.00        4 0xfffffffec1a82708     disp_lock_enter_high+0x11
   14   6% 100% 0.00        5 turnstile_table+0xd38  disp_lock_enter+0x31    
    1   0% 100% 0.00        4 turnstile_table+0xad8  disp_lock_enter+0x31    
-------------------------------------------------------------------------------

R/W writer blocked by writer: 8281 events in 64.812 seconds (128 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
 8281 100% 100% 0.00 20609451 0xfffffffec17a56a0     ufs_rwlock+0xfd         
-------------------------------------------------------------------------------

Adaptive mutex hold: 14476537 events in 64.812 seconds (223362 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
289666   2%   2% 0.00     1242 vph_mutex+0x2000       page_hashout+0xed       
274484   2%   4% 0.00     1295 vph_mutex+0x2000       page_hashin+0xc6        
256452   2%   6% 0.00     1359 vph_mutex+0x17e8       hat_page_setattr+0xbf   
255344   2%   7% 0.00     1245 vph_mutex+0x17e8       pvn_write_done+0x14b    
213470   1%   9% 0.00     1181 pcf+0x40               page_free+0x1ba         
146344   1%  10% 0.00     4043 0xfffffffec14742a0     getblk_common+0x1e1     
146344   1%  11% 0.00     1238 0xfffffffec14742a0     brelse+0x15d            
131168   1%  12% 0.00     1274 0xfffffffec142fdf0     deltamap_add+0x1f5      
106877   1%  13% 0.00     1277 0xfffffffec17a56b0     ufs_putpage+0xf8        
64537   0%  13% 0.00     1283 0xfffffffec0208cc8     free_vpmap+0x85         
64508   0%  13% 0.00     1293 0xfffffffec0208d60     free_vpmap+0x85         
61387   0%  14% 0.00     4756 0xfffffffec0208d08     get_free_vpmap+0x357    
61261   0%  14% 0.00     4640 0xfffffffec0208da0     get_free_vpmap+0x357    
59765   0%  15% 0.00     1294 0xfffffffec0208da0     free_vpmap+0x85         
59731   0%  15% 0.00     1251 0xfffffffec0208d08     free_vpmap+0x85         
57656   0%  16% 0.00     4686 0xfffffffec0208cc8     get_free_vpmap+0x357    
57655   0%  16% 0.00     4555 0xfffffffec0208d60     get_free_vpmap+0x357    
49703   0%  16% 0.00     1270 0xffffff00077dd718     page_ctr_sub+0x80       
48208   0%  17% 0.00     1273 0xffffff00077dd720     page_ctr_sub+0x80       
47333   0%  17% 0.00     1276 0xffffff00077dd710     page_ctr_sub+0x80       
-------------------------------------------------------------------------------

Spin lock hold: 1092671 events in 64.812 seconds (16859 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
508900  47%  47% 0.00     2965 xc_mbox_lock+0x11      mutex_vector_exit+0xf1  
292686  27%  73% 0.00     1353 sleepq_head+0x12f8     disp_lock_exit+0x56     
32656   3%  76% 0.00     1537 pcicfg_mutex+0x1       mutex_vector_exit+0xf1  
24638   2%  79% 0.00     1555 cpu0_disp              disp_lock_exit_high+0x34
21397   2%  81% 0.00     1580 0xfffffffec1a82708     disp_lock_exit_high+0x34
13948   1%  82% 0.00     1787 cpu[0]+0xe8            disp_lock_exit_high+0x34
12960   1%  83% 0.00     1742 hres_lock              dtrace_hres_tick+0x55   
12763   1%  84% 0.00     1943 cpu[1]+0xe8            disp_lock_exit_high+0x34
 8280   1%  85% 0.00     6327 turnstile_table+0xd38  disp_lock_exit+0x56     
 8268   1%  86% 0.00     4912 turnstile_table+0xd38  disp_lock_exit_high+0x34
 5670   1%  86% 0.00     5425 sleepq_head+0x1ab8     disp_lock_exit+0x56     
 5237   0%  87% 0.00     2626 cpu0_disp             
disp_lock_exit_nopreempt+0x44
 4854   0%  87% 0.00     1204 sleepq_head+0x1eb8     disp_lock_exit+0x56     
 4843   0%  88% 0.00     1223 sleepq_head+0x1a08     disp_lock_exit+0x56     
 4671   0%  88% 0.00     1382 sleepq_head+0xf58      disp_lock_exit+0x56     
 4599   0%  88% 0.00     1272 sleepq_head+0x1f98     disp_lock_exit+0x56     
 4308   0%  89% 0.00     3844 sleepq_head+0x1ab8    
disp_lock_exit_nopreempt+0x44
 4138   0%  89% 0.00     1405 cpu0_disp              disp_lock_exit+0x56     
 4130   0%  90% 0.00     1353 sleepq_head+0x17c8     disp_lock_exit+0x56     
 4111   0%  90% 0.00     1526 0xfffffffec1a82708     disp_lock_exit+0x56     
-------------------------------------------------------------------------------

R/W writer hold: 164764 events in 64.812 seconds (2542 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
125018  76%  76% 0.00    24819 0xfffffffec17a56a8     wrip+0x489              
12533   8%  83% 0.00     7098 0xfffffffec17a56a8     ufs_write+0x54b         
12532   8%  91% 0.00  1414263 0xfffffffec17a56a0     ufs_rwunlock+0x24       
 4729   3%  94% 0.00     1444 0xffffff01efb81d08     dnode_new_blkid+0x19d   
 3630   2%  96% 0.00     2009 0xfffffffec17a56a8     wrip+0x3cd              
 1648   1%  97% 0.00    31342 0xfffffffed09c3490     as_map_locked+0x305     
 1647   1%  98% 0.00     2829 0xfffffffec2b2bc88     segvn_extend_prev+0x149 
  438   0%  98% 0.00    40571 0xfffffffed09c33b0     as_unmap+0x25c          
  196   0%  99% 0.00     1468 0xfffffffed1ebc7d8     dnode_new_blkid+0x19d   
   89   0%  99% 0.00     1551 0xfffffffecba72700     spa_sync+0x4cc          
   89   0%  99% 0.00     1449 0xfffffffecb5f8cd8     txg_sync_thread+0x1c6   
   86   0%  99% 0.00     8430 0xfffffffecb5f8cd8     txg_sync_thread+0x186   
   61   0%  99% 0.00   910403 0xfffffffec142fef8     ldl_waito+0x6d          
   61   0%  99% 0.00     4446 0xfffffffec142fef8     get_write_bp+0xac       
   61   0%  99% 0.00     7958 0xfffffffec142fef8     inval_range+0xbd        
   53   0%  99% 0.00     1454 0xfffffffed37a2560     dnode_new_blkid+0x19d   
   40   0%  99% 0.00     1469 0xfffffffed3e58ad8     dnode_new_blkid+0x19d   
   34   0%  99% 0.00     1549 0xfffffffed379ea78     dnode_new_blkid+0x19d   
   32   0%  99% 0.00    16405 0xfffffffec7bec560     zfs_getpage+0x2ff       
   24   0%  99% 0.00     1497 0xfffffffed37a32d0     dnode_new_blkid+0x19d   
-------------------------------------------------------------------------------

R/W reader hold: 290702 events in 64.812 seconds (4485 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller                  
-------------------------------------------------------------------------------
125018  43%  43% 0.00    28827 0xfffffffec1435648     wrip+0x49e              
23000   8%  51% 0.00   188307 0xfffffffec17a56a8     ufs_putpages+0x340      
22368   8%  59% 0.00  8206227 0xfffffffed2ee0a90     taskq_thread+0x1c3      
14403   5%  64% 0.00     1508 0xfffffffed37a3518     dmu_zfetch_find+0x3c1   
14403   5%  69% 0.00     6107 0xfffffffed37a32d0     dnode_hold_impl+0xbc    
14403   5%  73% 0.00     6606 0xfffffffed37a32d0     dbuf_read+0x250         
12533   4%  78% 0.00    10450 0xfffffffec1435648     ufs_write+0x554         
 6722   2%  80% 0.00    75280 0xfffffffed09c3490     as_fault+0x658          
 6704   2%  82% 0.00    70784 0xfffffffed1e95958     segvn_fault+0xbfc       
 6704   2%  85% 0.00    66973 0xfffffffec2b2bc88     segvn_fault+0xbf4       
 4766   2%  86% 0.00     6106 0xffffff01efb81d08     dbuf_dirty+0x39c        
 4734   2%  88% 0.00     3757 0xffffff01efb81d08     dbuf_read+0x250         
 4729   2%  90% 0.00   180997 0xffffffff76afd928     zfs_write+0x3f5         
 4728   2%  91% 0.00    21145 0xffffff01efb81d08    
dmu_buf_hold_array_by_dnode+0x1eb
 3630   1%  92% 0.00     5639 0xfffffffec1435648     wrip+0x3e2              
 3576   1%  94% 0.00 25200843 0xfffffffed2ee0b68     taskq_thread+0x1c3      
 2764   1%  95% 0.00     5248 0xfffffffed37a3a68     dbuf_read+0x250         
 2519   1%  96% 0.00     4934 0xfffffffec17a56a8     ufs_getpage+0x8d5       
 1379   0%  96% 0.00    68439 0xfffffffed09c33b0     as_fault+0x658          
 1229   0%  96% 0.00    56389 0xffffffff76b28d88     segvn_fault+0xbf4       
-------------------------------------------------------------------------------

> Do you see anyone with long lock hold times, long
> sleeps, or excessive spinning?
Not sure if the above lockstat shows anything excessive.
> The largest numbers from mpstat are for interrupts
> and cross calls.
> What does intrstat(1M) show?
The first two are for the idle machine; starting with
#3 filling of the gzip compressed zpool starts:

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         1  0.0         0  0.0
      ehci#0 |         1  0.0         0  0.0
       nge#0 |         0  0.0         0  0.0
    nvidia#0 |        60  0.1         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0         0  0.0

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         2  0.0         0  0.0
      ehci#0 |         2  0.0         0  0.0
       nge#0 |         0  0.0       158  0.0
    nvidia#0 |        61  0.2         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0       158  0.1

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         2  0.0         0  0.0
   audiohd#0 |         4  0.0         0  0.0
      ehci#0 |         4  0.0         0  0.0
       nge#0 |         0  0.0      3547  0.3
    nvidia#0 |        60  0.2         0  0.0
      ohci#0 |         0  0.0        35  0.1
   pci-ide#1 |         0  0.0      3547  2.8

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         1  0.0         0  0.0
      ehci#0 |         1  0.0         0  0.0
       nge#0 |         0  0.0      2861  0.2
    nvidia#0 |        60  0.1         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0      2861  2.2

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         1  0.0         0  0.0
   audiohd#0 |         2  0.0         0  0.0
      ehci#0 |         2  0.0         0  0.0
       nge#0 |         0  0.0       380  0.0
    nvidia#0 |        61  0.3         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0       380  0.3

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         0  0.0         0  0.0
      ehci#0 |         0  0.0         0  0.0
       nge#0 |         0  0.0      3302  0.2
    nvidia#0 |        60  0.2         0  0.0
      ohci#0 |         0  0.0        30  0.1
   pci-ide#1 |         0  0.0      3302  2.6

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         2  0.0         0  0.0
      ehci#0 |         2  0.0         0  0.0
       nge#0 |         0  0.0      3165  0.3
    nvidia#0 |        61  0.2         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0      3165  2.5

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         1  0.0         0  0.0
   audiohd#0 |         3  0.0         0  0.0
      ehci#0 |         3  0.0         0  0.0
       nge#0 |         0  0.0       304  0.0
    nvidia#0 |        61  0.3         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0       304  0.2

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         2  0.0         0  0.0
   audiohd#0 |         0  0.0         0  0.0
      ehci#0 |         0  0.0         0  0.0
       nge#0 |         0  0.0      3878  0.3
    nvidia#0 |        60  0.2         0  0.0
      ohci#0 |         0  0.0        49  0.1
   pci-ide#1 |         0  0.0      3878  3.0

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         2  0.0         0  0.0
      ehci#0 |         2  0.0         0  0.0
       nge#0 |         0  0.0      2679  0.2
    nvidia#0 |        61  0.1         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0      2679  2.1

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         1  0.0         0  0.0
       ata#1 |         1  0.0         0  0.0
   audiohd#0 |         3  0.0         0  0.0
      ehci#0 |         3  0.0         0  0.0
       nge#0 |         0  0.0       312  0.0
    nvidia#0 |        60  0.2         0  0.0
      ohci#0 |         0  0.0         0  0.0
   pci-ide#1 |         0  0.0       312  0.2

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
       ata#0 |         0  0.0         0  0.0
       ata#1 |         0  0.0         0  0.0
   audiohd#0 |         0  0.0         0  0.0
      ehci#0 |         0  0.0         0  0.0
       nge#0 |         0  0.0      6403  0.5
    nvidia#0 |       103  0.5         0  0.0
      ohci#0 |         0  0.0       413  1.2
   pci-ide#1 |         0  0.0      6403  5.1

 > Have you run dtrace to determine the most frequent
> cross-callers?
> 
> #!/usr/sbin/dtrace -s
> 
> sysinfo:::xcalls
> {
>         @a[stack(30)] = count();
> 
> 	trunc(@a, 30);
> }
> 
> is an easy way to do this.
Seems to taskq_threads, running zfs'' gzip compression:

dtrace: script ''/home/jk/src/dtrace/xcalls.d'' matched 3 probes
^C
CPU     ID                    FUNCTION:NAME
  1      2                             :END 


              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x1df
              unix`hat_unload+0x41
              genunix`segkp_release_internal+0xb2
              genunix`segkp_release+0xb6
              genunix`schedctl_freepage+0x30
              genunix`schedctl_proc_cleanup+0x5c
              genunix`exec_args+0x1c2
              elfexec`elf32exec+0x410
              genunix`gexec+0x36d
              genunix`exec_common+0x417
              genunix`exece+0x1b
              unix`sys_syscall32+0x101
                1

              unix`xc_do_call+0xfd
              unix`xc_sync+0x2b
              unix`dtrace_xcall+0x6f
              dtrace`dtrace_state_go+0x3ec
              dtrace`dtrace_ioctl+0x79c
              genunix`cdev_ioctl+0x48
              specfs`spec_ioctl+0x86
              genunix`fop_ioctl+0x37
              genunix`ioctl+0x16b
              unix`sys_syscall+0x17b
                1

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x1df
              unix`hat_unload+0x41
              swrand`physmem_ent_gen+0x190
              swrand`rnd_handler+0x29
              genunix`callout_execute+0xb1
              genunix`taskq_thread+0x1a7
              unix`thread_start+0x8
               24

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              genunix`segkp_release_internal+0xb2
              genunix`segkp_release+0xb6
              genunix`thread_free+0x238
              genunix`thread_reap_list+0x1f
              genunix`thread_reaper+0x130
              unix`thread_start+0x8
               45

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              genunix`kmem_free+0x47
              kstat`read_kstat_data+0x484
              kstat`kstat_ioctl+0x4a
              genunix`cdev_ioctl+0x48
              specfs`spec_ioctl+0x86
              genunix`fop_ioctl+0x37
              genunix`ioctl+0x16b
              unix`sys_syscall32+0x101
               76

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              unix`kfreea+0x5a
              unix`i_ddi_mem_free+0x5d
              genunix`ddi_dma_mem_free+0x26
              ehci`ehci_free_tw+0x45
              ehci`ehci_deallocate_tw+0xb6
              ehci`ehci_traverse_active_qtd_list+0xdf
              ehci`ehci_intr+0x162
              unix`av_dispatch_autovect+0x78
              unix`dispatch_hardint+0x2f
              unix`switch_sp_and_call+0x13
              120

              unix`xc_do_call+0xfd
              unix`xc_sync+0x2b
              unix`dtrace_xcall+0x6f
              unix`dtrace_sync+0x18
              dtrace`dtrace_state_deadman+0x18
              genunix`cyclic_softint+0xc9
              unix`cbe_low_level+0x17
              unix`av_dispatch_softvect+0x5f
              unix`dispatch_softint+0x34
              unix`switch_sp_and_call+0x13
              149

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              unix`kfreea+0x5a
              unix`i_ddi_mem_free+0x5d
              rootnex`rootnex_teardown_copybuf+0x20
              rootnex`rootnex_dma_unbindhdl+0xbe
              genunix`ddi_dma_unbind_handle+0x22
              ata`ghd_dmafree_attr+0x29
              ata`ata_disk_memfree+0x20
              gda`gda_free+0x39
              dadk`dadk_iodone+0xbf
              dadk`dadk_pktcb+0xa5
              ata`ata_disk_complete+0x115
              ata`ata_hba_complete+0x34
              ata`ghd_doneq_process+0x85
              unix`av_dispatch_softvect+0x5f
              unix`dispatch_softint+0x34
              unix`switch_sp_and_call+0x13
              290

              unix`xc_do_call+0xfd
              unix`xc_sync+0x2b
              unix`dtrace_xcall+0x6f
              dtrace`dtrace_ioctl+0x47f
              genunix`cdev_ioctl+0x48
              specfs`spec_ioctl+0x86
              genunix`fop_ioctl+0x37
              genunix`ioctl+0x16b
              unix`sys_syscall+0x17b
              300

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              unix`kfreea+0x5a
              unix`i_ddi_mem_free+0x5d
              genunix`ddi_dma_mem_free+0x26
              ohci`ohci_free_tw+0x59
              ohci`ohci_deallocate_tw_resources+0xa4
              ohci`ohci_traverse_done_list+0xfc
              ohci`ohci_intr+0x1d0
              unix`av_dispatch_autovect+0x78
              unix`dispatch_hardint+0x2f
              unix`switch_sp_and_call+0x13
             1020

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              genunix`kmem_free+0x47
              unix`kobj_free+0x23
              unix`zcfree+0x36
              unix`z_deflateEnd+0x74
              unix`z_compress_level+0x9d
              zfs`gzip_compress+0x4b
              zfs`zio_compress_data+0xbc
              zfs`zio_write_compress+0x8d
              genunix`taskq_thread+0x1a7
              unix`thread_start+0x8
           133773

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              genunix`kmem_free+0x47
              unix`kobj_free+0x23
              unix`zcfree+0x36
              unix`z_deflateEnd+0x91
              unix`z_compress_level+0x9d
              zfs`gzip_compress+0x4b
              zfs`zio_compress_data+0xbc
              zfs`zio_write_compress+0x8d
              genunix`taskq_thread+0x1a7
              unix`thread_start+0x8
           133773

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              genunix`kmem_free+0x47
              unix`kobj_free+0x23
              unix`zcfree+0x36
              unix`z_deflateEnd+0xae
              unix`z_compress_level+0x9d
              zfs`gzip_compress+0x4b
              zfs`zio_compress_data+0xbc
              zfs`zio_write_compress+0x8d
              genunix`taskq_thread+0x1a7
              unix`thread_start+0x8
           133773

              unix`xc_do_call+0xfd
              unix`xc_wait_sync+0x2b
              unix`x86pte_inval+0x139
              unix`hat_pte_unmap+0xfc
              unix`hat_unload_callback+0x148
              unix`hat_unload+0x41
              unix`segkmem_free_vn+0x6a
              unix`segkmem_free+0x20
              genunix`vmem_xfree+0x10c
              genunix`vmem_free+0x25
              genunix`kmem_free+0x47
              unix`kobj_free+0x23
              unix`zcfree+0x36
              unix`z_deflateEnd+0xcb
              unix`z_compress_level+0x9d
              zfs`gzip_compress+0x4b
              zfs`zio_compress_data+0xbc
              zfs`zio_write_compress+0x8d
              genunix`taskq_thread+0x1a7
              unix`thread_start+0x8
           133773
 
 
This message posted from opensolaris.org

Jürgen Keil

2007-May-04 12:06 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

Roch Bourbonnais wrote
> with recent bits ZFS compression is now handled concurrently with  
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it''s results  
> (compression) faster.
Is this done using the taskq''s, created in spa_activate()?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109

These threads seems to be running the gzip compression code,
and are apparently started with a priority of maxclsyspri == 99.
> So the observed pauses should be consistent with that of a load  
> generating high system time.
> The assumption is that compression now goes faster than when is was  
> single threaded.
> 
> Is this undesirable ? We might seek a way to slow
> down compression in  order to limit the system load.
Hmm, I see that the USB device drivers are also using taskq''s,
see file usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c,
function usba_init_pipe_handle().  The USB device driver is
using a priority of minclsyspri == 60 (or "maxclsyspri - 5" == 94,
in the case of isochronuous usb pipes):

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c#427


Could this be a problem?  That is, when zfs'' taskq is filled with
lots of compression requests, there is no time left running USB
taskq that have a lower priority than zfs?
 
 
This message posted from opensolaris.org

Roch - PAE

2007-May-04 16:43 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

Ian Collins writes:
 > Roch Bourbonnais wrote:
 > >
 > > with recent bits ZFS compression is now handled concurrently with
many
 > > CPUs working on different records.
 > > So this load will burn more CPUs and acheive it''s results
 > > (compression) faster.
 > >
 > Would changing (selecting a smaller) filesystem record size have any
effect?
 > 

If the problem is that we just have a high kernel load
compressing blocks, then probably not. If anything small
records might be a tad less efficient (thus needing more CPU).

 > > So the observed pauses should be consistent with that of a load
 > > generating high system time.
 > > The assumption is that compression now goes faster than when is was
 > > single threaded.
 > >
 > > Is this undesirable ? We might seek a way to slow down compression in
 > > order to limit the system load.
 > >
 > I think you should, otherwise we have a performance throttle that scales
 > with the number of cores!
 > 

Again I wonder to what extent the issue becomes painful due 
to lack of write throttling. Once we have that in, we should 
revisit this. 

-r

 > Ian
 >

Jürgen Keil

2007-May-04 18:08 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

> A couple more questions here.
...> You still have idle time in this lockstat (and mpstat).
> 
> What do you get for a lockstat -A -D 20 sleep 30?
> 
> Do you see anyone with long lock hold times, long
> sleeps, or excessive spinning?
Hmm, I ran a series of "lockstat -A -l ph_mutex -s 16 -D 20 sleep 5"
commands while writing to the gzip compressed zpool, and noticed
these high mutex block times:


Adaptive mutex block: 8 events in 5.100 seconds (2 events/sec)

-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    5  62%  62% 0.00 317300109 ph_mutex+0x1380        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1b8   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1  12%  75% 0.00 260247717 ph_mutex+0x1a40        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1de   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1  12%  88% 0.00 348135263 ph_mutex+0x1380        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1a1   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1  12% 100% 0.00 109836587 ph_mutex+0x1380        page_find+0x82          

      nsec ------ Time Distribution ------ count     Stack                   
 134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_free_vn+0x81    
                                                     segkmem_free+0x20       
                                                     vmem_xfree+0x10c        
                                                     vmem_free+0x25          
                                                     kmem_free+0x47          
                                                     kobj_free+0x23          
                                                     zcfree+0x36             
                                                     z_deflateEnd+0x91       
                                                     z_compress_level+0x9d   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
                                                     zio_write_compress+0x8d 
                                                     taskq_thread+0x1a7      
                                                     thread_start+0x8        
-------------------------------------------------------------------------------






Or this one, a few seconds later:

Adaptive mutex block: 25 events in 5.208 seconds (5 events/sec)

-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    3  12%  12% 0.00 643510908 ph_mutex+0x13c0        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x18a   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    3  12%  24% 0.00 210239890 ph_mutex+0x1380        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1b8   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    3  12%  36% 0.00 179186654 ph_mutex+0xd80         page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1b8   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    2   8%  44% 0.00 649290823 ph_mutex+0x1680        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1a1   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    2   8%  52% 0.00 220946732 ph_mutex+0x1380        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1de   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    2   8%  60% 0.00 199293221 ph_mutex+0xd80         page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1a1   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  64% 0.00 764285200 ph_mutex+0x1f00        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1de   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  68% 0.00 761611341 ph_mutex+0x1f00        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1a1   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  72% 0.00 762076326 ph_mutex+0x1b00        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1b8   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  76% 0.00 762026717 ph_mutex+0x1b00        page_find+0x82          

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_free_vn+0x81    
                                                     segkmem_free+0x20       
                                                     vmem_xfree+0x10c        
                                                     vmem_free+0x25          
                                                     kmem_free+0x47          
                                                     kobj_free+0x23          
                                                     zcfree+0x36             
                                                     z_deflateEnd+0x74       
                                                     z_compress_level+0x9d   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
                                                     zio_write_compress+0x8d 
                                                     taskq_thread+0x1a7      
                                                     thread_start+0x8        
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  80% 0.00 681787674 ph_mutex+0x1680        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1de   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  84% 0.00 154470424 ph_mutex+0xd80         page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1de   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  88% 0.00 61512189 ph_mutex+0x1380        page_find+0x82          

      nsec ------ Time Distribution ------ count     Stack                   
  67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_free_vn+0x81    
                                                     segkmem_free+0x20       
                                                     vmem_xfree+0x10c        
                                                     vmem_free+0x25          
                                                     kmem_free+0x47          
                                                     kobj_free+0x23          
                                                     zcfree+0x36             
                                                     z_deflateEnd+0xcb       
                                                     z_compress_level+0x9d   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
                                                     zio_write_compress+0x8d 
                                                     taskq_thread+0x1a7      
                                                     thread_start+0x8        
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  92% 0.00 200988978 ph_mutex+0x1380        page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1a1   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4%  96% 0.00 11823303 ph_mutex+0x340         page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
  16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1b8   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller                  
    1   4% 100% 0.00 15941350 ph_mutex+0x340         page_create_va+0x334    

      nsec ------ Time Distribution ------ count     Stack                   
  16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1         segkmem_page_create+0x89
                                                     segkmem_xalloc+0xbc     
                                                     segkmem_alloc_vn+0xcd   
                                                     segkmem_alloc+0x20      
                                                     vmem_xalloc+0x4fc       
                                                     vmem_alloc+0x159        
                                                     kmem_alloc+0x4f         
                                                     kobj_alloc+0x7e         
                                                     kobj_zalloc+0x1c        
                                                     zcalloc+0x2d            
                                                     z_deflateInit2_+0x1de   
                                                     z_deflateInit_+0x32     
                                                     z_compress_level+0x77   
                                                     gzip_compress+0x4b      
                                                     zio_compress_data+0xbc  
-------------------------------------------------------------------------------
 
 
This message posted from opensolaris.org

Erblichs

2007-May-04 21:07 UTC

head link

[zfs-discuss] gzip compression throttles system?

Ian Collins,

	My two free cents..

	If the gzip was in application space, most gzip''s implementations
	support (maybe a new compile) a less extensive/expensive "deflation" 
        that would consume fewer CPU cycles.

	Secondly, if the file objects are being written locally, the
	writes to disk are being done asynchronously and shouldn''t
	really delay other processes and slow down the system.
	
	So, my first order would be to take 1GB or 10GB .wav files
	AND time both the kernel implementation of Gzip and the
	user application. Approx the same times MAY indicate
	that the kernel implementation gzip funcs should
	be treatedly maybe more as  interactive scheduling
	threads and that it is too high and blocks other
	threads or proces from executing.


	Mitchell Erblich
	Sr Software Engineer
	----------------


Ian Collins wrote:> 
> I just had a quick play with gzip compression on a filesystem and the
> result was the machine grinding to a halt while copying some large
> (.wav) files to it from another filesystem in the same pool.
> 
> The system became very unresponsive, taking several seconds to echo
> keystrokes.  The box is a maxed out AMD QuadFX, so it should have plenty
> of grunt for this.
> 
> Comments?
> 
> Ian
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jürgen Keil

2007-May-07 10:16 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

> A couple more questions here.
> 
> [mpstat]
> 
> > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
> > 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15
> > 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15
> > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
> > 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17
> > 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17
> > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
> > 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19
> > 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17
...> The largest numbers from mpstat are for interrupts and cross calls.
> What does intrstat(1M) show?
>
> Have you run dtrace to determine the most frequent cross-callers?
As far as I understand it, we have these frequent cross calls
because 
1. the test was run on an x86 MP machine
2. the kernel zmod / gzip code allocates and frees four big chunks of
   memory (4 * 65544 bytes) per zio_write_compress ( gzip ) call  [1]

Freeing these big memory chunks generates lots of cross calls,
because page table entries for that memory are invalidated on all
cpus (cores).


Of cause this effect cannot be observed on an uniprocessor machine
(one cpu / core).

And apparently it isn''t the root cause for the bad interactive
performance with this test;  the bad interactive performance can
also be observed on single cpu/single core x86 machines.


A possible optimization for MP machines:  use some kind of
kmem_cache for the gzip buffers, so that these buffers could
be reused between gzip compression calls.


[1] allocations per zio_write_compress() / gzip_compress() call:

  1   6642                 kobj_alloc:entry sz 5936, fl 1001
  1   6642                 kobj_alloc:entry sz 65544, fl 1001
  1   6642                 kobj_alloc:entry sz 65544, fl 1001
  1   6642                 kobj_alloc:entry sz 65544, fl 1001
  1   6642                 kobj_alloc:entry sz 65544, fl 1001
  1   5769                  kobj_free:entry fffffffeeb307000: sz 65544
  1   5769                  kobj_free:entry fffffffeeb2f5000: sz 65544
  1   5769                  kobj_free:entry fffffffeeb2e3000: sz 65544
  1   5769                  kobj_free:entry fffffffeeb2d1000: sz 65544
  1   5769                  kobj_free:entry fffffffed1c42000: sz 5936
 
 
This message posted from opensolaris.org

Jürgen Keil

2007-May-07 12:02 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

> with recent bits ZFS compression is now handled concurrently with  
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it''s results  
> (compression) faster.
> 
> So the observed pauses should be consistent with that of a load  
> generating high system time.
> The assumption is that compression now goes faster than when is was  
> single threaded.
> 
> Is this undesirable ? We might seek a way to slow down compression in  
> order to limit the system load.
According to this dtrace script....

#!/usr/sbin/dtrace -s

sdt:genunix::taskq-enqueue
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t
*)&`zio_write_compress/
{
        @where[stack()] = count();
}

tick-5s {
        printa(@where);
        trunc(@where);
}




... I see bursts of ~ 1000 zio_write_compress() [gzip] taskq calls
enqueued into the "spa_zio_issue" taskq by zfs`spa_sync() and
its children:

  0  76337                         :tick-5s 
...
              zfs`zio_next_stage+0xa1
              zfs`zio_wait_for_children+0x5d
              zfs`zio_wait_children_ready+0x20
              zfs`zio_next_stage_async+0xbb
              zfs`zio_nowait+0x11
              zfs`dbuf_sync_leaf+0x1b3
              zfs`dbuf_sync_list+0x51
              zfs`dbuf_sync_indirect+0xcd
              zfs`dbuf_sync_list+0x5e
              zfs`dbuf_sync_indirect+0xcd
              zfs`dbuf_sync_list+0x5e
              zfs`dnode_sync+0x214
              zfs`dmu_objset_sync_dnodes+0x55
              zfs`dmu_objset_sync+0x13d
              zfs`dsl_dataset_sync+0x42
              zfs`dsl_pool_sync+0xb5
              zfs`spa_sync+0x1c5
              zfs`txg_sync_thread+0x19a
              unix`thread_start+0x8
             1092

  0  76337                         :tick-5s 



It seems that after such a batch of compress requests is
submitted to the "spa_zio_issue" taskq, the kernel is busy
for several seconds working on these taskq entries.
It seems that this blocks all other "taskq" activity inside the
kernel...



This dtrace script counts the number of 
zio_write_compress() calls enqueued / execed 
by the kernel per second:

#!/usr/sbin/dtrace -qs

sdt:genunix::taskq-enqueue
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t
*)&`zio_write_compress/
{
        this->tqe = (taskq_ent_t *)arg1;
        @enq[this->tqe->tqent_func] = count();
}

sdt:genunix::taskq-exec-end
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t
*)&`zio_write_compress/
{
        this->tqe = (taskq_ent_t *)arg1;
        @exec[this->tqe->tqent_func] = count();
}

tick-1s {
        /*
        printf("%Y\n", walltimestamp);
        */
        printf("TS(sec): %u\n", timestamp / 1000000000);
        printa("enqueue %a: %@u\n", @enq);
        printa("exec    %a: %@u\n", @exec);
        trunc(@enq);
        trunc(@exec);
}




I see bursts of zio_write_compress() calls enqueued / execed,
and periods of time where no zio_write_compress() taskq calls
are enqueued or execed.

10#  ~jk/src/dtrace/zpool_gzip7.d 
TS(sec): 7829
TS(sec): 7830
TS(sec): 7831
TS(sec): 7832
TS(sec): 7833
TS(sec): 7834
TS(sec): 7835
enqueue zfs`zio_write_compress: 1330
exec    zfs`zio_write_compress: 1330
TS(sec): 7836
TS(sec): 7837
TS(sec): 7838
TS(sec): 7839
TS(sec): 7840
TS(sec): 7841
TS(sec): 7842
TS(sec): 7843
TS(sec): 7844
enqueue zfs`zio_write_compress: 1116
exec    zfs`zio_write_compress: 1116
TS(sec): 7845
TS(sec): 7846
TS(sec): 7847
TS(sec): 7848
TS(sec): 7849
TS(sec): 7850
TS(sec): 7851
TS(sec): 7852
TS(sec): 7853
TS(sec): 7854
TS(sec): 7855
TS(sec): 7856
TS(sec): 7857
enqueue zfs`zio_write_compress: 932
exec    zfs`zio_write_compress: 932
TS(sec): 7858
TS(sec): 7859
TS(sec): 7860
TS(sec): 7861
TS(sec): 7862
TS(sec): 7863
TS(sec): 7864
TS(sec): 7865
TS(sec): 7866
TS(sec): 7867
enqueue zfs`zio_write_compress: 5
exec    zfs`zio_write_compress: 5
TS(sec): 7868
enqueue zfs`zio_write_compress: 774
exec    zfs`zio_write_compress: 774
TS(sec): 7869
TS(sec): 7870
TS(sec): 7871
TS(sec): 7872
TS(sec): 7873
TS(sec): 7874
TS(sec): 7875
TS(sec): 7876
enqueue zfs`zio_write_compress: 653
exec    zfs`zio_write_compress: 653
TS(sec): 7877
TS(sec): 7878
TS(sec): 7879
TS(sec): 7880
TS(sec): 7881


And a final dtrace script, which monitors scheduler activity while
filling a gzip compressed pool:

#!/usr/sbin/dtrace -qs

sched:::off-cpu,
sched:::on-cpu,
sched:::remain-cpu,
sched:::preempt
{
        /*
        @[probename, stack()] = count();
        */
        @[probename] = count();
}


tick-1s {
        printf("%Y", walltimestamp);
        printa(@);
        trunc(@);
}


It shows periods of time with absolutely *no*
scheduling activity (I guess this is when the
"spa_zio_issue" taskq is working on such a bug
batch of submitted gzip compression calls):

21# ~jk/src/dtrace/zpool_gzip9.d
2007 May  6 21:38:12
  preempt                                                          13
  off-cpu                                                         808
  on-cpu                                                          808
2007 May  6 21:38:13
  preempt                                                           1
  off-cpu                                                         446
  on-cpu                                                          446
2007 May  6 21:38:14
  preempt                                                           6
  off-cpu                                                        2308
  on-cpu                                                         2308
2007 May  6 21:38:15
  preempt                                                           1
  off-cpu                                                         429
  on-cpu                                                          429
2007 May  6 21:38:16
  off-cpu                                                         412
  on-cpu                                                          412
2007 May  6 21:38:17
  off-cpu                                                         711
  on-cpu                                                          711
2007 May  6 21:38:18
  preempt                                                           2
  off-cpu                                                         238
  on-cpu                                                          238
2007 May  6 21:38:19
  preempt                                                          48
  off-cpu                                                        7740
  on-cpu                                                         7740
2007 May  6 21:38:20
  preempt                                                         114
  off-cpu                                                       20823
  on-cpu                                                        20823
2007 May  6 21:38:21                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:22                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:23                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:24
  preempt                                                         181
  off-cpu                                                       32486
  on-cpu                                                        32486
2007 May  6 21:38:25
  preempt                                                           1
  off-cpu                                                        1612
  on-cpu                                                         1612
2007 May  6 21:38:26
  preempt                                                           6
  off-cpu                                                        1068
  on-cpu                                                         1068
2007 May  6 21:38:27
2007 May  6 21:38:28
  preempt                                                          56
  off-cpu                                                        9756
  on-cpu                                                         9756
2007 May  6 21:38:29
  preempt                                                         200
  off-cpu                                                       36195
  on-cpu                                                        36195
2007 May  6 21:38:30                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:31                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:32                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:33
  preempt                                                         144
  off-cpu                                                       25982
  on-cpu                                                        25982
2007 May  6 21:38:34
  preempt                                                         118
  off-cpu                                                       20705
  on-cpu                                                        20705
2007 May  6 21:38:35                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:36                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:37                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:38
  preempt                                                         231
  off-cpu                                                       42003
  on-cpu                                                        42003
2007 May  6 21:38:39
  preempt                                                          42
  off-cpu                                                        4781
  on-cpu                                                         4781
2007 May  6 21:38:40                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:41                                              
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2007 May  6 21:38:42
  preempt                                                           4
  off-cpu                                                         486
  on-cpu                                                          486
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-May-08 09:58 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

Hello Ian,

Thursday, May 3, 2007, 10:20:20 PM, you wrote:

IC> Roch Bourbonnais wrote:>>
>> with recent bits ZFS compression is now handled concurrently with many
>> CPUs working on different records.
>> So this load will burn more CPUs and acheive it''s results
>> (compression) faster.
>>IC> Would changing (selecting a smaller) filesystem record size have any
effect?
>> So the observed pauses should be consistent with that of a load
>> generating high system time.
>> The assumption is that compression now goes faster than when is was
>> single threaded.
>>
>> Is this undesirable ? We might seek a way to slow down compression in
>> order to limit the system load.
>>IC> I think you should, otherwise we have a performance throttle that scales
IC> with the number of cores!

For file servers you actually want to all CPUs to be used for
compression otherwise you get bad performance with compression and
plenty of CPU left doing nothing...

So maybe special pool/dataset property (compression_parallelism=N?)

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Adam Leventhal

2007-May-09 09:02 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

On Thu, May 03, 2007 at 11:43:49AM -0500, Wade.Stuart at fallon.com
wrote:> I think this may be a premature leap -- It is still undetermined if we are
> running up against a yet unknown bug in the kernel implementation of gzip
> used for this compression type. From my understanding the gzip code has
> been reused from an older kernel implementation,  it may be possible that
> this code has some issues with kernel stuttering when used for zfs
> compression that may have not been exposed with its original usage.  If it
> turns out that it is just a case of high cpu trade-off for buying faster
> compression times, then the talk of a tunable may make sense (if it is even
> possible given the constraints of the gzip code in kernelspace).
The in-kernel version is zlib is the latest version (1.2.3). It''s not
surprising that we''re spending all of our time in zlib if the machine
is
being driving by I/O. There are outstanding problems with compression in
the ZIO pipeline that may contribute to the bursty behavior.

Adam

-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

Darren J Moffat

2007-May-09 10:52 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

Adam Leventhal wrote:
> The in-kernel version is zlib is the latest version (1.2.3). It''s
not
> surprising that we''re spending all of our time in zlib if the
machine is
> being driving by I/O. There are outstanding problems with compression in
> the ZIO pipeline that may contribute to the bursty behavior.
Can you give some more info on what these problems are.

I''m specifically interested in wither or not crypto will end up with 
similar issues.  Also because of the special nature of compression in 
the ZIO pipeline I had make a small modification[1] the compress part of 
the pipeline for crypto to run for the write stages.

[1] see: 
http://src.opensolaris.org/source/xref/zfs-crypto/zfs-crypto-gate/usr/src/uts/common/fs/zfs/zio.c#1053

-- 
Darren J Moffat

Adam Leventhal

2007-May-09 16:50 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat
wrote:> Can you give some more info on what these problems are.
I was thinking of this bug:

  6460622 zio_nowait() doesn''t live up to its name

Which was surprised to find was fixed by Eric in build 59.

Adam

-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl

Bart Smaalders

2007-May-09 17:17 UTC

head link

[zfs-discuss] Re: Re: gzip compression throttles system?

Adam Leventhal wrote:> On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote:
>> Can you give some more info on what these problems are.
> 
> I was thinking of this bug:
> 
>   6460622 zio_nowait() doesn''t live up to its name
> 
> Which was surprised to find was fixed by Eric in build 59.
> 
> Adam
> 
It was pointed out by J?rgen Keil that using ZFS compression
submits a lot of prio 60 tasks to the system task queues;
this would clobber interactive performance.

- Bart


-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts
-------------- next part --------------
An embedded message was scrubbed...
From: =?UTF-8?Q?J=C3=BCrgen_Keil?= <jk at tools.de>
Subject: [zfs-discuss] Re:  Re: Re: gzip compression throttles system?
Date: Mon, 07 May 2007 05:02:36 -0700 (PDT)
Size: 13888
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070509/a3577a74/attachment.mht>

Jürgen Keil

2007-May-10 17:10 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

Bart wrote:> Adam Leventhal wrote:
> > On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote:
> >> Can you give some more info on what these problems are.
> > 
> > I was thinking of this bug:
> > 
> >   6460622 zio_nowait() doesn''t live up to its name
> > 
> > Which was surprised to find was fixed by Eric in build 59.
> > 
> 
> It was pointed out by J?rgen Keil that using ZFS compression
> submits a lot of prio 60 tasks to the system task queues;
> this would clobber interactive performance.
Actually the taskq "spa_zio_issue" / "spa_zio_intr" run at
prio 99 (== maxclsyspri or MAXCLSYSPRI):

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109

Btw: In one experiment I tried to boot the kernel under kmdb
control (-kd), patched "minclsyspri := 61" and used a
breakpoint inside spa_active() to patch the spa_zio_* taskq
to use prio 60 when importing the gzip compressed pool
(so that the gzip compressed pool was using prio 60 threads
and usb and other stuff was using prio >= 61 threads).
That didn''t help interactive performance...

This message posted from opensolaris.org

Bill Sommerfeld

2007-May-10 18:54 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

On Thu, 2007-05-10 at 10:10 -0700, J??rgen Keil wrote:> Btw: In one experiment I tried to boot the kernel under kmdb
> control (-kd), patched "minclsyspri := 61" and used a
> breakpoint inside spa_active() to patch the spa_zio_* taskq
> to use prio 60 when importing the gzip compressed pool
> (so that the gzip compressed pool was using prio 60 threads
> and usb and other stuff was using prio >= 61 threads).
> That didn''t help interactive performance...
oops.  sounds like cpu-intensive compression (and encryption/decryption
once that''s upon us) should ideally be handed off to worker threads
that
compete on a "fair" footing with compute-intensive userspace threads,
or
(better yet) are scheduled like the thread which initiated the I/O.

						- Bill

Darren J Moffat

2007-May-11 09:01 UTC

head link

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

Bill Sommerfeld wrote:> On Thu, 2007-05-10 at 10:10 -0700, J??rgen Keil wrote:
>> Btw: In one experiment I tried to boot the kernel under kmdb
>> control (-kd), patched "minclsyspri := 61" and used a
>> breakpoint inside spa_active() to patch the spa_zio_* taskq
>> to use prio 60 when importing the gzip compressed pool
>> (so that the gzip compressed pool was using prio 60 threads
>> and usb and other stuff was using prio >= 61 threads).
>> That didn''t help interactive performance...
> 
> oops.  sounds like cpu-intensive compression (and encryption/decryption
> once that''s upon us) should ideally be handed off to worker
threads that
> compete on a "fair" footing with compute-intensive userspace
threads, or
> (better yet) are scheduled like the thread which initiated the I/O.
This will be different with encryption.  The crypto framework already 
tries to do "fair" scheduling, it can be called in sync and async
mode.
  We use per provider taskqs for hardware and for async software 
requests we have a taskq per cpu that can have scheduling priority/class 
set on it by putting the svc://system/cryptosvc service into an 
appropriate project.

I haven''t done any performance testing of crypto yet so I
don''t know how
it will work in this case but we do know that the current method works 
well for networking.

-- 
Darren J Moffat

eric kustarz

2007-Jul-28 00:27 UTC

head link

[zfs-discuss] gzip compression throttles system?

I''ve filed:
6586537 async zio taskqs can block out userland commands

to track this issue.

eric
 
 
This message posted from opensolaris.org

David Gwynne

2008-Nov-04 15:17 UTC

head link

[zfs-discuss] gzip compression throttles system?

On 11/05/2007, at 4:54 AM, Bill Sommerfeld wrote:
> On Thu, 2007-05-10 at 10:10 -0700, J?rgen Keil wrote:
>> Btw: In one experiment I tried to boot the kernel under kmdb
>> control (-kd), patched "minclsyspri := 61" and used a
>> breakpoint inside spa_active() to patch the spa_zio_* taskq
>> to use prio 60 when importing the gzip compressed pool
>> (so that the gzip compressed pool was using prio 60 threads
>> and usb and other stuff was using prio >= 61 threads).
>> That didn''t help interactive performance...
>
> oops.  sounds like cpu-intensive compression (and encryption/ 
> decryption
> once that''s upon us) should ideally be handed off to worker
threads
> that
> compete on a "fair" footing with compute-intensive userspace  
> threads, or
> (better yet) are scheduled like the thread which initiated the I/O.
i just installed s10u6 which now has gzip compression on zfs datasets.  
very exciting, my samfs disk vsn can store a bit more now. however,  
despite being able to store more it does lock up a lot while it does  
the compression.

perhaps doing the compression at a different point in the io pipeline  
would be better. at the moment it is done when zfs decides to flush  
the buffers to disk, which ends up being a lot of work that needs to  
be done in a very short time. perhaps you can amortize that cost by  
doing it when the data from userland makes it into the kernel. another  
idea could be doing the compression when you reach a relatively low  
threshold of uncompressed data in the cache. ie, as soon as you get  
1MB of data in the cache, compress it then, rather than waiting till  
you have 200MB of data in the cache that needs to be compressed RIGHT  
NOW.

either way, my opinion is you want to do all the compression work  
spread over a longer period of time, rather than all at once when you  
flush the cache to the spindles. i can tolerate 90% idle cpus. i cant  
tolerate cpus that are 98% idle except for every 5 seconds when theyre  
0% idle for 2 seconds and i cant even see keypresses on a serial  
console cos there''s no time for userland to run in.

dlg

Bob Friesenhahn

2008-Nov-04 17:27 UTC

head link

[zfs-discuss] gzip compression throttles system?

On Wed, 5 Nov 2008, David Gwynne wrote:
> be done in a very short time. perhaps you can amortize that cost by
> doing it when the data from userland makes it into the kernel. another
> idea could be doing the compression when you reach a relatively low
> threshold of uncompressed data in the cache. ie, as soon as you get
> 1MB of data in the cache, compress it then, rather than waiting till
> you have 200MB of data in the cache that needs to be compressed RIGHT
> NOW.
This is counter-productive.  ZFS''s lazy compression approach ends up 
doing a lot less compression in the common case where files are 
updated multiple times before ZFS decides to write to disk.  If your 
advice is followed, then every write will involve compression, rather 
than the summation of perhaps thousands of writes.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

David Gwynne

2008-Nov-04 21:55 UTC

head link

[zfs-discuss] gzip compression throttles system?

On 05/11/2008, at 3:27 AM, Bob Friesenhahn wrote:
> On Wed, 5 Nov 2008, David Gwynne wrote:
>
>> be done in a very short time. perhaps you can amortize that cost by
>> doing it when the data from userland makes it into the kernel.  
>> another
>> idea could be doing the compression when you reach a relatively low
>> threshold of uncompressed data in the cache. ie, as soon as you get
>> 1MB of data in the cache, compress it then, rather than waiting till
>> you have 200MB of data in the cache that needs to be compressed RIGHT
>> NOW.
>
> This is counter-productive.  ZFS''s lazy compression approach ends
up
> doing a lot less compression in the common case where files are  
> updated multiple times before ZFS decides to write to disk.  If your  
> advice is followed, then every write will involve compression,  
> rather than the summation of perhaps thousands of writes.
when the compression happens can be an arbitrary line in the sand. on  
one side of the line you''re mitigating compression per write by  
deferring the compression as long as possible, and on the other side  
you want to spread the cost of compression out over time to avoid  
locking up the machine by doing lots of little compressions more often.

this is a similar trade off zfs makes with its cache and its flush  
interval.

the flush interval is adjustable in zfs, why cant the "compression  
interval/watermark" be invented and made adjustable as well?

dlg
>
>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>

Ian Collins

2008-Nov-05 04:22 UTC

head link

[zfs-discuss] gzip compression throttles system?

Bob Friesenhahn wrote:> On Wed, 5 Nov 2008, David Gwynne wrote:
>
>   
>> be done in a very short time. perhaps you can amortize that cost by
>> doing it when the data from userland makes it into the kernel. another
>> idea could be doing the compression when you reach a relatively low
>> threshold of uncompressed data in the cache. ie, as soon as you get
>> 1MB of data in the cache, compress it then, rather than waiting till
>> you have 200MB of data in the cache that needs to be compressed RIGHT
>> NOW.
>>     
>
> This is counter-productive.  ZFS''s lazy compression approach ends
up
> doing a lot less compression in the common case where files are 
> updated multiple times before ZFS decides to write to disk.  If your 
> advice is followed, then every write will involve compression, rather 
> than the summation of perhaps thousands of writes.
>
>   But gzip has a significant impact when doing a zfs receive. It would be
interesting to see how an amortised compression scheme would work in
this case.  Currently writing to a filesystem with gzip compression
takes more than twice the time than a to one with lzjb compression on a
quiet x4540.  There isn''t any noticeable difference between lzjb and no
compression.

-- 
Ian.

David Gwynne

2008-Nov-05 05:14 UTC

head link

[zfs-discuss] gzip compression throttles system?

On 05/11/2008, at 2:22 PM, Ian Collins wrote:
> Bob Friesenhahn wrote:
>> On Wed, 5 Nov 2008, David Gwynne wrote:
>>
>>
>>> be done in a very short time. perhaps you can amortize that cost by
>>> doing it when the data from userland makes it into the kernel.  
>>> another
>>> idea could be doing the compression when you reach a relatively low
>>> threshold of uncompressed data in the cache. ie, as soon as you get
>>> 1MB of data in the cache, compress it then, rather than waiting
till
>>> you have 200MB of data in the cache that needs to be compressed  
>>> RIGHT
>>> NOW.
>>>
>>
>> This is counter-productive.  ZFS''s lazy compression approach
ends up
>> doing a lot less compression in the common case where files are
>> updated multiple times before ZFS decides to write to disk.  If your
>> advice is followed, then every write will involve compression, rather
>> than the summation of perhaps thousands of writes.
>>
>>
> But gzip has a significant impact when doing a zfs receive. It would  
> be
> interesting to see how an amortised compression scheme would work in
> this case.  Currently writing to a filesystem with gzip compression
> takes more than twice the time than a to one with lzjb compression  
> on a
> quiet x4540.  There isn''t any noticeable difference between lzjb
and
> no
> compression.
i dont think lzjb needs big memory allocations like gzip does. those  
memory allocs cause a ton of xcals on a system, which cant be good for  
interactivity.

dlg

Reasonably Related Threads

Search for more seemingly similar threads

zfs discuss - May 2007 - gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] Re: gzip compression throttles system?

[zfs-discuss] Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] Re: Re: Re: gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

[zfs-discuss] gzip compression throttles system?

Reasonably Related Threads