I just had a quick play with gzip compression on a filesystem and the result was the machine grinding to a halt while copying some large (.wav) files to it from another filesystem in the same pool. The system became very unresponsive, taking several seconds to echo keystrokes. The box is a maxed out AMD QuadFX, so it should have plenty of grunt for this. Comments? Ian
Ian Collins wrote:> I just had a quick play with gzip compression on a filesystem and the > result was the machine grinding to a halt while copying some large > (.wav) files to it from another filesystem in the same pool. > > The system became very unresponsive, taking several seconds to echo > keystrokes. The box is a maxed out AMD QuadFX, so it should have plenty > of grunt for this. > > Comments? > > IanHow big were the files, what OS build are you running and how much memory on the system? Were you copying in parallel? - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Bart Smaalders wrote:> Ian Collins wrote: >> I just had a quick play with gzip compression on a filesystem and the >> result was the machine grinding to a halt while copying some large >> (.wav) files to it from another filesystem in the same pool. >> >> The system became very unresponsive, taking several seconds to echo >> keystrokes. The box is a maxed out AMD QuadFX, so it should have plenty >> of grunt for this. >> >> Comments? >> >> Ian > > How big were the files, what OS build are you running and how > much memory on the system? Were you copying in parallel? >quadfx> uname -a SunOS quadfx 5.11 snv_62 i86pc i386 i86pc quadfx> psrinfo -v Status of virtual processor 0 as of: 05/03/2007 14:35:05 on-line since 04/29/2007 22:18:57. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 05/03/2007 14:35:05 on-line since 04/29/2007 22:19:04. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 05/03/2007 14:35:05 on-line since 04/29/2007 22:19:04. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 05/03/2007 14:35:05 on-line since 04/29/2007 22:19:04. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. The system has 8MB of RAM and a was using ''cp -r'' to copy the directory. The files are between 15 and 50MB. It''s worth pointing out that .wav files only compress by a few percent. Ian
On 5/3/07, Ian Collins <ian at ianshome.com> wrote:> > The system has 8MB of RAM and a was using ''cp -r'' to copy the directory.Hm, that would be a record breaking system. Did you mean 8 GB ? -- Regards, Cyril
Cyril Plisko wrote:> On 5/3/07, Ian Collins <ian at ianshome.com> wrote: >> >> The system has 8MB of RAM and a was using ''cp -r'' to copy the directory. > > Hm, that would be a record breaking system. Did you mean 8 GB ? >Oops.... Yes.
On May 2, 2007, at 10:36 PM, Ian Collins wrote:> The files are between 15 and 50MB. It''s worth pointing out that .wav > files only compress by a few percent.Not entirely related to your maxed CPU problem, but.... gzip on PCM audio isn''t, as you point out, going to earn you much of a compression ratio. It''s also very very slow on this kind of data. If you''re looking to compress wav or other PCM-based audio formats for storage space reasons, use FLAC. It''s guaranteed that you''ll get a far better compress ratio and a quicker result for your troubles than you will with gzip. There are many technical reasons for this, but generally, FLAC knows about audio down to the sample and is geared for the properties of PCM audio. gzip just sees it as a generic data blob like any other which contributes to its inefficiencies in this case. The downside is that, well, it''s not an option on the ZFS level, but you don''t necessarily have to pre-decompress your FLAC-compressed WAV file in order to listen to them :) Hmm... a FLAC-based compression mech in ZFS for efficient (and lossless) PCM audio storage... /dale
Dale Ghent wrote:> On May 2, 2007, at 10:36 PM, Ian Collins wrote: > >> The files are between 15 and 50MB. It''s worth pointing out that .wav >> files only compress by a few percent. > > Not entirely related to your maxed CPU problem, but.... >I don''t think it was a maxed CPU problem, only one core was loaded and the prstat numbers I could get (the reporting period was erratic) didn''t show anything nasty.> gzip on PCM audio isn''t, as you point out, going to earn you much of a > compression ratio. It''s also very very slow on this kind of data. If > you''re looking to compress wav or other PCM-based audio formats for > storage space reasons, use FLAC.I was looking at some mixed content, just an experiment. Ian
Ian, On 5/3/07, Ian Collins <ian at ianshome.com> wrote:> I don''t think it was a maxed CPU problem, only one core was loaded and > the prstat numbers I could get (the reporting period was erratic) didn''t > show anything nasty.Do you have the output of ''mpstat 5''? -- Just me, Wire ... Blog: <prstat.blogspot.com>
> I just had a quick play with gzip compression on a filesystem and the > result was the machine grinding to a halt while copying some large > (.wav) files to it from another filesystem in the same pool. > > The system became very unresponsive, taking several seconds to echo > keystrokes. The box is a maxed out AMD QuadFX, so it should have plenty > of grunt for this.I''ve observed the same behavior. With my test I''ve used a zpool created on a 1GB file (on an UFS filesystem): # mkfile 1G /var/tmp/vdev_for_gzip_pool # zpool create gzip_pool /var/tmp/vdev_for_gzip_pool # zfs set compression=gzip gzip_pool # chown jk /gzip_pool Now, when I run this command... % dd bs=128k if=/dev/urandom of=/gzip_pool/junk ... the mouse cursor sometimes is frozen for two (or more) seconds. Same with keyboard input. This is on an amd64 x2 box, 4gb memory, and usb keyboard and usb mouse. Lots of system cpu time is used while the gzip compressed poll is filled: % mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 47 1 122 646 316 317 13 24 8 2 1024 3 3 0 94 1 41 1 159 334 101 279 11 24 8 2 900 2 3 0 94 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 1 0 6860 7263 282 73 2 2 78 1 64 0 70 0 30 1 0 0 6866 6870 10 4 9 1 85 0 121 0 100 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 6 576 301 465 3 19 18 146 126 1 5 0 95 1 0 0 36 1471 1276 410 29 20 24 115 335 0 59 0 41 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 5404 5823 309 113 2 2 57 1 420 0 56 0 44 1 0 0 5409 5431 13 5 12 1 80 1 179 0 100 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1 529 300 281 17 13 10 103 274 0 64 0 36 1 0 0 9 1348 1169 452 8 23 14 105 163 0 5 0 95 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 6186 6607 282 62 8 9 55 11 123 0 88 0 12 1 0 0 6196 6259 53 78 4 3 80 12 132 0 75 0 25 A kernel profile seems to show that the kernel is busy with gzip''ing (and busy with computing SHA1 hashes?): # lockstat -kIW -D 20 sleep 20 Profiling interrupt: 3882 events in 20.021 seconds (194 events/sec) Count indv cuml rcnt nsec Hottest CPU+PIL Caller ------------------------------------------------------------------------------- 1802 46% 46% 0.00 1931 cpu[0] mach_cpu_idle 517 13% 60% 0.00 6178 cpu[1] SHA1Transform 482 12% 72% 0.00 1094 cpu[0] deflate_slow 328 8% 81% 0.00 1111 cpu[1] longest_match 104 3% 83% 0.00 940 cpu[0] fill_window 98 3% 86% 0.00 47357 cpu[1] bcopy 65 2% 87% 0.00 5438 cpu[1] SHA1Update 63 2% 89% 0.00 834 cpu[1] bzero 50 1% 90% 0.00 1042 cpu[0] pqdownheap 44 1% 92% 0.00 676 cpu[1] Encode 32 1% 92% 0.00 1136 cpu[0] copy_block 24 1% 93% 0.00 1214 cpu[1] do_copy_fault_nta 23 1% 94% 0.00 401205 cpu[0] do_splx 23 1% 94% 0.00 644 cpu[1] hmac_encr 22 1% 95% 0.00 1058 cpu[1] z_adler32 19 0% 95% 0.00 1208 cpu[0]+10 todpc_rtcget 16 0% 96% 0.00 752 cpu[0] SHA1Final 14 0% 96% 0.00 1186 cpu[1] mutex_enter 12 0% 96% 0.00 642 cpu[1] kcopy 11 0% 97% 0.00 948 cpu[0] page_create_va ------------------------------------------------------------------------------- This message posted from opensolaris.org
Darren J Moffat
2007-May-03 10:50 UTC
[zfs-discuss] Re: gzip compression throttles system?
J?rgen Keil wrote:> A kernel profile seems to show that the kernel is busy with gzip''ing > (and busy with computing SHA1 hashes?): > > # lockstat -kIW -D 20 sleep 20 > > Profiling interrupt: 3882 events in 20.021 seconds (194 events/sec) > > Count indv cuml rcnt nsec Hottest CPU+PIL Caller > ------------------------------------------------------------------------------- > 1802 46% 46% 0.00 1931 cpu[0] mach_cpu_idle > 517 13% 60% 0.00 6178 cpu[1] SHA1Transform> 65 2% 87% 0.00 5438 cpu[1] SHA1Update> 16 0% 96% 0.00 752 cpu[0] SHA1FinalThe reason you are busy computing SHA1 hashes is you are using /dev/urandom. The implementation of drv/random uses SHA1 for mixing, actually strictly speaking it is the swrand provider that does that part. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/crypto/io/swrand.c -- Darren J Moffat
Jürgen Keil
2007-May-03 11:53 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
> The reason you are busy computing SHA1 hashes is you are using > /dev/urandom. The implementation of drv/random uses > SHA1 for mixing, > actually strictly speaking it is the swrand provider that does that part.Ahh, ok. So, instead of using dd reading from /dev/urandom all the time, I''ve now used this quick C program to write one /dev/urandom block over and over to the gzip compressed zpool: ================================#include <stdio.h> #include <stdlib.h> #include <fcntl.h> int main(int argc, char **argv) { int fd; char buf[128*1024]; fd = open("/dev/urandom", O_RDONLY); if (fd < 0) { perror("open /dev/urandom"); exit(1); } if (read(fd, buf, sizeof(buf)) != sizeof(buf)) { perror("fill buf from /dev/urandom"); exit(1); } close(fd); fd = open(argv[1], O_WRONLY|O_CREAT, 0666); if (fd < 0) { perror(argv[1]); exit(1); } for (;;) { if (write(fd, buf, sizeof(buf)) != sizeof(buf)) { break; } } close(fd); exit(0); } ================================ Avoiding the reads from /dev/urandom makes the effect even more noticeable, the machine now "freezes" for 10+ seconds. CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17 # lockstat -kIW -D 20 sleep 30 Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec) Count indv cuml rcnt nsec Hottest CPU+PIL Caller ------------------------------------------------------------------------------- 2068 34% 34% 0.00 1767 cpu[0] deflate_slow 1506 25% 59% 0.00 1721 cpu[1] longest_match 1017 17% 76% 0.00 1833 cpu[1] mach_cpu_idle 454 7% 83% 0.00 1539 cpu[0] fill_window 215 4% 87% 0.00 1788 cpu[1] pqdownheap 152 2% 89% 0.00 1691 cpu[0] copy_block 89 1% 90% 0.00 1839 cpu[1] z_adler32 77 1% 92% 0.00 36067 cpu[1] do_splx 64 1% 93% 0.00 2090 cpu[0] bzero 62 1% 94% 0.00 2082 cpu[0] do_copy_fault_nta 48 1% 95% 0.00 1976 cpu[0] bcopy 41 1% 95% 0.00 62913 cpu[0] mutex_enter 27 0% 96% 0.00 1862 cpu[1] build_tree 19 0% 96% 0.00 1771 cpu[1] gen_bitlen 17 0% 96% 0.00 1744 cpu[0] bi_reverse 15 0% 97% 0.00 1783 cpu[0] page_create_va 15 0% 97% 0.00 1406 cpu[1] fletcher_2_native 14 0% 97% 0.00 1778 cpu[1] gen_codes 11 0% 97% 0.00 912 cpu[1]+6 ddi_mem_put8 5 0% 97% 0.00 3854 cpu[1] fsflush_do_pages ------------------------------------------------------------------------------- It seems the same problem can be observed with "lzjb" compression, but the pauses with lzjb are much shorter and the kernel consumes less system cpu time with "lzjb" (which is expected, I think). This message posted from opensolaris.org
Frank Hofmann
2007-May-03 12:00 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
I''m not quite sure what this test should show ? Compressing random data is the perfect way to generate heat. After all, compression working relies on input entropy being low. But good random generators are characterized by the opposite - output entropy being high. Even a good compressor, if operated on a good random generator''s output, will only end up burning cycles, but not reducing the data size. Hence, is the request here for the compressor module to ''adapt'', kind of first-pass check the input data whether it''s sufficiently low-entropy to warrant a compression attempt ? If not, then what ? FrankH. On Thu, 3 May 2007, J?rgen Keil wrote:>> The reason you are busy computing SHA1 hashes is you are using >> /dev/urandom. The implementation of drv/random uses >> SHA1 for mixing, >> actually strictly speaking it is the swrand provider that does that part. > > Ahh, ok. > > So, instead of using dd reading from /dev/urandom all the time, > I''ve now used this quick C program to write one /dev/urandom block > over and over to the gzip compressed zpool: > > ================================> #include <stdio.h> > #include <stdlib.h> > #include <fcntl.h> > > int > main(int argc, char **argv) > { > int fd; > char buf[128*1024]; > > fd = open("/dev/urandom", O_RDONLY); > if (fd < 0) { > perror("open /dev/urandom"); > exit(1); > } > if (read(fd, buf, sizeof(buf)) != sizeof(buf)) { > perror("fill buf from /dev/urandom"); > exit(1); > } > close(fd); > fd = open(argv[1], O_WRONLY|O_CREAT, 0666); > if (fd < 0) { > perror(argv[1]); > exit(1); > } > for (;;) { > if (write(fd, buf, sizeof(buf)) != sizeof(buf)) { > break; > } > } > close(fd); > exit(0); > } > ================================> > > Avoiding the reads from /dev/urandom makes the effect even > more noticeable, the machine now "freezes" for 10+ seconds. > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15 > 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15 > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17 > 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17 > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19 > 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17 > > # lockstat -kIW -D 20 sleep 30 > > Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec) > > Count indv cuml rcnt nsec Hottest CPU+PIL Caller > ------------------------------------------------------------------------------- > 2068 34% 34% 0.00 1767 cpu[0] deflate_slow > 1506 25% 59% 0.00 1721 cpu[1] longest_match > 1017 17% 76% 0.00 1833 cpu[1] mach_cpu_idle > 454 7% 83% 0.00 1539 cpu[0] fill_window > 215 4% 87% 0.00 1788 cpu[1] pqdownheap > 152 2% 89% 0.00 1691 cpu[0] copy_block > 89 1% 90% 0.00 1839 cpu[1] z_adler32 > 77 1% 92% 0.00 36067 cpu[1] do_splx > 64 1% 93% 0.00 2090 cpu[0] bzero > 62 1% 94% 0.00 2082 cpu[0] do_copy_fault_nta > 48 1% 95% 0.00 1976 cpu[0] bcopy > 41 1% 95% 0.00 62913 cpu[0] mutex_enter > 27 0% 96% 0.00 1862 cpu[1] build_tree > 19 0% 96% 0.00 1771 cpu[1] gen_bitlen > 17 0% 96% 0.00 1744 cpu[0] bi_reverse > 15 0% 97% 0.00 1783 cpu[0] page_create_va > 15 0% 97% 0.00 1406 cpu[1] fletcher_2_native > 14 0% 97% 0.00 1778 cpu[1] gen_codes > 11 0% 97% 0.00 912 cpu[1]+6 ddi_mem_put8 > 5 0% 97% 0.00 3854 cpu[1] fsflush_do_pages > ------------------------------------------------------------------------------- > > > It seems the same problem can be observed with "lzjb" compression, > but the pauses with lzjb are much shorter and the kernel consumes > less system cpu time with "lzjb" (which is expected, I think). > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 5/3/07, Frank Hofmann <Frank.Hofmann at sun.com> wrote:> I''m not quite sure what this test should show ?I didn''t try the test myself... but I think what it shows is a possible problem that turning compression can hang a machine. Rayson> > Compressing random data is the perfect way to generate heat. > After all, compression working relies on input entropy being low. > But good random generators are characterized by the opposite - output > entropy being high. > Even a good compressor, if operated on a good random generator''s output, > will only end up burning cycles, but not reducing the data size. > > Hence, is the request here for the compressor module to ''adapt'', kind of > first-pass check the input data whether it''s sufficiently low-entropy to > warrant a compression attempt ? > > If not, then what ? > > FrankH. >
John Malone (At Home)
2007-May-03 13:21 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
I''ve also noticed brief hangs on my f4k laptop. build 63 with / on a zfs pool. I was taring up /opt and putting it on a seperate zfspool. This second zfspool had compression set to gzip and the machine would freeze for long periods of time. Setting compression to gzip-9 was hardest on the machine while setting it to gzip (no compression level specified) appeared better but still had hangs. I don''t have any hard numbers though but I can provide some if needed. All the pools are on the one physical laptop harddrive. When I was able to get some keystrokes in the load was at 18 at one point. Killing the tar process free''d things up straightaway. john Rayson Ho wrote:> On 5/3/07, Frank Hofmann <Frank.Hofmann at sun.com> wrote: >> I''m not quite sure what this test should show ? > > I didn''t try the test myself... but I think what it shows is a > possible problem that turning compression can hang a machine. > > Rayson > > > >> >> Compressing random data is the perfect way to generate heat. >> After all, compression working relies on input entropy being low. >> But good random generators are characterized by the opposite - output >> entropy being high. >> Even a good compressor, if operated on a good random generator''s output, >> will only end up burning cycles, but not reducing the data size. >> >> Hence, is the request here for the compressor module to ''adapt'', kind of >> first-pass check the input data whether it''s sufficiently low-entropy to >> warrant a compression attempt ? >> >> If not, then what ? >> >> FrankH. >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch Bourbonnais
2007-May-03 16:35 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
with recent bits ZFS compression is now handled concurrently with many CPUs working on different records. So this load will burn more CPUs and acheive it''s results (compression) faster. So the observed pauses should be consistent with that of a load generating high system time. The assumption is that compression now goes faster than when is was single threaded. Is this undesirable ? We might seek a way to slow down compression in order to limit the system load. -r Le 3 mai 07 ? 14:21, Rayson Ho a ?crit :> On 5/3/07, Frank Hofmann <Frank.Hofmann at sun.com> wrote: >> I''m not quite sure what this test should show ? > > I didn''t try the test myself... but I think what it shows is a > possible problem that turning compression can hang a machine. > > Rayson > > > >> >> Compressing random data is the perfect way to generate heat. >> After all, compression working relies on input entropy being low. >> But good random generators are characterized by the opposite - output >> entropy being high. >> Even a good compressor, if operated on a good random generator''s >> output, >> will only end up burning cycles, but not reducing the data size. >> >> Hence, is the request here for the compressor module to ''adapt'', >> kind of >> first-pass check the input data whether it''s sufficiently low- >> entropy to >> warrant a compression attempt ? >> >> If not, then what ? >> >> FrankH. >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Wade.Stuart at fallon.com
2007-May-03 16:43 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
zfs-discuss-bounces at opensolaris.org wrote on 05/03/2007 11:35:24 AM:> > with recent bits ZFS compression is now handled concurrently with > many CPUs working on different records. > So this load will burn more CPUs and acheive it''s results > (compression) faster. > > So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded.I think this may be a premature leap -- It is still undetermined if we are running up against a yet unknown bug in the kernel implementation of gzip used for this compression type. From my understanding the gzip code has been reused from an older kernel implementation, it may be possible that this code has some issues with kernel stuttering when used for zfs compression that may have not been exposed with its original usage. If it turns out that it is just a case of high cpu trade-off for buying faster compression times, then the talk of a tunable may make sense (if it is even possible given the constraints of the gzip code in kernelspace). -Wade
Jürgen Keil
2007-May-03 17:04 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> I''m not quite sure what this test should show ?For me, the test shows how writing to a gzip compressed pool completely kills interactive desktop performance. At least when using an usb keyboard and mouse. (I''ve not yet tested with a ps/2 keyboard & mouse; or a SPARC box)> Compressing random data is the perfect way to generate heat. > After all, compression working relies on input entropy being low. > But good random generators are characterized by the opposite - output > entropy being high. Even a good compressor, if operated on a good random > generator''s output, will only end up burning cycles, but not reducing the > data size.Whatever I write to the gzip compressed pool (128K of /dev/urandom random data, or 128K of a buffer filled with completely with **** characters, or the first 128K from /etc/termcap), the Xorg / Gnome desktop becomes completely unusable while writing to such a gzip compressed zpool / zfs. With an "lzjb" compressed zpool / zfs the system remains more or less usable...> Hence, is the request here for the compressor module > to ''adapt'', kind of first-pass check the input data whether it''s > sufficiently low-entropy to warrant a compression attempt ? > > If not, then what ?I''m not yet sure what the problem is. But it sure would be nice if a gzip compressed zpool / zfs wouldn''t kill interactive desktop performance as is does now. This message posted from opensolaris.org
Ian Collins
2007-May-03 20:20 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
Roch Bourbonnais wrote:> > with recent bits ZFS compression is now handled concurrently with many > CPUs working on different records. > So this load will burn more CPUs and acheive it''s results > (compression) faster. >Would changing (selecting a smaller) filesystem record size have any effect?> So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded. > > Is this undesirable ? We might seek a way to slow down compression in > order to limit the system load. >I think you should, otherwise we have a performance throttle that scales with the number of cores! Ian
johansen-osdev at sun.com
2007-May-03 20:21 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
A couple more questions here. [mpstat]> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15 > 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15 > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17 > 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17 > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19 > 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17 > > # lockstat -kIW -D 20 sleep 30 > > Profiling interrupt: 6080 events in 31.341 seconds (194 events/sec) > > Count indv cuml rcnt nsec Hottest CPU+PIL Caller > ------------------------------------------------------------------------------- > 2068 34% 34% 0.00 1767 cpu[0] deflate_slow > 1506 25% 59% 0.00 1721 cpu[1] longest_match > 1017 17% 76% 0.00 1833 cpu[1] mach_cpu_idle > 454 7% 83% 0.00 1539 cpu[0] fill_window > 215 4% 87% 0.00 1788 cpu[1] pqdownheap<snip> What do you have zfs compresison set to? The gzip level is tunable, according to zfs set, anyway: PROPERTY EDIT INHERIT VALUES compression YES YES on | off | lzjb | gzip | gzip-[1-9] You still have idle time in this lockstat (and mpstat). What do you get for a lockstat -A -D 20 sleep 30? Do you see anyone with long lock hold times, long sleeps, or excessive spinning? The largest numbers from mpstat are for interrupts and cross calls. What does intrstat(1M) show? Have you run dtrace to determine the most frequent cross-callers? #!/usr/sbin/dtrace -s sysinfo:::xcalls { @a[stack(30)] = count(); } END { trunc(@a, 30); } is an easy way to do this. -j
Erblichs wrote:> So, my first order would be to take 1GB or 10GB .wav files > AND time both the kernel implementation of Gzip and the > user application. Approx the same times MAY indicate > that the kernel implementation gzip funcs should > be treatedly maybe more as interactive scheduling > threads and that it is too high and blocks other > threads or proces from executing.If you just run gzip(1) against the files you are operating on the whole file so you only incur startup costs once and are thus doing quite a different compression to operating on a block level. A fairer comparison would be to build a userland program that compresses and then writes to disk in ZFS blocksize chunks, that way you are compressing the same sizes of data and doing the startup every time just like zio has to do. -- Darren J Moffat
Darren Moffat, Yes and no. A earlier statement within this discussion was whether gzip is appropriate for .wav files. This just gets a relative time to compress. And relative sizes of the files after the compression. My assumption is that gzip will run as a user app in one environment. The normal r/w sys calls then take a user buffer. So, it would be hard to believe that the .wav file won''t be read one user buff at at time. Yes, it could be mmap''ed, but then it would have to be unmapped. Too many sys calls, I think for the app. Sorry, haven''t looked at it for awhile.. Overall, I am just trying to guess at the read-ahead delay versus the user buffer versus the internal fs. The internal FS should take it basicly one FS block at a time (or do multiple blocks in parallel) and the user app takes it anywhere from one buffer to one page size, 8k at a time. So, due to reading one buffer at a time in a loop, with a context switch from kernel to user each time. Thus, I would expect that the gzip app would be sloooower. So, my first step is to keep it simple (KISS)and tell the group "what happens if" we do this simple comparison? And how many bytes/sec is compressed? And are they approx the same speed? Do you end up with the same size file???? Mitchell Erblich ------------------ Darren J Moffat wrote:> > Erblichs wrote: > > So, my first order would be to take 1GB or 10GB .wav files > > AND time both the kernel implementation of Gzip and the > > user application. Approx the same times MAY indicate > > that the kernel implementation gzip funcs should > > be treatedly maybe more as interactive scheduling > > threads and that it is too high and blocks other > > threads or proces from executing. > > If you just run gzip(1) against the files you are operating on the whole > file so you only incur startup costs once and are thus doing quite a > different compression to operating on a block level. A fairer > comparison would be to build a userland program that compresses and then > writes to disk in ZFS blocksize chunks, that way you are compressing the > same sizes of data and doing the startup every time just like zio has to do. > > -- > Darren J Moffat
Jürgen Keil
2007-May-04 10:32 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here....> What do you have zfs compresison set to? The gzip level is > tunable, according to zfs set, anyway: > > PROPERTY EDIT INHERIT VALUES > compression YES YES on | off | lzjb | gzip | gzip-[1-9]I''ve used the "default" gzip compression level, that is I used zfs set compression=gzip gzip_pool> You still have idle time in this lockstat (and mpstat). > > What do you get for a lockstat -A -D 20 sleep 30?# lockstat -A -D 20 /usr/tmp/fill /gzip_pool/junk lockstat: warning: 723388 aggregation drops on CPU 0 lockstat: warning: 239335 aggregation drops on CPU 1 lockstat: warning: 62366 aggregation drops on CPU 0 lockstat: warning: 51856 aggregation drops on CPU 1 lockstat: warning: 45187 aggregation drops on CPU 0 lockstat: warning: 46536 aggregation drops on CPU 1 lockstat: warning: 687832 aggregation drops on CPU 0 lockstat: warning: 575675 aggregation drops on CPU 1 lockstat: warning: 46504 aggregation drops on CPU 0 lockstat: warning: 40874 aggregation drops on CPU 1 lockstat: warning: 45571 aggregation drops on CPU 0 lockstat: warning: 33422 aggregation drops on CPU 1 lockstat: warning: 501063 aggregation drops on CPU 0 lockstat: warning: 361041 aggregation drops on CPU 1 lockstat: warning: 651 aggregation drops on CPU 0 lockstat: warning: 7011 aggregation drops on CPU 1 lockstat: warning: 61600 aggregation drops on CPU 0 lockstat: warning: 19386 aggregation drops on CPU 1 lockstat: warning: 566156 aggregation drops on CPU 0 lockstat: warning: 105502 aggregation drops on CPU 1 lockstat: warning: 25362 aggregation drops on CPU 0 lockstat: warning: 8700 aggregation drops on CPU 1 lockstat: warning: 585002 aggregation drops on CPU 0 lockstat: warning: 645299 aggregation drops on CPU 1 lockstat: warning: 237841 aggregation drops on CPU 0 lockstat: warning: 20931 aggregation drops on CPU 1 lockstat: warning: 320102 aggregation drops on CPU 0 lockstat: warning: 435898 aggregation drops on CPU 1 lockstat: warning: 115 dynamic variable drops with non-empty dirty list lockstat: warning: 385192 aggregation drops on CPU 0 lockstat: warning: 81833 aggregation drops on CPU 1 lockstat: warning: 259105 aggregation drops on CPU 0 lockstat: warning: 255812 aggregation drops on CPU 1 lockstat: warning: 486712 aggregation drops on CPU 0 lockstat: warning: 61607 aggregation drops on CPU 1 lockstat: warning: 1865 dynamic variable drops with non-empty dirty list lockstat: warning: 250425 aggregation drops on CPU 0 lockstat: warning: 171415 aggregation drops on CPU 1 lockstat: warning: 166277 aggregation drops on CPU 0 lockstat: warning: 74819 aggregation drops on CPU 1 lockstat: warning: 39342 aggregation drops on CPU 0 lockstat: warning: 3556 aggregation drops on CPU 1 lockstat: warning: ran out of data records (use -n for more) Adaptive mutex spin: 4701 events in 64.812 seconds (73 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 1726 37% 37% 0.00 2 vph_mutex+0x17e8 pvn_write_done+0x10c 1518 32% 69% 0.00 1 vph_mutex+0x17e8 hat_page_setattr+0x70 264 6% 75% 0.00 2 vph_mutex+0x2000 page_hashin+0xad 194 4% 79% 0.00 4 0xfffffffed2ee0a88 cv_wait+0x69 106 2% 81% 0.00 2 vph_mutex+0x2000 page_hashout+0xdd 91 2% 83% 0.00 4 0xfffffffed2ee0a88 taskq_dispatch+0x2c9 83 2% 85% 0.00 4 0xfffffffed2ee0a88 taskq_thread+0x1cb 83 2% 86% 0.00 1 0xfffffffec17a56b0 ufs_iodone+0x3d 47 1% 87% 0.00 4 0xfffffffec1e4ce98 vdev_queue_io+0x85 43 1% 88% 0.00 6 0xfffffffec139a2c0 trap+0xf66 38 1% 89% 0.00 6 0xfffffffecb5f8cd0 cv_wait+0x69 37 1% 90% 0.00 4 0xfffffffec143ee90 dmult_deque+0x36 26 1% 91% 0.00 2 htable_mutex+0x108 htable_release+0x79 26 1% 91% 0.00 1 0xfffffffec17a56b0 ufs_putpage+0xa4 18 0% 91% 0.00 4 0xfffffffec00dca48 ghd_intr+0xa8 17 0% 92% 0.00 2 0xfffffffec00dca48 ghd_waitq_delete+0x35 12 0% 92% 0.00 2 htable_mutex+0x248 htable_release+0x79 11 0% 92% 0.00 8 0xfffffffec1e4ce98 vdev_queue_io_done+0x3b 10 0% 93% 0.00 3 0xfffffffec00dca48 ghd_transport+0x71 10 0% 93% 0.00 2 0xffffff00077dc138 page_get_mnode_freelist+0xdb ------------------------------------------------------------------------------- Adaptive mutex block: 167 events in 64.812 seconds (3 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 78 47% 47% 0.00 31623 vph_mutex+0x17e8 pvn_write_done+0x10c 25 15% 62% 0.00 97897 0xfffffffed2ee0a88 cv_wait+0x69 13 8% 69% 0.00 70426 0xfffffffed2ee0a88 taskq_dispatch+0x2c9 10 6% 75% 0.00 30559 0xfffffffec17a56b0 ufs_iodone+0x3d 9 5% 81% 0.00 34569 0xfffffffec143ee90 dmult_deque+0x36 4 2% 83% 0.00 30579 0xfffffffec00dca48 ghd_waitq_delete+0x35 3 2% 85% 0.00 117726 0xfffffffec14365f0 ohci_hcdi_pipe_bulk_xfer+0x50 2 1% 86% 0.00 32324 pse_mutex+0x1a80 page_unlock+0x3b 1 1% 87% 0.00 381871279 ph_mutex+0xac0 page_create_va+0x334 1 1% 87% 0.00 34696 pse_mutex+0x1cc0 page_unlock+0x3b 1 1% 88% 0.00 29577 pse_mutex+0x1e80 page_unlock+0x3b 1 1% 89% 0.00 29324 pse_mutex+0x1580 page_unlock+0x3b 1 1% 89% 0.00 29029 pse_mutex+0x1400 page_unlock+0x3b 1 1% 90% 0.00 29806 pse_mutex+0x1700 page_unlock+0x3b 1 1% 90% 0.00 30873 pse_mutex+0x8c0 page_unlock+0x3b 1 1% 91% 0.00 34591 pse_mutex+0x700 page_unlock+0x3b 1 1% 92% 0.00 32196 pse_mutex+0x140 page_unlock+0x3b 1 1% 92% 0.00 31986 pse_mutex+0x440 page_unlock+0x3b 1 1% 93% 0.00 27500 pse_mutex+0xd00 page_unlock+0x3b 1 1% 93% 0.00 36400 pse_mutex+0xb00 page_unlock+0x3b ------------------------------------------------------------------------------- Spin lock spin: 252 events in 64.812 seconds (4 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 97 38% 38% 0.00 4 cpu0_disp disp_lock_enter+0x31 44 17% 56% 0.00 3 cpu0_disp disp_lock_enter_high+0x11 44 17% 73% 0.00 3 0xfffffffec1a82708 disp_lock_enter+0x31 36 14% 88% 0.00 3 xc_mbox_lock+0x11 mutex_vector_enter+0x4ee 16 6% 94% 0.00 4 0xfffffffec1a82708 disp_lock_enter_high+0x11 14 6% 100% 0.00 5 turnstile_table+0xd38 disp_lock_enter+0x31 1 0% 100% 0.00 4 turnstile_table+0xad8 disp_lock_enter+0x31 ------------------------------------------------------------------------------- R/W writer blocked by writer: 8281 events in 64.812 seconds (128 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 8281 100% 100% 0.00 20609451 0xfffffffec17a56a0 ufs_rwlock+0xfd ------------------------------------------------------------------------------- Adaptive mutex hold: 14476537 events in 64.812 seconds (223362 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 289666 2% 2% 0.00 1242 vph_mutex+0x2000 page_hashout+0xed 274484 2% 4% 0.00 1295 vph_mutex+0x2000 page_hashin+0xc6 256452 2% 6% 0.00 1359 vph_mutex+0x17e8 hat_page_setattr+0xbf 255344 2% 7% 0.00 1245 vph_mutex+0x17e8 pvn_write_done+0x14b 213470 1% 9% 0.00 1181 pcf+0x40 page_free+0x1ba 146344 1% 10% 0.00 4043 0xfffffffec14742a0 getblk_common+0x1e1 146344 1% 11% 0.00 1238 0xfffffffec14742a0 brelse+0x15d 131168 1% 12% 0.00 1274 0xfffffffec142fdf0 deltamap_add+0x1f5 106877 1% 13% 0.00 1277 0xfffffffec17a56b0 ufs_putpage+0xf8 64537 0% 13% 0.00 1283 0xfffffffec0208cc8 free_vpmap+0x85 64508 0% 13% 0.00 1293 0xfffffffec0208d60 free_vpmap+0x85 61387 0% 14% 0.00 4756 0xfffffffec0208d08 get_free_vpmap+0x357 61261 0% 14% 0.00 4640 0xfffffffec0208da0 get_free_vpmap+0x357 59765 0% 15% 0.00 1294 0xfffffffec0208da0 free_vpmap+0x85 59731 0% 15% 0.00 1251 0xfffffffec0208d08 free_vpmap+0x85 57656 0% 16% 0.00 4686 0xfffffffec0208cc8 get_free_vpmap+0x357 57655 0% 16% 0.00 4555 0xfffffffec0208d60 get_free_vpmap+0x357 49703 0% 16% 0.00 1270 0xffffff00077dd718 page_ctr_sub+0x80 48208 0% 17% 0.00 1273 0xffffff00077dd720 page_ctr_sub+0x80 47333 0% 17% 0.00 1276 0xffffff00077dd710 page_ctr_sub+0x80 ------------------------------------------------------------------------------- Spin lock hold: 1092671 events in 64.812 seconds (16859 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 508900 47% 47% 0.00 2965 xc_mbox_lock+0x11 mutex_vector_exit+0xf1 292686 27% 73% 0.00 1353 sleepq_head+0x12f8 disp_lock_exit+0x56 32656 3% 76% 0.00 1537 pcicfg_mutex+0x1 mutex_vector_exit+0xf1 24638 2% 79% 0.00 1555 cpu0_disp disp_lock_exit_high+0x34 21397 2% 81% 0.00 1580 0xfffffffec1a82708 disp_lock_exit_high+0x34 13948 1% 82% 0.00 1787 cpu[0]+0xe8 disp_lock_exit_high+0x34 12960 1% 83% 0.00 1742 hres_lock dtrace_hres_tick+0x55 12763 1% 84% 0.00 1943 cpu[1]+0xe8 disp_lock_exit_high+0x34 8280 1% 85% 0.00 6327 turnstile_table+0xd38 disp_lock_exit+0x56 8268 1% 86% 0.00 4912 turnstile_table+0xd38 disp_lock_exit_high+0x34 5670 1% 86% 0.00 5425 sleepq_head+0x1ab8 disp_lock_exit+0x56 5237 0% 87% 0.00 2626 cpu0_disp disp_lock_exit_nopreempt+0x44 4854 0% 87% 0.00 1204 sleepq_head+0x1eb8 disp_lock_exit+0x56 4843 0% 88% 0.00 1223 sleepq_head+0x1a08 disp_lock_exit+0x56 4671 0% 88% 0.00 1382 sleepq_head+0xf58 disp_lock_exit+0x56 4599 0% 88% 0.00 1272 sleepq_head+0x1f98 disp_lock_exit+0x56 4308 0% 89% 0.00 3844 sleepq_head+0x1ab8 disp_lock_exit_nopreempt+0x44 4138 0% 89% 0.00 1405 cpu0_disp disp_lock_exit+0x56 4130 0% 90% 0.00 1353 sleepq_head+0x17c8 disp_lock_exit+0x56 4111 0% 90% 0.00 1526 0xfffffffec1a82708 disp_lock_exit+0x56 ------------------------------------------------------------------------------- R/W writer hold: 164764 events in 64.812 seconds (2542 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 125018 76% 76% 0.00 24819 0xfffffffec17a56a8 wrip+0x489 12533 8% 83% 0.00 7098 0xfffffffec17a56a8 ufs_write+0x54b 12532 8% 91% 0.00 1414263 0xfffffffec17a56a0 ufs_rwunlock+0x24 4729 3% 94% 0.00 1444 0xffffff01efb81d08 dnode_new_blkid+0x19d 3630 2% 96% 0.00 2009 0xfffffffec17a56a8 wrip+0x3cd 1648 1% 97% 0.00 31342 0xfffffffed09c3490 as_map_locked+0x305 1647 1% 98% 0.00 2829 0xfffffffec2b2bc88 segvn_extend_prev+0x149 438 0% 98% 0.00 40571 0xfffffffed09c33b0 as_unmap+0x25c 196 0% 99% 0.00 1468 0xfffffffed1ebc7d8 dnode_new_blkid+0x19d 89 0% 99% 0.00 1551 0xfffffffecba72700 spa_sync+0x4cc 89 0% 99% 0.00 1449 0xfffffffecb5f8cd8 txg_sync_thread+0x1c6 86 0% 99% 0.00 8430 0xfffffffecb5f8cd8 txg_sync_thread+0x186 61 0% 99% 0.00 910403 0xfffffffec142fef8 ldl_waito+0x6d 61 0% 99% 0.00 4446 0xfffffffec142fef8 get_write_bp+0xac 61 0% 99% 0.00 7958 0xfffffffec142fef8 inval_range+0xbd 53 0% 99% 0.00 1454 0xfffffffed37a2560 dnode_new_blkid+0x19d 40 0% 99% 0.00 1469 0xfffffffed3e58ad8 dnode_new_blkid+0x19d 34 0% 99% 0.00 1549 0xfffffffed379ea78 dnode_new_blkid+0x19d 32 0% 99% 0.00 16405 0xfffffffec7bec560 zfs_getpage+0x2ff 24 0% 99% 0.00 1497 0xfffffffed37a32d0 dnode_new_blkid+0x19d ------------------------------------------------------------------------------- R/W reader hold: 290702 events in 64.812 seconds (4485 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 125018 43% 43% 0.00 28827 0xfffffffec1435648 wrip+0x49e 23000 8% 51% 0.00 188307 0xfffffffec17a56a8 ufs_putpages+0x340 22368 8% 59% 0.00 8206227 0xfffffffed2ee0a90 taskq_thread+0x1c3 14403 5% 64% 0.00 1508 0xfffffffed37a3518 dmu_zfetch_find+0x3c1 14403 5% 69% 0.00 6107 0xfffffffed37a32d0 dnode_hold_impl+0xbc 14403 5% 73% 0.00 6606 0xfffffffed37a32d0 dbuf_read+0x250 12533 4% 78% 0.00 10450 0xfffffffec1435648 ufs_write+0x554 6722 2% 80% 0.00 75280 0xfffffffed09c3490 as_fault+0x658 6704 2% 82% 0.00 70784 0xfffffffed1e95958 segvn_fault+0xbfc 6704 2% 85% 0.00 66973 0xfffffffec2b2bc88 segvn_fault+0xbf4 4766 2% 86% 0.00 6106 0xffffff01efb81d08 dbuf_dirty+0x39c 4734 2% 88% 0.00 3757 0xffffff01efb81d08 dbuf_read+0x250 4729 2% 90% 0.00 180997 0xffffffff76afd928 zfs_write+0x3f5 4728 2% 91% 0.00 21145 0xffffff01efb81d08 dmu_buf_hold_array_by_dnode+0x1eb 3630 1% 92% 0.00 5639 0xfffffffec1435648 wrip+0x3e2 3576 1% 94% 0.00 25200843 0xfffffffed2ee0b68 taskq_thread+0x1c3 2764 1% 95% 0.00 5248 0xfffffffed37a3a68 dbuf_read+0x250 2519 1% 96% 0.00 4934 0xfffffffec17a56a8 ufs_getpage+0x8d5 1379 0% 96% 0.00 68439 0xfffffffed09c33b0 as_fault+0x658 1229 0% 96% 0.00 56389 0xffffffff76b28d88 segvn_fault+0xbf4 -------------------------------------------------------------------------------> Do you see anyone with long lock hold times, long > sleeps, or excessive spinning?Not sure if the above lockstat shows anything excessive.> The largest numbers from mpstat are for interrupts > and cross calls. > What does intrstat(1M) show?The first two are for the idle machine; starting with #3 filling of the gzip compressed zpool starts: device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 1 0.0 0 0.0 ehci#0 | 1 0.0 0 0.0 nge#0 | 0 0.0 0 0.0 nvidia#0 | 60 0.1 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 0 0.0 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 2 0.0 0 0.0 ehci#0 | 2 0.0 0 0.0 nge#0 | 0 0.0 158 0.0 nvidia#0 | 61 0.2 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 158 0.1 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 2 0.0 0 0.0 audiohd#0 | 4 0.0 0 0.0 ehci#0 | 4 0.0 0 0.0 nge#0 | 0 0.0 3547 0.3 nvidia#0 | 60 0.2 0 0.0 ohci#0 | 0 0.0 35 0.1 pci-ide#1 | 0 0.0 3547 2.8 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 1 0.0 0 0.0 ehci#0 | 1 0.0 0 0.0 nge#0 | 0 0.0 2861 0.2 nvidia#0 | 60 0.1 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 2861 2.2 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 1 0.0 0 0.0 audiohd#0 | 2 0.0 0 0.0 ehci#0 | 2 0.0 0 0.0 nge#0 | 0 0.0 380 0.0 nvidia#0 | 61 0.3 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 380 0.3 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 0 0.0 0 0.0 ehci#0 | 0 0.0 0 0.0 nge#0 | 0 0.0 3302 0.2 nvidia#0 | 60 0.2 0 0.0 ohci#0 | 0 0.0 30 0.1 pci-ide#1 | 0 0.0 3302 2.6 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 2 0.0 0 0.0 ehci#0 | 2 0.0 0 0.0 nge#0 | 0 0.0 3165 0.3 nvidia#0 | 61 0.2 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 3165 2.5 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 1 0.0 0 0.0 audiohd#0 | 3 0.0 0 0.0 ehci#0 | 3 0.0 0 0.0 nge#0 | 0 0.0 304 0.0 nvidia#0 | 61 0.3 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 304 0.2 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 2 0.0 0 0.0 audiohd#0 | 0 0.0 0 0.0 ehci#0 | 0 0.0 0 0.0 nge#0 | 0 0.0 3878 0.3 nvidia#0 | 60 0.2 0 0.0 ohci#0 | 0 0.0 49 0.1 pci-ide#1 | 0 0.0 3878 3.0 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 2 0.0 0 0.0 ehci#0 | 2 0.0 0 0.0 nge#0 | 0 0.0 2679 0.2 nvidia#0 | 61 0.1 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 2679 2.1 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 1 0.0 0 0.0 ata#1 | 1 0.0 0 0.0 audiohd#0 | 3 0.0 0 0.0 ehci#0 | 3 0.0 0 0.0 nge#0 | 0 0.0 312 0.0 nvidia#0 | 60 0.2 0 0.0 ohci#0 | 0 0.0 0 0.0 pci-ide#1 | 0 0.0 312 0.2 device | cpu0 %tim cpu1 %tim -------------+------------------------------ ata#0 | 0 0.0 0 0.0 ata#1 | 0 0.0 0 0.0 audiohd#0 | 0 0.0 0 0.0 ehci#0 | 0 0.0 0 0.0 nge#0 | 0 0.0 6403 0.5 nvidia#0 | 103 0.5 0 0.0 ohci#0 | 0 0.0 413 1.2 pci-ide#1 | 0 0.0 6403 5.1> Have you run dtrace to determine the most frequent > cross-callers? > > #!/usr/sbin/dtrace -s > > sysinfo:::xcalls > { > @a[stack(30)] = count(); > > trunc(@a, 30); > } > > is an easy way to do this.Seems to taskq_threads, running zfs'' gzip compression: dtrace: script ''/home/jk/src/dtrace/xcalls.d'' matched 3 probes ^C CPU ID FUNCTION:NAME 1 2 :END unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x1df unix`hat_unload+0x41 genunix`segkp_release_internal+0xb2 genunix`segkp_release+0xb6 genunix`schedctl_freepage+0x30 genunix`schedctl_proc_cleanup+0x5c genunix`exec_args+0x1c2 elfexec`elf32exec+0x410 genunix`gexec+0x36d genunix`exec_common+0x417 genunix`exece+0x1b unix`sys_syscall32+0x101 1 unix`xc_do_call+0xfd unix`xc_sync+0x2b unix`dtrace_xcall+0x6f dtrace`dtrace_state_go+0x3ec dtrace`dtrace_ioctl+0x79c genunix`cdev_ioctl+0x48 specfs`spec_ioctl+0x86 genunix`fop_ioctl+0x37 genunix`ioctl+0x16b unix`sys_syscall+0x17b 1 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x1df unix`hat_unload+0x41 swrand`physmem_ent_gen+0x190 swrand`rnd_handler+0x29 genunix`callout_execute+0xb1 genunix`taskq_thread+0x1a7 unix`thread_start+0x8 24 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 genunix`segkp_release_internal+0xb2 genunix`segkp_release+0xb6 genunix`thread_free+0x238 genunix`thread_reap_list+0x1f genunix`thread_reaper+0x130 unix`thread_start+0x8 45 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 genunix`kmem_free+0x47 kstat`read_kstat_data+0x484 kstat`kstat_ioctl+0x4a genunix`cdev_ioctl+0x48 specfs`spec_ioctl+0x86 genunix`fop_ioctl+0x37 genunix`ioctl+0x16b unix`sys_syscall32+0x101 76 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 unix`kfreea+0x5a unix`i_ddi_mem_free+0x5d genunix`ddi_dma_mem_free+0x26 ehci`ehci_free_tw+0x45 ehci`ehci_deallocate_tw+0xb6 ehci`ehci_traverse_active_qtd_list+0xdf ehci`ehci_intr+0x162 unix`av_dispatch_autovect+0x78 unix`dispatch_hardint+0x2f unix`switch_sp_and_call+0x13 120 unix`xc_do_call+0xfd unix`xc_sync+0x2b unix`dtrace_xcall+0x6f unix`dtrace_sync+0x18 dtrace`dtrace_state_deadman+0x18 genunix`cyclic_softint+0xc9 unix`cbe_low_level+0x17 unix`av_dispatch_softvect+0x5f unix`dispatch_softint+0x34 unix`switch_sp_and_call+0x13 149 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 unix`kfreea+0x5a unix`i_ddi_mem_free+0x5d rootnex`rootnex_teardown_copybuf+0x20 rootnex`rootnex_dma_unbindhdl+0xbe genunix`ddi_dma_unbind_handle+0x22 ata`ghd_dmafree_attr+0x29 ata`ata_disk_memfree+0x20 gda`gda_free+0x39 dadk`dadk_iodone+0xbf dadk`dadk_pktcb+0xa5 ata`ata_disk_complete+0x115 ata`ata_hba_complete+0x34 ata`ghd_doneq_process+0x85 unix`av_dispatch_softvect+0x5f unix`dispatch_softint+0x34 unix`switch_sp_and_call+0x13 290 unix`xc_do_call+0xfd unix`xc_sync+0x2b unix`dtrace_xcall+0x6f dtrace`dtrace_ioctl+0x47f genunix`cdev_ioctl+0x48 specfs`spec_ioctl+0x86 genunix`fop_ioctl+0x37 genunix`ioctl+0x16b unix`sys_syscall+0x17b 300 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 unix`kfreea+0x5a unix`i_ddi_mem_free+0x5d genunix`ddi_dma_mem_free+0x26 ohci`ohci_free_tw+0x59 ohci`ohci_deallocate_tw_resources+0xa4 ohci`ohci_traverse_done_list+0xfc ohci`ohci_intr+0x1d0 unix`av_dispatch_autovect+0x78 unix`dispatch_hardint+0x2f unix`switch_sp_and_call+0x13 1020 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 genunix`kmem_free+0x47 unix`kobj_free+0x23 unix`zcfree+0x36 unix`z_deflateEnd+0x74 unix`z_compress_level+0x9d zfs`gzip_compress+0x4b zfs`zio_compress_data+0xbc zfs`zio_write_compress+0x8d genunix`taskq_thread+0x1a7 unix`thread_start+0x8 133773 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 genunix`kmem_free+0x47 unix`kobj_free+0x23 unix`zcfree+0x36 unix`z_deflateEnd+0x91 unix`z_compress_level+0x9d zfs`gzip_compress+0x4b zfs`zio_compress_data+0xbc zfs`zio_write_compress+0x8d genunix`taskq_thread+0x1a7 unix`thread_start+0x8 133773 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 genunix`kmem_free+0x47 unix`kobj_free+0x23 unix`zcfree+0x36 unix`z_deflateEnd+0xae unix`z_compress_level+0x9d zfs`gzip_compress+0x4b zfs`zio_compress_data+0xbc zfs`zio_write_compress+0x8d genunix`taskq_thread+0x1a7 unix`thread_start+0x8 133773 unix`xc_do_call+0xfd unix`xc_wait_sync+0x2b unix`x86pte_inval+0x139 unix`hat_pte_unmap+0xfc unix`hat_unload_callback+0x148 unix`hat_unload+0x41 unix`segkmem_free_vn+0x6a unix`segkmem_free+0x20 genunix`vmem_xfree+0x10c genunix`vmem_free+0x25 genunix`kmem_free+0x47 unix`kobj_free+0x23 unix`zcfree+0x36 unix`z_deflateEnd+0xcb unix`z_compress_level+0x9d zfs`gzip_compress+0x4b zfs`zio_compress_data+0xbc zfs`zio_write_compress+0x8d genunix`taskq_thread+0x1a7 unix`thread_start+0x8 133773 This message posted from opensolaris.org
Jürgen Keil
2007-May-04 12:06 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
Roch Bourbonnais wrote> with recent bits ZFS compression is now handled concurrently with > many CPUs working on different records. > So this load will burn more CPUs and acheive it''s results > (compression) faster.Is this done using the taskq''s, created in spa_activate()? http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109 These threads seems to be running the gzip compression code, and are apparently started with a priority of maxclsyspri == 99.> So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded. > > Is this undesirable ? We might seek a way to slow > down compression in order to limit the system load.Hmm, I see that the USB device drivers are also using taskq''s, see file usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c, function usba_init_pipe_handle(). The USB device driver is using a priority of minclsyspri == 60 (or "maxclsyspri - 5" == 94, in the case of isochronuous usb pipes): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c#427 Could this be a problem? That is, when zfs'' taskq is filled with lots of compression requests, there is no time left running USB taskq that have a lower priority than zfs? This message posted from opensolaris.org
Ian Collins writes: > Roch Bourbonnais wrote: > > > > with recent bits ZFS compression is now handled concurrently with many > > CPUs working on different records. > > So this load will burn more CPUs and acheive it''s results > > (compression) faster. > > > Would changing (selecting a smaller) filesystem record size have any effect? > If the problem is that we just have a high kernel load compressing blocks, then probably not. If anything small records might be a tad less efficient (thus needing more CPU). > > So the observed pauses should be consistent with that of a load > > generating high system time. > > The assumption is that compression now goes faster than when is was > > single threaded. > > > > Is this undesirable ? We might seek a way to slow down compression in > > order to limit the system load. > > > I think you should, otherwise we have a performance throttle that scales > with the number of cores! > Again I wonder to what extent the issue becomes painful due to lack of write throttling. Once we have that in, we should revisit this. -r > Ian >
Jürgen Keil
2007-May-04 18:08 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here....> You still have idle time in this lockstat (and mpstat). > > What do you get for a lockstat -A -D 20 sleep 30? > > Do you see anyone with long lock hold times, long > sleeps, or excessive spinning?Hmm, I ran a series of "lockstat -A -l ph_mutex -s 16 -D 20 sleep 5" commands while writing to the gzip compressed zpool, and noticed these high mutex block times: Adaptive mutex block: 8 events in 5.100 seconds (2 events/sec) ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 5 62% 62% 0.00 317300109 ph_mutex+0x1380 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 12% 75% 0.00 260247717 ph_mutex+0x1a40 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 12% 88% 0.00 348135263 ph_mutex+0x1380 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 12% 100% 0.00 109836587 ph_mutex+0x1380 page_find+0x82 nsec ------ Time Distribution ------ count Stack 134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_free_vn+0x81 segkmem_free+0x20 vmem_xfree+0x10c vmem_free+0x25 kmem_free+0x47 kobj_free+0x23 zcfree+0x36 z_deflateEnd+0x91 z_compress_level+0x9d gzip_compress+0x4b zio_compress_data+0xbc zio_write_compress+0x8d taskq_thread+0x1a7 thread_start+0x8 ------------------------------------------------------------------------------- Or this one, a few seconds later: Adaptive mutex block: 25 events in 5.208 seconds (5 events/sec) ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 3 12% 12% 0.00 643510908 ph_mutex+0x13c0 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x18a z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 3 12% 24% 0.00 210239890 ph_mutex+0x1380 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 3 12% 36% 0.00 179186654 ph_mutex+0xd80 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 2 8% 44% 0.00 649290823 ph_mutex+0x1680 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 2 8% 52% 0.00 220946732 ph_mutex+0x1380 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 2 8% 60% 0.00 199293221 ph_mutex+0xd80 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 64% 0.00 764285200 ph_mutex+0x1f00 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 68% 0.00 761611341 ph_mutex+0x1f00 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 72% 0.00 762076326 ph_mutex+0x1b00 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 76% 0.00 762026717 ph_mutex+0x1b00 page_find+0x82 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_free_vn+0x81 segkmem_free+0x20 vmem_xfree+0x10c vmem_free+0x25 kmem_free+0x47 kobj_free+0x23 zcfree+0x36 z_deflateEnd+0x74 z_compress_level+0x9d gzip_compress+0x4b zio_compress_data+0xbc zio_write_compress+0x8d taskq_thread+0x1a7 thread_start+0x8 ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 80% 0.00 681787674 ph_mutex+0x1680 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 1073741824 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 84% 0.00 154470424 ph_mutex+0xd80 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 88% 0.00 61512189 ph_mutex+0x1380 page_find+0x82 nsec ------ Time Distribution ------ count Stack 67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_free_vn+0x81 segkmem_free+0x20 vmem_xfree+0x10c vmem_free+0x25 kmem_free+0x47 kobj_free+0x23 zcfree+0x36 z_deflateEnd+0xcb z_compress_level+0x9d gzip_compress+0x4b zio_compress_data+0xbc zio_write_compress+0x8d taskq_thread+0x1a7 thread_start+0x8 ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 92% 0.00 200988978 ph_mutex+0x1380 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 268435456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 96% 0.00 11823303 ph_mutex+0x340 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 1 4% 100% 0.00 15941350 ph_mutex+0x340 page_create_va+0x334 nsec ------ Time Distribution ------ count Stack 16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc ------------------------------------------------------------------------------- This message posted from opensolaris.org
Ian Collins, My two free cents.. If the gzip was in application space, most gzip''s implementations support (maybe a new compile) a less extensive/expensive "deflation" that would consume fewer CPU cycles. Secondly, if the file objects are being written locally, the writes to disk are being done asynchronously and shouldn''t really delay other processes and slow down the system. So, my first order would be to take 1GB or 10GB .wav files AND time both the kernel implementation of Gzip and the user application. Approx the same times MAY indicate that the kernel implementation gzip funcs should be treatedly maybe more as interactive scheduling threads and that it is too high and blocks other threads or proces from executing. Mitchell Erblich Sr Software Engineer ---------------- Ian Collins wrote:> > I just had a quick play with gzip compression on a filesystem and the > result was the machine grinding to a halt while copying some large > (.wav) files to it from another filesystem in the same pool. > > The system became very unresponsive, taking several seconds to echo > keystrokes. The box is a maxed out AMD QuadFX, so it should have plenty > of grunt for this. > > Comments? > > Ian > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jürgen Keil
2007-May-07 10:16 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here. > > [mpstat] > > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > > 0 0 0 3109 3616 316 196 5 17 48 45 245 0 85 0 15 > > 1 0 0 3127 3797 592 217 4 17 63 46 176 0 84 0 15 > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > > 0 0 0 3051 3529 277 201 2 14 25 48 216 0 83 0 17 > > 1 0 0 3065 3739 606 195 2 14 37 47 153 0 82 0 17 > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > > 0 0 0 3011 3538 316 242 3 26 16 52 202 0 81 0 19 > > 1 0 0 3019 3698 578 269 4 25 23 56 309 0 83 0 17...> The largest numbers from mpstat are for interrupts and cross calls. > What does intrstat(1M) show? > > Have you run dtrace to determine the most frequent cross-callers?As far as I understand it, we have these frequent cross calls because 1. the test was run on an x86 MP machine 2. the kernel zmod / gzip code allocates and frees four big chunks of memory (4 * 65544 bytes) per zio_write_compress ( gzip ) call [1] Freeing these big memory chunks generates lots of cross calls, because page table entries for that memory are invalidated on all cpus (cores). Of cause this effect cannot be observed on an uniprocessor machine (one cpu / core). And apparently it isn''t the root cause for the bad interactive performance with this test; the bad interactive performance can also be observed on single cpu/single core x86 machines. A possible optimization for MP machines: use some kind of kmem_cache for the gzip buffers, so that these buffers could be reused between gzip compression calls. [1] allocations per zio_write_compress() / gzip_compress() call: 1 6642 kobj_alloc:entry sz 5936, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 6642 kobj_alloc:entry sz 65544, fl 1001 1 5769 kobj_free:entry fffffffeeb307000: sz 65544 1 5769 kobj_free:entry fffffffeeb2f5000: sz 65544 1 5769 kobj_free:entry fffffffeeb2e3000: sz 65544 1 5769 kobj_free:entry fffffffeeb2d1000: sz 65544 1 5769 kobj_free:entry fffffffed1c42000: sz 5936 This message posted from opensolaris.org
Jürgen Keil
2007-May-07 12:02 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> with recent bits ZFS compression is now handled concurrently with > many CPUs working on different records. > So this load will burn more CPUs and acheive it''s results > (compression) faster. > > So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded. > > Is this undesirable ? We might seek a way to slow down compression in > order to limit the system load.According to this dtrace script.... #!/usr/sbin/dtrace -s sdt:genunix::taskq-enqueue /((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/ { @where[stack()] = count(); } tick-5s { printa(@where); trunc(@where); } ... I see bursts of ~ 1000 zio_write_compress() [gzip] taskq calls enqueued into the "spa_zio_issue" taskq by zfs`spa_sync() and its children: 0 76337 :tick-5s ... zfs`zio_next_stage+0xa1 zfs`zio_wait_for_children+0x5d zfs`zio_wait_children_ready+0x20 zfs`zio_next_stage_async+0xbb zfs`zio_nowait+0x11 zfs`dbuf_sync_leaf+0x1b3 zfs`dbuf_sync_list+0x51 zfs`dbuf_sync_indirect+0xcd zfs`dbuf_sync_list+0x5e zfs`dbuf_sync_indirect+0xcd zfs`dbuf_sync_list+0x5e zfs`dnode_sync+0x214 zfs`dmu_objset_sync_dnodes+0x55 zfs`dmu_objset_sync+0x13d zfs`dsl_dataset_sync+0x42 zfs`dsl_pool_sync+0xb5 zfs`spa_sync+0x1c5 zfs`txg_sync_thread+0x19a unix`thread_start+0x8 1092 0 76337 :tick-5s It seems that after such a batch of compress requests is submitted to the "spa_zio_issue" taskq, the kernel is busy for several seconds working on these taskq entries. It seems that this blocks all other "taskq" activity inside the kernel... This dtrace script counts the number of zio_write_compress() calls enqueued / execed by the kernel per second: #!/usr/sbin/dtrace -qs sdt:genunix::taskq-enqueue /((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/ { this->tqe = (taskq_ent_t *)arg1; @enq[this->tqe->tqent_func] = count(); } sdt:genunix::taskq-exec-end /((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/ { this->tqe = (taskq_ent_t *)arg1; @exec[this->tqe->tqent_func] = count(); } tick-1s { /* printf("%Y\n", walltimestamp); */ printf("TS(sec): %u\n", timestamp / 1000000000); printa("enqueue %a: %@u\n", @enq); printa("exec %a: %@u\n", @exec); trunc(@enq); trunc(@exec); } I see bursts of zio_write_compress() calls enqueued / execed, and periods of time where no zio_write_compress() taskq calls are enqueued or execed. 10# ~jk/src/dtrace/zpool_gzip7.d TS(sec): 7829 TS(sec): 7830 TS(sec): 7831 TS(sec): 7832 TS(sec): 7833 TS(sec): 7834 TS(sec): 7835 enqueue zfs`zio_write_compress: 1330 exec zfs`zio_write_compress: 1330 TS(sec): 7836 TS(sec): 7837 TS(sec): 7838 TS(sec): 7839 TS(sec): 7840 TS(sec): 7841 TS(sec): 7842 TS(sec): 7843 TS(sec): 7844 enqueue zfs`zio_write_compress: 1116 exec zfs`zio_write_compress: 1116 TS(sec): 7845 TS(sec): 7846 TS(sec): 7847 TS(sec): 7848 TS(sec): 7849 TS(sec): 7850 TS(sec): 7851 TS(sec): 7852 TS(sec): 7853 TS(sec): 7854 TS(sec): 7855 TS(sec): 7856 TS(sec): 7857 enqueue zfs`zio_write_compress: 932 exec zfs`zio_write_compress: 932 TS(sec): 7858 TS(sec): 7859 TS(sec): 7860 TS(sec): 7861 TS(sec): 7862 TS(sec): 7863 TS(sec): 7864 TS(sec): 7865 TS(sec): 7866 TS(sec): 7867 enqueue zfs`zio_write_compress: 5 exec zfs`zio_write_compress: 5 TS(sec): 7868 enqueue zfs`zio_write_compress: 774 exec zfs`zio_write_compress: 774 TS(sec): 7869 TS(sec): 7870 TS(sec): 7871 TS(sec): 7872 TS(sec): 7873 TS(sec): 7874 TS(sec): 7875 TS(sec): 7876 enqueue zfs`zio_write_compress: 653 exec zfs`zio_write_compress: 653 TS(sec): 7877 TS(sec): 7878 TS(sec): 7879 TS(sec): 7880 TS(sec): 7881 And a final dtrace script, which monitors scheduler activity while filling a gzip compressed pool: #!/usr/sbin/dtrace -qs sched:::off-cpu, sched:::on-cpu, sched:::remain-cpu, sched:::preempt { /* @[probename, stack()] = count(); */ @[probename] = count(); } tick-1s { printf("%Y", walltimestamp); printa(@); trunc(@); } It shows periods of time with absolutely *no* scheduling activity (I guess this is when the "spa_zio_issue" taskq is working on such a bug batch of submitted gzip compression calls): 21# ~jk/src/dtrace/zpool_gzip9.d 2007 May 6 21:38:12 preempt 13 off-cpu 808 on-cpu 808 2007 May 6 21:38:13 preempt 1 off-cpu 446 on-cpu 446 2007 May 6 21:38:14 preempt 6 off-cpu 2308 on-cpu 2308 2007 May 6 21:38:15 preempt 1 off-cpu 429 on-cpu 429 2007 May 6 21:38:16 off-cpu 412 on-cpu 412 2007 May 6 21:38:17 off-cpu 711 on-cpu 711 2007 May 6 21:38:18 preempt 2 off-cpu 238 on-cpu 238 2007 May 6 21:38:19 preempt 48 off-cpu 7740 on-cpu 7740 2007 May 6 21:38:20 preempt 114 off-cpu 20823 on-cpu 20823 2007 May 6 21:38:21 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:22 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:23 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:24 preempt 181 off-cpu 32486 on-cpu 32486 2007 May 6 21:38:25 preempt 1 off-cpu 1612 on-cpu 1612 2007 May 6 21:38:26 preempt 6 off-cpu 1068 on-cpu 1068 2007 May 6 21:38:27 2007 May 6 21:38:28 preempt 56 off-cpu 9756 on-cpu 9756 2007 May 6 21:38:29 preempt 200 off-cpu 36195 on-cpu 36195 2007 May 6 21:38:30 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:31 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:33 preempt 144 off-cpu 25982 on-cpu 25982 2007 May 6 21:38:34 preempt 118 off-cpu 20705 on-cpu 20705 2007 May 6 21:38:35 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:36 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:37 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:38 preempt 231 off-cpu 42003 on-cpu 42003 2007 May 6 21:38:39 preempt 42 off-cpu 4781 on-cpu 4781 2007 May 6 21:38:40 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:41 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2007 May 6 21:38:42 preempt 4 off-cpu 486 on-cpu 486 This message posted from opensolaris.org
Robert Milkowski
2007-May-08 09:58 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
Hello Ian, Thursday, May 3, 2007, 10:20:20 PM, you wrote: IC> Roch Bourbonnais wrote:>> >> with recent bits ZFS compression is now handled concurrently with many >> CPUs working on different records. >> So this load will burn more CPUs and acheive it''s results >> (compression) faster. >>IC> Would changing (selecting a smaller) filesystem record size have any effect?>> So the observed pauses should be consistent with that of a load >> generating high system time. >> The assumption is that compression now goes faster than when is was >> single threaded. >> >> Is this undesirable ? We might seek a way to slow down compression in >> order to limit the system load. >>IC> I think you should, otherwise we have a performance throttle that scales IC> with the number of cores! For file servers you actually want to all CPUs to be used for compression otherwise you get bad performance with compression and plenty of CPU left doing nothing... So maybe special pool/dataset property (compression_parallelism=N?) -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Adam Leventhal
2007-May-09 09:02 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
On Thu, May 03, 2007 at 11:43:49AM -0500, Wade.Stuart at fallon.com wrote:> I think this may be a premature leap -- It is still undetermined if we are > running up against a yet unknown bug in the kernel implementation of gzip > used for this compression type. From my understanding the gzip code has > been reused from an older kernel implementation, it may be possible that > this code has some issues with kernel stuttering when used for zfs > compression that may have not been exposed with its original usage. If it > turns out that it is just a case of high cpu trade-off for buying faster > compression times, then the talk of a tunable may make sense (if it is even > possible given the constraints of the gzip code in kernelspace).The in-kernel version is zlib is the latest version (1.2.3). It''s not surprising that we''re spending all of our time in zlib if the machine is being driving by I/O. There are outstanding problems with compression in the ZIO pipeline that may contribute to the bursty behavior. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Darren J Moffat
2007-May-09 10:52 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
Adam Leventhal wrote:> The in-kernel version is zlib is the latest version (1.2.3). It''s not > surprising that we''re spending all of our time in zlib if the machine is > being driving by I/O. There are outstanding problems with compression in > the ZIO pipeline that may contribute to the bursty behavior.Can you give some more info on what these problems are. I''m specifically interested in wither or not crypto will end up with similar issues. Also because of the special nature of compression in the ZIO pipeline I had make a small modification[1] the compress part of the pipeline for crypto to run for the write stages. [1] see: http://src.opensolaris.org/source/xref/zfs-crypto/zfs-crypto-gate/usr/src/uts/common/fs/zfs/zio.c#1053 -- Darren J Moffat
Adam Leventhal
2007-May-09 16:50 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote:> Can you give some more info on what these problems are.I was thinking of this bug: 6460622 zio_nowait() doesn''t live up to its name Which was surprised to find was fixed by Eric in build 59. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Bart Smaalders
2007-May-09 17:17 UTC
[zfs-discuss] Re: Re: gzip compression throttles system?
Adam Leventhal wrote:> On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote: >> Can you give some more info on what these problems are. > > I was thinking of this bug: > > 6460622 zio_nowait() doesn''t live up to its name > > Which was surprised to find was fixed by Eric in build 59. > > Adam >It was pointed out by J?rgen Keil that using ZFS compression submits a lot of prio 60 tasks to the system task queues; this would clobber interactive performance. - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts -------------- next part -------------- An embedded message was scrubbed... From: =?UTF-8?Q?J=C3=BCrgen_Keil?= <jk at tools.de> Subject: [zfs-discuss] Re: Re: Re: gzip compression throttles system? Date: Mon, 07 May 2007 05:02:36 -0700 (PDT) Size: 13888 URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070509/a3577a74/attachment.mht>
Jürgen Keil
2007-May-10 17:10 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
Bart wrote:> Adam Leventhal wrote: > > On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote: > >> Can you give some more info on what these problems are. > > > > I was thinking of this bug: > > > > 6460622 zio_nowait() doesn''t live up to its name > > > > Which was surprised to find was fixed by Eric in build 59. > > > > It was pointed out by J?rgen Keil that using ZFS compression > submits a lot of prio 60 tasks to the system task queues; > this would clobber interactive performance.Actually the taskq "spa_zio_issue" / "spa_zio_intr" run at prio 99 (== maxclsyspri or MAXCLSYSPRI): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109 Btw: In one experiment I tried to boot the kernel under kmdb control (-kd), patched "minclsyspri := 61" and used a breakpoint inside spa_active() to patch the spa_zio_* taskq to use prio 60 when importing the gzip compressed pool (so that the gzip compressed pool was using prio 60 threads and usb and other stuff was using prio >= 61 threads). That didn''t help interactive performance... This message posted from opensolaris.org
Bill Sommerfeld
2007-May-10 18:54 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
On Thu, 2007-05-10 at 10:10 -0700, J??rgen Keil wrote:> Btw: In one experiment I tried to boot the kernel under kmdb > control (-kd), patched "minclsyspri := 61" and used a > breakpoint inside spa_active() to patch the spa_zio_* taskq > to use prio 60 when importing the gzip compressed pool > (so that the gzip compressed pool was using prio 60 threads > and usb and other stuff was using prio >= 61 threads). > That didn''t help interactive performance...oops. sounds like cpu-intensive compression (and encryption/decryption once that''s upon us) should ideally be handed off to worker threads that compete on a "fair" footing with compute-intensive userspace threads, or (better yet) are scheduled like the thread which initiated the I/O. - Bill
Darren J Moffat
2007-May-11 09:01 UTC
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
Bill Sommerfeld wrote:> On Thu, 2007-05-10 at 10:10 -0700, J??rgen Keil wrote: >> Btw: In one experiment I tried to boot the kernel under kmdb >> control (-kd), patched "minclsyspri := 61" and used a >> breakpoint inside spa_active() to patch the spa_zio_* taskq >> to use prio 60 when importing the gzip compressed pool >> (so that the gzip compressed pool was using prio 60 threads >> and usb and other stuff was using prio >= 61 threads). >> That didn''t help interactive performance... > > oops. sounds like cpu-intensive compression (and encryption/decryption > once that''s upon us) should ideally be handed off to worker threads that > compete on a "fair" footing with compute-intensive userspace threads, or > (better yet) are scheduled like the thread which initiated the I/O.This will be different with encryption. The crypto framework already tries to do "fair" scheduling, it can be called in sync and async mode. We use per provider taskqs for hardware and for async software requests we have a taskq per cpu that can have scheduling priority/class set on it by putting the svc://system/cryptosvc service into an appropriate project. I haven''t done any performance testing of crypto yet so I don''t know how it will work in this case but we do know that the current method works well for networking. -- Darren J Moffat
I''ve filed: 6586537 async zio taskqs can block out userland commands to track this issue. eric This message posted from opensolaris.org
On 11/05/2007, at 4:54 AM, Bill Sommerfeld wrote:> On Thu, 2007-05-10 at 10:10 -0700, J?rgen Keil wrote: >> Btw: In one experiment I tried to boot the kernel under kmdb >> control (-kd), patched "minclsyspri := 61" and used a >> breakpoint inside spa_active() to patch the spa_zio_* taskq >> to use prio 60 when importing the gzip compressed pool >> (so that the gzip compressed pool was using prio 60 threads >> and usb and other stuff was using prio >= 61 threads). >> That didn''t help interactive performance... > > oops. sounds like cpu-intensive compression (and encryption/ > decryption > once that''s upon us) should ideally be handed off to worker threads > that > compete on a "fair" footing with compute-intensive userspace > threads, or > (better yet) are scheduled like the thread which initiated the I/O.i just installed s10u6 which now has gzip compression on zfs datasets. very exciting, my samfs disk vsn can store a bit more now. however, despite being able to store more it does lock up a lot while it does the compression. perhaps doing the compression at a different point in the io pipeline would be better. at the moment it is done when zfs decides to flush the buffers to disk, which ends up being a lot of work that needs to be done in a very short time. perhaps you can amortize that cost by doing it when the data from userland makes it into the kernel. another idea could be doing the compression when you reach a relatively low threshold of uncompressed data in the cache. ie, as soon as you get 1MB of data in the cache, compress it then, rather than waiting till you have 200MB of data in the cache that needs to be compressed RIGHT NOW. either way, my opinion is you want to do all the compression work spread over a longer period of time, rather than all at once when you flush the cache to the spindles. i can tolerate 90% idle cpus. i cant tolerate cpus that are 98% idle except for every 5 seconds when theyre 0% idle for 2 seconds and i cant even see keypresses on a serial console cos there''s no time for userland to run in. dlg
On Wed, 5 Nov 2008, David Gwynne wrote:> be done in a very short time. perhaps you can amortize that cost by > doing it when the data from userland makes it into the kernel. another > idea could be doing the compression when you reach a relatively low > threshold of uncompressed data in the cache. ie, as soon as you get > 1MB of data in the cache, compress it then, rather than waiting till > you have 200MB of data in the cache that needs to be compressed RIGHT > NOW.This is counter-productive. ZFS''s lazy compression approach ends up doing a lot less compression in the common case where files are updated multiple times before ZFS decides to write to disk. If your advice is followed, then every write will involve compression, rather than the summation of perhaps thousands of writes. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 05/11/2008, at 3:27 AM, Bob Friesenhahn wrote:> On Wed, 5 Nov 2008, David Gwynne wrote: > >> be done in a very short time. perhaps you can amortize that cost by >> doing it when the data from userland makes it into the kernel. >> another >> idea could be doing the compression when you reach a relatively low >> threshold of uncompressed data in the cache. ie, as soon as you get >> 1MB of data in the cache, compress it then, rather than waiting till >> you have 200MB of data in the cache that needs to be compressed RIGHT >> NOW. > > This is counter-productive. ZFS''s lazy compression approach ends up > doing a lot less compression in the common case where files are > updated multiple times before ZFS decides to write to disk. If your > advice is followed, then every write will involve compression, > rather than the summation of perhaps thousands of writes.when the compression happens can be an arbitrary line in the sand. on one side of the line you''re mitigating compression per write by deferring the compression as long as possible, and on the other side you want to spread the cost of compression out over time to avoid locking up the machine by doing lots of little compressions more often. this is a similar trade off zfs makes with its cache and its flush interval. the flush interval is adjustable in zfs, why cant the "compression interval/watermark" be invented and made adjustable as well? dlg> > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >
Bob Friesenhahn wrote:> On Wed, 5 Nov 2008, David Gwynne wrote: > > >> be done in a very short time. perhaps you can amortize that cost by >> doing it when the data from userland makes it into the kernel. another >> idea could be doing the compression when you reach a relatively low >> threshold of uncompressed data in the cache. ie, as soon as you get >> 1MB of data in the cache, compress it then, rather than waiting till >> you have 200MB of data in the cache that needs to be compressed RIGHT >> NOW. >> > > This is counter-productive. ZFS''s lazy compression approach ends up > doing a lot less compression in the common case where files are > updated multiple times before ZFS decides to write to disk. If your > advice is followed, then every write will involve compression, rather > than the summation of perhaps thousands of writes. > >But gzip has a significant impact when doing a zfs receive. It would be interesting to see how an amortised compression scheme would work in this case. Currently writing to a filesystem with gzip compression takes more than twice the time than a to one with lzjb compression on a quiet x4540. There isn''t any noticeable difference between lzjb and no compression. -- Ian.
On 05/11/2008, at 2:22 PM, Ian Collins wrote:> Bob Friesenhahn wrote: >> On Wed, 5 Nov 2008, David Gwynne wrote: >> >> >>> be done in a very short time. perhaps you can amortize that cost by >>> doing it when the data from userland makes it into the kernel. >>> another >>> idea could be doing the compression when you reach a relatively low >>> threshold of uncompressed data in the cache. ie, as soon as you get >>> 1MB of data in the cache, compress it then, rather than waiting till >>> you have 200MB of data in the cache that needs to be compressed >>> RIGHT >>> NOW. >>> >> >> This is counter-productive. ZFS''s lazy compression approach ends up >> doing a lot less compression in the common case where files are >> updated multiple times before ZFS decides to write to disk. If your >> advice is followed, then every write will involve compression, rather >> than the summation of perhaps thousands of writes. >> >> > But gzip has a significant impact when doing a zfs receive. It would > be > interesting to see how an amortised compression scheme would work in > this case. Currently writing to a filesystem with gzip compression > takes more than twice the time than a to one with lzjb compression > on a > quiet x4540. There isn''t any noticeable difference between lzjb and > no > compression.i dont think lzjb needs big memory allocations like gzip does. those memory allocs cause a ton of xcals on a system, which cant be good for interactivity. dlg