thr3ads.net - zfs discuss - [zfs-discuss] Slow death-spiral with zfs gzip-9 compression [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Ray Clark

2008-Nov-29 16:15 UTC

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

I am [trying to] perform a test prior to moving my data to solaris and zfs. 
Things are going very poorly.  Please suggest what I might do to understand what
is going on, report a meaningful bug report, fix it, whatever!

Both to learn what the compression could be, and to induce a heavy load to
expose issues, I am running with compress=gzip-9.

I have two machines, both identical 800MHz P3 with 768MB memory.  The disk
complement and OS is different.  My current host is Suse Linux 10.2 (2.6.18
kernel) running two 120GB drives under LVM.  My test machine is 2008.11 B2 with
two 200GB drives on the motherboard secondary IDE, zfs mirroring them, NFS
exported.

My "test" is to simply run "cp -rp * /testhome" on the Linux
machine, where /testhome is the NFS mounted zfs file system on the Solaris
system.

It starts out with "reasonable" throughput.  Although the heavy load
makes the Solaris system pretty jerky and unresponsive, it does work.  The Linux
system is a little jerky and unresponsive, I assume due to waiting for sluggish
network responses.

After about 12 hours, the throughput has slowed to a crawl.  The Solaris machine
takes a minute or more to respond to every character typed and mouse click.  The
Linux machines is no longer jerky, which makes sense since it has to wait alot
for Solaris.  Stuff is flowing, but throughput is in the range of 100K
bytes/second.

The Linux machine (available for tests) "gzip -9"ing a few multi-GB
files seems to get 3MB/sec +/- 5% pretty consistently.  Being the exact same
CPU, RAM (Including brand and model), Chipset, etc. I would expect should have
similar throughput from ZFS.  This is in the right ballpark of what I saw when
the copy first started.  In an hour or two it moved about 17GB.

I am also running a "vmstat" and a "top" to a log file.  Top
reports total swap size as 512MB, 510 available.  vmstat for the first few hours
reported something reasonable (it never seems to agree with top), but now is
reporting around 570~580MB, and for a while was reporting well over 600MB free
swap out of the 512M total!

I have gotten past a top memory leak (opensolaris.com bug 5482) and so am now
running top only one iteration, in a shell for loop with a sleep instead of
letting it repeat.  This was to be my test run to see it work.

What information can I capture and how can I capture it to figure this out?

My goal is to gain confidence in this system.  The idea is that Solaris and ZFS
should be more reliable than Linux and LVM.  Although I have never lost data due
to Linux problems, I have lost it due to disk failure, and zfs should cover
that!

Thank you ahead for any ideas or suggestions.
-- 
This message posted from opensolaris.org

Karl Hakimian

2008-Nov-29 16:29 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

We looked at using gzip compression with zfs for a backup solution. Short answer
on our conclusion is it will not work. The gzip -9 compression just
can''t sustain a large throughput of copying gigs of files. We ended up
doing non gzip compression and taking the hit on compression ratio (1.2x or so
instead of almost 3x).

In the mean time, we are working on integrating our gzip board into zfs so that
we can try again with hardware compression.
-- 
This message posted from opensolaris.org

andrew

2008-Nov-29 16:39 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> I am [trying to] perform a test prior to moving my
> data to solaris and zfs.  Things are going very
> poorly.  Please suggest what I might do to understand
> what is going on, report a meaningful bug report, fix
> it, whatever!
> 
> Both to learn what the compression could be, and to
> induce a heavy load to expose issues, I am running
> with compress=gzip-9.
> 
> I have two machines, both identical 800MHz P3 with
> 768MB memory.  The disk complement and OS is
> different.  My current host is Suse Linux 10.2
> (2.6.18 kernel) running two 120GB drives under LVM.
> My test machine is 2008.11 B2 with two 200GB drives
> on the motherboard secondary IDE, zfs mirroring
>  them, NFS exported.
> 
> My "test" is to simply run "cp -rp * /testhome" on
> the Linux machine, where /testhome is the NFS mounted
> zfs file system on the Solaris system.
> 
> It starts out with "reasonable" throughput.  Although
> the heavy load makes the Solaris system pretty jerky
> and unresponsive, it does work.  The Linux system is
> a little jerky and unresponsive, I assume due to
> waiting for sluggish network responses.
> 
> After about 12 hours, the throughput has slowed to a
> crawl.  The Solaris machine takes a minute or more to
> respond to every character typed and mouse click.
> The Linux machines is no longer jerky, which makes
> sense since it has to wait alot for Solaris.  Stuff
> is flowing, but throughput is in the range of 100K
>  bytes/second.
> 
> The Linux machine (available for tests) "gzip -9"ing
> a few multi-GB files seems to get 3MB/sec +/- 5%
> pretty consistently.  Being the exact same CPU, RAM
> (Including brand and model), Chipset, etc. I would
> expect should have similar throughput from ZFS.  This
> is in the right ballpark of what I saw when the copy
> first started.  In an hour or two it moved about
> 17GB.
> 
> I am also running a "vmstat" and a "top" to a log
> file.  Top reports total swap size as 512MB, 510
> available.  vmstat for the first few hours reported
> something reasonable (it never seems to agree with
> top), but now is reporting around 570~580MB, and for
> a while was reporting well over 600MB free swap out
> of the 512M total!
> 
> I have gotten past a top memory leak (opensolaris.com
> bug 5482) and so am now running top only one
> iteration, in a shell for loop with a sleep instead
> of letting it repeat.  This was to be my test run to
> see it work.
> 
> What information can I capture and how can I capture
> it to figure this out?
> 
> My goal is to gain confidence in this system.  The
> idea is that Solaris and ZFS should be more reliable
> than Linux and LVM.  Although I have never lost data
> due to Linux problems, I have lost it due to disk
> failure, and zfs should cover that!
> 
> Thank you ahead for any ideas or suggestions.
Solaris reports "virtual memory" as the sum of physical memory and
page file - so this is where your strange vmstat output comes from. Running ZFS
stress tests on a system with only 768MB of memory is not a good idea since ZFS
uses large amounts of memory for its cache. You can limit the size of the ARC
(Adaptive Replacement Cache) using the details here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache

Try limiting the ARC size then run the test again - if this works then memory
contention is the cause of the slowdown.

Also, NFS to ZFS filesystems will run slowly under certain conditions -including
with the default configuration. See this link for more information:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

Cheers

Andrew.
-- 
This message posted from opensolaris.org

Richard Elling

2008-Nov-29 17:00 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ray Clark wrote:> I am [trying to] perform a test prior to moving my data to solaris and zfs.
Things are going very poorly.  Please suggest what I might do to understand what
is going on, report a meaningful bug report, fix it, whatever!
>
> Both to learn what the compression could be, and to induce a heavy load to
expose issues, I am running with compress=gzip-9.
>
> I have two machines, both identical 800MHz P3 with 768MB memory.  The disk
complement and OS is different.  My current host is Suse Linux 10.2 (2.6.18
kernel) running two 120GB drives under LVM.  My test machine is 2008.11 B2 with
two 200GB drives on the motherboard secondary IDE, zfs mirroring them, NFS
exported.
>
> My "test" is to simply run "cp -rp * /testhome" on the
Linux machine, where /testhome is the NFS mounted zfs file system on the Solaris
system.
>
> It starts out with "reasonable" throughput.  Although the heavy
load makes the Solaris system pretty jerky and unresponsive, it does work.  The
Linux system is a little jerky and unresponsive, I assume due to waiting for
sluggish network responses.
>
> After about 12 hours, the throughput has slowed to a crawl.  The Solaris
machine takes a minute or more to respond to every character typed and mouse
click.  The Linux machines is no longer jerky, which makes sense since it has to
wait alot for Solaris.  Stuff is flowing, but throughput is in the range of 100K
bytes/second.
>
> The Linux machine (available for tests) "gzip -9"ing a few
multi-GB files seems to get 3MB/sec +/- 5% pretty consistently.  Being the exact
same CPU, RAM (Including brand and model), Chipset, etc. I would expect should
have similar throughput from ZFS.  This is in the right ballpark of what I saw
when the copy first started.  In an hour or two it moved about 17GB.
>
> I am also running a "vmstat" and a "top" to a log file.
Top reports total swap size as 512MB, 510 available.  vmstat for the first few
hours reported something reasonable (it never seems to agree with top), but now
is reporting around 570~580MB, and for a while was reporting well over 600MB
free swap out of the 512M total!
>
> I have gotten past a top memory leak (opensolaris.com bug 5482) and so am
now running top only one iteration, in a shell for loop with a sleep instead of
letting it repeat.  This was to be my test run to see it work.
>
> What information can I capture and how can I capture it to figure this out?
>
> My goal is to gain confidence in this system.  The idea is that Solaris and
ZFS should be more reliable than Linux and LVM.  Although I have never lost data
due to Linux problems, I have lost it due to disk failure, and zfs should cover
that!
>
> Thank you ahead for any ideas or suggestions.
>   
800 MHz P3 + 768 MBytes of RAM + IDE + ZFS + gzip-9 + NFS = pain
I''m not sure I could dream of a worse combination for performance. 
I''m
actually surprised it takes 12 hours to crater -- probably because the 
client
is also quite slow.

arcstat will show the ARC usage, which should be increasing until the limit
is reached.  If you compare iostat to network use (eg nicstat or iostat 
(does
the Linux version of iostat track NFS I/O?)) then you should see a mismatch,
which can likely be attributed to the time required to gzip-9 and commit to
disk.

When there is plenty of free RAM or the ARC is full of flushable data, then
performance might be ok.  But the ARC can also contain writable 
(unflushable)
data which cannot be quickly drained because of IDE + gzip-9 + 800 MHz P3.
Look for a memory shortfall, which we would normally expect under such
conditions, and probably best observed via the scan rate column in vmstat.

You could change any one of the variables and get much better performance.
 -- richard

Ray Clark

2008-Nov-29 17:06 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Please help me understand what you mean.  There is a big difference between
being unacceptably slow and not working correctly, or between being unacceptably
slow and having an implementation problem that causes it to eventually stop.  I
expect it to be slow, but I expect it to work.  Are you saying that you found
that it did not function correctly, or that it was too slow for your purposes? 
Thanks for your insights!  (3x would be awesome).
-- 
This message posted from opensolaris.org

Tim

2008-Nov-29 17:18 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 11:06 AM, Ray Clark <webclark at
rochester.rr.com>wrote:
> Please help me understand what you mean.  There is a big difference between
> being unacceptably slow and not working correctly, or between being
> unacceptably slow and having an implementation problem that causes it to
> eventually stop.  I expect it to be slow, but I expect it to work.  Are you
> saying that you found that it did not function correctly, or that it was
too
> slow for your purposes?  Thanks for your insights!  (3x would be awesome).
> --
>
>
I expect it will go SO SLOW, that some function somewhere is eventually
going to fail/timeout.  That system is barely usable WITHOUT compression.  I
hope at the very least you''re disabling every single unnecessary
service
before doing any testing, especially the GUI.

ZFS uses ram, and plenty of it.  That''s the nature of COW.  Enabling
realtime compression with an 800mhz p3?  Kiss any performance, however poor
it was, goodbye.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/689078c5/attachment.html>

Karl Hakimian

2008-Nov-29 17:30 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

For us, the machine became increasing unresponsive until it was
indistinguishable from a complete lockup. A top that was running showed triple
digit loads when it occasionally updated. We had to hit the reset button to get
the machine back. While this might simply be unacceptably slow, I was not
willing to wait long enough to find out. I did let it run for a few days, so I
don''t think I jumped the gun.
-- 
This message posted from opensolaris.org

Karl Hakimian

2008-Nov-29 17:32 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

The machine we tested with was a reasonably fast amd64 with 4Gigs of memory. I
don''t know if I had disabled the graphical login yet, but it was
probably not even logged in via the console. Nothing else was running.
-- 
This message posted from opensolaris.org

Mario Goebbels

2008-Nov-29 17:33 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> I expect it will go SO SLOW, that some function somewhere is eventually
> going to fail/timeout.  That system is barely usable WITHOUT
> compression.  I hope at the very least you''re disabling every
single
> unnecessary service before doing any testing, especially the GUI.
> 
> ZFS uses ram, and plenty of it.  That''s the nature of COW. 
Enabling
> realtime compression with an 800mhz p3?  Kiss any performance, however
> poor it was, goodbye.
Regardless of that, gzip is still heavy on the system. Unbz2ing a 30MB
package (e.g. VirtualBox) in my packages ZFS filesystem with gzip
compression does affect interactivity quite a lot (i.e. 1 sec UI freezes
on transaction commit). This on an Intel Core2 Quad!

I do not notice these effects using lzjb.

People have been clamoring for lzo support more than a year ago. That
algorithm gets a decent compression rate not far from gzip and has a
very light footprint similar to lzjb. I think someone even wanted to
start a project for porting alternative compression methods to ZFS,
focusing on BWT tho, but nothing came from that (at least publicly).

Regards,
-mg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 225 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/aa90cfa4/attachment.bin>

Ray Clark

2008-Nov-29 17:33 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Andrewk8,

Thanks for the information.  I have some questions.

[1] You said "zfs uses large amounts of memory for its cache".   If I
understand correctly, it is not that it uses large amounts, it is that it uses
all memory available.  If this is an accurate picture, then it should be just as
happy with 128MB as it is with 4GB.  The result would simply be less of a
cache/buffer between clients and the physical disk.  It also seems like any
congestion should show up fairly soon, not gradually over 12 hours!  Certainly
limiting the ARC cache is something I will try, but it does not make sense to
me.  Can you help me along?

[2] Regarding zfs vs. nfs, the reference talks about unneeded cache flushes
dragging down throughput to NVRAM buffered disks.  The flushes were designed for
physical rotating disks.  I am using physical, rotating disks, so it seems like
the changes that they suggest for NVRAM buffered disks would not be appropriate
for me, and that the default behavior designed for physical rotating disks would
be what I want.  What am I missing?

[3] I also get ~4MB/second throughput NFS to disk with compression disabled, and
3MB/sec with gzip-9 for the first hour or two.  This is nothing to brag about
and I had planned eventually to look into making it faster, but this pales
compared to the 100KB/second it has degraded to over 12 hours.  Were your
comments aimed at helping me get faster NFS throughput, or at addressing the
immediate gross problem?

Thanks again for taking the time to help.
--Ray
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2008-Nov-29 17:56 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Ray Clark wrote:>
> [1] You said "zfs uses large amounts of memory for its cache". 
If I
> understand correctly, it is not that it uses large amounts, it is 
> that it uses all memory available.  If this is an accurate picture, 
> then it should be just as happy with 128MB as it is with 4GB.  The 
> result would simply be less of a cache/buffer between clients and 
> the physical disk.  It also seems like any congestion should show up
Memory is about 10,000 times faster than disk.  Why should it be just 
as happy with vastly less memory?
> [2] Regarding zfs vs. nfs, the reference talks about unneeded cache 
> flushes dragging down throughput to NVRAM buffered disks.  The 
> flushes were designed for physical rotating disks.  I am using 
> physical, rotating disks, so it seems like the changes that they 
> suggest for NVRAM buffered disks would not be appropriate for me, 
> and that the default behavior designed for physical rotating disks 
> would be what I want.  What am I missing?
Most NVRAM buffered disks do use caching.  The question is how 
reliably unflushed data will be stored after power loss.  FLASH 
devices definitely use a cache buffer since writes to FLASH are 
actually pretty slow (often slower than rotating media) and the FLASH 
blocksize is typically larger than the write blocksize.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ray Clark

2008-Nov-29 18:02 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

tcook,

You bring up a good point.  exponentially slow is very different from crashed,
though they may have the same net effect.  Also that other factors like timeouts
would come into play.

Regarding services, I am new to administering "modern" solaris, and
that is on my learning curve.  My immediate need is simply a dumb file server. 
3 or 4 MB/sec would be adequate for my needs (marginal and at times annoying,
but adequate).  If you expect it to be slow, it does work quite nicely without
compression.  I have to use what I have.  In the meantime, perhaps my stress
tests will also serve to expose issues.

Regarding the GUI, I don''t know how to disable it.   There are no
virtual consoles, and unlike older versions of SunOS and Solaris, it comes up in
XDM and there is no [apparent] way to get a shell without running gnome.  I am
sure that there is, but again, I come from the BSD/SunOS/Linux line, and have
not learned the ins and outs of Nevada/Indiana yet.  I had hoped to put up a
simple installation serving up disks and learns details later.  There are
several 60~90MB gnome apps evidently pre-loaded - even a 45MB clock!   Wow.

Interestingly, the "size" fields under "top" add up to 950GB
without getting to the bottom of the list, yet it shows NO swap being used, and
150MB free out of 768 of RAM!  So how can the size of the existing processes
exceed the size of the virtual memory in use by a factor of 2, and the size of
total virtual memory by a factor of 1.5?  This is not the resident size - this
is the total size!

News Flash!  It has come out of it, and is moving along now at 2 MB/sec.  GUI is
responsive with an occasional stutter.  It was going through a directory
structure full of .mp3 and .flac files.    Perhaps the gzip algorithm gets hung
up in the data patterns they create.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 18:07 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Hakimian,

So you had a similar experience to what I had with an 800 MHz P3 and 768MB, all
the way down to totally unresponsive.  Probably 5 or 6 x the CPU speed (assuming
single core) and 5 x the memory.  This can only be a real design problem or bug,
not just expected performance.

Is there anyone from Sun who can advise me how to file this given the diffuse
information?
-- 
This message posted from opensolaris.org

Tim

2008-Nov-29 18:10 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 12:02 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> tcook,
>
> You bring up a good point.  exponentially slow is very different from
> crashed, though they may have the same net effect.  Also that other factors
> like timeouts would come into play.
>
> Regarding services, I am new to administering "modern" solaris,
and that is
> on my learning curve.  My immediate need is simply a dumb file server.  3
or
> 4 MB/sec would be adequate for my needs (marginal and at times annoying,
but
> adequate).  If you expect it to be slow, it does work quite nicely without
> compression.  I have to use what I have.  In the meantime, perhaps my
stress
> tests will also serve to expose issues.
>
> Regarding the GUI, I don''t know how to disable it.   There are no
virtual
> consoles, and unlike older versions of SunOS and Solaris, it comes up in
XDM
> and there is no [apparent] way to get a shell without running gnome.  I am
> sure that there is, but again, I come from the BSD/SunOS/Linux line, and
> have not learned the ins and outs of Nevada/Indiana yet.  I had hoped to
put
> up a simple installation serving up disks and learns details later.  There
> are several 60~90MB gnome apps evidently pre-loaded - even a 45MB clock!
> Wow.
>
> Interestingly, the "size" fields under "top" add up to
950GB without
> getting to the bottom of the list, yet it shows NO swap being used, and
> 150MB free out of 768 of RAM!  So how can the size of the existing
processes
> exceed the size of the virtual memory in use by a factor of 2, and the size
> of total virtual memory by a factor of 1.5?  This is not the resident size
-
> this is the total size!
>
> News Flash!  It has come out of it, and is moving along now at 2 MB/sec.
>  GUI is responsive with an occasional stutter.  It was going through a
> directory structure full of .mp3 and .flac files.    Perhaps the gzip
> algorithm gets hung up in the data patterns they create.
> --
>
Assuming you''re running opensolaris:
pfexec svcadm disable gdm
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/35788037/attachment.html>

Karl Hakimian

2008-Nov-29 18:19 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> Regarding the GUI, I don''t know how to disable it.
> There are no virtual consoles, and unlike older
> versions of SunOS and Solaris, it comes up in XDM
> and there is no [apparent] way to get a shell
> without running gnome.  I am sure that there is, but
> again, I come from the BSD/SunOS/Linux line, and
> have not learned the ins and outs of Nevada/Indiana
> yet. 
While we are veering a little of topic here, you can disable gdm (for indiana)
or cde-login for Nevada to get to the console mode. In Nevada, you can also
select console login from the gui and it will close it down for you to login at
a text console.
> News Flash!  It has come out of it, and is moving
> along now at 2 MB/sec.  GUI is responsive with an
> occasional stutter.  It was going through a directory
> structure full of .mp3 and .flac files.    Perhaps
> the gzip algorithm gets hung up in the data patterns
> they create.
For a few more details, the test I was doing was attempting to copy around 1TB
of data via ssh and zfs send | zfs receive. My data was all pretty compressible,
no jpegs, mp3s etc. Lots of source and log files.
-- 
This message posted from opensolaris.org

Karl Hakimian

2008-Nov-29 18:23 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> So you had a similar experience to what I had with an
> 800 MHz P3 and 768MB, all the way down to totally
> unresponsive.  Probably 5 or 6 x the CPU speed
> (assuming single core) and 5 x the memory.  This can
> only be a real design problem or bug, not just
> expected performance. 
Our test machine was actually a dual core. I was pretty surprised at the
results.

Keep in mind, we did our tests around a year ago and things have changed. We
have not re-visited gzip software compression since, so I do not know how things
behave today.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 18:24 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Servo / mg,

I *have* noticed these effects on my system with lzjb, but they are minor. 
Things are a little grainy, not smooth.

Eliminating the algorithm that exposes the shortfall in how the compression is
integrated into the system does not change the shortfall (See opensolaris.com
bug 5483).  My low-end system resulted in my stress test being extra stressful. 
Perhaps that is a good thing for exposing problems (Although frustrating for
me)!

What I do not understand is why things get better and worse by orders of
magnitude vs. being a relatively steady drain.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 18:30 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

relling,  Thank you, you gave me several things to look at.  The one thing that
sticks out for me is that I don''t see why you listed IDE.  Compared to
all of the other factors, it is not the bottleneck by a long shot even if it is
a slow transfer rate (33MB/Sec) by todays standards.  What don''t I
know?
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 18:37 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

bfriesen,

Ultimately stuff flows in and stuff flows out.  Data is not reused, so a cache
does not do anything for us.  As a buffer, it is simply a rubber band, a FIFO. 
So if the client wrote something real quick it would complete quickly.  But if
it is writing an unlimited amount of data (like 200GB) without reading anything,
it all simply flows through the buffer.  Whether the buffer is 128MB or 4GB,
once the buffer is full the client will have to wait until something flows out
to the disk.  So the system runs at the speed of the slowest component.  If
accesses are done only once, caches don''t help.  A buffer helps only to
smooth out localized chunkyness.

Regarding the NVRAM discussion, what does this have to do with my situation with
rotating magnetic disks with tiny 8MB embedded volatile caches?  The behavior of
disks or storage subsystems with NVRAM are not pertinent to my situation!  Or do
I have something backwards?
-- 
This message posted from opensolaris.org

Tim

2008-Nov-29 18:39 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 12:30 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> relling,  Thank you, you gave me several things to look at.  The one thing
> that sticks out for me is that I don''t see why you listed IDE. 
Compared to
> all of the other factors, it is not the bottleneck by a long shot even if
it
> is a slow transfer rate (33MB/Sec) by todays standards.  What
don''t I know?
>

Slow transfers over NFS != slow transfers to and from the disk.  Have you
done a zpool iostat to see what kind of traffic is actually going to and
from disk?  If you''ve got both drives hanging off a single IDE bus,
that can
further hurt performance.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/67834739/attachment.html>

Miles Nordin

2008-Nov-29 18:44 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

>>>>> "t" == Tim  <tim at tcsac.net> writes:
>>>>> "a" == andrew  <andrum04 at gmail.com>
writes:
>>>>> "re" == Richard Elling <Richard.Elling at
Sun.COM> writes:
     t> ZFS uses ram, and plenty of it.  That''s the nature of COW.

     a> Running ZFS stress tests on a system with only
     a> 768MB of memory is not a good idea since ZFS uses large
     a> amounts of memory for its cache.

He''s watching for memory pressure with top and vmstat and not seeing
any.

    re> 800 MHz P3 + 768 MBytes of RAM + IDE + ZFS + gzip-9 + NFS     re>
pain I''m not sure I could dream of a worse combination for
    re> performance.  I''m actually surprised it takes 12 hours to
    re> crater

3MB/sec is indeed the expected lousy performance.  He can''t even keep
fastethernet full.  But he''s not complaining about that part of the
test!  He''s complaining about later when it goes down to 100kB/s.

Ray, is there anything in dmesg or ''zpool status''?  Maybe one
of the
disks is going bad?  but since Karl also says ``for us, the machine
became increasing unresponsive until it was indistinguishable from a
complete lockup,'''' yeah it sounds like a zfs-gzip bug to me,
too.
If you agree maybe one of you should file it?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/6f016394/attachment.bin>

Bob Friesenhahn

2008-Nov-29 19:02 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Ray Clark wrote:
> through the buffer.  Whether the buffer is 128MB or 4GB, once the 
> buffer is full the client will have to wait until something flows 
> out to the disk.  So the system runs at the speed of the slowest 
> component.  If accesses are done only once, caches don''t help.  A 
> buffer helps only to smooth out localized chunkyness.
You are wrong in assuming that a "write only" situation does not 
benefit immensely from caching.  ZFS tries to write data 128K at a 
time.  Without caching, the user data aggregation into 128K does not 
work well when NFS provides only 8-32K at a time.  ZFS likes to buffer 
up a number of blocks (if allowed) and then write them to disk in 
optimum order. Also, the filesystem metadata and structures need to be 
cached or else ZFS needs to continually go to disk in order to 
re-obtain this information that it otherwise would already have in 
RAM.  Since disk access is 10,000 times slower than RAM, having to go 
to the disk even one more time is a *huge* performance loss.

Since you have very little RAM, it is is quite likely that your kernel 
data memory is becoming "fragmented" so that acquiring and freeing 
memory is less optimum than normal.

Regardless, ZFS is a very memory-oriented filesystem implementation 
which requires more RAM than most other filesystems.  Good old UFS 
uses less RAM.
> Regarding the NVRAM discussion, what does this have to do with my 
> situation with rotating magnetic disks with tiny 8MB embedded 
> volatile caches?  The behavior of disks or storage subsystems with 
> NVRAM are not pertinent to my situation!  Or do I have something 
> backwards? -- This message posted from opensolaris.org
I am not sure why you brought up NVRAM if it was not pertinent to your 
situation. :-)

Bob
==== 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ray Clark

2008-Nov-29 19:17 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

zpool status -v says "No known data errors" for both the root rpool
(separate non-mirrored 80GB drive) and my pool (mirrored 200GB drives).

It is getting very marginal (sluggish/unresponsive) again.  Interesting, top
shows 20~30% cpu idle with most of remainder kernel.  I wonder if everything is
counted?  Linux top definitely does not show everything... I suspected at one
point that it did not count time in interrupt servicing.  Does Solaris?  (Off
topic I guess).

Free memory runnint 150~175MB.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 19:29 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

bfriesen,

Andrew brought up NVRAM by refering me to the following link:

Also, NFS to ZFS filesystems will run slowly under certain conditions -including
with the default configuration. See this link for more information:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

This section discusses exclusively how ZFS cache flushes, which can be triggered
by NFS requests or policies, interacts with NVRAMs unproductively, and how the
flushes can be controlled to improve performance.  Since the NVRAMs are Non
Volatile, the flushes are not necessary to preserve data integrity anyway.

Not worth tracing the chain to see how you and I got tangled in this.  One of us
make an inappropriate association or didn''t follow a sub-thread.  Sorry
for the confusion.
-----------

Regarding the cache, right now there is 150MB of free memory not being used by
ANYBODY, so I don''t think there is a shortage of memory for the ZFS
cache... and 150MB >> 128K, or even a whole slew of 128K blocks.  Also,
the yellow light that blinks when the disk is accessed is off 90% of the time
minimum.  When it was almost frozen, the disk almost never blinked (one real
quick one every minute or two!)  Nothing is accessing the disk to re-obtain
anything!  Otherwise, yes you would have a good point about re-fetching various
file structure stuff.  (Good thought).

Fragmentation of kernel memory would be a good one.  Wouldn''t it get
fragmented after 6 months or so of everyday use anyway?    It must de-frag
itself somehow.  You bring up an excellent observation.  When it was super-slow,
free RAM was down to 15MB, although that still seems large compared to 32K or
128K blocks.  Remember, the system is not doing ANYTHING else.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 19:33 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

tcook, zpool iostat shows 1.15MB/sec.  Is this averaged since boot, or a recent
running average?

The two drives ARE on a single IDE cable, however again, with a 33MB/sec cable
rate and 8 or 16MB cache in the disk, 3 or 4 MB/sec should be able to time-share
the cable without a significant impact on throughput.
-- 
This message posted from opensolaris.org

Mattias Pantzare

2008-Nov-29 20:02 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> Interestingly, the "size" fields under "top" add up to
950GB without getting to the bottom of the list, yet it
> shows NO swap being used, and 150MB free out of 768 of RAM!  So how can the
size of the existing processes
> exceed the size of the virtual memory in use by a factor of 2, and the size
of total virtual memory by a factor of 1.5?
> This is not the resident size - this is the total size!
Size is how much address space the process has allocated. Part of that
is executables and shared libraries (they are backed by the file, not
by swap). A large portion of that is shared, the same memory is used
by many processes. Processes can also allocate shared memory by other
means.


Memory is not a big problem for ZFS, address space is. You may have to
give the kernel more address space on 32-bit CPUs.

eeprom kernelbase=0x80000000

This will reduce the usable address space of user processes though.

Tim

2008-Nov-29 20:07 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 1:33 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> tcook, zpool iostat shows 1.15MB/sec.  Is this averaged since boot, or a
> recent running average?
>
> The two drives ARE on a single IDE cable, however again, with a 33MB/sec
> cable rate and 8 or 16MB cache in the disk, 3 or 4 MB/sec should be able to
> time-share the cable without a significant impact on throughput.
> --
>
The *rated* was theoretical, and you couldn''t ever achieve anything
remotely
close to it.  Sticking a second drive out there makes it even worse. 
I''d at
the very least dedicate a channel to each disk and just disconnect the cdrom
drive if you have one in the system, or spend 2$ on ebay for a pci add-on
controller.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/db658543/attachment.html>

Bob Friesenhahn

2008-Nov-29 20:37 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Ray Clark wrote:
> Regarding the cache, right now there is 150MB of free memory not 
> being used by ANYBODY, so I don''t think there is a shortage of 
> memory for the ZFS cache... and 150MB >> 128K, or even a whole slew
To be more clear, memory which is claimed to be free is often actually 
still used for caching.  Even if the virtual memory system has not 
mapped a VM page to a process, if a minor page fault occurs (due to an 
access), the data in that seemingly "unused" page may still be 
immediately switched in and used because the VM system tracks where 
the current content of that page came from.  This is primarily the 
case for memory-mapped regions such as ordinary files, shared 
libraries, executable text, or even a video frame buffer.  This is 
pretty much normal operation since when new processes are started, the 
VM maps the existing pages that the new process requires into its 
address space.

It is pretty common for Unix systems to lie about free memory and use 
that free memory for the filesystem cache with the expectation that 
this "free" memory can be freed up for use fast enough that no one 
really notices.

If the critical "working set" of VM pages is larger than available 
memory, then the system will become exceedingly slow.  This is 
indicated by a substantial amount of major page fault activity. 
Since disk is 10,000 times slower than RAM, major page faults can 
really slow things down dramatically.  Imagine what happens if ZFS or 
an often-accessed part of the kernel is not able to fit in available 
RAM.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim

2008-Nov-29 20:42 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 2:37 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
>
> To be more clear, memory which is claimed to be free is often actually
> still used for caching.  Even if the virtual memory system has not
> mapped a VM page to a process, if a minor page fault occurs (due to an
> access), the data in that seemingly "unused" page may still be
> immediately switched in and used because the VM system tracks where
> the current content of that page came from.  This is primarily the
> case for memory-mapped regions such as ordinary files, shared
> libraries, executable text, or even a video frame buffer.  This is
> pretty much normal operation since when new processes are started, the
> VM maps the existing pages that the new process requires into its
> address space.
>
> It is pretty common for Unix systems to lie about free memory and use
> that free memory for the filesystem cache with the expectation that
> this "free" memory can be freed up for use fast enough that no
one
> really notices.
>
> If the critical "working set" of VM pages is larger than
available
> memory, then the system will become exceedingly slow.  This is
> indicated by a substantial amount of major page fault activity.
> Since disk is 10,000 times slower than RAM, major page faults can
> really slow things down dramatically.  Imagine what happens if ZFS or
> an often-accessed part of the kernel is not able to fit in available
> RAM.
>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
>
So as a follow on to this, I guess my question is: Can you shut down the
linux box and throw the ram from it into this box and see what kind of
performance you are getting?  I believe you''ll see far, far better
results
with 1.5G in the system.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/d2aa3d7c/attachment.html>

Mattias Pantzare

2008-Nov-29 20:47 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> If the critical "working set" of VM pages is larger than
available
> memory, then the system will become exceedingly slow.  This is
> indicated by a substantial amount of major page fault activity.
> Since disk is 10,000 times slower than RAM, major page faults can
> really slow things down dramatically.  Imagine what happens if ZFS or
> an often-accessed part of the kernel is not able to fit in available
> RAM.
ZFS and most of the kernel is locked in physical memory. Swap is never
used for ZFS.

In this case (NFS) everything is done in kernel. "working set" can not
be larger than available memory.

Ross

2008-Nov-29 21:16 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

>From personal experience, 3-6MB/s is about what you should expect for NFS if
you''re not using any kind of nvram write cache.  With write cache,
it''s easy to pretty much saturate 100MB/s ethernet.
And as others have said, ZFS needs RAM and plenty of it.  I''d have
thought 2GB would be a sensible minimum.  For our ZFS server we bought 8GB and
that was only ?200 for full ECC Registered memory, and we''re not even
using compression here.

I''m no solaris expert, but your symptoms sound like a classic case of
running low on memory.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 21:19 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Pantzer5:  Thanks for the "top"  "size" explanation.

Re: eeprom kernelbase=0x80000000
So this makes the kernel load at the 2G mark?  What is the default, something
like C00... for 3G?

Are PCI and AGP space in there too, such that kernel space is 4G - (kernelbase +
PCI_Size + AGP_Size) ?  (Shot in the dark)?
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 21:26 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

I get 15 to/from (I don''t remember which) Linux LVM to a USB disk.  It
does seem to saturate there.  I assume due to interrupt service time between
transfers.  I appreciate the contention for the IDE, but in a 3MB/Sec system, I
don''t think that it is my bottleneck, much less in a 100KByte/second
system.  Do you disagree?

I *have* a PCI add on card, which is unplugged to make the system dead-simple
until I figure out why it does not function!

As a side note, most such controllers report themselves as a RAID card or some
such, and Solaris will refuse to talk to them!  The only one I could find that
would work was an IT8212 with an out of production flashchip that ITE supported
an alternate BIOS for.  I went through 3 or 4 different ones before finding it! 
You seem to say it is easy to buy a PCI add-in and have it work under Solaris -
what card are you thinking of, and where did you find it?
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 21:36 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Thanks for the info about "Free Memory".  That also links to another
sub-thread regarding kernel memory space.    If disk files are mapped into
memory space, that would be a reason that the kernel could make use of address
space larger that virtual memory (RAM+Swap).

Regarding showing stuff as Free when it is tracked and may be used, I would
assume though that it would be abandoned if the memory is needed. 
Wouldn''t the fact that it was sitting "Free" indicate that
nothing needed memory?

I also understand "Working set" as a page replacement algorithm, but
that would make the disk light blink!

These are all good things, I just don''t see how they apply to the
current situation, at least given the apparent information from vmstat and top!
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-29 21:47 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

If I shut down the Linux box, I won''t have a host to send stuff to the
Solaris box!

Also, the Solaris box can only support 1024MB.  I did have 1024MB in it at one
time and had essentially the same performance.  I might note that I had the same
problem with 1024MB, albiet with TOP eating memory (opensolaris.com bug 5482)
(up to 417MB at the highest observation).  No wonder it crashed.  Anyway, 1024MB
is not "Far, Far better", it turns out there was no noticeable
difference when I dropped to 768.

Also note that Hikimiam had identical symptoms with a dual core 64 bit AMD and
4G of RAM.
-- 
This message posted from opensolaris.org

Mattias Pantzare

2008-Nov-29 21:56 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 22:19, Ray Clark <webclark at rochester.rr.com>
wrote:> Pantzer5:  Thanks for the "top"  "size" explanation.
>
> Re: eeprom kernelbase=0x80000000
> So this makes the kernel load at the 2G mark?  What is the default,
something like C00... for 3G?
Yes on both questions (i have not checked the hex conversions).

This might not be your problem, but it is easy to test. My symptom was
that zpool scrub made the computer go slower and slower and finally
just stop. But this was a long time ago so this might not be a problem
today.
>
> Are PCI and AGP space in there too, such that kernel space is 4G -
(kernelbase + PCI_Size + AGP_Size) ?  (Shot in the dark)?
No.

This is virtual memory.

The big difference in memory usage between UFS and ZFS is that ZFS
will have all data it caches mapped in the kernel address space. UFS
leaves data unmapped.

Bob Friesenhahn

2008-Nov-29 23:04 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Mattias Pantzare wrote:>
> The big difference in memory usage between UFS and ZFS is that ZFS
> will have all data it caches mapped in the kernel address space. UFS
> leaves data unmapped.
Another big difference I have heard about is that Solaris 10 on x86 
only uses something like 64MB of filesystem caching by default for 
UFS.  This is different than SPARC where the caching is allowed to 
grow.  I am not sure if OpenSolaris maintains this arbitrary limit for 
x86.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim

2008-Nov-29 23:13 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 3:26 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> I get 15 to/from (I don''t remember which) Linux LVM to a USB disk.
It does
> seem to saturate there.  I assume due to interrupt service time between
> transfers.  I appreciate the contention for the IDE, but in a 3MB/Sec
> system, I don''t think that it is my bottleneck, much less in a
> 100KByte/second system.  Do you disagree?
>
> I *have* a PCI add on card, which is unplugged to make the system
> dead-simple until I figure out why it does not function!
>
> As a side note, most such controllers report themselves as a RAID card or
> some such, and Solaris will refuse to talk to them!  The only one I could
> find that would work was an IT8212 with an out of production flashchip that
> ITE supported an alternate BIOS for.  I went through 3 or 4 different ones
> before finding it!  You seem to say it is easy to buy a PCI add-in and have
> it work under Solaris - what card are you thinking of, and where did you
> find it?
>
Every one of the promise IDE (non-raid) cards work just fine.  Worst case
scenario you have to add the device id for the driver to load properly.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/a0ec5ed4/attachment.html>

Tim

2008-Nov-29 23:18 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 3:47 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> If I shut down the Linux box, I won''t have a host to send stuff to
the
> Solaris box!
>
> Also, the Solaris box can only support 1024MB.  I did have 1024MB in it at
> one time and had essentially the same performance.  I might note that I had
> the same problem with 1024MB, albiet with TOP eating memory (
> opensolaris.com bug 5482) (up to 417MB at the highest observation).  No
> wonder it crashed.  Anyway, 1024MB is not "Far, Far better", it
turns out
> there was no noticeable difference when I dropped to 768.
>
> Also note that Hikimiam had identical symptoms with a dual core 64 bit AMD
> and 4G of RAM.
>

He never said they were identical symptoms, he said he had a somewhat
similar experience.  Different kernel, different build of Solaris, different
CPU''s.  You can''t even attempt to say you were hitting the
same issue with
as little information as has been provided.

I''ve got gzip9 running on a dataset right now with a nearly identical
setup
to what he had without issue.  I''d say let''s stop jumping to
conclusions.

As for your 1024 not making a difference, did you turn off the GUI in that
instance and all unnecessary services?  Search the discussion lists and
you''ll find plenty of people who had "no difference"
increasing ram until
they cross the threshold of giving zfs what it needs vs. not for their
workload.  Claiming it''s "only 3MB/sec" and downplaying all
the bad design
decisions you''ve made so far isn''t helping the situation at
all.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/245ccfc8/attachment.html>

Mattias Pantzare

2008-Nov-29 23:44 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, Nov 30, 2008 at 00:04, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Sat, 29 Nov 2008, Mattias Pantzare wrote:
>>
>> The big difference in memory usage between UFS and ZFS is that ZFS
>> will have all data it caches mapped in the kernel address space. UFS
>> leaves data unmapped.
>
> Another big difference I have heard about is that Solaris 10 on x86 only
> uses something like 64MB of filesystem caching by default for UFS.  This is
> different than SPARC where the caching is allowed to grow.  I am not sure
if
> OpenSolaris maintains this arbitrary limit for x86.
That is not true. I doubt that any Solaris version had that type of limit.

Karl Hakimian

2008-Nov-29 23:49 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> I''ve got gzip9 running on a dataset right now
> with a nearly identical setup to what he had without
> issue. I''d say let''s stop jumping to
> conclusions.
I curious if to know if you have ever tried dumping many gigs (100s at one time)
to your setup. Mine seemed fine when writing some files. It never worked when
trying to dump large amounts of data.

If you have successfully dumped many gigs, I''m very interested to know
more about your setup and how it differs from what I was testing with.
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2008-Nov-30 00:10 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, 30 Nov 2008, Mattias Pantzare wrote:>>
>> Another big difference I have heard about is that Solaris 10 on x86
only
>> uses something like 64MB of filesystem caching by default for UFS. 
This is
>> different than SPARC where the caching is allowed to grow.  I am not
sure if
>> OpenSolaris maintains this arbitrary limit for x86.
>
> That is not true. I doubt that any Solaris version had that type of limit.
What is what I heard Jim Mauro tell us.  I recall feeling a bit 
disturbed when I heard it.  If it is true, perhaps it applies only to 
x86 32 bits, which has obvious memory restrictions.  I recall that he 
showed this parameter via DTrace. However on my Solaris 10U5 AMD64 
system I see this limit:

429293568       maximum memory allowed in buffer cache (bufhwm)

which seems much higher than 64MB.  The "Solaris Tuning And Tools" 
book says that by default the buffer cache is allowed to grow to 2% of 
physical memory.

Obtain the value via

   sysdef | grep bufhwm

My 32-bit Belenix system running under VirtualBox with 2GB allocated 
to the VM reports a value of 41,762,816.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ray Clark

2008-Nov-30 00:31 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Tim,

I don''t think we would really disagree if we were in the same room.  I
think in the process of the threaded communication that a few things got
overlooked, or the wrong thing attributed.

You are right that there are many differences.  Some of them are:

- Tests done a year ago, I expect the kernel has had many changes.
- He was moving data via ssh from zfs sed into zfs receive as opposed to my file
operations over NFS.
- My problem seems to occur on incompressible data.  His was all very
compressible.
- He had 5x the CPU x2 and 5x the memory.

Yes, I jumped on what I saw as common symptoms, in hakimian''s words:
"becoming increasing unresponsive until it was indistinguishable from a
complete lockup".  This is similar to my description of "After about
12 hours, the throughput has slowed to a crawl.  The Solaris machine takes a
minute or more to respond to every character typed..." and "disk
throughput is in the range of 100K bytes/second".

I was the one who judged these symptoms to be essentially identical, I did not
say that Hakimian made that statement.  I also pointed out that he was seeing
these "identical" symptoms in a very different environment, which
would be your point.

Regarding my 768 vs. 1024, there were no changes other than the change in
memory.  So whatever else is true, the system had 33% more memory to work with
minimum.  Given that probably a few hundred Meg is needed for a just booted,
idle system, the effective percentage increase in memory for zfs to work with is
in reality higher.  I may not have given in 4GB, but I gave it substantially
more than it had.  It should behave substantially differently if memory is the
limiting factor.  Just because memory is thin does not make it the limiting
factor.  I believe the indications by top and vmstat that there is free memory
(available to be reallocated) that nothing is gobbling up also suggests that
memory is not the limiting factor.

Regarding my design decisions, I did not make bad design decisions.  I have what
I have.  I know it is substandard.

Also you seem to be reacting as though I was complaining about the 3MB/Sec
throughput.  I believe I stated that I understand that there are many
sub-optimal aspects of this system.  However I don''t believe any of
them explain it running fine for a few hours, then slowing down by a factor of
30, for a few hours, then going back up.  I am trying to understand and resolve
the dysfunctional behavior, not the poor but plausible throughput.  In any
system there are many possible bottlenecks, most of which are probably
suboptimal, but it is not productive to focus on the 15MB/Sec links in the chain
when you have a 100KB/Sec problem.  Increasing the 15MB/Sec to 66 or 132MB/Sec
is just not going to have a large effect!

I think/hope I have reconciled our apparent differences.  If not, so be it.  I
do appreciate your suggestions and insights, and they are not lost on me.

--Ray
-- 
This message posted from opensolaris.org

Tim

2008-Nov-30 00:40 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 6:31 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> Tim,
>
> I don''t think we would really disagree if we were in the same
room.  I
> think in the process of the threaded communication that a few things got
> overlooked, or the wrong thing attributed.
>
> You are right that there are many differences.  Some of them are:
>
> - Tests done a year ago, I expect the kernel has had many changes.
> - He was moving data via ssh from zfs sed into zfs receive as opposed to my
> file operations over NFS.
> - My problem seems to occur on incompressible data.  His was all very
> compressible.
> - He had 5x the CPU x2 and 5x the memory.
>
> Yes, I jumped on what I saw as common symptoms, in hakimian''s
words:
> "becoming increasing unresponsive until it was indistinguishable from
a
> complete lockup".  This is similar to my description of "After
about 12
> hours, the throughput has slowed to a crawl.  The Solaris machine takes a
> minute or more to respond to every character typed..." and "disk
throughput
> is in the range of 100K bytes/second".
>
> I was the one who judged these symptoms to be essentially identical, I did
> not say that Hakimian made that statement.  I also pointed out that he was
> seeing these "identical" symptoms in a very different
environment, which
> would be your point.
>
> Regarding my 768 vs. 1024, there were no changes other than the change in
> memory.  So whatever else is true, the system had 33% more memory to work
> with minimum.  Given that probably a few hundred Meg is needed for a just
> booted, idle system, the effective percentage increase in memory for zfs to
> work with is in reality higher.  I may not have given in 4GB, but I gave it
> substantially more than it had.  It should behave substantially differently
> if memory is the limiting factor.  Just because memory is thin does not
make
> it the limiting factor.  I believe the indications by top and vmstat that
> there is free memory (available to be reallocated) that nothing is gobbling
> up also suggests that memory is not the limiting factor.
>
> Regarding my design decisions, I did not make bad design decisions.  I have
> what I have.  I know it is substandard.
>
> Also you seem to be reacting as though I was complaining about the 3MB/Sec
> throughput.  I believe I stated that I understand that there are many
> sub-optimal aspects of this system.  However I don''t believe any
of them
> explain it running fine for a few hours, then slowing down by a factor of
> 30, for a few hours, then going back up.  I am trying to understand and
> resolve the dysfunctional behavior, not the poor but plausible throughput.
>  In any system there are many possible bottlenecks, most of which are
> probably suboptimal, but it is not productive to focus on the 15MB/Sec
links
> in the chain when you have a 100KB/Sec problem.  Increasing the 15MB/Sec to
> 66 or 132MB/Sec is just not going to have a large effect!
>
> I think/hope I have reconciled our apparent differences.  If not, so be it.
>  I do appreciate your suggestions and insights, and they are not lost on
me.
>
> --Ray
> --
>
My point is you''re not looking at the bigger picture.  "Well this
small
portion is working some of the time so it''s ok, and this small portion
is
working some of the time so it''s ok, but when I throw it all together
something isn''t quite right so it must be the software."

Case in point on the memory front:
http://www.opensolaris.org/jive/thread.jspa?messageID=309878

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/160f6c37/attachment.html>

Ray Clark

2008-Nov-30 00:43 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Now that it has come out of its slump, I can watch what it is working on vs.
response.  Whenever it is going through a folder with alot of incompressible
stuff, it gets worse.  .mp3 and .flac are horrible.  .iso images and .gz and
.zip files are bad.  It is sinking again, but still works.  It depends on the
data.

In hindsight, and with the help of this thread, I think I understand.  Yes, it
is a hypothesis, not fact.  Bug 5483 and the reference in there to bug 6586537
explains how the zfs compression task blocks out userland tasks (and probably
all other kernel tasks) by running at the highest kernel priority.  This is a
fact I take it.  The hypothesis part would be that certain data characteristics
(probably higher entropy) results in very tedious, laborious behavior by the
gzip algorithm, or at least the implementation in zfs.  So NOTHING else runs
unless the gzip algorithm has nothing to do, and it takes FOREVER to do its
thing on certain types of data.

All of the free memory discussions will help me to understand the system and how
to get more information, but I don''t see any of the evidence suggesting
that lack of RAM was the reason for throughput to drop to 100KB/Sec.  No doubt
if I address all of these things I can get throughput up from the 3~4 that I was
seeing with compression disabled.

My plan right now is to let it finish (It has someplace around 50GB to go) just
to see it do so without crashing.  I may then do a "diff -r" to see if
the decompression has the same behavior (Glutton for punishment).  Then I will
forget compression and do the exercise without.  Not sure how I will finally be
comfortable to commit all my bits!

This understanding gives me hope that the system will be robust, that my heavy
load is not exposing a critical section of code.  Rather it is a problem that
causes dysfunctional though still correct behavior.  And I know how to avoid it.

If you have more comments, or especially if you think I reached the wrong
conclusion, please do post it.  I will post my continuing results.

Thank you ALL for giving me so much attention and help.  It is good to not be
alone!
-- 
This message posted from opensolaris.org

Mattias Pantzare

2008-Nov-30 00:44 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, Nov 30, 2008 at 01:10, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Sun, 30 Nov 2008, Mattias Pantzare wrote:
>>>
>>> Another big difference I have heard about is that Solaris 10 on x86
only
>>> uses something like 64MB of filesystem caching by default for UFS. 
This
>>> is
>>> different than SPARC where the caching is allowed to grow.  I am
not sure
>>> if
>>> OpenSolaris maintains this arbitrary limit for x86.
>>
>> That is not true. I doubt that any Solaris version had that type of
limit.
>
> What is what I heard Jim Mauro tell us.  I recall feeling a bit disturbed
> when I heard it.  If it is true, perhaps it applies only to x86 32 bits,
> which has obvious memory restrictions.  I recall that he showed this
> parameter via DTrace. However on my Solaris 10U5 AMD64 system I see this
> limit:
>
> 429293568       maximum memory allowed in buffer cache (bufhwm)
>
> which seems much higher than 64MB.  The "Solaris Tuning And
Tools" book says
> that by default the buffer cache is allowed to grow to 2% of physical
> memory.
>
> Obtain the value via
>
>  sysdef | grep bufhwm
>
> My 32-bit Belenix system running under VirtualBox with 2GB allocated to the
> VM reports a value of 41,762,816.
That is only a small part of the cache used for file system metadata.
File data caching  is integrated in the normal memory management.

http://docs.sun.com/app/docs/doc/817-0404/chapter2-37?a=view

Jeff Bonwick

2008-Nov-30 01:03 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> If you have more comments, or especially if you think I reached the wrong
> conclusion, please do post it.  I will post my continuing results.
I think your conclusions are correct.  The main thing you''re seeing is
the combination of gzip-9 being incredibly CPU-intensive with our I/O
pipeline allowing too much of it to be scheduled in parallel.  The latter
is a bug we will fix; the former is the nature of the gzip algorithm.

One other thing you may encounter from time to time is slowdowns due to
kernel VA fragmentation.  The CPU you''re using is 32-bit, so
you''re
running a 32-bit kernel, which has very little KVA.  This tends to be
more of a problem with big-memory machines, however -- e.g. a system
with 8GB running a 32-bit kernel.  With 768MB, you''ll probably be OK,
but it''s something to be aware of on any 32-bit system.  You can tell
if this is affecting you by looking for kernel threads stuck waiting
to allocate a virtual address:

# echo ''::walk thread | ::findstack -v'' | mdb -k | grep
vmem_xalloc

Jeff

Ian Collins

2008-Nov-30 01:12 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ray Clark wrote:> Now that it has come out of its slump, I can watch what it is working on
vs. response.  Whenever it is going through a folder with alot of incompressible
stuff, it gets worse.  .mp3 and .flac are horrible.  .iso images and .gz and
.zip files are bad.  It is sinking again, but still works.  It depends on the
data.
>
>   What did you expect?  A 3GHz Opteron core takes about a minutes to
attempt to compress a 1GB .mkv file.  So your P3 would probably take
between 5 and 10 minutes.  Now move that to the kernel and your system
will crawl.  High gzip compressions are only really feasible on fast
multi-core systems (the compression is threaded).

-- 
Ian.

Ray Clark

2008-Nov-30 01:30 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Tim,

I am trying to look at the whole picture.  I don''t see any unwarranted
assumptions, although I know so little about Solaris and I extrapolated all over
the place based on general knowlege, sort of draping it around and over what you
all said.  I see quite a few misconceptions in the thread you pointed me to
based on lack of understanding of modern systems, both clear ones and
questionable ones.  I suppose I probably have my share of them in here.  Please
refute my defenses as appropriate.

--Ray
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 01:35 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Jeff,

Thank you for weighing in, as well as for the additional insight.  It is good to
have confidence that I am on the right track.

I like your system ... alot.  Got work to do for it to be as slick as a recent
Linux distribution, but you are working on a solid core and just need some
touch-up work.  Thanks.  Hang in there.

--Ray
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 02:29 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ref relling''s 12:00 post:
My system does not have arcstat or nicstat.  But it is the B2 distribution. 
Would I expect these to be in the final distribution, or where do these come
from?
Thanks.
--Ray
-- 
This message posted from opensolaris.org

Tim

2008-Nov-30 02:42 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 8:29 PM, Ray Clark <webclark at
rochester.rr.com>wrote:
> Ref relling''s 12:00 post:
> My system does not have arcstat or nicstat.  But it is the B2 distribution.
>  Would I expect these to be in the final distribution, or where do these
> come from?
> Thanks.
> --Ray
> --
>

I don''t believe either are bundled.  Search google for arcstat.pl and
nicstat.pl

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081129/28546720/attachment.html>

Jim Mauro

2008-Nov-30 02:43 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

For the record, what I said was the on x64, the default size of the UFS
segmap segment, which is the L1 cache for UFS reads and writes, is
64MB. Caches pages will be moved to a cache list in memory if segmap
fills up.

The message I was trying to convey is that the default size of the UFS 
segmap
on x64 (64MB) is generally too small if UFS file IO is a component of your
workload (and 64MB is the default for 64-bit x64).

Check out;
http://www.solarisinternals.com/wiki/index.php/Segmap_tuning

For increasing the size. Note please do NOT use the /etc/system method
of increasing segmapsize on x64 - it will panic your system.

None of this has anything to do with ZFS, which uses a completely different
mechanism for caching (the ZFS ARC).

Thanks,
/jim


> What is what I heard Jim Mauro tell us.  I recall feeling a bit 
> disturbed when I heard it.  If it is true, perhaps it applies only to 
> x86 32 bits, which has obvious memory restrictions.  I recall that he 
> showed this parameter via DTrace. However on my Solaris 10U5 AMD64 
> system I see this limit:
>
> 429293568       maximum memory allowed in buffer cache (bufhwm)
>
> which seems much higher than 64MB.  The "Solaris Tuning And
Tools"
> book says that by default the buffer cache is allowed to grow to 2% of 
> physical memory.
>
> Obtain the value via
>
>    sysdef | grep bufhwm
>
> My 32-bit Belenix system running under VirtualBox with 2GB allocated 
> to the VM reports a value of 41,762,816.
>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Bob Friesenhahn

2008-Nov-30 02:59 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, 30 Nov 2008, Ian Collins wrote:
> What did you expect?  A 3GHz Opteron core takes about a minutes to
> attempt to compress a 1GB .mkv file.  So your P3 would probably take
> between 5 and 10 minutes.  Now move that to the kernel and your system
> will crawl.  High gzip compressions are only really feasible on fast
> multi-core systems (the compression is threaded).
The gzip manual pages says that the default compression level for gzip 
is -6.  Experimentation will show that the compression ratio does not 
increase much at -9 so it is not worth it when you are short on time 
or CPU.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Chris Ridd

2008-Nov-30 08:07 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On 30 Nov 2008, at 02:59, Bob Friesenhahn wrote:
> On Sun, 30 Nov 2008, Ian Collins wrote:
>
>> What did you expect?  A 3GHz Opteron core takes about a minutes to
>> attempt to compress a 1GB .mkv file.  So your P3 would probably take
>> between 5 and 10 minutes.  Now move that to the kernel and your  
>> system
>> will crawl.  High gzip compressions are only really feasible on fast
>> multi-core systems (the compression is threaded).
>
> The gzip manual pages says that the default compression level for gzip
> is -6.  Experimentation will show that the compression ratio does not
> increase much at -9 so it is not worth it when you are short on time
> or CPU.
Would it also help if the blocksize were reduced down from the default  
(128K?) in the filesystem with gzip compression?

It feels like it might - there''d be more (and smaller) blocks being  
compressed, so more chance of other things being able to happen in  
between blocks.

I stress this is a WAG, but it is an easy variable to alter.

Cheers,

Chris

Ian Collins

2008-Nov-30 08:25 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Chris Ridd wrote:> On 30 Nov 2008, at 02:59, Bob Friesenhahn wrote:
>
>   
>> On Sun, 30 Nov 2008, Ian Collins wrote:
>>
>>     
>>> What did you expect?  A 3GHz Opteron core takes about a minutes to
>>> attempt to compress a 1GB .mkv file.  So your P3 would probably
take
>>> between 5 and 10 minutes.  Now move that to the kernel and your  
>>> system
>>> will crawl.  High gzip compressions are only really feasible on
fast
>>> multi-core systems (the compression is threaded).
>>>       
>> The gzip manual pages says that the default compression level for gzip
>> is -6.  Experimentation will show that the compression ratio does not
>> increase much at -9 so it is not worth it when you are short on time
>> or CPU.
>>     
>
> Would it also help if the blocksize were reduced down from the default  
> (128K?) in the filesystem with gzip compression?
>
> It feels like it might - there''d be more (and smaller) blocks
being
> compressed, so more chance of other things being able to happen in  
> between blocks.
>
>   Maybe not, there''s be more starting and stopping going on. 
I''d expect
there''s an overhead doing that, rather than compressing more data in
one go.

gzip compression works a lot better now the compression is threaded. 
It''s a shame userland gzip isn''t!

-- 
Ian.

Ray Clark

2008-Nov-30 09:14 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Andewk8 at 11:39 on 11/29 said:

Solaris reports "virtual memory" as the sum of physical memory and
page file - so this is where your strange vmstat output comes from. Running ZFS
stress tests on a system with only 768MB of memory is not a good idea since ZFS
uses large amounts of memory for its cache.

VMstat is now jumping around between 493xxx and 527xxx fee while top is
reporting a solid, unchanging "509M" "free swap" (Both at 1
second updates).

The explanation would only explain a presentation of "swap memory"
using the "virtual memory" definition being larger than a "free
swap" definition that includes only swap.  Here we have had it larger, and
now it is smaller!  For whatever it is worth, top continues to display a rock
solid "509MB" free, never changing for "free swap".

Anybody know?
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 09:19 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

I think Chris has the right idea.  This would give more little opportunities for
user processes to get a word in edgewise.  Since the blocks are *obviously*
taking a LONG time, this would not be a big hit efficiency in the bogged-down
condition.  It would however increase overhead in the well-behaved case.  I
think the real answer is making the compression thread task lower and dynamic. 
I like the "80% cap unless there are no competing processes, in which case
100%" suggestion.  If the compression thread backs up, I would *assume*
that there is a queue that would back up and block the process adding to it,
regulating the whole process.  The whole machine would get 5x slower than
normal, but everything would continue to work.

--Ray
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 09:21 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Re:  gzip compression works a lot better now the compression is threaded.
It''s a shame userland gzip isn''t!

---
What does "now than" mean?  I assume you mean the zfs / kernel gzip
(right?) at some point became threaded.  Is this in the past, or in a kernel
post 208.11 B2?

--Ray
-- 
This message posted from opensolaris.org

Ian Collins

2008-Nov-30 09:24 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ray Clark wrote:

[some context would be nice for mail list subscribers]
> I think Chris has the right idea.  This would give more little
opportunities for user processes to get a word in edgewise.  Since the blocks
are *obviously* taking a LONG time, this would not be a big hit efficiency in
the bogged-down condition.
I still think you are expecting too much of a P3 system with limited
RAM.  I chose not to use gzip (default compression) on a max''d out
x4540
because it slowed down zfs receive too much.

-- 
Ian.

Ray Clark

2008-Nov-30 09:25 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Re:   Experimentation will show that the compression ratio does not
increase much at -9 so it is not worth it when you are short on time
or CPU.
---
Yes, and a large part of my experiment was to understand the cost (time) vs.
compression ratio curve.  lj?? only gave me 7%, which to me is not worth goofing
with.  I was curious what gzip-9 would do, and how it impacted performance.  I
guess I have one of those points, but my graph paper is not large enough!

--Ray
-- 
This message posted from opensolaris.org

Ian Collins

2008-Nov-30 09:27 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ray Clark wrote:> Re:  gzip compression works a lot better now the compression is threaded.
> It''s a shame userland gzip isn''t!
>
> ---
> What does "now than" mean?  I didn''t type that.
> I assume you mean the zfs / kernel gzip (right?) at some point became
threaded.  Is this in the past, or in a kernel post 208.11 B2?
>
>   Some time in the past year (I started a thread "gzip compression
throttles system?" back in March last year).

-- 
Ian.

Ray Clark

2008-Nov-30 09:28 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Re: I don''t believe either are bundled.  Search google for arcstat.pl
and nicstat.pl

--
Thanks.  On a related note, there are at leaset 2 or 3 places proclaiming to
provide Solaris 10 packages.  How do I know which ones are "safe",
both in terms of quality, and in terms of adhering to a consistent structure?  I
would prefer to do without than mess things up.  The last thing I need is
trouble!

--Ray
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 09:49 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

I just want to interject here that I think, if memory serves me correctly, that
SPARC has been 64 bit for 10~15 years, and so had LOTS of address space to map
stuff.  x86 brought a new restriction.

Regarding the practice of mapping files etc. into virtual memory that does not
exist, now I understand why a 32 bit address space is viewed as restrictive. 
This is a powerful technique.  I would be interested in understanding how it is
done though... it somehow ties a file reference (inode?  name?) to an address
range.  I assume when the range is accessed (since it does not exist) that a
page fault is generated to fullfill the request, which then (for this to make
sense) must have a short-circuit map to the disk blocks, which I assume would go
through some disk cache in case they are in memory somewhere, else generate an
IO request to disk...  but what if the file was written, and so moved?  Where
would I read more about what is REALLY going on and how it works?

Thanks,
--Ray
-- 
This message posted from opensolaris.org

Ross

2008-Nov-30 18:35 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

I''d agree there Ian, but this still sounds like a good idea to me.  I
don''t like any system becoming unresponsive, and this sounds like a
good general purpose tweak to help Solaris cope with worst case scenarios.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 19:46 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> > I think Chris has the right idea. This would give more little
opportunities for user
> > processes to get a word in edgewise. Since the blocks are *obviously*
taking a
> > LONG time, this would not be a big hit efficiency in the bogged-down
condition.
> I still think you are expecting too much of a P3 system with limited
> RAM. I chose not to use gzip (default compression) on a max''d out
x4540
> because it slowed down zfs receive too much.
---
This is not about getting my P3 to do gzip-9 at 100Mbit wire speeds.  I know
that is not going to happen.

This is about not having kernel threads completely lock out user (and other
kernel) processes for undesirable lengths of time.  It is about improving
Solaris.  It is about having more appropriate CPU sharing between all of the
threads in the system, kernel and user. This is the root cause of the
pathological behavior I stumbled on.

To clarify, (1) This started as an experiment to see what compression ratio
would result, (2) To see what the performance hit would be, and (3) To stress
the system severely to expose problems such as exposed critical sections of
code, race conditions, etc. to give myself confidence in using 2008.11.  I did
not expect to find that it performed well.  I did not expect to decide to use
gzip-9 on this machine.

The experiment / exercise turned into a concern regarding the reliability of
Solaris and ZFS as a platform based on the gradual depredation to 100KB/Sec and
completely unresponsive console (I understated it, at times it took 10-20
minutes to respond).  That triggered this thread.

This thread is NOT about throughput of a gzip-9 zfs system.  It is about a
Solaris ZFS system becoming completely, 99.999% unresponsive, indistinguishable
from crashed.  No doubt I will put some effort into seeing if I can boost
throughput a little, but right now my primary concern is that it WORKS.

This discussion has served to enable me to go away with confidence in Solaris
and ZFS despite the pathological behavior of the gzip-9 algorithm and its
interaction with the ZFS thread scheduling.  The copy completed successfully
last night.  (1) It still functions correctly even with the problems, and I will
not loose data.  It is NOT a code correctness problem that could under the right
conditions and random chance result in data loss even without gzip.   (2) I can
completely avoid it by not doing compression, especially gzip-9 compression.  It
is also comforting to know that the pathological behavior will be eliminated by
an improvement in zfs thread scheduling.  This will leave only the intrinsic
poor performance of gzip-9.

I do expect (Though many I gather will disagree) that I will have a reliable,
predictable, serviceable if low-performance Solaris/ZFS file server based on an
800MHz P3 with 768MB of memory, without compression.  I can deal with slow, I
can''t deal with crashed or data loss.  I don''t think that is
an unreasonable expectation.

The discussion of how to improve the zfs kernel thread''s scheduling I
believe has value regardless of gzip-9.  It is a latent problem, a poor design
the way it is now.  Jeff has said that it will be fixed.

The dead-idle system running gnome is a little jerky vs. smooth as silk, I
expect due to the same root-case.  This will be good to fix, as it gives a
pretty bad impression of Solaris when Linux can run silky-smooth and responsive.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Nov-30 20:07 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Unless something new develops, from my perspective this thread has served its
purpose.  I don''t want to drag it out and waste people''s time.
If any "late readers" have insights that should be captured for the
archive please add them.

Thank you all VERY much for the discussion, insights, suggestions, and
especially the responsiveness.

As you might gather from my several mentions of Linux, I have been using Linux
for almost 10 years now and on balance am very happy.  But when there is a
problem, there are usually very few if any responses, they come over several
days or longer, and only very rarely is any light shed.

The suggestions regarding running a file server on a low-end machine will be
taken to heart also.  I like this machine because it has ECC memory, but in time
I expect to break down and use a faster machine without ECC.  Too bad Intel does
not provide ECC in their desktop grade chipsets any more.

With best regards,
Ray
-- 
This message posted from opensolaris.org

Miles Nordin

2008-Nov-30 21:21 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

>>>>> "t" == Tim <tim at tcsac.net> writes:
>>>>> "ic" == Ian Collins <ian at ianshome.com>
writes:
>>>>> "mp" == Mattias Pantzare <pantzer at
ludd.ltu.se> writes:
t> My point is you''re not looking at the bigger picture.

And you''re not hearing this:

* 100KByte/s

* sudden slowdown 30x after 12 hours

* no memory pressure in vmstat or top

I''m not so interested in knowing what''s for sale on newegg
this month.
If you''ve experience with ZFS systems with limited ARC space taking
minutes to respond to keypresses and after 12 hours of
``normal''''
albeit expectedly slow performance, suddenly slowing down 30x to
slower-than-DSL write speeds and minutes to respond to a keypress, and
the problems were fixed by adding memory or changing to more
fashionable disk interfaces, please share them because it would be new
to me, and would need some explaining, too.

mp> My symptom was that zpool scrub made the computer go slower
mp> and slower and finally just stop. But this was a long time ago
mp> so this might not be a problem today.

I experienced that, too, on 1GB RAM SPARC system. so maybe I lied by
saying ``new to me'''' though it was fixed by upgrading not
adding more
RAM. Maybe this problem was 6355416.

Ray, you might see if it''s similar to Karl''s bug by repeating
Karl''s
test: run the same setup without gzip and see if that makes it work
fine for days with no sudden slowdown. That should eliminate bugs in
the PATA and network drivers, too. Or have you already tried that,
and I''m not listening either?

ic> What did you expect? A 3GHz Opteron core takes about a
ic> minutes to attempt to compress a 1GB .mkv file. So your P3
ic> would probably take between 5 and 10 minutes.

10min for 1GB is 1.6MByte/s, 16x faster than what he''s seeing.
You''re
in the ballpark for his ``working properly''''
figures---1.6MByte/s and
3Mbyte/s are similar---but you still haven''t explained the slowdown to
100kByte/s.

Is gzip really 30x slower with uncompressable data? Is -9 making it
behave pathologically on uncompressable data? Neither has been my
experience with userspace gzip---in fact, in the quick test I just
did, ''gzip -9'' is *faster* than ''gzip'' with
/dev/urandom data (but
slower with compressable data).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081130/b5e2b256/attachment.bin>

Karl Hakimian

2008-Nov-30 22:38 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> This is about not having kernel threads completely
> lock out user (and other kernel) processes for
> undesirable lengths of time.  It is about improving
> Solaris.  It is about having more appropriate CPU
> sharing between all of the threads in the system,
> kernel and user. This is the root cause of the
> pathological behavior I stumbled on.
Well said. When I ran into this problem a year ago (on a machine that should
have been up to the task) I just chalked it up to gzip being a fairly new
feature with bugs to work out. I figured I''d try again later. I
don''t think I knew about these forums at the time, or I might have
posted.

I think anyone with any kind of understanding of what gzip is would expect a
slow down when using it (especially gzip 9). What was seen was beyond what was
expected. Trying to tune the system to address the issue did not occur to me any
more than trying to get a 600lb guy to do wind sprints to prepare him for a
marathon later in the week. I figured a larger underlying problem needed to be
addressed first.

I''m looking forward to seeing what fixes come out of this discussion.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Dec-01 01:53 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

This has been an evolution, see defect.opensolaris.com''s 5482 memory
leak with top.  I have not run anything bug gzip-9 since fixing that by only
running top in one-time mode.

I will start the same copy with compress=off in about 45 minutes (Got to go do
an errand now).  Glad to run tests.

--Ray
-- 
This message posted from opensolaris.org

Roch

2008-Dec-01 08:46 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Tim writes:
 > On Sat, Nov 29, 2008 at 11:06 AM, Ray Clark <webclark at
rochester.rr.com>wrote:
 > 
 > > Please help me understand what you mean.  There is a big difference
between
 > > being unacceptably slow and not working correctly, or between being
 > > unacceptably slow and having an implementation problem that causes it
to
 > > eventually stop.  I expect it to be slow, but I expect it to work. 
Are you
 > > saying that you found that it did not function correctly, or that it
was too
 > > slow for your purposes?  Thanks for your insights!  (3x would be
awesome).
 > > --
 > >
 > >
 > 
 > I expect it will go SO SLOW, that some function somewhere is eventually
 > going to fail/timeout.  That system is barely usable WITHOUT compression. 
I
 > hope at the very least you''re disabling every single unnecessary
service
 > before doing any testing, especially the GUI.
 > 
 > ZFS uses ram, and plenty of it.  That''s the nature of COW. 
Enabling
 > realtime compression with an 800mhz p3?  Kiss any performance, however
poor
 > it was, goodbye.
 > 
 > --Tim

Hi Tim,

Let me highjack this thread to comment on the RAM
usage. It''s a misconception to blame ram usage on COW.

As been stated in this threads, ZFS will need Address Space
in the kernel in order to maintain it''s cache. But the cache
is designed to grow and shrink according to memory demand.

The amount memory that ZFS really _needs_ is the amount of
dirty data per transaction group. Today the code is in place
to limit that to 10 seconds worth of I/O. So this should be
very reasonable usage in most cases.

-r

Karl Rossing

2008-Dec-01 15:50 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Could zfs be configured to use gzip-9 to compress small files or when 
the system is idle..

When the system is busy or is handling a large file use lzjb.

Busy/Idle and large/small files would need to be defined somewhere.

Alternatively, write the file out using lzjb if the system is busy and 
go back and gzip-9 it when the system is idle or less busy.

I''m not familiar with fs design. There are probably compelling
technical
and compliance reasons not to do any of my suggestions.

Karl












CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.

Bob Friesenhahn

2008-Dec-01 18:03 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Mon, 1 Dec 2008, Karl Rossing wrote:>
> I''m not familiar with fs design. There are probably compelling
technical
> and compliance reasons not to do any of my suggestions.
Due to ZFS COW design, each re-compression requires allocation of a 
new block.  This has implications when snapshots and clones are 
involved.  There could be huge wasted disk space or else all the 
snapshots/clones would need to be updated somehow to use the new 
"existing" blocks.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ray Clark

2008-Dec-01 23:26 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

It completed copying 191,xxx MB without issue in 17 hours and 40 minutes,
average transfer rate of 3.0MB/Sec.  During the copy (At least the first hour or
so, and an hour in the middle), the machine was reasonably responsive.  It was
jerky to a greater or lesser extent, but nothing like even the best times with
gzip-9.  Not sure how to convey it.  The machine was usable.

It was stopped by running out of disk space.  The source was about 1GB larger
than the target zfs file system.  (When I started this exercise I had an IT8212
PCI PATA card in the system for a another pair of drives for the pool, and took
it out to eliminate a potential cause of my troubles).

Interestingly before I started I had to reboot, as there was a
"trashapplett" eating 100% of the CPU, 60% user, 40% system.  Note
that I have not made, much less deleted any files with gnome, nor put any in my
home directory.  I don''t even know how to do these things, as I am a
KDE man.  All I have done is futz with this zfs in a separate pool and type at
terminal windows.  Can''t imagine what trashapplett was doing with 100%
of the CPU for an extended time without any files to manage!

Something I have not mentioned is that the fourth memory socket was worn out a
few years ago testing memory, this is why I only have 768 installed (The bottom
3 have not been abused and are fine).  My next move is to trade the motherboard
for one in good shape so I can put in all 1024MB, plug in the IT8212 with a
couple of 160GB disks to get my pool up to 360GB, and install RC2...

But it looks like 2008.11 has been released!  The mirrors still have 2008.05,
but the main link goes to osol-0811.iso!  Is that final, not an RC?

I will be beating on it to gain confidence and learn about Solaris.  If anyone
wants me to run any other tests, let me know.  Thanks (again) for all of your
help.
-- 
This message posted from opensolaris.org

Ray Clark

2008-Dec-01 23:49 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Re pantzer5''s suggestion:  

Memory is not a big problem for ZFS, address space is. You may have to
give the kernel more address space on 32-bit CPUs.

eeprom kernelbase=0x80000000

This will reduce the usable address space of user processes though.

---
Would you please verify that I understand correctly.  I am extrapolating here
based on general knowledge:

During a running user process, the process has the entire lower part of the
address space below the kernel.  The kernel is loaded at kernelbase, and has
from there to the top (2**32-1) to use for its purposes.  Evidently it is
relocatable or position independent.

The positioning of kernelbase really has nothing to do with how much physical
RAM I have, since the user memory and perhaps some of the kernel memory is
virtual (paged).  So the fact that I have 768MB does enter into this decision
directly (It does indirectly per Jeff''s note implying that kernel
structures need to be larger with larger RAM, makes sense, more to keep track
of, more page tables).

By default kernelbase is set at 3G, so presumably the kernel needs a minimum of
1G space.

Every userland process gets the full virtual space from 0 to kernelbase-1.  So
unless I am going to run a process that needs more than 1G, there is no
advantage in setting kernelbase to something larger than 1G, etc.  Even if
physical RAM is larger.

If I am not going to run virtual machines, or edit enormous video or audio or
image files in RAM, I really have no use for userland address space, and giving
alot to the kernel can only help it to have things mapped rather than having to
recreate create information (Although I don''t have a good handle on the
utility of address space without a storage mechanism like RAM or Disk behind
it...must be something akin to a pagefault with pages mapped to a disk file so
you don''t have to walk the file hierarchy).

Hence your suggestion to set kernelbase to 2G.  But 1G is probably fine too
(Although the incremental benefit may be negligible - I am going for the
principle here).

How am I doing?
-- 
This message posted from opensolaris.org

Quentin Neill

2009-Feb-10 21:46 UTC

head link

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

> tcook,
> ... 
> Regarding the GUI, I don''t know how to disable it.
> There are no virtual consoles, and unlike older
> versions of SunOS and Solaris, it comes up in XDM
> and there is no [apparent] way to get a shell
> without running gnome.  I am sure that there is, but
> again, I come from the BSD/SunOS/Linux line, and
> have not learned the ins and outs of Nevada/Indiana
> yet. 
I had problems with turning off XDM as well, I tried changing the default
milestone and ended up unable to login using the GUI (hung on the BIOS splash
screen).  Fortunately, the grub installed by default had a text login entry, but
it had problems with the TTY (stty broken, no line-feeds - hardware/BIOS
issue?).  I was able to enable sshd and avoided a reinstall, but I''m at
a loss how to fix the problem I created.

FWIW, there is a virtual console project that has a working prototype; a
colleague had initial success with it out of the box.  See
http://opensolaris.org/os/project/vconsole/
-- 
Quentin
-- 
This message posted from opensolaris.org

Possibly Parallel Threads

Search for more seemingly similar threads

zfs discuss - Nov 2008 - Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Possibly Parallel Threads