thr3ads.net - zfs discuss - [zfs-discuss] ZFS jammed while busy [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Arthur Person

2008-Apr-22 15:24 UTC

[zfs-discuss] ZFS jammed while busy

Hi...

Here''s my system:

    2 Intel 3 Ghz 5160 dual-core cpu''s
   10 SATA 750 GB disks running as a ZFS RAIDZ2 pool
    8 GB Memory
      SunOS 5.11 snv_79a on a separate UFS mirror
      ZFS pool version 10
      No separate ZIL or ARC cache

I ran into a problem today where the ZFS pool jammed for an extended 
period of time.  During that time, it seemed read-bound doing only read 
I/O''s (as observed with "zpool iostat 1") and I saw 100%
misses while
running arcstat.pl (for "miss%", "dm%", "pm%" and
"mm%").  Processes
accessing the pool were jammed, including remote NFS mounts.  At the time, 
I was: 1) running a scrub, 2) writing 10''s of MB/sec of data onto the
pool
as well as reading from the pool, 3) was deleting a large number of files 
on the pool.  I tried killing one of the jammed "rm" processes and it 
eventually died.  The # of misses seen in arcstat.pl eventually dropped 
back down to the 20-40% range ("miss%").  A while later, writes began 
occuring to the pool again and remote NFS access also freed up and overall 
system behaviour seemed to normalize.  This all occurred over the course 
of approximately an hour.

Does this kind of problem sound familiar to anyone?  Is it a ZFS problem, 
or have I hit some sort of ZFS load maximum and this is the response? 
Any suggestions for ways to avoid this are welcome...

                         Thanks...

                           Art

Arthur A. Person
Research Assistant, System Administrator
Penn State Department of Meteorology
email:  person at meteo.psu.edu, phone:  814-863-1563

Scott

2008-May-04 11:20 UTC

head link

[zfs-discuss] ZFS jammed while busy

> Hi...
> 
> Here''s my system:
> 
>     2 Intel 3 Ghz 5160 dual-core cpu''s
> 0 SATA 750 GB disks running as a ZFS RAIDZ2 pool
>     8 GB Memory
>   SunOS 5.11 snv_79a on a separate UFS mirror
>     ZFS pool version 10
>   No separate ZIL or ARC cache
> ran into a problem today where the ZFS pool jammed
>  for an extended 
> eriod of time.  During that time, it seemed
> read-bound doing only read 
> I/O''s (as observed with "zpool iostat 1") and I saw
> 100% misses while 
> running arcstat.pl (for "miss%", "dm%", "pm%"
and
> "mm%").  Processes 
> accessing the pool were jammed, including remote NFS
> mounts.  At the time, 
> I was: 1) running a scrub, 2) writing 10''s of MB/sec
> of data onto the pool 
> as well as reading from the pool, 3) was deleting a
> large number of files 
> on the pool.  I tried killing one of the jammed "rm"
> processes and it 
> eventually died.  The # of misses seen in arcstat.pl
> eventually dropped 
> back down to the 20-40% range ("miss%").  A while
> later, writes began 
> occuring to the pool again and remote NFS access also
> freed up and overall 
> system behaviour seemed to normalize.  This all
> occurred over the course 
> of approximately an hour.
> 
> Does this kind of problem sound familiar to anyone?
>  Is it a ZFS problem, 
> r have I hit some sort of ZFS load maximum and this
> is the response? 
> Any suggestions for ways to avoid this are welcome...
> 
>                          Thanks...
>    Art
> thur A. Person
> Research Assistant, System Administrator
> Penn State Department of Meteorology
> email:  person at meteo.psu.edu, phone:  814-863-1563
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
Hi Art,

I have seen a similar problem that is happening on several servers since a
recent upgrade from b70 to b86/b87.  For no obvious reason, the servers will
stop writing to the pool for long periods of time.  Watching a "zpool
iostat", I can see that 0 writes are being done for up to a minute at a
time.  Meanwhile, a large amount of small (~3K) reads are happening.  The
servers behave like this for an hour or more at a time.

The server configuration is:
Dual-core Opteron 2212HE
4GB ECC DDR2 RAM
15 1TB SATA drives in a RAID-Z2 pool
2 Supermicro SAT2-MV8 controllers
SunOS 5.11 snv_86
UFS root and swap are on their own disk

Have you made any progress with this problem? Has anyone else seen this
behavior?
 
 
This message posted from opensolaris.org

ap60 at meteo.psu.edu

2008-May-14 17:59 UTC

head link

[zfs-discuss] ZFS jammed while busy

Scott,

On Sun, 4 May 2008, Scott wrote:
>> Hi...
>>
>> Here''s my system:
>>
>>     2 Intel 3 Ghz 5160 dual-core cpu''s
>> 10 SATA 750 GB disks running as a ZFS RAIDZ2 pool
>>     8 GB Memory
>>   SunOS 5.11 snv_79a on a separate UFS mirror
>>     ZFS pool version 10
>>   No separate ZIL or ARC cache
>> ran into a problem today where the ZFS pool jammed
>>  for an extended
>> eriod of time.  During that time, it seemed
>> read-bound doing only read
>> I/O''s (as observed with "zpool iostat 1") and I saw
>> 100% misses while
>> running arcstat.pl (for "miss%", "dm%",
"pm%" and
>> "mm%").  Processes
>> accessing the pool were jammed, including remote NFS
>> mounts.  At the time,
>> I was: 1) running a scrub, 2) writing 10''s of MB/sec
>> of data onto the pool
>> as well as reading from the pool, 3) was deleting a
>> large number of files
>> on the pool.  I tried killing one of the jammed "rm"
>> processes and it
>> eventually died.  The # of misses seen in arcstat.pl
>> eventually dropped
>> back down to the 20-40% range ("miss%").  A while
>> later, writes began
>> occuring to the pool again and remote NFS access also
>> freed up and overall
>> system behaviour seemed to normalize.  This all
>> occurred over the course
>> of approximately an hour.
>>
>> Does this kind of problem sound familiar to anyone?
>>  Is it a ZFS problem,
>> r have I hit some sort of ZFS load maximum and this
>> is the response?
>> Any suggestions for ways to avoid this are welcome...
>>
>>                          Thanks...
>>    Art
>> thur A. Person
>> Research Assistant, System Administrator
>> Penn State Department of Meteorology
>> email:  person at meteo.psu.edu, phone:  814-863-1563
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
>> ss
>
> Hi Art,
>
> I have seen a similar problem that is happening on several servers since 
> a recent upgrade from b70 to b86/b87.  For no obvious reason, the 
> servers will stop writing to the pool for long periods of time. 
> Watching a "zpool iostat", I can see that 0 writes are being done
for up
> to a minute at a time.  Meanwhile, a large amount of small (~3K) reads 
> are happening.  The servers behave like this for an hour or more at a 
> time.
>
> The server configuration is:
> Dual-core Opteron 2212HE
> 4GB ECC DDR2 RAM
> 15 1TB SATA drives in a RAID-Z2 pool
> 2 Supermicro SAT2-MV8 controllers
> SunOS 5.11 snv_86
> UFS root and swap are on their own disk
>
> Have you made any progress with this problem? Has anyone else seen this
behavior?

I haven''t seen it happen again, but I haven''t hammered the
system as I did
above to try and make it fail either.  Since then, I have also added two
RiDATA 16GB SSD''s, one as a log device and one as a cache device to see
if
I can improve performance into/out-of the array.  Writing data to the
array has definitely improved with the log device, but I''m still having
performance issues reading large numbers of small files off the array.

I''m curious about your array configuration above... did you create your
RAIDZ2 as one vdev or multiple vdev''s?  If multiple, how many?  On
mine, I
have all 10 disks set up as one RAIDZ2 vdev which is supposed to be near
the performance limit... I''m wondering how much I would gain by
splitting
it into two vdev''s for the price of losing 1.5TB (2 disks) worth of
storage.

                                Art

Arthur A. Person
Research Assistant, System Administrator
Penn State Department of Meteorology
email:  person at meteo.psu.edu, phone:  814-863-1563

2008-May-14 18:42 UTC

head link

[zfs-discuss] ZFS jammed while busy

This sounds like an important problem
> > Hi...
> > 
> > Here''s my system:
> > 
> >     2 Intel 3 Ghz 5160 dual-core cpu''s
> > 0 SATA 750 GB disks running as a ZFS RAIDZ2 pool
> >     8 GB Memory
> >   SunOS 5.11 snv_79a on a separate UFS mirror
> >     ZFS pool version 10
> >   No separate ZIL or ARC cache
> > ran into a problem today where the ZFS pool jammed
> >  for an extended 
> > eriod of time.  During that time, it seemed
> > read-bound doing only read 
> > I/O''s (as observed with "zpool iostat 1") and I
> saw
> > 100% misses while 
> > running arcstat.pl (for "miss%", "dm%",
"pm%" and
> > "mm%").  Processes 
> > accessing the pool were jammed, including remote
> NFS
> > mounts.  At the time, 
> > I was: 1) running a scrub, 2) writing 10''s of
> MB/sec
> > of data onto the pool 
> > as well as reading from the pool, 3) was deleting
> a
> > large number of files 
> > on the pool.  I tried killing one of the jammed
> "rm"
> > processes and it 
> > eventually died.  The # of misses seen in
> arcstat.pl
> > eventually dropped 
> > back down to the 20-40% range ("miss%").  A while
> > later, writes began 
> > occuring to the pool again and remote NFS access
> also
> > freed up and overall 
> > system behaviour seemed to normalize.  This all
> > occurred over the course 
> > of approximately an hour.
> > 
> > Does this kind of problem sound familiar to
> anyone?
> >  Is it a ZFS problem, 
> > r have I hit some sort of ZFS load maximum and
> this
> > is the response? 
> > Any suggestions for ways to avoid this are
> welcome...
> > 
> >                          Thanks...
> >    Art
> > thur A. Person
> > Research Assistant, System Administrator
> > Penn State Department of Meteorology
> > email:  person at meteo.psu.edu, phone:  814-863-1563
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> >
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> 
> > ss
> 
> Hi Art,
> 
> I have seen a similar problem that is happening on
> several servers since a recent upgrade from b70 to
> b86/b87.  For no obvious reason, the servers will
> stop writing to the pool for long periods of time.
> Watching a "zpool iostat", I can see that 0 writes
> are being done for up to a minute at a time.
> Meanwhile, a large amount of small (~3K) reads are
> happening.  The servers behave like this for an hour
>  or more at a time.
> 
> The server configuration is:
> Dual-core Opteron 2212HE
> 4GB ECC DDR2 RAM
> 15 1TB SATA drives in a RAID-Z2 pool
> 2 Supermicro SAT2-MV8 controllers
> SunOS 5.11 snv_86
> UFS root and swap are on their own disk
> 
> Have you made any progress with this problem? Has
> anyone else seen this behavior? 
 
This message posted from opensolaris.org

Brandon High

2008-May-15 00:18 UTC

head link

[zfs-discuss] ZFS jammed while busy

On Tue, Apr 22, 2008 at 8:24 AM, Arthur Person <ap60 at meteo.psu.edu>
wrote:>  Does this kind of problem sound familiar to anyone?  Is it a ZFS problem,
>  or have I hit some sort of ZFS load maximum and this is the response?
>  Any suggestions for ways to avoid this are welcome...
I think I''ve seen reports or similar problems on the zfs list, but I
don''t know if there was any resolution.

One suggestion was that a SATA drive could be attempting to correct a
read error and that was causing ZFS to block on i/o.

-B

-- 
Brandon High bhigh at freaks.com
"The good is the enemy of the best." - Nietzsche

Simon Breden

2008-May-15 07:56 UTC

head link

[zfs-discuss] ZFS jammed while busy

Hi Arthur, I''ve seen a lockup type situation which might possibly be
similar to what you''ve described. I didn''t wait 1 hour though
to see if it resolved itself so I had to reboot.

I have described the saga here:
http://www.opensolaris.org/jive/thread.jspa?threadID=59201&tstart=0

I haven''t managed to debug it yet with DTrace as I didn''t get
the time to learn DTrace, and I wasn''t sure which version of the SATA
driver source code was the correct one for snv_b87.

Simon
 
 
This message posted from opensolaris.org

Scott

2008-May-16 00:45 UTC

head link

[zfs-discuss] ZFS jammed while busy

> I''m curious about your array configuration above...
> did you create your
> RAIDZ2 as one vdev or multiple vdev''s?  If multiple,
> how many?  On mine, I
> have all 10 disks set up as one RAIDZ2 vdev which is
> supposed to be near
> the performance limit... I''m wondering how much I
> would gain by splitting
> it into two vdev''s for the price of losing 1.5TB (2
> disks) worth of
> storage.
> 
>                                 Art
I have all 15 drives under a single raidz2.  In my case, capacity is more
important than speed.  I''m sure others can comment on any potential
speed tradeoffs in your setup.

I''m still having this problem, and have been playing around with DTrace
for the last few days.  I downgraded my b87 servers to b86 in order to cut the
new write-throttling code from the equation.  That seems to have improved the
performance, but hasn''t completely eliminated the problem.
 
 
This message posted from opensolaris.org

Marion Hakanson

2008-May-21 02:20 UTC

head link

[zfs-discuss] ZFS jammed while busy

ap60 at meteo.psu.edu said:> I''m curious about your array configuration above... did you create
your
> RAIDZ2 as one vdev or multiple vdev''s?  If multiple, how many?  On
mine, I
> have all 10 disks set up as one RAIDZ2 vdev which is supposed to be near
the
> performance limit... I''m wondering how much I would gain by
splitting it into
> two vdev''s for the price of losing 1.5TB (2 disks) worth of
storage.
You''ve probably already seen/heard this, but I haven''t seen it
mentioned
in this thread.  The consensus is, and measurements seem to confirm, that
splitting it into two vdev''s will double your available IOPS for small,
random read loads on raidz/raidz2.  Here are some references and examples:

 http://blogs.sun.com/roch/entry/when_to_and_not_to
 http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
 http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance1
 http://acc.ohsu.edu/~hakansom/thumper_bench.html

Regards,

Marion

Bob Friesenhahn

2008-May-21 04:07 UTC

head link

[zfs-discuss] ZFS jammed while busy

On Tue, 20 May 2008, Marion Hakanson wrote:> You''ve probably already seen/heard this, but I haven''t
seen it mentioned
> in this thread.  The consensus is, and measurements seem to confirm, that
> splitting it into two vdev''s will double your available IOPS for
small,
> random read loads on raidz/raidz2.  Here are some references and examples:
>
> http://blogs.sun.com/roch/entry/when_to_and_not_to
>
http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
>
http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance1
> http://acc.ohsu.edu/~hakansom/thumper_bench.html
The upshot of all this analysis is that mirroring offers the best 
multi-user performance with excellent reliability.  A system comprised 
of mirrors and big-fat SATA-II disks will almost certainly beat a 
system using raidz and small fast SAS disks in multi-user situations. 
A system comprised of mirrors with small fast SAS disks will of course 
be fastest, but will be more expensive for the same storage.

Note that in the Roch blog, load-sharing across 40 4-disk raidzs only 
achieved 4000 random I/Os per second but mirrors achieved 20000. 
Switching over to the Ranch Ramblings, we see that the fancy drives 
are only 78% faster than the big fat SATA drives.  Using the fancy SAS 
drives prefered at the Ranch only improves the raidz random I/Os to 
something like 7120, which is still far less than 20000.

It seems that it pains people to "waste" disk space.  For example, the
cost increase to use 1TB disks may not be all that much more as 
compared to 500GB disks, but it somehow seems like a huge cost not to 
maximize use of available media space.  This perception of cost and 
waste is completely irrational.  There is more waste caused by 
crippling your investment.

The Roch "WHEN TO (AND NOT TO) USE RAID-Z" blog posting contains an 
error since it blames ZFS mirroring on doubling the write IOPS.  This 
is not actually true since IOPS are measured at the per-disk level and 
each mirror disk sees the same IOPS.  It is true that the host system 
needs to send twice as many transactions when using mirroring, but the 
transactions are to different disks.  In order to improve read 
performance further, triple mirroring can be used, with added write 
cost at the host level and more wasted disk space.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

zfs discuss - Apr 2008 - ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy

[zfs-discuss] ZFS jammed while busy