thr3ads.net - zfs discuss - [zfs-discuss] Extremely long creat64 latencies on higly utilized zpools [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Yaniv Aknin

2007-Aug-15 08:11 UTC

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

Hello friends,

I''ve recently seen a strange phenomenon with ZFS on Solaris 10u3, and
was wondering if someone may have more information.

The system uses several zpools, each a bit under 10T, each containing one zfs
with lots and lots of small files (way too many, about 100m files and 75m
directories).

I have absolutely no control over the directory structure and believe me I tried
to change it.

Filesystem usage patterns are create and read, never delete and never rewrite.

When volumes approach 90% usage, and under medium/light load (zpool iostat
reports 50mb/s and 750iops reads), some creat64 system calls take over 50
seconds to complete (observed with ''truss -D touch''). When
doing manual tests, I''ve seen similar times on unlink() calls (truss -D
rm).

I''d like to stress this happens on /some/ of the calls, maybe every
100th manual call (I scripted the test), which (along with normal system
operations) would probably be every 10,000th or 100,000th call.

Other system parameters (memory usage, loadavg, process number, etc) appear
nominal. The machine is an NFS server, though the crazy latencies were observed
both local and remote.

What would you suggest to further diagnose this? Has anyone seen trouble with
high utilization and medium load? (with or without insanely high filecount?)

Many thanks in advance,
 - Yaniv
 
 
This message posted from opensolaris.org

michael schuster

2007-Aug-15 13:33 UTC

head link

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

Yaniv,

I''m adding dtrace-discuss to this email for reasons that will be
obvious
immediately :-) - see below

Yaniv Aknin wrote:
> When volumes approach 90% usage, and under medium/light load (zpool
> iostat reports 50mb/s and 750iops reads), some creat64 system calls take
> over 50 seconds to complete (observed with ''truss -D
touch''). When doing
> manual tests, I''ve seen similar times on unlink() calls (truss -D
rm).
> 
> I''d like to stress this happens on /some/ of the calls, maybe
every
> 100th manual call (I scripted the test), which (along with normal system
> operations) would probably be every 10,000th or 100,000th call.
I''d suggest you do something like this (not tested, so syntax errors
etc
may be lurking; I''d also suggest you get the DTrace guide off of 
opensolaris.org and read the chapter about speculations):

#!/usr/sbin/drace -Fs

int limit  ONE_SECOND	/* you need to replace this with 10^9, I think)

syscall::creat64:entry
{
	self->spec = speculation();
	speculate(self->spec);
	self->ts=timestamp();
	self->duration = 0;
}

fbt:::entry,
fbt:::return
/self->spec/
{
	speculate(self->spec);
}

syscall::creat64:return
/self->spec/
{
	speculate(self->spec);
	self->duration = timestamp() - self->ts;
}

syscall::creat64:return
/self->duration > limit/
{
	commit(self->spec);
	self->spec = 0;
}

syscall::creat64:return
/self->spec/
{
	discard(self->spec);
	self->spec = 0;
}


you may need to use a different timestamp (walltimestamp?); and perhaps 
you''ll want to somehow reduce the number of fbt probes, but
that''s up to
you. I hope you can take it from here.

cheers
Michael
-- 
Michael Schuster	Sun Microsystems, Inc.
recursion, n: see ''recursion''

johansen-osdev at sun.com

2007-Aug-15 13:49 UTC

head link

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

You might also consider taking a look at this thread:

http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041760.html

Although I''m not certain, this sounds a lot like the other pool
fragmentation issues.

-j

On Wed, Aug 15, 2007 at 01:11:40AM -0700, Yaniv Aknin
wrote:> Hello friends,
> 
> I''ve recently seen a strange phenomenon with ZFS on Solaris 10u3,
and was wondering if someone may have more information.
> 
> The system uses several zpools, each a bit under 10T, each containing one
zfs with lots and lots of small files (way too many, about 100m files and 75m
directories).
> 
> I have absolutely no control over the directory structure and believe me I
tried to change it.
> 
> Filesystem usage patterns are create and read, never delete and never
rewrite.
> 
> When volumes approach 90% usage, and under medium/light load (zpool iostat
reports 50mb/s and 750iops reads), some creat64 system calls take over 50
seconds to complete (observed with ''truss -D touch''). When
doing manual tests, I''ve seen similar times on unlink() calls (truss -D
rm).
> 
> I''d like to stress this happens on /some/ of the calls, maybe
every 100th manual call (I scripted the test), which (along with normal system
operations) would probably be every 10,000th or 100,000th call.
> 
> Other system parameters (memory usage, loadavg, process number, etc) appear
nominal. The machine is an NFS server, though the crazy latencies were observed
both local and remote.
> 
> What would you suggest to further diagnose this? Has anyone seen trouble
with high utilization and medium load? (with or without insanely high
filecount?)
> 
> Many thanks in advance,
>  - Yaniv
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Matthew Ahrens

2007-Aug-18 04:09 UTC

head link

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

Yaniv Aknin wrote:> When volumes approach 90% usage, and under medium/light load (zpool iostat
> reports 50mb/s and 750iops reads), some creat64 system calls take over 50
> seconds to complete (observed with ''truss -D touch'').
When doing manual
> tests, I''ve seen similar times on unlink() calls (truss -D rm).
If it''s only file creation that''s slow, it''s possible
that you''re hitting
6544140 (fixed in snv_68).  If this is the case then you will see high system 
time from your touch process, and a high ratio of calls to dnode_hold_impl() 
to dmu_object_alloc(), which you can examine with dtrace.

If that isn''t it, then try getting some stack traces (of the kernel
stacks)
when it''s taking a long time.  Again, dtrace is your friend here.

--matt

Yaniv Aknin

2007-Aug-21 10:03 UTC

head link

[zfs-discuss] Extremely long creat64 latencies on higly

Thanks Michael.

Here is my (slightly corrected) version of the script, after I''ve done
a bit of investment in the dtrace manual (always knew it''s powerful,
but it''s more than that...).

While it appears to run, and when I tried lowering the limit I started getting
results, I''d appreciate it if you could please explain the actions
attached to the last few probes with the different predicates (the numerous
return probes from creat64).

It looks to me a lot like a conditional program flow (first we calculate the
duration, then we commit() the speculation if duration is > limit and
discard() it otherwise) rather than discrete probes that fire independently. I
read the manual as saying that conditional flow isn''t possible in D,
but I could have been wrong. What guarantees that the last free probes will fire
in order, producing an IF-THEN-ELSE logic? If nothing guarantees that - why does
the script work?

Thanks for your help,
 - Yaniv

#!/usr/sbin/dtrace -Fs

int limit;

dtrace:::BEGIN
{
        limit = 1000000000;
}

syscall::creat64:entry
{
        self->spec = speculation();
        speculate(self->spec);
        self->ts=timestamp;
        self->duration = 0;
}

fbt:::entry,
fbt:::return
/self->spec/
{
        speculate(self->spec);
}

syscall::creat64:return
/self->spec/
{
        speculate(self->spec);
        self->duration = timestamp - self->ts;
}

syscall::creat64:return
/self->duration > limit/
{
        commit(self->spec);
        self->spec = 0;
}

syscall::creat64:return
/self->spec/
{
        discard(self->spec);
        self->spec = 0;
}
 
 
This message posted from opensolaris.org

Michael Schuster

2007-Aug-21 14:37 UTC

head link

[zfs-discuss] Extremely long creat64 latencies on higly

Yaniv Aknin wrote:
> It looks to me a lot like a conditional program flow (first we calculate
>  the duration, then we commit() the speculation if duration is > limit
and
> discard() it otherwise) rather than discrete probes that fire
> independently. I read the manual as saying that conditional flow
isn''t
> possible in D, but I could have been wrong. What guarantees that the last
> free probes will fire in order, producing an IF-THEN-ELSE logic? If nothing
> guarantees that - why does the script work?
that''s exactly the point. Otherwise identical probes with different 
predicates fire in the order encountered in the script.

is there a reason you took dtrace-discuss off the distribution list?

cheers
Michael
-- 
Michael Schuster
Recursion, n.: see ''Recursion''

Michael Schuster

2007-Aug-21 16:15 UTC

head link

[dtrace-discuss] [zfs-discuss] Extremely long creat64 latencies on higly

Yaniv,

I just found something I don''t understand - since it''s
DTrace-specific, I
added that list back to the distribution.

Yaniv Aknin wrote:
> #!/usr/sbin/dtrace -Fs
> 
> int limit;
> 
> dtrace:::BEGIN
> {
>         limit = 1000000000;
> }
> 
> syscall::creat64:entry
> {
>         self->spec = speculation();
>         speculate(self->spec);
>         self->ts=timestamp;aren''t you missing " = timestamp" here?
>         self->duration = 0;
> }
> 
> fbt:::entry,
> fbt:::return
> /self->spec/
> {
>         speculate(self->spec);
> }
> 
> syscall::creat64:return
> /self->spec/
> {
>         speculate(self->spec);
>         self->duration = timestamp - self->ts;
> }
> 
> syscall::creat64:return
> /self->duration > limit/
> {
>         commit(self->spec);
>         self->spec = 0;
> }
> 
> syscall::creat64:return
> /self->spec/
> {
>         discard(self->spec);
>         self->spec = 0;
> }
Michael
-- 
Michael Schuster
Recursion, n.: see ''Recursion''

Michael Schuster

2007-Aug-21 16:28 UTC

head link

[dtrace-discuss] [zfs-discuss] Extremely long creat64 latencies on higly

Yaniv Aknin wrote:> I''m not sure I understood your question...
> I have a =timestamp there, but I corrected your earlier =timestamp().
> timestamp is a "magic variable", not a function.
> Is that what you meant?
no, sorry ... I read too quickly (and let the difference in formatting 
mislead me :-( - some assignments have spaces around the "=", some
don''t ...)

sorry again
Michael
-- 
Michael Schuster
Recursion, n.: see ''Recursion''

zfs discuss - Aug 2007 - Extremely long creat64 latencies on higly utilized zpools

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

[zfs-discuss] Extremely long creat64 latencies on higly utilized zpools

[zfs-discuss] Extremely long creat64 latencies on higly

[zfs-discuss] Extremely long creat64 latencies on higly

[dtrace-discuss] [zfs-discuss] Extremely long creat64 latencies on higly

[dtrace-discuss] [zfs-discuss] Extremely long creat64 latencies on higly