thr3ads.net - zfs discuss - [zfs-discuss] Very bad ZFS write performance. Ok Read. [Feb 2011]

If this information is useful, please help other people find it:
Share via:

ian W

2011-Feb-10 02:16 UTC

[zfs-discuss] Very bad ZFS write performance. Ok Read.

Hi All.

Hope you can still help here. Solaris 11 Express.
x86 platform E6600 with 6GB of RAM

I have a fairly new S11E box Im using as a file server.
3x1.5TB HDD''s in a raidz pool.

When I first set it up I was getting 110MB/sec writes across gigabit network via
SMB shares.

I noticed recently that write rate has dropped off and through testing now I am
getting 35MB/sec writes. The pool is around 50-60% full.

I am getting a CONSTANT 30-35% kernel cpu utilisation, even if the machine is
idle. I do not know if this was the case when the write performance was better.
I have tried reading from the server to a HDD on a windows client and I get
50+MB/sec which is probably the max that that HDD can sustain on a write.

It''s interesting that when I boot into the S11E live CD, my CPU idles
at 100% idle, but as soon as I install to HDD, it gets that 30-35% kernel
utilisation.

I do not know if the CPU utilisation and the bad write performance is related.

I did the following while the system was idle.

an at Nas:~# lockstat -gkIW sleep 60

Profiling interrupt: 11648 events in 60.038 seconds (194 events/sec)

Count genr cuml rcnt nsec Hottest CPU+PIL Caller

10534 90% ---- 0.00 165530 cpu[1] thread_start 
9750 84% ---- 0.00 147359 cpu[1] idle 
8128 70% ---- 0.00 94807 cpu[1] cpu_idle_mwait 
7852 67% ---- 0.00 84202 cpu[1] i86_mwait 
2541 22% ---- 0.00 407347 cpu[1]+11 cpupm_utilization_event 
2541 22% ---- 0.00 407347 cpu[1]+11 cpupm_change_state 
2540 22% ---- 0.00 407412 cpu[1]+11 cpupm_plat_change_state 
2539 22% ---- 0.00 407466 cpu[1]+11 cpupm_state_change 
2530 22% ---- 0.00 408167 cpu[1]+11 pg_ev_thread_swtch 
2512 22% ---- 0.00 401944 cpu[1]+11 swtch 
2511 22% ---- 0.00 407944 cpu[1]+11 cmt_ev_thread_swtch_pwr 
2509 22% ---- 0.00 408047 cpu[1]+11 speedstep_power 
1311 11% ---- 0.00 421383 cpu[0]+11 speedstep_pstate_transition
1307 11% ---- 0.00 422634 cpu[0]+11 write_ctrl 
1307 11% ---- 0.00 422634 cpu[0]+11 cpu_acpi_write_port 
1306 11% ---- 0.00 422939 cpu[0]+11 outw 
1237 11% ---- 0.00 392091 cpu[1]+11 do_splx 
1197 10% ---- 0.00 393779 cpu[1]+11 xc_call 
1197 10% ---- 0.00 393779 cpu[1]+11 xc_common 

hope that might help.
-- 
This message posted from opensolaris.org

Markus Kovero

2011-Feb-11 08:55 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> I noticed recently that write rate has dropped off and through testing now
I am getting 35MB/sec writes. The pool is around 50-60% full.
> I am getting a CONSTANT 30-35% kernel cpu utilisation, even if the machine
is idle. I do not know if this was the case when the write performance was
better. I have tried reading from the server to a HDD on a windows client and I
get 50+MB/sec which is probably the max that that HDD can sustain on a write.
Hi, do you have your zfs prefetch turned on or off? Turning prefetch off makes
comstar iscsi shares unusable in Solaris 11 Express while it might work fine in
osol.

Yours
Markus Kovero

Edward Ned Harvey

2011-Feb-11 12:53 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Markus Kovero
> 
> > I noticed recently that write rate has dropped off and through testing
now I> am getting 35MB/sec writes. The pool is around 50-60% full.
> 
> Hi, do you have your zfs prefetch turned on or off? Turning prefetch off
> makes comstar iscsi shares unusable in Solaris 11 Express while it might
work> fine in osol.
On the other hand, that will only matter for reads.  And the complaint is
writes.

Edward Ned Harvey

2011-Feb-11 12:59 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of ian W
> 
> Hope you can still help here. Solaris 11 Express.
> x86 platform E6600 with 6GB of RAM
> 
> I have a fairly new S11E box Im using as a file server.
> 3x1.5TB HDD''s in a raidz pool.
Just so you know, you''re calling that "fairly new" but a
Core2 Duo processor
and 6G of ram isn''t the same class of machine this is meant for. 
Still,
that should be irrelevant, cuz you should be able to fully max out a 1G
ethernet easily enough.

> When I first set it up I was getting 110MB/sec writes across gigabit
network> via SMB shares.
> 
> I noticed recently that write rate has dropped off and through testing now
I> am getting 35MB/sec writes. The pool is around 50-60% full.
How, precisely are you performing your test?  Step by step...
What do you get if you start "iostat 5" a minute before the test and
keep it
running till a little after the test?

What does "zpool status" tell you?

Markus Kovero

2011-Feb-11 13:35 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> On the other hand, that will only matter for reads.  And the complaint is
> writes.
Actually, it also affects writes. (due checksum reads?)

Yours
Markus Kovero

ian W

2011-Feb-12 08:14 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

Thanks for the responses.. I found the issue. It was due to power management,
and a probably bug with event driven power management states,

changing 

cpupm enable

to 

cpupm enable poll-mode

in /etc/power.conf fixed the issue for me. back up to 110MB/sec+ now..
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Feb-12 14:15 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> What does "zpool status" tell you?
Also, "zpool iostat 5''

Roy Sigurd Karlsbakk

2011-Feb-12 17:14 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> > What does "zpool status" tell you?
> 
> Also, "zpool iostat 5''
or even iostat -xn 

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Roy Sigurd Karlsbakk

2011-Feb-12 17:15 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey
> >
> > What does "zpool status" tell you?
> 
> Also, "zpool iostat 5''
Sorry - check iostat -en

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Krunal Desai

2011-Feb-14 16:42 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

On Sat, Feb 12, 2011 at 3:14 AM, ian W <dropbearsssf at yahoo.com.au>
wrote:> Thanks for the responses.. I found the issue. It was due to power
management, and a probably bug with event driven power management states,
>
> changing
>
> cpupm enable
>
> to
>
> cpupm enable poll-mode
>
> in /etc/power.conf fixed the issue for me. back up to 110MB/sec+ now..
Interesting - I have a E6600 also, and I will give this a try. I left
''cpupm enable'' in /etc/power.conf because powertop/prtdiag
properly
reported all the available P/C-states of my CPU, so I assumed that
power management was good to go. What do you have cpu-threshold set
too?

(This may be a moot point for me, because my CPU is littering fault
management with strings of L2 cache errors, so might be upgrading to
Nehalem soon).

ian W

2011-Feb-15 00:49 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

Hello

my power.conf is as follows; any recommendations for improvement?


device-dependency-property removable-media /dev/fb
autopm enable
autoS3 enable
cpu-threshold 1s
# Auto-Shutdown		Idle(min)	Start/Finish(hh:mm)	Behavior
autoshutdown 30 0:00 0:00 noshutdown
S3-support enable
cpu_deep_idle enable
cpupm enable poll-mode
-- 
This message posted from opensolaris.org

Richard Elling

2011-Feb-15 05:28 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

On Feb 14, 2011, at 4:49 PM, ian W wrote:
> Hello
> 
> my power.conf is as follows; any recommendations for improvement?
For best performance, disable power management. For certain processors
and BIOSes, some combinations of power management (below the OS) are
also known to be toxic. At Nexenta, current best practice is to disable C-states
for Nehalems.
 -- richard

ian W

2011-Feb-16 03:46 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

Thanks..

given this box runs 18 hours a day and is idle for maybe 17.5 hrs of that,
I''d rather have the best power management I can...

I would have loved to have upgraded to a i3 or even SB but the solaris 11
express support for both is marginal. (h55 chipset issues, no sandybridge
support at all etc)
-- 
This message posted from opensolaris.org

Richard Elling

2011-Feb-16 05:10 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

On Feb 15, 2011, at 7:46 PM, ian W wrote:
> Thanks..
> 
> given this box runs 18 hours a day and is idle for maybe 17.5 hrs of that,
I''d rather have the best power management I can...
> 
> I would have loved to have upgraded to a i3 or even SB but the solaris 11
express support for both is marginal. (h55 chipset issues, no sandybridge
support at all etc)
I think there are options here, but there are few who will care enough to spend
the
time required to optimize... it is less expensive to buy lower-power processors
than
to spend even one man-hour trying to get savings out of a high-power processor.
But if you are up to the challenge :-) try disabling cores entirely and leave
the remaining
two or three cores running without C-states. You will need to measure the actual
power
consumption, but you might be surprised at how much better that works for
performance
and power savings.
 -- richard

Khushil Dep

2011-Feb-16 07:26 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

Could you not also pin process'' to cores, preventing switching should
help
too? I''ve done this for performance reasons before on a 24 core Linux
box

Sent from my HTC Desire
On 16 Feb 2011 05:12, "Richard Elling" <richard.elling at
gmail.com> wrote:> On Feb 15, 2011, at 7:46 PM, ian W wrote:
>
>> Thanks..
>>
>> given this box runs 18 hours a day and is idle for maybe 17.5 hrs ofthat, I''d rather have the best power management I
can...>>
>> I would have loved to have upgraded to a i3 or even SB but the solaris
11express support for both is marginal. (h55 chipset issues, no sandybridge
support at all etc)>
> I think there are options here, but there are few who will care enough to
spend the> time required to optimize... it is less expensive to buy lower-power
processors than> to spend even one man-hour trying to get savings out of a high-power
processor.> But if you are up to the challenge :-) try disabling cores entirely and
leave the remaining> two or three cores running without C-states. You will need to measure the
actual power> consumption, but you might be surprised at how much better that works for
performance> and power savings.
> -- richard
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110216/558c88fb/attachment.html>

Richard Elling

2011-Feb-16 14:43 UTC

head link

[zfs-discuss] Very bad ZFS write performance. Ok Read.

On Feb 15, 2011, at 11:26 PM, Khushil Dep wrote:> Could you not also pin process'' to cores, preventing switching
should help too? I''ve done this for performance reasons before on a 24
core Linux box
> 
Yes. More importantly, you could send interrupts to a processor set. There are
many
ways to implement resource management in Solaris-based systems :-)
 -- richard


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110216/c988bbad/attachment.html>

zfs discuss - Feb 2011 - Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.

[zfs-discuss] Very bad ZFS write performance. Ok Read.