thr3ads.net - zfs discuss - [zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Patrick Skerrett

2009-Apr-09 21:13 UTC

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

Hi folks,

I would appreciate it if someone can help me understand some weird 
results I''m seeing with trying to do performance testing with an SSD 
offloaded ZIL.


I''m attempting to improve my infrastructure''s burstable write
capacity
(ZFS based WebDav servers), and naturally I''m looking at implementing 
SSD based ZIL devices.
I have a test machine with the crummiest hard drive I can find installed 
in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 
32gig SSD drive.
I''m trying to do A-B comparisons and am coming up with some very odd 
results:

The first test involves doing IOZone write testing on the fireball 
standalone, the SSD standalone, and the fireball with the SSD as a log 
device.

My test command is:  time iozone -i 0 -a -y 64 -q 1024 -g 32M

Then I check the time it takes to complete this operation in each scenario:

Fireball alone - 2m15s (told you it was crappy)
SSD alone - 0m3s
Fireball + SSD zil - 0m28s

This looks great! Watching ''zpool iostat-v'' during this test
further
proves that the ZIL device is doing the brunt of the heavy lifting 
during this test. If I can get these kind of write results in my prod 
environment, I would be one happy camper.



However, ANY other test I can think of to run on this test machine shows 
absolutely no performance improvement of the Fireball+SSD Zil over the 
Fireball by itself. Watching zpool iostat -v shows no activity on the 
ZIL at all whatsoever.
Other tests I''ve tried to run:

A scripted batch job of 10,000 -
dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000

A scripted batch job of 10,000 -
cat /sourcedrive/$file > /fireball/$file

A scripted batch job of 10,000 -
cp /sourcedrive/$file /fireball/$file

And a scripted batch job moving 10,000 files onto the fireball using 
Apache Webdav mounted on the fireball (similar to my prod environment):
curl -T /sourcedrive/$file http://127.0.0.1/fireball/




So what is IOZone doing differently than any other write operation I can 
think of???


Thanks,

Pat S.

Neil Perrin

2009-Apr-09 21:52 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

Patrick,

The ZIL is only used for synchronous requests like O_DSYNC/O_SYNC and
fsync(). Your iozone command must be doing some synchronous writes.
All the other tests (dd, cat, cp, ...) do everything asynchronously.
That is they do not require the data to be on stable storage on
return from the write. So asynchronous writes get cached in memory
(the ARC) and written out periodically (every 30 seconds or less)
when the transaction group commits.

The ZIL would be heavily used if your system were a NFS server.
Databases also do synchronous writes.

Neil.

On 04/09/09 15:13, Patrick Skerrett wrote:> Hi folks,
> 
> I would appreciate it if someone can help me understand some weird 
> results I''m seeing with trying to do performance testing with an
SSD
> offloaded ZIL.
> 
> 
> I''m attempting to improve my infrastructure''s burstable
write capacity
> (ZFS based WebDav servers), and naturally I''m looking at
implementing
> SSD based ZIL devices.
> I have a test machine with the crummiest hard drive I can find installed 
> in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 
> 32gig SSD drive.
> I''m trying to do A-B comparisons and am coming up with some very
odd
> results:
> 
> The first test involves doing IOZone write testing on the fireball 
> standalone, the SSD standalone, and the fireball with the SSD as a log 
> device.
> 
> My test command is:  time iozone -i 0 -a -y 64 -q 1024 -g 32M
> 
> Then I check the time it takes to complete this operation in each scenario:
> 
> Fireball alone - 2m15s (told you it was crappy)
> SSD alone - 0m3s
> Fireball + SSD zil - 0m28s
> 
> This looks great! Watching ''zpool iostat-v'' during this
test further
> proves that the ZIL device is doing the brunt of the heavy lifting 
> during this test. If I can get these kind of write results in my prod 
> environment, I would be one happy camper.
> 
> 
> 
> However, ANY other test I can think of to run on this test machine shows 
> absolutely no performance improvement of the Fireball+SSD Zil over the 
> Fireball by itself. Watching zpool iostat -v shows no activity on the 
> ZIL at all whatsoever.
> Other tests I''ve tried to run:
> 
> A scripted batch job of 10,000 -
> dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000
> 
> A scripted batch job of 10,000 -
> cat /sourcedrive/$file > /fireball/$file
> 
> A scripted batch job of 10,000 -
> cp /sourcedrive/$file /fireball/$file
> 
> And a scripted batch job moving 10,000 files onto the fireball using 
> Apache Webdav mounted on the fireball (similar to my prod environment):
> curl -T /sourcedrive/$file http://127.0.0.1/fireball/
> 
> 
> 
> 
> So what is IOZone doing differently than any other write operation I can 
> think of???
> 
> 
> Thanks,
> 
> Pat S.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Patrick Skerrett

2009-Apr-10 13:07 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

Thanks for the explanation folks.

So if I cannot get Apache/Webdav to write synchronously, (and it does 
not look like I can), then is it possible to tune the ARC to be more 
write-buffered heavy?

My biggest problem is with very quick spikes in writes periodically 
throughout the day. If I were able to buffer these better, I would be in 
pretty good shape. The machines are already (economically) maxed out on 
ram at 32 gigs.

If I were to add in the SSD L2ARC devices for read caching, can I 
configure the ARC to give up some of it''s read caching for more write 
buffering?

Thanks.

Neil Perrin wrote:> Patrick,
>
> The ZIL is only used for synchronous requests like O_DSYNC/O_SYNC and
> fsync(). Your iozone command must be doing some synchronous writes.
> All the other tests (dd, cat, cp, ...) do everything asynchronously.
> That is they do not require the data to be on stable storage on
> return from the write. So asynchronous writes get cached in memory
> (the ARC) and written out periodically (every 30 seconds or less)
> when the transaction group commits.
>
> The ZIL would be heavily used if your system were a NFS server.
> Databases also do synchronous writes.
>
> Neil.

Eric D. Mudama

2009-Apr-10 20:43 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

On Fri, Apr 10 at  8:07, Patrick Skerrett wrote:> Thanks for the explanation folks.
>
> So if I cannot get Apache/Webdav to write synchronously, (and it does  
> not look like I can), then is it possible to tune the ARC to be more  
> write-buffered heavy?
>
> My biggest problem is with very quick spikes in writes periodically  
> throughout the day. If I were able to buffer these better, I would be in  
> pretty good shape. The machines are already (economically) maxed out on  
> ram at 32 gigs.
>
> If I were to add in the SSD L2ARC devices for read caching, can I  
> configure the ARC to give up some of it''s read caching for more
write
> buffering?
I think in most cases, the raw spindle throughput should be enough to
handle your load, or else you haven''t sized your arrays properly.
Bursts of async writes of relatively large size should be headed to
the media at somewhere around 50-100MB/s/vdev I would think.  How much
burst IO do you have?

-- 
Eric D. Mudama
edmudama at mail.bounceswoosh.org

Patrick Skerrett

2009-Apr-10 20:46 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

More than that :)

It''s very very short duration, but we have the potential for >
10''s of
thousands of clients doing writes all at the same time. I have the farm 
spread out over 16 servers, each with 2x 4GB fiber cards into big disk 
arrays, but my reads do get slow (resulting in end user experience 
degradation) when these write bursts come in, and if I could buffer them 
even for 60 seconds, it would make everything much smoother.

Is there a way to optimize the ARC for more write buffering, and push 
more read caching off into the L2ARC?

Again, I''m only worried about short bursts that happen once or twice a 
day. The rest of the time everything runs very smooth.

Thanks.

Eric D. Mudama wrote:> On Fri, Apr 10 at 8:07, Patrick Skerrett wrote:
>> Thanks for the explanation folks.
>>
>> So if I cannot get Apache/Webdav to write synchronously, (and it does 
>> not look like I can), then is it possible to tune the ARC to be more 
>> write-buffered heavy?
>>
>> My biggest problem is with very quick spikes in writes periodically 
>> throughout the day. If I were able to buffer these better, I would be 
>> in pretty good shape. The machines are already (economically) maxed 
>> out on ram at 32 gigs.
>>
>> If I were to add in the SSD L2ARC devices for read caching, can I 
>> configure the ARC to give up some of it''s read caching for
more write
>> buffering?
>
> I think in most cases, the raw spindle throughput should be enough to
> handle your load, or else you haven''t sized your arrays properly.
> Bursts of async writes of relatively large size should be headed to
> the media at somewhere around 50-100MB/s/vdev I would think. How much
> burst IO do you have?
>

Mark J Musante

2009-Apr-10 21:05 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

On Fri, 10 Apr 2009, Patrick Skerrett wrote:
> degradation) when these write bursts come in, and if I could buffer them 
> even for 60 seconds, it would make everything much smoother.
ZFS already batches up writes into a transaction group, which currently 
happens every 30 seconds.  Have you tested zfs against a real-world 
workload?


Regards,
markm

Patrick Skerrett

2009-Apr-10 21:06 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

Yes, we are currently running ZFS, just without L2 ARC, or offloaded ZIL.





Mark J Musante wrote:> On Fri, 10 Apr 2009, Patrick Skerrett wrote:
>
>> degradation) when these write bursts come in, and if I could buffer 
>> them even for 60 seconds, it would make everything much smoother.
>
> ZFS already batches up writes into a transaction group, which 
> currently happens every 30 seconds. Have you tested zfs against a 
> real-world workload?
>
>
> Regards,
> markm
>

Toby Thain

2009-Apr-11 02:15 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

On 10-Apr-09, at 5:05 PM, Mark J Musante wrote:
> On Fri, 10 Apr 2009, Patrick Skerrett wrote:
>
>> degradation) when these write bursts come in, and if I could  
>> buffer them even for 60 seconds, it would make everything much  
>> smoother.
>
> ZFS already batches up writes into a transaction group, which  
> currently happens every 30 seconds.

Isn''t that 5 seconds?

--T
>   Have you tested zfs against a real-world workload?
>
>
> Regards,
> markm
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Neil Perrin

2009-Apr-11 02:31 UTC

head link

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

On 04/10/09 20:15, Toby Thain wrote:> 
> On 10-Apr-09, at 5:05 PM, Mark J Musante wrote:
> 
>> On Fri, 10 Apr 2009, Patrick Skerrett wrote:
>>
>>> degradation) when these write bursts come in, and if I could buffer
>>> them even for 60 seconds, it would make everything much smoother.
>>
>> ZFS already batches up writes into a transaction group, which 
>> currently happens every 30 seconds.
> 
> Isn''t that 5 seconds?
It used to be, and it may still be for what you are running.
However, Mark is right, it is now 30 seconds. In fact 30s is
the maximum. The actual time will depend on load. If the pool
is heavily used then the txg''s fire more frequently.

Neil.

Apparently Analagous Threads

Search for more seemingly similar threads

zfs discuss - Apr 2009 - ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great

Apparently Analagous Threads