Patrick Skerrett
2009-Apr-09  21:13 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
Hi folks, I would appreciate it if someone can help me understand some weird results I''m seeing with trying to do performance testing with an SSD offloaded ZIL. I''m attempting to improve my infrastructure''s burstable write capacity (ZFS based WebDav servers), and naturally I''m looking at implementing SSD based ZIL devices. I have a test machine with the crummiest hard drive I can find installed in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E 32gig SSD drive. I''m trying to do A-B comparisons and am coming up with some very odd results: The first test involves doing IOZone write testing on the fireball standalone, the SSD standalone, and the fireball with the SSD as a log device. My test command is: time iozone -i 0 -a -y 64 -q 1024 -g 32M Then I check the time it takes to complete this operation in each scenario: Fireball alone - 2m15s (told you it was crappy) SSD alone - 0m3s Fireball + SSD zil - 0m28s This looks great! Watching ''zpool iostat-v'' during this test further proves that the ZIL device is doing the brunt of the heavy lifting during this test. If I can get these kind of write results in my prod environment, I would be one happy camper. However, ANY other test I can think of to run on this test machine shows absolutely no performance improvement of the Fireball+SSD Zil over the Fireball by itself. Watching zpool iostat -v shows no activity on the ZIL at all whatsoever. Other tests I''ve tried to run: A scripted batch job of 10,000 - dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000 A scripted batch job of 10,000 - cat /sourcedrive/$file > /fireball/$file A scripted batch job of 10,000 - cp /sourcedrive/$file /fireball/$file And a scripted batch job moving 10,000 files onto the fireball using Apache Webdav mounted on the fireball (similar to my prod environment): curl -T /sourcedrive/$file http://127.0.0.1/fireball/ So what is IOZone doing differently than any other write operation I can think of??? Thanks, Pat S.
Neil Perrin
2009-Apr-09  21:52 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
Patrick, The ZIL is only used for synchronous requests like O_DSYNC/O_SYNC and fsync(). Your iozone command must be doing some synchronous writes. All the other tests (dd, cat, cp, ...) do everything asynchronously. That is they do not require the data to be on stable storage on return from the write. So asynchronous writes get cached in memory (the ARC) and written out periodically (every 30 seconds or less) when the transaction group commits. The ZIL would be heavily used if your system were a NFS server. Databases also do synchronous writes. Neil. On 04/09/09 15:13, Patrick Skerrett wrote:> Hi folks, > > I would appreciate it if someone can help me understand some weird > results I''m seeing with trying to do performance testing with an SSD > offloaded ZIL. > > > I''m attempting to improve my infrastructure''s burstable write capacity > (ZFS based WebDav servers), and naturally I''m looking at implementing > SSD based ZIL devices. > I have a test machine with the crummiest hard drive I can find installed > in it, Quantum Fireball ATA-100 4500RPM 128K cache, and an Intel X25-E > 32gig SSD drive. > I''m trying to do A-B comparisons and am coming up with some very odd > results: > > The first test involves doing IOZone write testing on the fireball > standalone, the SSD standalone, and the fireball with the SSD as a log > device. > > My test command is: time iozone -i 0 -a -y 64 -q 1024 -g 32M > > Then I check the time it takes to complete this operation in each scenario: > > Fireball alone - 2m15s (told you it was crappy) > SSD alone - 0m3s > Fireball + SSD zil - 0m28s > > This looks great! Watching ''zpool iostat-v'' during this test further > proves that the ZIL device is doing the brunt of the heavy lifting > during this test. If I can get these kind of write results in my prod > environment, I would be one happy camper. > > > > However, ANY other test I can think of to run on this test machine shows > absolutely no performance improvement of the Fireball+SSD Zil over the > Fireball by itself. Watching zpool iostat -v shows no activity on the > ZIL at all whatsoever. > Other tests I''ve tried to run: > > A scripted batch job of 10,000 - > dd if=/dev/urandom of=/fireball/file_$i.dat bs=1k count=1000 > > A scripted batch job of 10,000 - > cat /sourcedrive/$file > /fireball/$file > > A scripted batch job of 10,000 - > cp /sourcedrive/$file /fireball/$file > > And a scripted batch job moving 10,000 files onto the fireball using > Apache Webdav mounted on the fireball (similar to my prod environment): > curl -T /sourcedrive/$file http://127.0.0.1/fireball/ > > > > > So what is IOZone doing differently than any other write operation I can > think of??? > > > Thanks, > > Pat S. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Patrick Skerrett
2009-Apr-10  13:07 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
Thanks for the explanation folks. So if I cannot get Apache/Webdav to write synchronously, (and it does not look like I can), then is it possible to tune the ARC to be more write-buffered heavy? My biggest problem is with very quick spikes in writes periodically throughout the day. If I were able to buffer these better, I would be in pretty good shape. The machines are already (economically) maxed out on ram at 32 gigs. If I were to add in the SSD L2ARC devices for read caching, can I configure the ARC to give up some of it''s read caching for more write buffering? Thanks. Neil Perrin wrote:> Patrick, > > The ZIL is only used for synchronous requests like O_DSYNC/O_SYNC and > fsync(). Your iozone command must be doing some synchronous writes. > All the other tests (dd, cat, cp, ...) do everything asynchronously. > That is they do not require the data to be on stable storage on > return from the write. So asynchronous writes get cached in memory > (the ARC) and written out periodically (every 30 seconds or less) > when the transaction group commits. > > The ZIL would be heavily used if your system were a NFS server. > Databases also do synchronous writes. > > Neil.
Eric D. Mudama
2009-Apr-10  20:43 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
On Fri, Apr 10 at 8:07, Patrick Skerrett wrote:> Thanks for the explanation folks. > > So if I cannot get Apache/Webdav to write synchronously, (and it does > not look like I can), then is it possible to tune the ARC to be more > write-buffered heavy? > > My biggest problem is with very quick spikes in writes periodically > throughout the day. If I were able to buffer these better, I would be in > pretty good shape. The machines are already (economically) maxed out on > ram at 32 gigs. > > If I were to add in the SSD L2ARC devices for read caching, can I > configure the ARC to give up some of it''s read caching for more write > buffering?I think in most cases, the raw spindle throughput should be enough to handle your load, or else you haven''t sized your arrays properly. Bursts of async writes of relatively large size should be headed to the media at somewhere around 50-100MB/s/vdev I would think. How much burst IO do you have? -- Eric D. Mudama edmudama at mail.bounceswoosh.org
Patrick Skerrett
2009-Apr-10  20:46 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
More than that :) It''s very very short duration, but we have the potential for > 10''s of thousands of clients doing writes all at the same time. I have the farm spread out over 16 servers, each with 2x 4GB fiber cards into big disk arrays, but my reads do get slow (resulting in end user experience degradation) when these write bursts come in, and if I could buffer them even for 60 seconds, it would make everything much smoother. Is there a way to optimize the ARC for more write buffering, and push more read caching off into the L2ARC? Again, I''m only worried about short bursts that happen once or twice a day. The rest of the time everything runs very smooth. Thanks. Eric D. Mudama wrote:> On Fri, Apr 10 at 8:07, Patrick Skerrett wrote: >> Thanks for the explanation folks. >> >> So if I cannot get Apache/Webdav to write synchronously, (and it does >> not look like I can), then is it possible to tune the ARC to be more >> write-buffered heavy? >> >> My biggest problem is with very quick spikes in writes periodically >> throughout the day. If I were able to buffer these better, I would be >> in pretty good shape. The machines are already (economically) maxed >> out on ram at 32 gigs. >> >> If I were to add in the SSD L2ARC devices for read caching, can I >> configure the ARC to give up some of it''s read caching for more write >> buffering? > > I think in most cases, the raw spindle throughput should be enough to > handle your load, or else you haven''t sized your arrays properly. > Bursts of async writes of relatively large size should be headed to > the media at somewhere around 50-100MB/s/vdev I would think. How much > burst IO do you have? >
Mark J Musante
2009-Apr-10  21:05 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
On Fri, 10 Apr 2009, Patrick Skerrett wrote:> degradation) when these write bursts come in, and if I could buffer them > even for 60 seconds, it would make everything much smoother.ZFS already batches up writes into a transaction group, which currently happens every 30 seconds. Have you tested zfs against a real-world workload? Regards, markm
Patrick Skerrett
2009-Apr-10  21:06 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
Yes, we are currently running ZFS, just without L2 ARC, or offloaded ZIL. Mark J Musante wrote:> On Fri, 10 Apr 2009, Patrick Skerrett wrote: > >> degradation) when these write bursts come in, and if I could buffer >> them even for 60 seconds, it would make everything much smoother. > > ZFS already batches up writes into a transaction group, which > currently happens every 30 seconds. Have you tested zfs against a > real-world workload? > > > Regards, > markm >
Toby Thain
2009-Apr-11  02:15 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
On 10-Apr-09, at 5:05 PM, Mark J Musante wrote:> On Fri, 10 Apr 2009, Patrick Skerrett wrote: > >> degradation) when these write bursts come in, and if I could >> buffer them even for 60 seconds, it would make everything much >> smoother. > > ZFS already batches up writes into a transaction group, which > currently happens every 30 seconds.Isn''t that 5 seconds? --T> Have you tested zfs against a real-world workload? > > > Regards, > markm > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Neil Perrin
2009-Apr-11  02:31 UTC
[zfs-discuss] ZIL SSD performance testing... -IOzone works great, others not so great
On 04/10/09 20:15, Toby Thain wrote:> > On 10-Apr-09, at 5:05 PM, Mark J Musante wrote: > >> On Fri, 10 Apr 2009, Patrick Skerrett wrote: >> >>> degradation) when these write bursts come in, and if I could buffer >>> them even for 60 seconds, it would make everything much smoother. >> >> ZFS already batches up writes into a transaction group, which >> currently happens every 30 seconds. > > Isn''t that 5 seconds?It used to be, and it may still be for what you are running. However, Mark is right, it is now 30 seconds. In fact 30s is the maximum. The actual time will depend on load. If the pool is heavily used then the txg''s fire more frequently. Neil.