We have a setup with ZFS/ESX/NFS and I am looking to move our zil to a solid state drive. So far I am looking into this one http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013 Does anyone have any experience with this drive as a poorman?s logzilla? And also what have other people done for mounting these kind of ssd?s inside a poweredge case? any suggestions on anything else? -D -- HUGE David Stahl Sr. Systems Administrator 718 233 9164 / F 718 625 5157 www.hugeinc.com <http://www.hugeinc.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090819/c28e1d6c/attachment.html>
Hi David, We are using them in our Sun X4540 filers. We are actually using 2 SSDs per pool, to improve throughput (since the logbias feature isn''t in an official release of OpenSolaris yet). I kind of wish they made an 8G or 16G part, since the 32G capacity is kind of a waste. We had to go the NewEgg route though. We tried to buy some Sun-branded disks from Sun, but that''s a different story. To summarize, we had to buy the NewEgg parts to ensure a project stayed on-schedule. Generally, we''ve been pretty pleased with them. Occasionally, we''ve had an SSD that wasn''t behaving well. Looks like you can replace log devices now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a Sun X4540 drive sled. If you can attach a standard sata disk to a Dell sled, this approach would most likely work for you as well. Only issue with using the third-party parts is that the involved support organizations for the software/hardware will make it very clear that such a configuration is quite unsupported. That said, we''ve had pretty good luck with them. -Greg -- Greg Mason System Administrator High Performance Computing Center Michigan State University HUGE | David Stahl wrote:> We have a setup with ZFS/ESX/NFS and I am looking to move our zil to a > solid state drive. > So far I am looking into this one > http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013 > Does anyone have any experience with this drive as a poorman?s logzilla?
Hello Greg, I''m curious how much performance benefit you gain from the ZIL accelerator. Have you measured that? If not, do you have a gut feel about how much it helped? Also, for what kind of applications does it help? (I know it helps with synchronous writes. I''m looking for real world answers like: "Our XYZ application was running like a dog and we added an SSD for ZIL and the response time improved by X%.") Of course, I would welcome a reply from anyone who has experience with this, not just Greg. Monish ----- Original Message ----- From: "Greg Mason" <gmason at msu.edu> To: "HUGE | David Stahl" <dstahl at hugeinc.com> Cc: "zfs-discuss" <zfs-discuss at opensolaris.org> Sent: Thursday, August 20, 2009 4:04 AM Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950 Hi David, We are using them in our Sun X4540 filers. We are actually using 2 SSDs per pool, to improve throughput (since the logbias feature isn''t in an official release of OpenSolaris yet). I kind of wish they made an 8G or 16G part, since the 32G capacity is kind of a waste. We had to go the NewEgg route though. We tried to buy some Sun-branded disks from Sun, but that''s a different story. To summarize, we had to buy the NewEgg parts to ensure a project stayed on-schedule. Generally, we''ve been pretty pleased with them. Occasionally, we''ve had an SSD that wasn''t behaving well. Looks like you can replace log devices now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a Sun X4540 drive sled. If you can attach a standard sata disk to a Dell sled, this approach would most likely work for you as well. Only issue with using the third-party parts is that the involved support organizations for the software/hardware will make it very clear that such a configuration is quite unsupported. That said, we''ve had pretty good luck with them. -Greg -- Greg Mason System Administrator High Performance Computing Center Michigan State University HUGE | David Stahl wrote:> We have a setup with ZFS/ESX/NFS and I am looking to move our zil to a > solid state drive. > So far I am looking into this one > http://www.newegg.com/Product/Product.aspx?Item=N82E16820167013 > Does anyone have any experience with this drive as a poorman?s logzilla?_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Does un-taring something count? It is what I used for our tests. I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and cheap SSD. Waiting for X-25E SSDs to arrive for testing those: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html If you want a quick answer, disable ZIL (you need to unmount/mount, export/import or reboot) on your ZFS volume and try it. That is the theoretical maximum. You can get close to this using various technologies, SSD and all that. I am no expert on this, I knew nothing about it 2 weeks ago. But for our provisioning engine to untar Movable-Types for customers, 5 mins to 45secs is quite an improvement. I can get that to 11seconds theoretically. (ZIL disable) Lund Monish Shah wrote:> Hello Greg, > > I''m curious how much performance benefit you gain from the ZIL > accelerator. Have you measured that? If not, do you have a gut feel > about how much it helped? Also, for what kind of applications does it > help? > > (I know it helps with synchronous writes. I''m looking for real world > answers like: "Our XYZ application was running like a dog and we added > an SSD for ZIL and the response time improved by X%.") > > Of course, I would welcome a reply from anyone who has experience with > this, not just Greg. > > Monish > > ----- Original Message ----- From: "Greg Mason" <gmason at msu.edu> > To: "HUGE | David Stahl" <dstahl at hugeinc.com> > Cc: "zfs-discuss" <zfs-discuss at opensolaris.org> > Sent: Thursday, August 20, 2009 4:04 AM > Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950 > > > Hi David, > > We are using them in our Sun X4540 filers. We are actually using 2 SSDs > per pool, to improve throughput (since the logbias feature isn''t in an > official release of OpenSolaris yet). I kind of wish they made an 8G or > 16G part, since the 32G capacity is kind of a waste. > > We had to go the NewEgg route though. We tried to buy some Sun-branded > disks from Sun, but that''s a different story. To summarize, we had to > buy the NewEgg parts to ensure a project stayed on-schedule. > > Generally, we''ve been pretty pleased with them. Occasionally, we''ve had > an SSD that wasn''t behaving well. Looks like you can replace log devices > now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a > Sun X4540 drive sled. If you can attach a standard sata disk to a Dell > sled, this approach would most likely work for you as well. Only issue > with using the third-party parts is that the involved support > organizations for the software/hardware will make it very clear that > such a configuration is quite unsupported. That said, we''ve had pretty > good luck with them. > > -Greg >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Something our users do quite a bit of is untarring archives with a lot of small files. Also, many small, quick writes are also one of the many workloads our users have. Real-world test: our old Linux-based NFS server allowed us to unpack a particular tar file (the source for boost 1.37) in around 2-4 minutes, depending on load. This machine wasn''t special at all, but it had fancy SGI disk on the back end, and was using the Linux-specific async NFS option. We turned up our X4540s, and this same tar unpack took over 17 minutes! We disabled the ZIL for testing, and we dropped this to under 1 minute. With the X25-E as a slog, we were able to run this test in 2-4 minutes, same as the old storage. That said, I strongly recommend using Richard Elling''s zilstat. He''s posted about it previously on this list. It will help you determine if adding a slog device will help your workload or not. I didn''t know about this script at the time of our testing, so it ended up being some trial and error, running various tests on different hardware setups (which means creating and destroying quite a few pools). -Greg Jorgen Lundman wrote:> > Does un-taring something count? It is what I used for our tests. > > I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and > cheap SSD. Waiting for X-25E SSDs to arrive for testing those: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html > > If you want a quick answer, disable ZIL (you need to unmount/mount, > export/import or reboot) on your ZFS volume and try it. That is the > theoretical maximum. You can get close to this using various > technologies, SSD and all that. > > I am no expert on this, I knew nothing about it 2 weeks ago. > > But for our provisioning engine to untar Movable-Types for customers, > 5 mins to 45secs is quite an improvement. I can get that to 11seconds > theoretically. (ZIL disable) > > Lund > > > Monish Shah wrote: >> Hello Greg, >> >> I''m curious how much performance benefit you gain from the ZIL >> accelerator. Have you measured that? If not, do you have a gut feel >> about how much it helped? Also, for what kind of applications does >> it help? >> >> (I know it helps with synchronous writes. I''m looking for real world >> answers like: "Our XYZ application was running like a dog and we >> added an SSD for ZIL and the response time improved by X%.") >> >> Of course, I would welcome a reply from anyone who has experience >> with this, not just Greg. >> >> Monish >> >> ----- Original Message ----- From: "Greg Mason" <gmason at msu.edu> >> To: "HUGE | David Stahl" <dstahl at hugeinc.com> >> Cc: "zfs-discuss" <zfs-discuss at opensolaris.org> >> Sent: Thursday, August 20, 2009 4:04 AM >> Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950 >> >> >> Hi David, >> >> We are using them in our Sun X4540 filers. We are actually using 2 SSDs >> per pool, to improve throughput (since the logbias feature isn''t in an >> official release of OpenSolaris yet). I kind of wish they made an 8G or >> 16G part, since the 32G capacity is kind of a waste. >> >> We had to go the NewEgg route though. We tried to buy some Sun-branded >> disks from Sun, but that''s a different story. To summarize, we had to >> buy the NewEgg parts to ensure a project stayed on-schedule. >> >> Generally, we''ve been pretty pleased with them. Occasionally, we''ve had >> an SSD that wasn''t behaving well. Looks like you can replace log devices >> now though... :) We use the 2.5" to 3.5" SATA adapter from IcyDock, in a >> Sun X4540 drive sled. If you can attach a standard sata disk to a Dell >> sled, this approach would most likely work for you as well. Only issue >> with using the third-party parts is that the involved support >> organizations for the software/hardware will make it very clear that >> such a configuration is quite unsupported. That said, we''ve had pretty >> good luck with them. >> >> -Greg >> >
> Something our users do quite a bit of is untarring > archives with a lot > of small files. Also, many small, quick writes are > also one of the many > workloads our users have. > > Real-world test: our old Linux-based NFS server > allowed us to unpack a > particular tar file (the source for boost 1.37) in > around 2-4 minutes, > depending on load. This machine wasn''t special at > all, but it had fancy > SGI disk on the back end, and was using the > Linux-specific async NFS option. > > We turned up our X4540s, and this same tar unpack took over 17 minutes! > We disabled the ZIL for testing, and we dropped this > to under 1 minute. > With the X25-E as a slog, we were able to run this test in 2-4 minutes, same as the old storage. > > That said, I strongly recommend using Richard > Elling''s zilstat. He''s posted about it previously on this list. It will help > you determine if adding a slog device will help your workload or not. > I didn''t know about this script at the time of our testing, so it ended > up being some trial and error, running various tests on different > hardware setups (which means creating and destroying quite a few pools). > > -GregHow about the bug "removing slog not possible"? What if this slog fails? Is there a plan for such situation (pool becomes inaccessible in this case)? -- Roman -- This message posted from opensolaris.org
> How about the bug "removing slog not possible"? What if this slog fails? Is there a plan for such situation (pool becomes inaccessible in this case)? >You can "zpool replace" a bad slog device now. -Greg
Greg Mason wrote:> >> How about the bug "removing slog not possible"? What if this slog >> fails? Is there a plan for such situation (pool becomes inaccessible >> in this case)? >> > You can "zpool replace" a bad slog device now.And I can testify that it works as described. Steve -- Stephen Green // Stephen.Green at sun.com Principal Investigator \\ http://blogs.sun.com/searchguy The AURA Project // Voice: +1 781-442-0926 Sun Microsystems Labs \\ Fax: +1 781-442-0399
> Greg Mason wrote: > > > >> How about the bug "removing slog not possible"? > What if this slog > >> fails? Is there a plan for such situation (pool > becomes inaccessible > >> in this case)? > >> > > You can "zpool replace" a bad slog device now. > > And I can testify that it works as described.I meant this situation (and even if slog mirrored - it''s still might happen): root at zsan0:~# zpool status zsan0store pool: zsan0store state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM zsan0store DEGRADED 0 0 0 raidz2 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 logs c9d0 FAULTED 0 0 0 corrupted data errors: No known data errors root at zsan0:~# zpool detach zsan0store c9d0 cannot detach c9d0: only applicable to mirror and replacing vdevs root at zsan0:~# zpool remove zsan0store c9d0 cannot remove c9d0: only inactive hot spares or cache devices can be removed -- This message posted from opensolaris.org
On 08/20/09 06:41, Greg Mason wrote:> Something our users do quite a bit of is untarring archives with a lot > of small files. Also, many small, quick writes are also one of the many > workloads our users have. > > Real-world test: our old Linux-based NFS server allowed us to unpack a > particular tar file (the source for boost 1.37) in around 2-4 minutes, > depending on load. This machine wasn''t special at all, but it had fancy > SGI disk on the back end, and was using the Linux-specific async NFS > option.I''m glad you mentioned this option. It turns all synchronous requests from the client into async allowing the server to immediately return without making the data stable. This is the equivalent of setting zil_disable. Async used to be the default behaviour. It must have been a shock to Linux users when suddenly NFS slowed down when synchronous became the default! I wonder what the perf numbers were without the async option.> > We turned up our X4540s, and this same tar unpack took over 17 minutes! > We disabled the ZIL for testing, and we dropped this to under 1 minute. > With the X25-E as a slog, we were able to run this test in 2-4 minutes, > same as the old storage.That''s pretty impressive. So with a X25-E slog ZFS is as fast synchronously as your previously hardware was asynchronously - but with no risk of data corruption. Of course the hardware is different so it''s not really apples to apples. Neil.
On Aug 22, 2009, at 5:21 PM, Neil Perrin <Neil.Perrin at Sun.COM> wrote:> > > On 08/20/09 06:41, Greg Mason wrote: >> Something our users do quite a bit of is untarring archives with a >> lot of small files. Also, many small, quick writes are also one of >> the many workloads our users have. >> Real-world test: our old Linux-based NFS server allowed us to >> unpack a particular tar file (the source for boost 1.37) in around >> 2-4 minutes, depending on load. This machine wasn''t special at all, >> but it had fancy SGI disk on the back end, and was using the Linux- >> specific async NFS option. > > I''m glad you mentioned this option. It turns all synchronous requests > from the client into async allowing the server to immediately return > without making the data stable. This is the equivalent of setting > zil_disable. > Async used to be the default behaviour. It must have been a shock to > Linux > users when suddenly NFS slowed down when synchronous became the > default! > I wonder what the perf numbers were without the async option. > >> We turned up our X4540s, and this same tar unpack took over 17 >> minutes! We disabled the ZIL for testing, and we dropped this to >> under 1 minute. With the X25-E as a slog, we were able to run this >> test in 2-4 minutes, same as the old storage. > > That''s pretty impressive. So with a X25-E slog ZFS is as fast > synchronously as > your previously hardware was asynchronously - but with no risk of > data corruption. > Of course the hardware is different so it''s not really apples to > apples.There was a thread not too along ago either on the xfs mailing list or mysql mailing list that talked about the Intel X25-E and it''s on board cache. The cache ignores flushes, but isn''t persistent on power failure. Pulling the drive during a sync write caused data corruption. You can disable the write back cache of these, but the performance is no where near as good with it disabled. -Ross
On Aug 22, 2009, at 7:33 PM, Ross Walker <rswwalker at gmail.com> wrote:> On Aug 22, 2009, at 5:21 PM, Neil Perrin <Neil.Perrin at Sun.COM> wrote: > >> >> >> On 08/20/09 06:41, Greg Mason wrote: >>> Something our users do quite a bit of is untarring archives with a >>> lot of small files. Also, many small, quick writes are also one of >>> the many workloads our users have. >>> Real-world test: our old Linux-based NFS server allowed us to >>> unpack a particular tar file (the source for boost 1.37) in around >>> 2-4 minutes, depending on load. This machine wasn''t special at >>> all, but it had fancy SGI disk on the back end, and was using the >>> Linux-specific async NFS option. >> >> I''m glad you mentioned this option. It turns all synchronous requests >> from the client into async allowing the server to immediately return >> without making the data stable. This is the equivalent of setting >> zil_disable. >> Async used to be the default behaviour. It must have been a shock >> to Linux >> users when suddenly NFS slowed down when synchronous became the >> default! >> I wonder what the perf numbers were without the async option. >> >>> We turned up our X4540s, and this same tar unpack took over 17 >>> minutes! We disabled the ZIL for testing, and we dropped this to >>> under 1 minute. With the X25-E as a slog, we were able to run this >>> test in 2-4 minutes, same as the old storage. >> >> That''s pretty impressive. So with a X25-E slog ZFS is as fast >> synchronously as >> your previously hardware was asynchronously - but with no risk of >> data corruption. >> Of course the hardware is different so it''s not really apples to >> apples. > > There was a thread not too along ago either on the xfs mailing list > or mysql mailing list that talked about the Intel X25-E and it''s on > board cache. The cache ignores flushes, but isn''t persistent on > power failure. Pulling the drive during a sync write caused data > corruption. You can disable the write back cache of these, but the > performance is no where near as good with it disabled.Here is the blog post: http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ -Ross
Ross Walker wrote:> [snip] >>> >>> >>>> We turned up our X4540s, and this same tar unpack took over 17 >>>> minutes! We disabled the ZIL for testing, and we dropped this to >>>> under 1 minute. With the X25-E as a slog, we were able to run this >>>> test in 2-4 minutes, same as the old storage. >>> >>> That''s pretty impressive. So with a X25-E slog ZFS is as fast >>> synchronously as >>> your previously hardware was asynchronously - but with no risk of >>> data corruption. >>> Of course the hardware is different so it''s not really apples to >>> apples. >> >> There was a thread not too along ago either on the xfs mailing list >> or mysql mailing list that talked about the Intel X25-E and it''s on >> board cache. The cache ignores flushes, but isn''t persistent on power >> failure. Pulling the drive during a sync write caused data >> corruption. You can disable the write back cache of these, but the >> performance is no where near as good with it disabled. > > Here is the blog post: > > http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ > > > -Ross >Hang on, in reading that his initial results were 50 writes a second, with the default xfs write barriers, which to me implies that the drive is honouring the cache flush. The fact that write rate jumps so significantly when he turns off barriers, but continues with ODIRECT and innodb_flush_log_at_trx_commit=1 to me just says that xfs is returning success on writes as soon as the data has been given to the drive - not when the drive has flushed it''s cache to have it persistent. Given that we told xfs to turn off write barriers - isn''t it doing what it''s told? Why are we expecting data to be consistent across power loss or device removal? Couldn''t this just be XFS only actually requesting cache flushes when barrier''s are enabled? T * * -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090823/62476208/attachment.html>
On Sun, Aug 23 at 14:11, Tristan Ball wrote:> Hang on, in reading that his initial results were 50 writes a second, with > the default xfs write barriers, which to me implies that the drive is > honouring the cache flush. The fact that write rate jumps so significantly > when he turns off barriers, but continues with ODIRECT and > innodb_flush_log_at_trx_commit=1 to me just says that xfs is returning > success on writes as soon as the data has been given to the drive - not > when the drive has flushed it''s cache to have it persistent. Given that we > told xfs to turn off write barriers - isn''t it doing what it''s told? Why > are we expecting data to be consistent across power loss or device > removal? > > Couldn''t this just be XFS only actually requesting cache flushes when > barrier''s are enabled?Yea, I parsed the article the same way. 50 IOPS with xfs barriers (crappy by any standard) 5300 cache-enabled IOPS or 1200 cache-disabled IOPS with the X25-E, and he tried yanking the power while doing work with cache disabled and it didn''t lose any transactions. Seems like it was behaving as expected, unless I misunderstood something. -- Eric D. Mudama edmudama at mail.bounceswoosh.org
On Aug 23, 2009, at 12:11 AM, Tristan Ball <Tristan.Ball at leica-microsystems.com > wrote:> > > Ross Walker wrote: >> >> [snip] >>> >>>> >>>> >>>>> We turned up our X4540s, and this same tar unpack took over 17 >>>>> minutes! We disabled the ZIL for testing, and we dropped this to >>>>> under 1 minute. With the X25-E as a slog, we were able to run >>>>> this test in 2-4 minutes, same as the old storage. >>>> >>>> That''s pretty impressive. So with a X25-E slog ZFS is as fast >>>> synchronously as >>>> your previously hardware was asynchronously - but with no risk of >>>> data corruption. >>>> Of course the hardware is different so it''s not really apples to >>>> apples. >>> >>> There was a thread not too along ago either on the xfs mailing >>> list or mysql mailing list that talked about the Intel X25-E and >>> it''s on board cache. The cache ignores flushes, but isn''t >>> persistent on power failure. Pulling the drive during a sync write >>> caused data corruption. You can disable the write back cache of >>> these, but the performance is no where near as good with it >>> disabled. >> >> Here is the blog post: >> >> http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ >> >> -Ross >> > Hang on, in reading that his initial results were 50 writes a > second, with the default xfs write barriers, which to me implies > that the drive is honouring the cache flush. The fact that write > rate jumps so significantly when he turns off barriers, but > continues with ODIRECT and innodb_flush_log_at_trx_commit=1 to me > just says that xfs is returning success on writes as soon as the > data has been given to the drive - not when the drive has flushed > it''s cache to have it persistent. Given that we told xfs to turn off > write barriers - isn''t it doing what it''s told? Why are we expecting > data to be consistent across power loss or device removal? > > Couldn''t this just be XFS only actually requesting cache flushes > when barrier''s are enabled?I think it''s more an illustration that write barriers on Linux need a little work, even with flushes it should do a lot better then 50 IOPS. O_DIRECT does just that, with or without barriers, it flushes on each write, with an ever so slight delay to allow the queue to coalesce writes. A barrier is more to enforce order and persistence when IO is async. -Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090823/a98d47ca/attachment.html>
On Aug 23, 2009, at 9:59 AM, Ross Walker <rswwalker at gmail.com> wrote:> On Aug 23, 2009, at 12:11 AM, Tristan Ball <Tristan.Ball at leica-microsystems.com > > wrote: > >> >> >> Ross Walker wrote: >>> >>> [snip] >>>> >>>>> >>>>> >>>>>> We turned up our X4540s, and this same tar unpack took over 17 >>>>>> minutes! We disabled the ZIL for testing, and we dropped this >>>>>> to under 1 minute. With the X25-E as a slog, we were able to >>>>>> run this test in 2-4 minutes, same as the old storage. >>>>> >>>>> That''s pretty impressive. So with a X25-E slog ZFS is as fast >>>>> synchronously as >>>>> your previously hardware was asynchronously - but with no risk >>>>> of data corruption. >>>>> Of course the hardware is different so it''s not really apples to >>>>> apples. >>>> >>>> There was a thread not too along ago either on the xfs mailing >>>> list or mysql mailing list that talked about the Intel X25-E and >>>> it''s on board cache. The cache ignores flushes, but isn''t >>>> persistent on power failure. Pulling the drive during a sync >>>> write caused data corruption. You can disable the write back >>>> cache of these, but the performance is no where near as good with >>>> it disabled. >>> >>> Here is the blog post: >>> >>> http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ >>> >>> -Ross >>> >> Hang on, in reading that his initial results were 50 writes a >> second, with the default xfs write barriers, which to me implies >> that the drive is honouring the cache flush. The fact that write >> rate jumps so significantly when he turns off barriers, but >> continues with ODIRECT and innodb_flush_log_at_trx_commit=1 to me >> just says that xfs is returning success on writes as soon as the >> data has been given to the drive - not when the drive has flushed >> it''s cache to have it persistent. Given that we told xfs to turn >> off write barriers - isn''t it doing what it''s told? Why are we >> expecting data to be consistent across power loss or device removal? >> >> Couldn''t this just be XFS only actually requesting cache flushes >> when barrier''s are enabled? > > I think it''s more an illustration that write barriers on Linux need > a little work, even with flushes it should do a lot better then 50 > IOPS. > > O_DIRECT does just that, with or without barriers, it flushes on > each write, with an ever so slight delay to allow the queue to > coalesce writes.My bad O_DIRECT does NOT do that, it just goes direct to the driver bypassing page cache. Allows for low latency IO and arbitrary IO sizes for throughput (instead of page sized IO), but it doesn''t enforce persistence.> A barrier is more to enforce order and persistence when IO is async.I suspect that since XFS can use an internal or external log like ZFS does, that when a barrier is issued it is issued across all devices in the file system since XFS doesn''t know about the actual physical layout like ZFS does and that is why the IOPS are so low with XFS barriers. -Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090823/86254b7f/attachment.html>
On Sun, 23 Aug 2009, Ross Walker wrote:>> >> O_DIRECT does just that, with or without barriers, it flushes on each >> write, with an ever so slight delay to allow the queue to coalesce writes. > > My bad O_DIRECT does NOT do that, it just goes direct to the driver bypassing > page cache. Allows for low latency IO and arbitrary IO sizes for throughput > (instead of page sized IO), but it doesn''t enforce persistence.Right. And Solaris does not support O_DIRECT. It does provide a directio() function which requests similar functionality (as a hint) but zfs does not support direct I/O. Linux O_DIRECT basically requires an application re-write to use it and its precise function tends to change across major kernel releases. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/