I''ve started reading up on this, and I know I have alot more reading to do, but I''ve already got some questions... :) I''m not sure yet that it will help for my purposes, but I was considering buying 2 SSD''s for mirrored boot devices anyway. My main question is: Can a pair of say 60GB SSD''s be shared for both the root pool and as an SSD ZIL? Can the installer be configured to make the slice for the root pool to be something less than the whole disk? leaving another slice for the ZIL? Or would a zVOL in the root pool be a better idea? I doubt 60GB will leave enough space, but would doing this for the L2ARC be useful also? -Kyle
I can''t speak to whether it''s a good idea or not, but I also wanted to do this and it was rather difficult. The problem is the opensolaris installer doesn''t let you setup slices on a device to install to. The two ways I came up with were: 1) using the automated installer to do everything because it has the option to configure slices before installing files. this requires learning a lot about the AI just to configure slices before installing. 2) - install like normal on one drive - setup drive #2 with the partition map that you want to have - zpool replace drive #1 with drive #2 with altered partition map - setup drive #1 with new partition map - zpool attach drive #1 - install grub on both drives Even though approach #2 probably sounds more difficult, I ended up doing it that way and setup a root slice on each, a slog slice on each, and 2 independent swap slices. I would also like to hear if there''s any other way to make this easier or any problems with my approach that I might have overlooked. -- This message posted from opensolaris.org
Thanks posting this solution. But I would like to point out that bug 6574286 "removing a slog doesn''t work" still isn''t resolved. A solution is under it''s way, according to George Wilson. But in the mean time, IF something happens you might be in a lot of trouble. Even without some unfortunate incident you cannot for example export your data pool, pull the drives and leave the root pool. Don''t get me wrong I would like such a setup a lot. But I''m not going to implement it until the slog can be removed or the pool be imported without the slog. In the mean time can someone confirm that in such a case, root pool and zil in two slices and mirrored, that the write cache can be enabled with format? Only zfs is using the disk, but perhaps I''m wrong on this. There have been post''s regarding enabling the write_cache. But I couldn''t find a conclusive answer for the above scenario. Regards, Frederik -- This message posted from opensolaris.org
F. Wessels wrote:> Thanks posting this solution. > > But I would like to point out that bug 6574286 "removing a slog doesn''t work" still isn''t resolved. A solution is under it''s way, according to George Wilson. But in the mean time, IF something happens you might be in a lot of trouble. Even without some unfortunate incident you cannot for example export your data pool, pull the drives and leave the root pool. >In my case the slog slice wouldn''t be the slog for the root pool, it would be the slog for a second data pool. If the device went bad, I''d have to replace it, true. But if the device goes bad, then so did a good part of my root pool, and I''d have to replace that too.> Don''t get me wrong I would like such a setup a lot. But I''m not going to implement it until the slog can be removed or the pool be imported without the slog. > > In the mean time can someone confirm that in such a case, root pool and zil in two slices and mirrored, that the write cache can be enabled with format? Only zfs is using the disk, but perhaps I''m wrong on this. There have been post''s regarding enabling the write_cache. But I couldn''t find a conclusive answer for the above scenario. > >When you have just the root pool on a disk, ZFS won''t enable the write cache by default. I think you can manually enable it but I don''t know the dangers. Adding the slog shouldn''t be any different. To be honest, I don''t know how closely the write caching on a SSD matches what a moving disk has. -Kyle
On Thu, Jul 23, 2009 at 10:28:38AM -0400, Kyle McDonald wrote:> > > In my case the slog slice wouldn''t be the slog for the root pool, it > would be the slog for a second data pool.I didn''t think you could add a slog to the root pool anyway. Or has that changed in recent builds? I''m a little behind on my SXCE versions, been too busy to keep up. :)> When you have just the root pool on a disk, ZFS won''t enable the write > cache by default.I don''t think this is limited to root pools. None of my pools (root or non-root) seem to have the write cache enabled. Now that I think about it, all my disks are "hidden" behind an LSI1078 controller so I''m not sure what sort of impact that would have on the situation. -brian -- "Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you''ll end up with a cupboard full of pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
Brian Hechinger wrote:> On Thu, Jul 23, 2009 at 10:28:38AM -0400, Kyle McDonald wrote: > >>> >>> >> In my case the slog slice wouldn''t be the slog for the root pool, it >> would be the slog for a second data pool. >> > > I didn''t think you could add a slog to the root pool anyway. Or has that > changed in recent builds? I''m a little behind on my SXCE versions, been > too busy to keep up. :) >I don''t know either. It''s not really what I was looking to do so I never even thought of it. :)> >> When you have just the root pool on a disk, ZFS won''t enable the write >> cache by default. >> > > I don''t think this is limited to root pools. None of my pools (root or > non-root) seem to have the write cache enabled. Now that I think about > it, all my disks are "hidden" behind an LSI1078 controller so I''m not > sure what sort of impact that would have on the situation. > >When you give the full disk (deivce name ''cWtXdY'' - with no ''sZ'' ) then ZFS will usually instruct the drive to enable write caching. You''re right though if youre drives are really something like single drive RAID 0 LUNs, then who knows what happens. -Kyle> -brian >
On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote:> F. Wessels wrote: >> Thanks posting this solution. >> >> But I would like to point out that bug 6574286 "removing a slog >> doesn''t work" still isn''t resolved. A solution is under it''s way, >> according to George Wilson. But in the mean time, IF something >> happens you might be in a lot of trouble. Even without some >> unfortunate incident you cannot for example export your data pool, >> pull the drives and leave the root pool. >> > In my case the slog slice wouldn''t be the slog for the root pool, it > would be the slog for a second data pool. > > If the device went bad, I''d have to replace it, true. But if the > device goes bad, then so did a good part of my root pool, and I''d > have to replace that too.Mirror the slog to match your mirrored root pool.>> Don''t get me wrong I would like such a setup a lot. But I''m not >> going to implement it until the slog can be removed or the pool be >> imported without the slog. >> >> In the mean time can someone confirm that in such a case, root pool >> and zil in two slices and mirrored, that the write cache can be >> enabled with format? Only zfs is using the disk, but perhaps I''m >> wrong on this. There have been post''s regarding enabling the >> write_cache. But I couldn''t find a conclusive answer for the above >> scenario. >> >> > When you have just the root pool on a disk, ZFS won''t enable the > write cache by default. I think you can manually enable it but I > don''t know the dangers. Adding the slog shouldn''t be any different. > To be honest, I don''t know how closely the write caching on a SSD > matches what a moving disk has.Write caches only help hard disks. Most (all?) SSDs do not have volatile write buffers. Volatile write buffers are another "bad thing" you can forget when you go to SSDs :-) -- richard
Richard Elling wrote:> > On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: > >> F. Wessels wrote: >>> Thanks posting this solution. >>> >>> But I would like to point out that bug 6574286 "removing a slog >>> doesn''t work" still isn''t resolved. A solution is under it''s way, >>> according to George Wilson. But in the mean time, IF something >>> happens you might be in a lot of trouble. Even without some >>> unfortunate incident you cannot for example export your data pool, >>> pull the drives and leave the root pool. >>> >> In my case the slog slice wouldn''t be the slog for the root pool, it >> would be the slog for a second data pool. >> >> If the device went bad, I''d have to replace it, true. But if the >> device goes bad, then so did a good part of my root pool, and I''d >> have to replace that too. > > Mirror the slog to match your mirrored root pool.Yep. That was the plan. I was just explaining that not being able to remove the slog wasn''t an issue for me since I planned on always having that device available. I was more curious about whether there were any diown sides to sharing the SSD between the root pool and the slog? Thanks for the valuable input, Richard. -Kyle> >>> Don''t get me wrong I would like such a setup a lot. But I''m not >>> going to implement it until the slog can be removed or the pool be >>> imported without the slog. >>> >>> In the mean time can someone confirm that in such a case, root pool >>> and zil in two slices and mirrored, that the write cache can be >>> enabled with format? Only zfs is using the disk, but perhaps I''m >>> wrong on this. There have been post''s regarding enabling the >>> write_cache. But I couldn''t find a conclusive answer for the above >>> scenario. >>> >>> >> When you have just the root pool on a disk, ZFS won''t enable the >> write cache by default. I think you can manually enable it but I >> don''t know the dangers. Adding the slog shouldn''t be any different. >> To be honest, I don''t know how closely the write caching on a SSD >> matches what a moving disk has. > > Write caches only help hard disks. Most (all?) SSDs do not have > volatile write buffers. > Volatile write buffers are another "bad thing" you can forget when you > go to SSDs :-) > -- richard >
On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote:> Richard Elling wrote: >> >> On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: >> >>> F. Wessels wrote: >>>> Thanks posting this solution. >>>> >>>> But I would like to point out that bug 6574286 "removing a slog >>>> doesn''t work" still isn''t resolved. A solution is under it''s way, >>>> according to George Wilson. But in the mean time, IF something >>>> happens you might be in a lot of trouble. Even without some >>>> unfortunate incident you cannot for example export your data >>>> pool, pull the drives and leave the root pool. >>>> >>> In my case the slog slice wouldn''t be the slog for the root pool, >>> it would be the slog for a second data pool. >>> >>> If the device went bad, I''d have to replace it, true. But if the >>> device goes bad, then so did a good part of my root pool, and I''d >>> have to replace that too. >> >> Mirror the slog to match your mirrored root pool. > Yep. That was the plan. I was just explaining that not being able to > remove the slog wasn''t an issue for me since I planned on always > having that device available. > > I was more curious about whether there were any diown sides to > sharing the SSD between the root pool and the slog?I think it is a great idea, assuming the SSD has good write performance. -- richard
Richard Elling wrote:> On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote: > >> Richard Elling wrote: >>> >>> On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: >>> >>>> F. Wessels wrote: >>>>> Thanks posting this solution. >>>>> >>>>> But I would like to point out that bug 6574286 "removing a slog >>>>> doesn''t work" still isn''t resolved. A solution is under it''s way, >>>>> according to George Wilson. But in the mean time, IF something >>>>> happens you might be in a lot of trouble. Even without some >>>>> unfortunate incident you cannot for example export your data pool, >>>>> pull the drives and leave the root pool. >>>>> >>>> In my case the slog slice wouldn''t be the slog for the root pool, >>>> it would be the slog for a second data pool. >>>> >>>> If the device went bad, I''d have to replace it, true. But if the >>>> device goes bad, then so did a good part of my root pool, and I''d >>>> have to replace that too. >>> >>> Mirror the slog to match your mirrored root pool. >> Yep. That was the plan. I was just explaining that not being able to >> remove the slog wasn''t an issue for me since I planned on always >> having that device available. >> >> I was more curious about whether there were any diown sides to >> sharing the SSD between the root pool and the slog? > > I think it is a great idea, assuming the SSD has good write performance.This one claims up to 230MB/s read and 180MB/s write and it''s only $196. http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 Compared to this one (250MB/s read and 170MB/s write) which is $699. Are those claims really trustworthy? They sound too good to be true! -Kyle> -- richard >
Kyle McDonald wrote:> Richard Elling wrote: >> On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote: >> >>> Richard Elling wrote: >>>> >>>> On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: >>>> >>>>> F. Wessels wrote: >>>>>> Thanks posting this solution. >>>>>> >>>>>> But I would like to point out that bug 6574286 "removing a slog >>>>>> doesn''t work" still isn''t resolved. A solution is under it''s way, >>>>>> according to George Wilson. But in the mean time, IF something >>>>>> happens you might be in a lot of trouble. Even without some >>>>>> unfortunate incident you cannot for example export your data >>>>>> pool, pull the drives and leave the root pool. >>>>>> >>>>> In my case the slog slice wouldn''t be the slog for the root pool, >>>>> it would be the slog for a second data pool. >>>>> >>>>> If the device went bad, I''d have to replace it, true. But if the >>>>> device goes bad, then so did a good part of my root pool, and I''d >>>>> have to replace that too. >>>> >>>> Mirror the slog to match your mirrored root pool. >>> Yep. That was the plan. I was just explaining that not being able to >>> remove the slog wasn''t an issue for me since I planned on always >>> having that device available. >>> >>> I was more curious about whether there were any diown sides to >>> sharing the SSD between the root pool and the slog? >> >> I think it is a great idea, assuming the SSD has good write performance. > This one claims up to 230MB/s read and 180MB/s write and it''s only $196. > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 > > Compared to this one (250MB/s read and 170MB/s write) which is $699. >Oops. Forgot the link: http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014> Are those claims really trustworthy? They sound too good to be true! > > -Kyle > >> -- richard >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >> I think it is a great idea, assuming the SSD has good write performance. > > This one claims up to 230MB/s read and 180MB/s write and it''s only $196. > > > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 > > > > Compared to this one (250MB/s read and 170MB/s write) which is $699. > > > Oops. Forgot the link: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 > > Are those claims really trustworthy? They sound too good to be true! > > > > -KyleKyle- The less expensive SSD is an MLC device. The Intel SSD is an SLC device. That right there accounts for the cost difference. The SLC device (Intel X25-E) will last quite a bit longer than the MLC device. -Greg -- Greg Mason System Administrator Michigan State University High Performance Computing Center
Greg Mason wrote:>>>> I think it is a great idea, assuming the SSD has good write performance. >>>> >>> This one claims up to 230MB/s read and 180MB/s write and it''s only $196. >>> >>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 >>> >>> Compared to this one (250MB/s read and 170MB/s write) which is $699. >>> >>> >> Oops. Forgot the link: >> >> http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 >> >>> Are those claims really trustworthy? They sound too good to be true! >>> >>> -Kyle >>> > > Kyle- > > The less expensive SSD is an MLC device. The Intel SSD is an SLC device. > That right there accounts for the cost difference. The SLC device (Intel > X25-E) will last quite a bit longer than the MLC device. >I understand that. That''s why I picked that one to compare. It was my understanding that the MLC drives weren''t even close performance wise to the SLC ones. This one seems pretty close. How can that be? -Kyle> > -Greg > >
In the context of a low-volume file server, for a few users, is the low-end Intel SSD sufficient? A. -- Adam Sherman +1.613.797.6819 On 2009-07-23, at 14:09, Greg Mason <gmason at msu.edu> wrote:>>>> I think it is a great idea, assuming the SSD has good write >>>> performance. >>> This one claims up to 230MB/s read and 180MB/s write and it''s only >>> $196. >>> >>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 >>> >>> Compared to this one (250MB/s read and 170MB/s write) which is $699. >>> >> Oops. Forgot the link: >> >> http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 >>> Are those claims really trustworthy? They sound too good to be true! >>> >>> -Kyle > > Kyle- > > The less expensive SSD is an MLC device. The Intel SSD is an SLC > device. > That right there accounts for the cost difference. The SLC device > (Intel > X25-E) will last quite a bit longer than the MLC device. > > -Greg > > -- > Greg Mason > System Administrator > Michigan State University > High Performance Computing Center > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Adam Sherman wrote:> In the context of a low-volume file server, for a few users, is the > low-end Intel SSD sufficient? >You''re right, it supposedly has less than half the the write speed, and that probably won''t matter for me, but I can''t find a 64GB version of it for sale, and the 80GB version is over 50% more at $314. -Kyle> A. >
> I think it is a great idea, assuming the SSD has good write performance. > This one claims up to 230MB/s read and 180MB/s write and it''s only $196. > > http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 > > Compared to this one (250MB/s read and 170MB/s write) which is $699. > > Are those claims really trustworthy? They sound too good to be true!MB/s numbers are not a good indication of performance. What you should pay attention to are usually random IOPS write and read. They tend to correlate a bit, but those numbers on newegg are probably just best case from the manufacturer. In the world of consumer grade SSDs, Intel has crushed everyone on IOPS performance.. but the other manufacturers are starting to catch up a bit. -- This message posted from opensolaris.org
>I don''t think this is limited to root pools. None of my pools (root or >non-root) seem to have the write cache enabled. Now that I think about >it, all my disks are "hidden" behind an LSI1078 controller so I''m not >sure what sort of impact that would have on the situation.I have a few of those controllers as well, I wouldn''t believe for a second that ZFS could change the controller config for an ld, I couldn''t see how? # /usr/local/bin/CLI/MegaCli -LdGetProp -DskCache -LAll -a0 Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled Adapter 0-VD 1(target id: 1): Disk Write Cache : Disabled Adapter 0-VD 2(target id: 2): Disk Write Cache : Disabled Adapter 0-VD 3(target id: 3): Disk Write Cache : Disabled Adapter 0-VD 4(target id: 4): Disk Write Cache : Disabled The comment later about defining a pool w/ and w/o the sX syntax warrants a test:) Good to keep in mind... jlc
On Jul 23, 2009, at 11:09 AM, Greg Mason wrote:>>>> I think it is a great idea, assuming the SSD has good write >>>> performance. >>> This one claims up to 230MB/s read and 180MB/s write and it''s only >>> $196. >>> >>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 >>> >>> Compared to this one (250MB/s read and 170MB/s write) which is $699. >>> >> Oops. Forgot the link: >> >> http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 >>> Are those claims really trustworthy? They sound too good to be true! >>> >>> -Kyle > > Kyle- > > The less expensive SSD is an MLC device. The Intel SSD is an SLC > device.Some newer designs use both SLC and MLC. It is no longer possible to use SLC vs MLC as a primary differentiator. Use the specifications. -- richard
On Thu, 2009-07-23 at 14:24 -0700, Richard Elling wrote:> On Jul 23, 2009, at 11:09 AM, Greg Mason wrote: > > >>>> I think it is a great idea, assuming the SSD has good write > >>>> performance. > >>> This one claims up to 230MB/s read and 180MB/s write and it''s only > >>> $196. > >>> > >>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 > >>> > >>> Compared to this one (250MB/s read and 170MB/s write) which is $699. > >>> > >> Oops. Forgot the link: > >> > >> http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 > >>> Are those claims really trustworthy? They sound too good to be true! > >>> > >>> -Kyle > > > > Kyle- > > > > The less expensive SSD is an MLC device. The Intel SSD is an SLC > > device. > > Some newer designs use both SLC and MLC. It is no longer possible > to use SLC vs MLC as a primary differentiator. Use the specifications. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussI''m finding the new-gen MLC w/ large DRAM cache & improved microcontroller to be more than sufficient for workgroup server. e.g. the OCZ Summit series and similar. I suspect the Intel X25-M is likely good enough, too. I''m using one SSD for both read and write caches, and it''s good enough for a 20-person small workgroup server doing NFS. I suspect that write caches are much more sensitive to IOPS performance than read ones, but that''s just my feeling. In any case, I''d pay more attention to the IOPS rating for things, than the sync read/write speeds. I''m testing that set up right now for iSCSI-based xVM guests, so we''ll see if it can stand the IOPs. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
I didn''t meant using slog for the root pool. I meant using the slog for a data pool. Where the data pool consists of (rotating) hard disk and complement them with a ssd based slog. But instead of a dedicated ssd for the slog I want the root pool share the ssd with the slog. Both can mirrored to a second ssd. I think that in this scenario my initial concern remains. Since you cannot remove an slog from a pool, if you want to move the pool or something bad happens you''re in trouble. Richard I''m under the impression that most current ssd''s have a dram buffer. Some are used only for reading some are also used for writing. I''m pretty sure the sun LogZilla devices (the stec zeus) have a dram write buffer. Some have a supercap to flush the caches others don''t I''m trying to compile some guidelines regarding write caching, ssd and zfs. I don''t like the posts like "I can''t import my pool" "my pool went down the niagara falls" etc. So in order to prevent more of these stories I think it''s important to get it in the open if write caching can be enabled on ssd''s (full disk and slice usage) I''m really looking for a conclusive test to determine wether or not it can be enabled. Regards, Frederik -- This message posted from opensolaris.org
It just so happens I have one of the 128G and two of the 32G versions in my drawer, waiting to go into our "DR" disk array when it arrives. I dropped the 128G into a spare Dell 745 (2GB ram) and used a Ubuntu liveCD to run some simple iozone tests on it. I had some stability issues with Iozone crashing however I did get some results... Attached are what I''ve got. I intended to do two sets of tests, one for each of sequential reads, writes, and a "random IO" mix. I also wanted to do a second set of tests, running a streaming read or streaming write in parallel with the random IO mix, as I understand many SSD''s have trouble with those kind of workloads. As it turns out, so did my test PC. :-) I''ve used 8K IO sizes for all the stage one tests - I know I might get it to go faster with a larger size, but I like to know how well systems will do when I treat them badly! The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes, 5000 on reads. The Streaming write load and "random over writes" were started at the same time - although I didn''t see which one finished first, so it''s possible that the stream finished first and allowed the random run to finish strong. Basically take these numbers with several large grains of salt! Interestingly, the random IO mix doesn''t slow down much, but the streaming writes are hurt a lot. Regards, Tristan. -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of thomas Sent: Friday, 24 July 2009 5:23 AM To: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] SSD''s and ZFS...> I think it is a great idea, assuming the SSD has good writeperformance.> This one claims up to 230MB/s read and 180MB/s write and it''s only$196.> > http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 > > Compared to this one (250MB/s read and 170MB/s write) which is $699. > > Are those claims really trustworthy? They sound too good to be true!MB/s numbers are not a good indication of performance. What you should pay attention to are usually random IOPS write and read. They tend to correlate a bit, but those numbers on newegg are probably just best case from the manufacturer. In the world of consumer grade SSDs, Intel has crushed everyone on IOPS performance.. but the other manufacturers are starting to catch up a bit. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Stage_1_Latency_Run.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090724/a516773d/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Stage_1_MBsThru_Run.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090724/a516773d/attachment-0001.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Stage_1_OpsThru_Run.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090724/a516773d/attachment-0002.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Streaming_Write_Load_1.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090724/a516773d/attachment-0003.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Random_over_Writes.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090724/a516773d/attachment-0004.txt>
Tristan Ball wrote:> It just so happens I have one of the 128G and two of the 32G versions in > my drawer, waiting to go into our "DR" disk array when it arrives. > >Hi Tristan, Just so I can be clear, What model/brand are the drives you were testing? -Kyle> I dropped the 128G into a spare Dell 745 (2GB ram) and used a Ubuntu > liveCD to run some simple iozone tests on it. I had some stability > issues with Iozone crashing however I did get some results... > > Attached are what I''ve got. I intended to do two sets of tests, one for > each of sequential reads, writes, and a "random IO" mix. I also wanted > to do a second set of tests, running a streaming read or streaming write > in parallel with the random IO mix, as I understand many SSD''s have > trouble with those kind of workloads. > > As it turns out, so did my test PC. :-) > > I''ve used 8K IO sizes for all the stage one tests - I know I might get > it to go faster with a larger size, but I like to know how well systems > will do when I treat them badly! > > The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes, > 5000 on reads. > > > The Streaming write load and "random over writes" were started at the > same time - although I didn''t see which one finished first, so it''s > possible that the stream finished first and allowed the random run to > finish strong. Basically take these numbers with several large grains of > salt! > > Interestingly, the random IO mix doesn''t slow down much, but the > streaming writes are hurt a lot. > > Regards, > Tristan. > > > > -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of thomas > Sent: Friday, 24 July 2009 5:23 AM > To: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] SSD''s and ZFS... > > >> I think it is a great idea, assuming the SSD has good write >> > performance. > >> This one claims up to 230MB/s read and 180MB/s write and it''s only >> > $196. > >> http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 >> >> Compared to this one (250MB/s read and 170MB/s write) which is $699. >> >> Are those claims really trustworthy? They sound too good to be true! >> > > > MB/s numbers are not a good indication of performance. What you should > pay attention to are usually random IOPS write and read. They tend to > correlate a bit, but those numbers on newegg are probably just best case > from the manufacturer. > > In the world of consumer grade SSDs, Intel has crushed everyone on IOPS > performance.. but the other manufacturers are starting to catch up a > bit. > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Fri, 24 Jul 2009, Tristan Ball wrote:> > I''ve used 8K IO sizes for all the stage one tests - I know I might get > it to go faster with a larger size, but I like to know how well systems > will do when I treat them badly! > > The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes, > 5000 on reads.This seems like rather low random write performance. My 12-drive array of rotating rust obtains 3708.89 ops/sec. In order to be effective, it seems that a synchronous write log should perform considerably better than the backing store. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, 24 Jul 2009, Bob Friesenhahn wrote:> > This seems like rather low random write performance. My 12-drive array of > rotating rust obtains 3708.89 ops/sec. In order to be effective, it seems > that a synchronous write log should perform considerably better than the > backing store.Actually, it seems that the options used when testing the SSD do not request synchronous writes so your results are flawed. This is what I get with rotating rust when using identical iozone options to yours: Children see throughput for 6 random writers = 28567.95 ops/sec Parent sees throughput for 6 random writers = 16274.03 ops/sec Min throughput per process = 4134.08 ops/sec Max throughput per process = 5529.43 ops/sec Avg throughput per process = 4761.33 ops/sec Min xfer = 98038.00 ops Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ok, I re-tested my rotating rust with these iozone options (note that -o requests syncronous writes): iozone -t 6 -k 8 -i 0 -i 2 -O -r 8K -o -s 1G and obtained these results: Children see throughput for 6 random writers = 5700.49 ops/sec Parent sees throughput for 6 random writers = 4698.40 ops/sec Min throughput per process = 834.67 ops/sec Max throughput per process = 1120.91 ops/sec Avg throughput per process = 950.08 ops/sec Min xfer = 97593.00 ops I think that any SSD used as a cache should surely be able to do much better than that. Of course, it is my understanding that the zfs slog is written sequentially so perhaps this applies instead: Children see throughput for 6 initial writers = 7522.42 ops/sec Parent sees throughput for 6 initial writers = 5645.58 ops/sec Min throughput per process = 1095.62 ops/sec Max throughput per process = 1676.02 ops/sec Avg throughput per process = 1253.74 ops/sec Min xfer = 85589.00 ops Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Kyle McDonald
2009-Jul-24 17:46 UTC
[zfs-discuss] slog writing patterns vs SSD tech. (was SSD''s and ZFS...)
Bob Friesenhahn wrote:> > Of course, it is my understanding that the zfs slog is written > sequentially so perhaps this applies instead: >Actually, reading up on these drives I''ve started to wonder about the slog writing pattern. While these drives do seem to do a great job at random writes, most of the promise shows at sequential writes, so Does the slog attempt to write sequentially through the space given to it? Also there are all sorts of analysis out there about how the drives always attempt to write new data to the pages and blocks they know are empty since they can''t overwrite one page (usually 4k) without erasing the whole (512k) block the page is in. This leads to a drop in write performance after all the space (both the space you paid for, and any extra space the vendor putin to work around this issue) has been used once. This shows up in regular filesystems because when a file is deleted the drive only sees a new (over)write of some meta-data so the OS can record that the file is gone, but the drive is never told that the blocks the file was occupying are now free and can be pre-erased at the drives convience. The Drive vendors have come up with a new TRIM command, which some OS''s (Win7) are talking about supporting in their Filesystems. Obviously for use only as an sLog device ZFS itself doesn''t need (until people start using SSD''s as regular pool devices) to know how to use TRIM, but I would think that the slog code would need to use it in order to keep write speeds up and latencies down. No? If so, what''s the current concensus, thoughts, plans, etc. on if and when TRIM will be usable in Solaris/ZFS? -Kyle
>>>>> "km" == Kyle McDonald <KMcDonald at Egenera.COM> writes:km> hese drives do seem to do a great job at random writes, most km> of the promise shows at sequential writes, so Does the slog km> attempt to write sequentially through the space given to it? when writing to the slog, some user-visible application has to wait for the slog to return that the write was committed to disk. so whether it''s random or not, you''re waiting for it, and io/s translates closely into latency because the writes cannot be batched with normal level of aggressiveness (or they should jsut go to the main pool which will eventually have to handle the entire workload anyway). io/s is the number that matters. ``but but but but!'''' <thwack> NO! Everyone who is using the code, writing the code, and building the systems says, io/s is the number that matters. If you''ve got some experience otherwise, fine, odd things turn up all the time. but AFAICT the consensus is clear right now. km> they can''t overwrite one page (usually 4k) without erasing the km> whole (512k) block the page is in. don''t presume to get into the business of their black box so far. That''s almost certainly not what they do. They probably do COW like ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to partly-empty pages until the page is full. In the background a gc thread will evacuate and rewrite pages that have become spattered with unreferenced sectors. They will write to the flash filesystem to keep track of things about itself, like half-erased cels, toasted cels, per-cel erase counts. Then there is probably a defragmenter thread, or else the gc is itself data-reorganizing. And there is some lookup state kept in DRAM during operation, and reconstructed from post-mortem observation of what''s in the FLASH at boot, like with any filesystem. Just look at the observed performance on microbenchmarks or in actual use rather than trying to reverse-reason about these fancy and otherwise-unobtainable closed-source filesystems, whichis what they are really selling, in disk/``game cartridge'''' form factor. km> The Drive vendors have come up with a new TRIM command, which km> some OS''s (Win7) are talking about supporting in their km> Filesystems. this would be useful for VM''s with thin-provisioned disks, too. km> I would think that the slog code would need to use it in order km> to keep write speeds up and latencies down. No? read the goofy gamer site review please. No, not with the latest intel firmware, it''s not needed. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090724/ab9ec3c8/attachment.bin>
Richard Elling
2009-Jul-24 20:35 UTC
[zfs-discuss] slog writing patterns vs SSD tech. (was SSD''s and ZFS...)
On Jul 24, 2009, at 10:46 AM, Kyle McDonald wrote:> Bob Friesenhahn wrote: >> >> Of course, it is my understanding that the zfs slog is written >> sequentially so perhaps this applies instead: >> > Actually, reading up on these drives I''ve started to wonder about > the slog writing pattern. While these drives do seem to do a great > job at random writes, most of the promise shows at sequential > writes, so Does the slog attempt to write sequentially through the > space given to it?Short answer is yes. But you can measure it with iopattern. http://www.richardelling.com/Home/scripts-and-programs-1/iopattern use the -d option to look at your slog device. -- richard
Miles Nordin wrote:>>>>>> "km" == Kyle McDonald <KMcDonald at Egenera.COM> writes: >>>>>> > > km> hese drives do seem to do a great job at random writes, most > km> of the promise shows at sequential writes, so Does the slog > km> attempt to write sequentially through the space given to it? > > > <thwack> NO! Everyone who is using the code, writing the code, and > building the systems says, io/s is the number that matters. If you''ve > got some experience otherwise, fine, odd things turn up all the time. > but AFAICT the consensus is clear right now. > >Yeah I know. I get it. I screwed up and used the the wrong term. OK? I agree with you. Still when all the previously erased pages are gone, write latencies go up (drastically - in some cases worse than a spinning HD,) and io/s goes down. So what I really wanted to get into was the question below.> km> they can''t overwrite one page (usually 4k) without erasing the > km> whole (512k) block the page is in. > > don''t presume to get into the business of their black box so far. >I''m not. Guys like this are: http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=8> That''s almost certainly not what they do. They probably do COW like > ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to > partly-empty pages until the page is full. In the background a gc > thread will evacuate and rewrite pages that have become spattered with > unreferenced sectors.That''s where the problem comes in. They have no knowledge of the upper filesystem, and don''t know what previously written blocks are still referenced. When the OS FS rewrites a directory to remove a pointer to the string of blocks the file used to use, and updates it''s list of which LBA sectors are now free vs. in use, it probably happens pretty much exactly like you say. But that doesn''t let the SSD mark the sectors the file used as unreferenced, so the gc thread can''t "evacuate" them ahead of time and add them to the empty page pool.> km> The Drive vendors have come up with a new TRIM command, which > km> some OS''s (Win7) are talking about supporting in their > km> Filesystems. > > this would be useful for VM''s with thin-provisioned disks, too. >True. Keeping or Putting the ''holes'' back in the ''holey'' disk files when the VM frees up space would be very useful.> km> I would think that the slog code would need to use it in order > km> to keep write speeds up and latencies down. No? > > read the goofy gamer site review please. No, not with the latest > intel firmware, it''s not needed. >I did read at least one review that compared old and new firmware on the Intel M model. In that I''m pretty sure they still saw a performance hit (in latency) when the entire drive had been written to. It may have taken longer to hit, and it may have not been as drastic but it was still there. Which review are you talking about? So what if Intel has fixed it. Not everyone is going to use the intel drives. If the TRIM command (assuming it can help at all) can keep the other brands and models performing close to how they performed when new, then I''d say it''s useful in the ZFS slogs too - Just because one vendor might have made it unnecessary, doesn''t mean it is for everyone. Does it? -Kyle> ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Fri, 24 Jul 2009, Kyle McDonald wrote:> > http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=8This an interesting test report. Something quite interesting for zfs is if the write rate is continually high, then the write performance will be limited by the FLASH erase performance, regardless of the use of something like TRIM. TRIM only improves write latency in the case that the FLASH erase is able to keep ahead of the write rate. If the writes are bottlenecked, then using TRIM is likely to decrease the write performance. If data is written at an almost constant rate, then a time may come where the drive suddenly "hits the wall" and is no longer able to erase the data as fast as it comes in. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Jul 24, 2009, at 2:33 PM, Bob Friesenhahn wrote:> On Fri, 24 Jul 2009, Kyle McDonald wrote: >> >> http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=8 > > This an interesting test report. Something quite interesting for > zfs is if the write rate is continually high, then the write > performance will be limited by the FLASH erase performance, > regardless of the use of something like TRIM. TRIM only improves > write latency in the case that the FLASH erase is able to keep ahead > of the write rate. If the writes are bottlenecked, then using TRIM > is likely to decrease the write performance. If data is written at > an almost constant rate, then a time may come where the drive > suddenly "hits the wall" and is no longer able to erase the data as > fast as it comes in.Yep. Good thing we can "zfs add" a log to spread the load. NB "zfs add" log != "zfs attach" log -- richard
The 128G Supertalent Ultradrive ME. The larger version of the drive mentioned in the original post. Sorry, should have made that a little clearer. :-) T Kyle McDonald wrote:> Tristan Ball wrote: >> It just so happens I have one of the 128G and two of the 32G versions in >> my drawer, waiting to go into our "DR" disk array when it arrives. >> > Hi Tristan, > > Just so I can be clear, What model/brand are the drives you were testing? > > -Kyle > >> I dropped the 128G into a spare Dell 745 (2GB ram) and used a Ubuntu >> liveCD to run some simple iozone tests on it. I had some stability >> issues with Iozone crashing however I did get some results... >> >> Attached are what I''ve got. I intended to do two sets of tests, one for >> each of sequential reads, writes, and a "random IO" mix. I also wanted >> to do a second set of tests, running a streaming read or streaming write >> in parallel with the random IO mix, as I understand many SSD''s have >> trouble with those kind of workloads. >> >> As it turns out, so did my test PC. :-) >> I''ve used 8K IO sizes for all the stage one tests - I know I might get >> it to go faster with a larger size, but I like to know how well systems >> will do when I treat them badly! >> >> The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes, >> 5000 on reads. >> >> >> The Streaming write load and "random over writes" were started at the >> same time - although I didn''t see which one finished first, so it''s >> possible that the stream finished first and allowed the random run to >> finish strong. Basically take these numbers with several large grains of >> salt! >> >> Interestingly, the random IO mix doesn''t slow down much, but the >> streaming writes are hurt a lot. >> >> Regards, >> Tristan. >> >> >> >> -----Original Message----- >> From: zfs-discuss-bounces at opensolaris.org >> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of thomas >> Sent: Friday, 24 July 2009 5:23 AM >> To: zfs-discuss at opensolaris.org >> Subject: Re: [zfs-discuss] SSD''s and ZFS... >> >> >>> I think it is a great idea, assuming the SSD has good write >>> >> performance. >> >>> This one claims up to 230MB/s read and 180MB/s write and it''s only >>> >> $196. >> >>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 >>> >>> Compared to this one (250MB/s read and 170MB/s write) which is $699. >>> >>> Are those claims really trustworthy? They sound too good to be true! >>> >> >> >> MB/s numbers are not a good indication of performance. What you should >> pay attention to are usually random IOPS write and read. They tend to >> correlate a bit, but those numbers on newegg are probably just best case >> from the manufacturer. >> >> In the world of consumer grade SSDs, Intel has crushed everyone on IOPS >> performance.. but the other manufacturers are starting to catch up a >> bit. >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________
Bob Friesenhahn wrote:> On Fri, 24 Jul 2009, Tristan Ball wrote: >> >> I''ve used 8K IO sizes for all the stage one tests - I know I might get >> it to go faster with a larger size, but I like to know how well systems >> will do when I treat them badly! >> >> The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes, >> 5000 on reads. > > This seems like rather low random write performance. My 12-drive > array of rotating rust obtains 3708.89 ops/sec. In order to be > effective, it seems that a synchronous write log should perform > considerably better than the backing store. >That really depends on what you''re trying to achieve. Even if this single drive is only showing equivilient performance to a twelve drive array (and I suspect your 3700 ops/sec would slow down over a bigger data set, as seeks make more of an impact), that still means that if the SSD is used as a ZIL, those sync writes don''t have to be written to the spinning disks immediately giving the scheduler a better change to order the IO''s providing better over all latency response for the requests that are going to disk. And while I didn''t make it clear, I actually intend to use the 128G drive as a L2ARC. While it''s effectiveness will obviously depend on the access patterns, the cost of adding the drive to the array is basically trivial, and it significantly increases the total ops/sec the array is capable of during those times that the access patterns provide for it. For my use, it was a case of "might as well". :-) Tristan.