Hi, I have a dual xeon 64GB 1U server with two free 3.5" drive slots. I also have a free PCI-E slot. I''m going to run a postgress database with a business intelligence application. The database size is not really set. It will be between 250-500GB running on Solaris 10 or b134. My storage choices are 1. OCZ Z-Drive R2 which fits in a 1U PCI slot. The docs say it has a raid controller built in. I don''t know if that can be disabled. 2. Mirrored OCZ Talos C Series 3.5" SAS <http://www.ocztechnology.com/products/solid-state-drives/sas.html> drives. 3. Mirrored OCZ SATA II 3.5" <http://www.ocztechnology.com/products/solid_state_drives/sata_3_5_solid_state_drives> drives. I''m looking for comments on the above drives or recommendations on other affordable drives running in a pure SSD pool. Also what drives do you run as a pure SSD pool? Thanks Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110709/4c7b5a29/attachment.html>
> I have a dual xeon 64GB 1U server with two free 3.5" drive slots. I > also have a free PCI-E slot. > > I''m going to run a postgress database with a business intelligence > application. > > The database size is not really set. It will be between 250-500GB > running on Solaris 10 or b134.Running business critical stuff on b134 isn''t what I''d recommend - no updates anymore - either use S10, or S11ex or perhaps openindiana.> My storage choices are > > 1. OCZ Z-Drive R2 which fits in a 1U PCI slot. The docs say it has a > raid controller built in. I don''t know if that can be disabled. > 2. Mirrored OCZ Talos C Series 3.5" SAS drives. > 3. Mirrored OCZ SATA II 3.5" drives. > > I''m looking for comments on the above drives or recommendations on > other affordable drives running in a pure SSD pool. > > Also what drives do you run as a pure SSD pool?Most drives should work well for a pure SSD pool. I have a postgresql database on a linux box on a mirrored set of C300s. AFAIK ZFS doesn''t yet support TRIM, so that can be an issue. Apart from that, it should work well. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Sat, Jul 9, 2011 at 2:19 PM, Roy Sigurd Karlsbakk <roy at karlsbakk.net> wrote:> Most drives should work well for a pure SSD pool. I have a postgresql database on a linux box on a mirrored set of C300s. AFAIK ZFS doesn''t yet support TRIM, so that can be an issue. Apart from that, it should work well.Interesting-- what is the suspected impact of not having TRIM support? I would imagine this might cause slow writes after some time of use, since the OS isn''t able to tell the drive what blocks have been freed, so the drive doesn''t get a chance to "pre-erase" those blocks before another write comes along that would occupy them anew. I''m contemplating a similar setup for some servers, so I''m interested if other people have been operating pure-SSD zpools and what their experiences have been. Eric
On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul <esproul at omniti.com> wrote:> Interesting-- what is the suspected impact of not having TRIM support?There shouldn''t be much, since zfs isn''t changing data in place. Any drive with reasonable garbage collection (which is pretty much everything these days) should be fine until the volume gets very full. -B -- Brandon High : bhigh at freaks.com
2011-07-12 9:06, Brandon High ?????:> On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul<esproul at omniti.com> wrote: >> Interesting-- what is the suspected impact of not having TRIM support? > There shouldn''t be much, since zfs isn''t changing data in place. Any > drive with reasonable garbage collection (which is pretty much > everything these days) should be fine until the volume gets very full.I wonder if in this case it would be beneficial to slice i.e. 90% of an SSD for use in ZFS pool(s) and leave the rest of the disk unassigned to any partition or slice? This would reserve some sectors as never-written-to-by-OS. Would this ease the life for SSD devices without TRIM between them ans the OS? Curious, //Jim
On Tue, Jul 12, 2011 at 6:18 PM, Jim Klimov <jimklimov at cos.ru> wrote:> 2011-07-12 9:06, Brandon High ?????: >> >> On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul<esproul at omniti.com> ?wrote: >>> >>> Interesting-- what is the suspected impact of not having TRIM support? >> >> There shouldn''t be much, since zfs isn''t changing data in place. Any >> drive with reasonable garbage collection (which is pretty much >> everything these days) should be fine until the volume gets very full. > > I wonder if in this case it would be beneficial to slice i.e. 90% > of an SSD for use in ZFS pool(s) and leave the rest of the > disk unassigned to any partition or slice? This would reserve > some sectors as never-written-to-by-OS. Would this ease the > life for SSD devices without TRIM between them ans the OS?Possibly so. That is, assuming your SSD has a controller (e.g. sandforce-based) that''s able to do some kind of wear-leveling. They maximize the number of unused sector by using compression, dedup, and reserving some space internally, but if you keep some space ununsed it should add up the number of "free" sectors (thus enabling it to rewrite the same sector less often, prolonging the disk lifetime). -- Fajar
On Tue, Jul 12, 2011 at 1:06 AM, Brandon High <bhigh at freaks.com> wrote:> On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul <esproul at omniti.com> wrote: >> Interesting-- what is the suspected impact of not having TRIM support? > > There shouldn''t be much, since zfs isn''t changing data in place. Any > drive with reasonable garbage collection (which is pretty much > everything these days) should be fine until the volume gets very full.But that''s exactly the problem-- ZFS being copy-on-write will eventually have written to all of the available LBA addresses on the drive, regardless of how much live data exists. It''s the rate of change, in other words, rather than the absolute amount that gets us into trouble with SSDs. The SSD has no way of knowing what blocks contain live data and which have been freed, because the OS never tells it (that''s what TRIM is supposed to do). So after ZFS has written to almost every LBA, it starts writing to addresses previously used (and freed by ZFS, but unknown to the SSD), so the SSD has to erase the cell before it can be written anew. This incurs a heavy performance penalty and seems like a worst-case-scenario use case. Now, others have hinted that certain controllers are better than others in the absence of TRIM, but I don''t see how GC could know what blocks are available to be erased without information from the OS. Those with deep knowledge of SSD models/controllers: how does the Intel 320 perform under ZFS as primary storage (not ZIL or L2ARC)? Eric
I think high end SSDs, like those from Pliant, use a significant amount of "over allocation", and internal remapping and internal COW, so that they can automatically garbage collect when they need to, without TRIM. This only works if the drive has enough extra free space that it knows about (because it uses overallocation for example). TRIM support is still something we want in ZFS, for a variety of reasons, including SSD performance. I think you can expect to hear more on this front before too much longer, so stay tuned. -- Garrett D''Amore On Jul 12, 2011, at 7:42 AM, "Eric Sproul" <esproul at omniti.com> wrote:> On Tue, Jul 12, 2011 at 1:06 AM, Brandon High <bhigh at freaks.com> wrote: >> On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul <esproul at omniti.com> wrote: >>> Interesting-- what is the suspected impact of not having TRIM support? >> >> There shouldn''t be much, since zfs isn''t changing data in place. Any >> drive with reasonable garbage collection (which is pretty much >> everything these days) should be fine until the volume gets very full. > > But that''s exactly the problem-- ZFS being copy-on-write will > eventually have written to all of the available LBA addresses on the > drive, regardless of how much live data exists. It''s the rate of > change, in other words, rather than the absolute amount that gets us > into trouble with SSDs. The SSD has no way of knowing what blocks > contain live data and which have been freed, because the OS never > tells it (that''s what TRIM is supposed to do). So after ZFS has > written to almost every LBA, it starts writing to addresses previously > used (and freed by ZFS, but unknown to the SSD), so the SSD has to > erase the cell before it can be written anew. This incurs a heavy > performance penalty and seems like a worst-case-scenario use case. > > Now, others have hinted that certain controllers are better than > others in the absence of TRIM, but I don''t see how GC could know what > blocks are available to be erased without information from the OS. > > Those with deep knowledge of SSD models/controllers: how does the > Intel 320 perform under ZFS as primary storage (not ZIL or L2ARC)? > > Eric > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, 12 Jul 2011, Eric Sproul wrote:> > Now, others have hinted that certain controllers are better than > others in the absence of TRIM, but I don''t see how GC could know what > blocks are available to be erased without information from the OS.Drives which keep spare space in reserve (as any responsible product will do) can be assured that any flash block which is requested to overwritten is available to be erased. The main issue occurs when the overwrites don''t span a flash erasure block since then the SSD controller needs to make a value judgement about what to do.> Those with deep knowledge of SSD models/controllers: how does the > Intel 320 perform under ZFS as primary storage (not ZIL or L2ARC)?My biggest fear with using consumer SSDs is if the SSD properly honors cache flush requests. If cache flush requests are not properly honored, then the pool could be bricked (or at least require a recovery action) due to a power failure. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
It is hard to say, 90% or 80%. SSD has already?reserved?overprovisioning places for garbage collection and wear leveling. The OS level only knows?file LBA, not the physical LBA mapping to flash pages/block.?Uberblock updates and COW from ZFS will use a new page/block each time. A TRIM command from ZFS level should be a better solution but?RAID is still a problem for TRIM at the OS level. Henry ----- Original Message ---- From: Jim Klimov <jimklimov at cos.ru> Cc: ZFS Discussions <zfs-discuss at opensolaris.org> Sent: Tue, July 12, 2011 4:18:28 AM Subject: Re: [zfs-discuss] Pure SSD Pool 2011-07-12 9:06, Brandon High ?????:> On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul<esproul at omniti.com>? wrote: >> Interesting-- what is the suspected impact of not having TRIM support? > There shouldn''t be much, since zfs isn''t changing data in place. Any > drive with reasonable garbage collection (which is pretty much > everything these days) should be fine until the volume gets very full.I wonder if in this case it would be beneficial to slice i.e. 90% of an SSD for use in ZFS pool(s) and leave the rest of the disk unassigned to any partition or slice? This would reserve some sectors as never-written-to-by-OS. Would this ease the life for SSD devices without TRIM between them ans the OS? Curious, //Jim _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
FYI - virtually all non-super-low-end SSDs are already significantly over-provisioned, for GC and scratch use inside the controller. In fact, the only difference between the OCZ "extended" models and the non-extended models (e.g. Vertex 2 50G (OCZSSD2-2VTX50G) and Vertex 2 Extended 60G (OCZSSD2-2VTXE60G)) is the amount of extra flash dedicated to scratch. Both the aforementioned drives have 64G of flash chips - it''s just that the 50G one uses significantly more for scratch, and thus, will perform better under heavy use. Over-provisioning at the filesystem level is unlikely to significantly improve things, as the SSD controller generally only uses what it considered "scratch" as such - that is, while not using 10G at the filesystem level might seem useful, overall, my understanding of SSD''s controller usage patterns is that this generally isn''t that much of a performance gain. E.g. you''d be better off buying the 50G Vertex 2 and fully using it than the 60G model and only using 50G on it. -Erik On Tue, 2011-07-12 at 10:10 -0700, Henry Lau wrote:> It is hard to say, 90% or 80%. SSD has already reserved overprovisioning places > for garbage collection and wear leveling. The OS level only knows file LBA, not > the physical LBA mapping to flash pages/block. Uberblock updates and COW from > ZFS will use a new page/block each time. A TRIM command from ZFS level should be > a better solution but RAID is still a problem for TRIM at the OS level. > > Henry > > > > ----- Original Message ---- > From: Jim Klimov <jimklimov at cos.ru> > Cc: ZFS Discussions <zfs-discuss at opensolaris.org> > Sent: Tue, July 12, 2011 4:18:28 AM > Subject: Re: [zfs-discuss] Pure SSD Pool > > 2011-07-12 9:06, Brandon High ?????: > > On Mon, Jul 11, 2011 at 7:03 AM, Eric Sproul<esproul at omniti.com> wrote: > >> Interesting-- what is the suspected impact of not having TRIM support? > > There shouldn''t be much, since zfs isn''t changing data in place. Any > > drive with reasonable garbage collection (which is pretty much > > everything these days) should be fine until the volume gets very full. > > I wonder if in this case it would be beneficial to slice i.e. 90% > of an SSD for use in ZFS pool(s) and leave the rest of the > disk unassigned to any partition or slice? This would reserve > some sectors as never-written-to-by-OS. Would this ease the > life for SSD devices without TRIM between them ans the OS? > > Curious, > //Jim > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Erik Trimble Java Platform Group - Infrastructure Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (UTC-0800)
On Tue, Jul 12, 2011 at 7:41 AM, Eric Sproul <esproul at omniti.com> wrote:> But that''s exactly the problem-- ZFS being copy-on-write will > eventually have written to all of the available LBA addresses on the > drive, regardless of how much live data exists. ?It''s the rate of > change, in other words, rather than the absolute amount that gets us > into trouble with SSDs. ?The SSD has no way of knowing what blocksMost "enterprise" SSDs use something like 30% for spare area. So a drive with 128MiB (base 2) of flash will have 100MB (base 10) of available storage. A consumer level drive will have ~ 6% spare, or 128MiB of flash and 128MB of available storage. Some drives have 120MB available, but still have 128 MiB of flash and therefore slightly more spare area. Controllers like the Sandforce that do some dedup can give you even more effective spare area, depending on the type of data. When the OS starts reusing LBAs, the drive will re-map them into new flash blocks in the spare area and may perform garbage collection on the now partially used blocks. The effectiveness of this depends on how quickly the system is writing and how full the drive is. I failed to mention earlier that ZFS''s write aggregation is also helpful when used with flash drives since it can help to ensure that a whole flash block is written at once. Increasing the ashift value to 4k when the pool is created may also help.> Now, others have hinted that certain controllers are better than > others in the absence of TRIM, but I don''t see how GC could know what > blocks are available to be erased without information from the OS.The changed LBAs are remapped rather than overwritten in place. The drive knows which LBAs in a flash block have been re-mapped, and can do garbage collection when the right criteria are met. -B -- Brandon High : bhigh at freaks.com
On Tue, Jul 12, 2011 at 1:35 PM, Brandon High <bhigh at freaks.com> wrote:> Most "enterprise" SSDs use something like 30% for spare area. So a > drive with 128MiB (base 2) of flash will have 100MB (base 10) of > available storage. A consumer level drive will have ~ 6% spare, or > 128MiB of flash and 128MB of available storage. Some drives have 120MB > available, but still have 128 MiB of flash and therefore slightly more > spare area. Controllers like the Sandforce that do some dedup can give > you even more effective spare area, depending on the type of data.I see, thanks for that explanation. So finding drives that keep more space in reserve is key to getting consistent performance under ZFS. Eric
On Tue, Jul 12, 2011 at 12:14 PM, Eric Sproul <esproul at omniti.com> wrote:> I see, thanks for that explanation. ?So finding drives that keep more > space in reserve is key to getting consistent performance under ZFS.More spare area might give you more performance, but the big difference is the lifetime of the device. A device with more spare area can handle more writes. In the capacity range (eg: 50-64 GB, 64 GiB flash), then the drive with more spare will last longer but may not offer a performance benefit. Higher capacity drives will offer better performance because they have more flash channels to write to, and they should last longer because while the spare area is the same percentage of total capacity, it''s numerically larger. A "consumer" 240GB drive (256GiB flash) will have 27GiB spare area. An "enterprise" 50GB (64GiB flash) drive will have 16 GiB spare area, or about 25% of the total capacity. Even though the consumer drive only sets aside ~ 10% for spare, it''s so much larger that it will last longer at any given rate of writing. If you were to completely fill and re-fill each drive, the consumer drive will fail earlier, but you''d have to write nearly 5x as much data to fill it even once. -B -- Brandon High : bhigh at freaks.com
2011-07-12 23:14, Eric Sproul ?????:> So finding drives that keep more space in reserve is key to getting > consistent performance under ZFS.I think I''ve read in a number of early SSD reviews (possibly regarding Intel devices - not certain now) that the vendor provided some low-level formatting tools which essentially allowed the user to tune how much flash would be useable and how much would be set aside as the reserve... Perhaps this rumour is worth an exploration too - do any modern SSDs have similar tools to switch them between capacity and performance modes, or such? //Jim
I am now using S11E and a OCZ Vertex 3, 240GB SSD disk. I am using it in a SATA 2 port (not the new SATA 6gbps). The PC seems to work better now, the worst lag is gone. For instance, I am using Sunray, and if my girl friend is using the PC, and I am doing bit torrenting, the PC could lock up for a while when there were lots of reads/writes to the single disk drive. Now It seems that the lock ups and worst lag is gone, everything runs fluent, it is as if I am the only user. When I boot the S11E system, there is a white splash screen with "Oracle Solaris" and a thin red stripe that moves across the screen, from left to right. With my SSD, I can see two red stripes before the system has booted. With the old hard disk, I could count to several red stripes before the system booted, maybe 10 red stripes flowing from left to right on the splash screen. So yes, the system boots quicker. But there is a bug with the OCZ Vertex 3, 240GB SSD disk firmware. The firmware might lock up the PC sometimes. With Win7, the PC will BSOD sometimes. This bug is well known but OCZ can not debug it successfully yet. The OCZ disk uses a Sandforce controller, but noone of them has a fix for this bug. And, the Intel 320 SSD seems to have a serious bug in firmware, where you loose all your data and the disk size change to 8MB. Google on this to read more about Intel 320 SSD bug. (I am not sure on this Intel bug). On Solaris, the OCZ bug manifests as it will lock up an application sometimes. For instance, I surf the web, or use Gnome Commander, and suddenly the application will be dark grey and not respond to any actions. I wait for a while, but nothing happens. So I just kill the application and restart it. This is annoying. First I suspected it was a bug with S11E, but after reading about OCZ lockups, I now suspect this is a OCZ bug. What do you say, there is no S11E bug that behaves like this. no? -- This message posted from opensolaris.org
On Tue, Jul 12 at 23:44, Jim Klimov wrote:>2011-07-12 23:14, Eric Sproul ?????: >>So finding drives that keep more space in reserve is key to getting >>consistent performance under ZFS. >I think I''ve read in a number of early SSD reviews >(possibly regarding Intel devices - not certain now) >that the vendor provided some low-level formatting >tools which essentially allowed the user to tune how >much flash would be useable and how much would >be set aside as the reserve... > >Perhaps this rumour is worth an exploration too - >do any modern SSDs have similar tools to switch >them between capacity and performance modes, >or such?It doesn''t require special tools, just partition the device. Since ZFS will stay within a partition boundary if told to, that should allow you to guarantee a certain minimum reserve area available for other purposes. e.g. Take a 100GB drive and partition it to 80GB. Assuming the original drive was a 100GB/100GiB design, you now have (100*0.07)+20 GB of spare area, which depending on the design, may significantly lower write amplification and thus increase performance on a device that is "full." --eric -- Eric D. Mudama edmudama at bounceswoosh.org