Jorgen Lundman
2009-Jul-29  00:36 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
This thread started over in nfs-discuss, as it appeared to be an nfs problem initially. Or at the very least, interaction between nfs and zil. Just summarising speeds we have found when untarring something. Always in a new/empty directory. Only looking at write speed. read is always very fast. The reason we started to look at this was because the 7 year old netapp being phased out, could untar the test file in 11 seconds. The x4500/x4540 Suns took 5 minutes. For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I had lying around, but it can be downloaded here if you want the same test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) The command executed generally, is: # mkdir .test34 && time gtar --directory=.test34 -zxf /tmp/MTOS-4.261-ja.tar.gz Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 **** 0m11.114s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 **** 5m11.654s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 **** 8m55.911s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 **** 10m32.629s Just untarring the tarball on the x4500 itself: ----------------------------: x4500 OpenSolaris svn117 server **** 0m0.478s ----------------------------: x4500 Solaris 10 10/08 server **** 0m1.361s So ZFS itself is very fast. Replacing NFS with different protocols, identical setup, just changing tar with rsync, and nfsd with sshd. The baseline test, using: "rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX" Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 **** 3m44.857s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh **** 0m1.387s So, get rid of nfsd and it goes from 3 minutes to 1 second! Lets share it with smb, and mount it: OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar **** 0m24.480s Neat, even SMB can beat nfs in default settings. This would then indicate to me that nfsd is broken somehow, but then we try again after only disabling ZIL. Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 **** 0m8.453s **** 0m8.284s **** 0m8.264s Nice, so this is theoretically the fastest NFS speeds we can reach? We run postfix+dovecot for mail, which probably would be safe to not use ZIL. The other type is FTP/WWW/CGI, which has more active writes/updates. Probably not as good. Comments? Enable ZIL, but disable zfscache (Just as a test, I have been told disabling zfscache is far more dangerous). Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 **** 0m45.139s Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a whole lot about slog. First I tried creating a 2G slog on the boot mirror: Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 **** 1m59.970s Some improvements. For a lark, I created a 2GB file in /tmp/ and changed the slog to that. (I know, having the slog in volatile RAM is pretty much the same as disabling ZIL. But it should give me theoretical maximum speed with ZIL enabled right?). Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 **** 0m8.916s Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we would test with a CF card attached. Alas the 600X (92MB/s) card are not out until next month, rats! So, we bought a 300X (40MB/s) card. Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 **** 0m26.566s Not too bad really. But you have to reboot to see a CF card, fiddle with BIOS for the boot order etc. Just not an easy add on a live system. A SATA emulated SSD DISK can be hot-swapped. Also, I learned an interesting lesson about rebooting with slog at /tmp/junk. I am hoping to pick up a SSD SATA device today and see what speeds we get out of that. That rsync (1s) vs nfs(8s) I can accept as over-head on a much more complicated protocol, but why would it take 3 minutes to write the same data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog is default, but both writing the same way. Does nfsd add FD_SYNC to every close regardless as to whether the application did or not? This I have not yet wrapped my head around. For example, I know rsync and tar does not use fdsync (but dovecot does) on its close(), but does NFS make it fdsync anyway? Sorry for the giant email. -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Bob Friesenhahn
2009-Jul-29  01:07 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Wed, 29 Jul 2009, Jorgen Lundman wrote:> > For example, I know rsync and tar does not use fdsync (but dovecot does) on > its close(), but does NFS make it fdsync anyway?NFS is required to do synchronous writes. This is what allows NFS clients to recover seamlessly if the server spontaneously reboots. If the NFS client supports it, it can send substantial data (multiple writes) to the server, and then commit it all via an NFS commit. Note that this requires more work by the client since the NFS client is required to replay the uncommited writes if the server goes away.> Sorry for the giant email.No, thank you very much for the interesting measurements and data. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Jorgen Lundman
2009-Jul-29  06:15 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
We just picked up the fastest SSD we could in the local biccamera, which turned out to be a CSSD?SM32NI, with supposedly 95MB/s write speed. I put it in place, and replaced the slog over: **** 0m49.173s **** 0m48.809s So, it is slower than the CF test. This is disappointing. Everyone else seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so perhaps that is why it works better for them. It is curious that it is slower than the CF card. Perhaps because it shares with so many other SATA devices? Oh and we''ll probably have to get a 3.5" frame for it, as I doubt it''ll stay standing after the next earthquake. :) Lund Jorgen Lundman wrote:> > This thread started over in nfs-discuss, as it appeared to be an nfs > problem initially. Or at the very least, interaction between nfs and zil. > > Just summarising speeds we have found when untarring something. Always > in a new/empty directory. Only looking at write speed. read is always > very fast. > > The reason we started to look at this was because the 7 year old netapp > being phased out, could untar the test file in 11 seconds. The > x4500/x4540 Suns took 5 minutes. > > For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I > had lying around, but it can be downloaded here if you want the same > test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) > > The command executed generally, is: > > # mkdir .test34 && time gtar --directory=.test34 -zxf > /tmp/MTOS-4.261-ja.tar.gz > > > > Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 > **** 0m11.114s > > Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 > **** 5m11.654s > > Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 > **** 8m55.911s > > Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 > **** 10m32.629s > > > Just untarring the tarball on the x4500 itself: > > ----------------------------: x4500 OpenSolaris svn117 server > **** 0m0.478s > > ----------------------------: x4500 Solaris 10 10/08 server > **** 0m1.361s > > > > So ZFS itself is very fast. Replacing NFS with different protocols, > identical setup, just changing tar with rsync, and nfsd with sshd. > > The baseline test, using: > "rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX" > > > Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 > **** 3m44.857s > > Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh > **** 0m1.387s > > So, get rid of nfsd and it goes from 3 minutes to 1 second! > > Lets share it with smb, and mount it: > > > OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar > **** 0m24.480s > > > Neat, even SMB can beat nfs in default settings. > > This would then indicate to me that nfsd is broken somehow, but then we > try again after only disabling ZIL. > > > Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 > **** 0m8.453s > **** 0m8.284s > **** 0m8.264s > > Nice, so this is theoretically the fastest NFS speeds we can reach? We > run postfix+dovecot for mail, which probably would be safe to not use > ZIL. The other type is FTP/WWW/CGI, which has more active > writes/updates. Probably not as good. Comments? > > > Enable ZIL, but disable zfscache (Just as a test, I have been told > disabling zfscache is far more dangerous). > > > Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 > **** 0m45.139s > > Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a > whole lot about slog. > > First I tried creating a 2G slog on the boot mirror: > > > Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 > > **** 1m59.970s > > > Some improvements. For a lark, I created a 2GB file in /tmp/ and changed > the slog to that. (I know, having the slog in volatile RAM is pretty > much the same as disabling ZIL. But it should give me theoretical > maximum speed with ZIL enabled right?). > > > Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 > **** 0m8.916s > > > Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we > would test with a CF card attached. Alas the 600X (92MB/s) card are not > out until next month, rats! So, we bought a 300X (40MB/s) card. > > > Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 > **** 0m26.566s > > > Not too bad really. But you have to reboot to see a CF card, fiddle with > BIOS for the boot order etc. Just not an easy add on a live system. A > SATA emulated SSD DISK can be hot-swapped. > > > Also, I learned an interesting lesson about rebooting with slog at > /tmp/junk. > > > I am hoping to pick up a SSD SATA device today and see what speeds we > get out of that. > > That rsync (1s) vs nfs(8s) I can accept as over-head on a much more > complicated protocol, but why would it take 3 minutes to write the same > data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog > is default, but both writing the same way. Does nfsd add FD_SYNC to > every close regardless as to whether the application did or not? > This I have not yet wrapped my head around. > > For example, I know rsync and tar does not use fdsync (but dovecot does) > on its close(), but does NFS make it fdsync anyway? > > > Sorry for the giant email. > >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Ross
2009-Jul-29  07:47 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Everyone else should be using the Intel X25-E. There''s a massive difference between the M and E models, and for a slog it''s IOPS and low latency that you need. I''ve heard that Sun use X25-E''s, but I''m sure that original reports had them using STEC. I have a feeling the 2nd generation X25-E''s are going to give STEC a run for their money though. If I were you, I''d see if you can get your hands on an X25-E for evaluation purposes. Also, if you''re just running NFS over gigabit ethernet, a single X25-E may be enough, but at around 90MB/s sustained performance for each, you might need to stripe a few of them to match the speeds your Thumper is capable of. We''re not running an x4500, but we were lucky enough to get our hands on some PCI 512MB nvram cards a while back, and I can confirm they make a huge difference to NFS speeds - for our purposes they''re identical to ramdisk slog performance. -- This message posted from opensolaris.org
James Lever
2009-Jul-29  08:13 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On 29/07/2009, at 5:47 PM, Ross wrote:> Everyone else should be using the Intel X25-E. There''s a massive > difference between the M and E models, and for a slog it''s IOPS and > low latency that you need.Do they have any capacitor backed cache? Is this cache considered stable storage? If so, then they would be a fine solution. Are there any details of the cache size and capacitor support time? SSD manufacturers aren''t releasing this type of information and for use as a ZIL/slog for an NFS server, it''s a pretty critical piece of the puzzle.> > We''re not running an x4500, but we were lucky enough to get our > hands on some PCI 512MB nvram cards a while back, and I can confirm > they make a huge difference to NFS speeds - for our purposes they''re > identical to ramdisk slog performance.At the moment, short of an STEC ZEUS, this is the only viable solution I''ve been able to come up with. What is the NVRAM card you''re using? For me, I''m putting it behind a raid controller with battery backed DRAM write cache. That works really well. cheers, James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090729/46747ec6/attachment.html>
Ross
2009-Jul-29  09:47 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Hi James, I''ll not reply in line since the forum software is completely munging your post. On the X25-E I believe there is cache, and it''s not backed up. While I haven''t tested it, I would expect the X25-E to have the cache turned off while used as a ZIL. The 2nd generation X25-E announced by Intel does have ''safe storage'' as they term it. I believe it has more cache, a faster write speed, and is able to guarantee that the contents of the cache will always make it to stable storage. My guess would be that since it''s designed for the server market, the cache on the X25-E would be irrelevant - the device is going to honor flush requests and the ZIL will be stable. I suspect that the X25-E G2 will ignore flush requests, with Intel''s engineers confident that the data in the cache is safe. The NVRAM card we''re using is a MM-5425, identical to the one used in the famous ''blog on slogs'', I was lucky to get my hands on a pair and some drivers :-) I think the raid controller approach is a nice idea though, and should work just as well. I''d love an 80GB ioDrive to use as our ZIL, I think that''s the best hardware solution out there right now, but until Fusion-IO release Solaris drivers I''m going to have to stick with my 512MB... -- This message posted from opensolaris.org
Bob Friesenhahn
2009-Jul-29  15:46 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Wed, 29 Jul 2009, Jorgen Lundman wrote:> > So, it is slower than the CF test. This is disappointing. Everyone else seems > to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so > perhaps that is why it works better for them. It is curious that it is slower > than the CF card. Perhaps because it shares with so many other SATA devices?Something to be aware of is that not all SSDs are the same. In fact, some "faster" SSDs may use a RAM write cache (they all do) and then ignore a cache sync request while not including hardware/firmware support to ensure that the data is persisted if there is power loss. Perhaps your "fast" CF device does that. If so, that would be really bad for zfs if your server was to spontaneously reboot or lose power. This is why you really want a true enterprise-capable SSD device for your slog. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Jorgen Lundman
2009-Jul-30  06:54 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Bob Friesenhahn wrote:> Something to be aware of is that not all SSDs are the same. In fact, > some "faster" SSDs may use a RAM write cache (they all do) and then > ignore a cache sync request while not including hardware/firmware > support to ensure that the data is persisted if there is power loss. > Perhaps your "fast" CF device does that. If so, that would be really > bad for zfs if your server was to spontaneously reboot or lose power. > This is why you really want a true enterprise-capable SSD device for > your slog.Naturally, we just wanted to try the various technologies to see how they compared. Store-bought CF card took 26s, store-bought SSD 48s. We have not found a PCI NVRam card yet. When talking to our Sun vendor, they have no solutions, which is annoying. X25-E would be good, but some pools have no spares, and since you can''t remove vdevs, we''d have to move all customers off the x4500 before we can use it. CF card need reboot to see the cards, but 6 servers are x4500, not x4540, so not really a global solution. PCI NVRam cards need a reboot, but should work in both x4500 and x4540 without zpool rebuilding. But can''t actually find any with Solaris drivers. Peculiar. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Markus Kovero
2009-Jul-30  07:40 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
btw, there''s coming new Intel X25-M (G2) next month that will offer better random read/writes than E-series and seriously cheap pricetag, worth for a try I''d say. Yours Markus Kovero -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jorgen Lundman Sent: 30. hein?kuuta 2009 9:55 To: ZFS Discussions Subject: Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08 Bob Friesenhahn wrote:> Something to be aware of is that not all SSDs are the same. In fact, > some "faster" SSDs may use a RAM write cache (they all do) and then > ignore a cache sync request while not including hardware/firmware > support to ensure that the data is persisted if there is power loss. > Perhaps your "fast" CF device does that. If so, that would be really > bad for zfs if your server was to spontaneously reboot or lose power. > This is why you really want a true enterprise-capable SSD device for > your slog.Naturally, we just wanted to try the various technologies to see how they compared. Store-bought CF card took 26s, store-bought SSD 48s. We have not found a PCI NVRam card yet. When talking to our Sun vendor, they have no solutions, which is annoying. X25-E would be good, but some pools have no spares, and since you can''t remove vdevs, we''d have to move all customers off the x4500 before we can use it. CF card need reboot to see the cards, but 6 servers are x4500, not x4540, so not really a global solution. PCI NVRam cards need a reboot, but should work in both x4500 and x4540 without zpool rebuilding. But can''t actually find any with Solaris drivers. Peculiar. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home) _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ross
2009-Jul-30  10:27 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Without spare drive bays I don''t think you''re going to find one solution that works for x4500 and x4540 servers. However, are these servers physically close together? Have you considered running the slog devices externally? One possible choice may be to run something like the Supermicro SC216 chassis (2U with 24x 2.5" drive bays): http://www.supermicro.com/products/chassis/2U/216/SC216E2-R900U.cfm Buy the chassis with redundant power (SC216E2-R900UB), and the JBOD power module (CSE-PTJBOD-CB1) to convert it to a dumb JBOD unit. The standard backplane has six SAS connectors, each of which connects to four drives. You might struggle if you need to connect more than six servers, although it may be possible to run it in a rather non standard configuration, removing the backplane and powering and connecting drives individually. However, for up to six servers, you can just fit Adaptec raid cards with external ports to each (PCI-e or PCI-x as needed), and use external cables to connect those to the SSD drives in the external chassis. If you felt like splashing out on the raid cards, that would let you run the ZIL on up to four Intel X25-E drives per server, backed up by 512MB of battery backed cache. I think that would have a dramatic effect on NFS speed to say the least :-) -- This message posted from opensolaris.org
Mike Gerdts
2009-Jul-30  13:07 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Thu, Jul 30, 2009 at 5:27 AM, Ross<no-reply at opensolaris.org> wrote:> Without spare drive bays I don''t think you''re going to find one solution that works for x4500 and x4540 servers. ?However, are these servers physically close together? ?Have you considered running the slog devices externally?It appears as though there is an upgrade path. http://www.c0t0d0s0.org/archives/5750-Upgrade-of-a-X4500-to-a-X4540.html However, the troll that you have to pay to follow that path demands a hefty sum ($7995 list). Oh, and a reboot is required. :) -- Mike Gerdts http://mgerdts.blogspot.com/
Kyle McDonald
2009-Jul-30  16:22 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Markus Kovero wrote:> btw, there''s coming new Intel X25-M (G2) next month that will offer better random read/writes than E-series and seriously cheap pricetag, worth for a try I''d say. >The suggested MSRP of the 80GB generation 2 (G2) is supposed to be $225. Even though the G2 is not shipping yet, this has already caused the prices on the G1 model to fall significantly to $229 here: http://www.newegg.com/Product/Product.aspx?Item=N82E16820167005 and maybe lower elsewhere. If there is any G1 stock left when the G2 ships, I can imagine we''ll see the G1 available for less than $200 -Kyle> Yours > Markus Kovero > > > -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Jorgen Lundman > Sent: 30. hein?kuuta 2009 9:55 > To: ZFS Discussions > Subject: Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08 > > > > Bob Friesenhahn wrote: > >> Something to be aware of is that not all SSDs are the same. In fact, >> some "faster" SSDs may use a RAM write cache (they all do) and then >> ignore a cache sync request while not including hardware/firmware >> support to ensure that the data is persisted if there is power loss. >> Perhaps your "fast" CF device does that. If so, that would be really >> bad for zfs if your server was to spontaneously reboot or lose power. >> This is why you really want a true enterprise-capable SSD device for >> your slog. >> > > Naturally, we just wanted to try the various technologies to see how > they compared. Store-bought CF card took 26s, store-bought SSD 48s. We > have not found a PCI NVRam card yet. > > When talking to our Sun vendor, they have no solutions, which is annoying. > > X25-E would be good, but some pools have no spares, and since you can''t > remove vdevs, we''d have to move all customers off the x4500 before we > can use it. > > CF card need reboot to see the cards, but 6 servers are x4500, not > x4540, so not really a global solution. > > PCI NVRam cards need a reboot, but should work in both x4500 and x4540 > without zpool rebuilding. But can''t actually find any with Solaris drivers. > > Peculiar. > > Lund > > >
Bob Friesenhahn
2009-Jul-30  16:26 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Thu, 30 Jul 2009, Ross wrote:> Without spare drive bays I don''t think you''re going to find one > solution that works for x4500 and x4540 servers. However, are these > servers physically close together? Have you considered running the > slog devices externally?This all sounds really sophisticated and complicated. While I have not yet held one of these SSDs in my own hands, it seems that they are rather small (laptop sized) SATA devices. If a SATA port can be found (or installed) into the chassis, how about just using stout velcro to affix the drive to the inside of the chassis, and run cables to it? Shouldn''t that work? Do these SSDs require a lot of cooling? Traditional drive slots are designed for hard drives which need to avoid vibration and have specific cooling requirements. What are the environmental requirements for the Intel X25-E? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ross
2009-Jul-30  17:04 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
That should work just as well Bob, although rather than velcro I''d be tempted to drill some holes into the server chassis somewhere and screw the drives on. These things do use a bit of power, but with the airflow in a thumper I don''t think I''d be worried. If they were my own servers I''d be very tempted, but it really depends on how happy you would be voiding the warranty on a rather expensive piece of kit :-) -- This message posted from opensolaris.org
Richard Elling
2009-Jul-30  17:07 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Jul 30, 2009, at 9:26 AM, Bob Friesenhahn wrote:> On Thu, 30 Jul 2009, Ross wrote: > >> Without spare drive bays I don''t think you''re going to find one >> solution that works for x4500 and x4540 servers. However, are >> these servers physically close together? Have you considered >> running the slog devices externally? > > This all sounds really sophisticated and complicated. While I have > not yet held one of these SSDs in my own hands, it seems that they > are rather small (laptop sized) SATA devices. If a SATA port can be > found (or installed) into the chassis, how about just using stout > velcro to affix the drive to the inside of the chassis, and run > cables to it? Shouldn''t that work?If you want to go down the path of "unsupported" then, IIRC, there is an "unsupported" CF slot on the X4500 mobo. This was brought out to the back for the X4540. This is consistent with other Sun designs at the time where CFs are available as boot devices. Now they are using the MiniFlashDIMMs.> Do these SSDs require a lot of cooling? Traditional drive slots are > designed for hard drives which need to avoid vibration and have > specific cooling requirements. What are the environmental > requirements for the Intel X25-E?Operating and non-operating shock: 1,000 G/0.5 msec (vs operating shock for Barracuda ES.2 of 63G/2ms) Power spec: 2.4 W @ 32 GB, 2.6W @ 64 GB. (less than HDDs @ ~8-15W) MTBF: 2M hours (vs 1.2M hours for Barracuda ES.2) Vibration specs are not consistent for comparison. Compare: http://download.intel.com/design/flash/nand/extreme/319984.pdf vs http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_es_2.pdf Interesting that they are now specifying write endurance as: 1 PB of random writes for 32GB, 2 PB of random writes for 64GB. Except for price/GB, it is game over for HDDs. Since price/GB is based on Moore''s Law, it is just a matter of time. -- richard
Andrew Gabriel
2009-Jul-30  17:38 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Richard Elling wrote:> On Jul 30, 2009, at 9:26 AM, Bob Friesenhahn wrote: > >> Do these SSDs require a lot of cooling?No. During the "Turbo Charge your Apps" presentations I was doing around the UK, I often pulled one out of a server to hand around the audience when I''d finished the demos on it. The first thing I noticed when doing this is that the disk is stone cold, which is not what you expect when you pull an operating disk out of a system. Note that they draw all their power from the 5V rail, and can draw more current on the 5V rail than some HDDs, which is something to check if you''re putting lots in a disk rack.>> Traditional drive slots are designed for hard drives which need to >> avoid vibration and have specific cooling requirements. What are the >> environmental requirements for the Intel X25-E? > > Operating and non-operating shock: 1,000 G/0.5 msec (vs operating shock > for Barracuda ES.2 of 63G/2ms) > Power spec: 2.4 W @ 32 GB, 2.6W @ 64 GB. (less than HDDs @ ~8-15W) > MTBF: 2M hours (vs 1.2M hours for Barracuda ES.2) > Vibration specs are not consistent for comparison. > Compare: > http://download.intel.com/design/flash/nand/extreme/319984.pdf > vs > http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_es_2.pdf > > Interesting that they are now specifying write endurance as: > 1 PB of random writes for 32GB, 2 PB of random writes for 64GB. > > Except for price/GB, it is game over for HDDs. Since price/GB is > based on > Moore''s Law, it is just a matter of time.SSD''s are a sufficiently new technology that I suspect there''s significant probably of discovering new techniques which give larger step improvements than Moore''s Law for some years yet. However, HDD''s aren''t standing still either when it comes to capacity, although improvements in other HDD performance characteristics has been very disappointing this decade (e.g. IOPs haven''t improved much at all, indeed they''ve only seen a 10-fold improvement over the last 25 years). -- Andrew
Kurt Olsen
2009-Jul-30  18:50 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10
I''m using an Acard ANS-9010B (configured with 12 GB battery backed ECC RAM w/ 16 GB CF card for longer term power losses. Device cost $250, RAM cost about $120, and the CF around $100.) It just shows up as a SATA drive. Works fine attached to an LSI 1068E. Since -- as I understand it -- one''s ZIL doesn''t need to be particularly large, I''ve split that into 2 GB of ZIL and 10 GB of L2ARC. Simple tests show it can do around 3200 sync 4k writes/sec over NFS into a RAID-Z pool of five western digital 1 TB caviar green drives. -- This message posted from opensolaris.org
Bob Friesenhahn
2009-Jul-30  19:07 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Thu, 30 Jul 2009, Andrew Gabriel wrote:>> >> Except for price/GB, it is game over for HDDs. Since price/GB is based on >> Moore''s Law, it is just a matter of time. > > SSD''s are a sufficiently new technology that I suspect there''s significant > probably of discovering new techniques which give larger step improvements > than Moore''s Law for some years yet. However, HDD''s aren''t standing stillFLASH technology is highly mature and has been around since the ''80s. Given this, it is perhaps the case that (through continual refinement) FLASH has finally made it to the point of usability for bulk mass storage. It is not clear if FLASH will obey Moore''s Law or if it has already started its trailing off stage (similar to what happened with single-core CPU performance). Only time will tell. Currently (after rounding) SSDs occupy 0% of the enterprise storage market even though they dominate in some other markets. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Will Murnane
2009-Jul-30  19:13 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10
On Thu, Jul 30, 2009 at 14:50, Kurt Olsen<no-reply at opensolaris.org> wrote:> I''m using an Acard ANS-9010B (configured with 12 GB battery backed ECC RAM w/ 16 GB CF card for longer term power losses. Device cost $250, RAM cost about $120, and the CF around $100.) It just shows up as a SATA drive. Works fine attached to an LSI 1068E. Since -- as I understand it -- one''s ZIL doesn''t need to be particularly large, I''ve split that into 2 GB of ZIL and 10 GB of L2ARC. Simple tests show it can do around 3200 sync 4k writes/sec over NFS into a RAID-Z pool of five western digital 1 TB caviar green drives.I, too, have one of these, and am mostly happy with it. The biggest inconvenience about it is the form factor: it occupies a 5.25" bay. Since my case has no 5.25" bays (Norco RPC-4220) I improvised by drilling a pair of correctly spaced holes into the lid of the case and screwing it in there. This isn''t really recommended for enterprise use, where drilling holes in the equipment is discouraged. I don''t have benchmarks for my setup, but anecdotally I no longer see the stalls accessing files over NFS that I had before adding the Acard to my pool as a log device. I only have 1GB in it, and that seems plenty for the purpose: it only ever seems to show up as 8k used, even with 100 MB/s or more of writes to it. Also, I should point out that the device doesn''t support SMART. Some raid controllers may be unhappy about this. Will
Richard Elling
2009-Jul-30  21:04 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Jul 30, 2009, at 12:07 PM, Bob Friesenhahn wrote:> On Thu, 30 Jul 2009, Andrew Gabriel wrote: >>> Except for price/GB, it is game over for HDDs. Since price/GB is >>> based on >>> Moore''s Law, it is just a matter of time. >> >> SSD''s are a sufficiently new technology that I suspect there''s >> significant probably of discovering new techniques which give >> larger step improvements than Moore''s Law for some years yet. >> However, HDD''s aren''t standing still > > FLASH technology is highly mature and has been around since the > ''80s. Given this, it is perhaps the case that (through continual > refinement) FLASH has finally made it to the point of usability for > bulk mass storage. It is not clear if FLASH will obey Moore''s Law > or if it has already started its trailing off stage (similar to what > happened with single-core CPU performance). > > Only time will tell. Currently (after rounding) SSDs occupy 0% of > the enterprise storage market even though they dominate in some > other markets.According to Gartner, enterprise SSDs accounted for $92.6M of a $585.5M SSD market in June 2009, representing 15.8% of the SSD market. STEC recently announced an order for $120M of ZeusIOPS drives from "a single enterprise storage customer." From 2007 to 2008, SSD market grew by 100%. IDC reports Q1CY09 had $4,203M for the external disk storage factory revenue, down 16% from Q1CY08 while total disk storage systems were down 25.8% YOY to $5,616M[*]. So while it looks like enterprise SSDs represented less than 1% of total storage revenue in 2008, it is the part that is growing rapidly. I would not be surprised to see enterprise SSDs at 5-10% of the total disk storage systems market in 2010. I would also expect to see total disk storage systems revenue continue to decline as fewer customers buy expensive RAID controllers. IMHO, the total disk storage systems market has already peaked, so the enterprise SSD gains at the expense of overall market size. Needless to say, whether or not Sun can capitalize on its OpenStorage strategy, the market is moving in the same direction, perhaps at a more rapid pace due to current economic conditions. [*] IDC defines a Disk Storage System as a set of storage elements, including controllers, cables, and (in some instances) host bus adapters, associated with three or more disks. A system may be located outside of or within a server cabinet and the average cost of the disk storage systems does not include infrastructure storage hardware (i.e. switches) and non-bundled storage software. -- richard
Bob Friesenhahn
2009-Jul-30  21:29 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Thu, 30 Jul 2009, Richard Elling wrote:> > According to Gartner, enterprise SSDs accounted for $92.6M of a > $585.5M SSD market in June 2009, representing 15.8% of the SSD > market. STEC recently announced an order for $120M of ZeusIOPS > drives from "a single enterprise storage customer." From 2007 to > 2008, SSD market grew by 100%. IDC reports Q1CY09 had $4,203M for > the external disk storage factory revenue, down 16% from Q1CY08 > while total disk storage systems were down 25.8% YOY to $5,616M[*]. > So while it looks like enterprise SSDs represented less than 1% of > total storage revenue in 2008, it is the part that is growing > rapidly. I would not be surprised to see enterprise SSDs at 5-10%While $$$ are important for corporate bottom lines, when it comes to the number of units deployed, $$$ are a useless measure when comparing disk drives to SSDs since SSDs are much more expensive and offer much less storage space. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Jorgen Lundman
2009-Jul-31  02:04 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
> X25-E would be good, but some pools have no spares, and since you can''t > remove vdevs, we''d have to move all customers off the x4500 before we > can use it.Ah it just occurred to me that perhaps for our specific problem, we will buy two X25-Es and replace the root mirror. The OS and ZIL logs can live together and put /var in the data pool. That way we would not need to rebuild the data-pool and all the work that comes with that. Shame I can''t zpool replace to a smaller disk (500GB HDD to 32GB SSD) though, I will have to lucreate and reboot one time. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Ross
2009-Jul-31  05:42 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Great idea, much neater than most of my suggestions too :-) -- This message posted from opensolaris.org
Ian Collins
2009-Jul-31  07:16 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Ross wrote:> Great idea, much neater than most of my suggestions too :-) >What is? Please keep some context for those of us on email! -- Ian.
Ross
2009-Jul-31  10:19 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
> Ross wrote: > > Great idea, much neater than most of my suggestions > too :-) > > > What is? Please keep some context for those of us on > email!x25-e drives as a mirrored boot volume on an x4500, partitioning off some of the space for the slog. -- This message posted from opensolaris.org
Joerg Moellenkamp
2009-Aug-01  18:37 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Hi Jorgen, warning ... weird idea inside ...> Ah it just occurred to me that perhaps for our specific problem, we > will buy two X25-Es and replace the root mirror. The OS and ZIL logs > can live together and put /var in the data pool. That way we would > not need to rebuild the data-pool and all the work that comes with that. > > Shame I can''t zpool replace to a smaller disk (500GB HDD to 32GB SSD) > though, I will have to lucreate and reboot one time.Oh, you have a solution ... just had an weird idea and thought about suggesting you something of a hack: Putting SSD in a central server, build a pool out of them, perhaps activate compression (at the end small machines are today 4 core systems, they shouldn''t idle for their money), create some zvols out of them, share them via iSCSI and assign them as slog devices. For high speed usage: Create a ramdisk, use it as slog on the ssd server, put a UPS under the ssd server. At the end a SSD drive is nothing else (a flash memory controller, with dram, some storage and some caps to keep the dram powered until the dram is flushed) Regards Joerg
Roch
2009-Aug-04  07:44 UTC
[zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Bob Friesenhahn writes: > On Wed, 29 Jul 2009, Jorgen Lundman wrote: > > > > For example, I know rsync and tar does not use fdsync (but dovecot does) on > > its close(), but does NFS make it fdsync anyway? > > NFS is required to do synchronous writes. This is what allows NFS > clients to recover seamlessly if the server spontaneously reboots. > If the NFS client supports it, it can send substantial data (multiple > writes) to the server, and then commit it all via an NFS commit. In theory; but for lots of single threaded file creation (the tar process) the NFS server is fairly constrained in what it can do. We need something like directory delegation to allow the client to interact with local caches like a DAS filesystem can. A slog on SDD can help, but that SSD needs to have low latency writes, which typically implies DRAM buffers, and a capacitor so that it can be made to ignore cache flushes. -r > Note that this requires more work by the client since the NFS client > is required to replay the uncommited writes if the server goes away. > > > Sorry for the giant email. > > No, thank you very much for the interesting measurements and data. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss