I have been searching this forum and just about every ZFS document i can find trying to find the answer to my questions. But i believe the answer i am looking for is not going to be documented and is probably best learned from experience. This is my first time playing around with open solaris and ZFS. I am in the midst of replacing my home based filed server. This server hosts all of my media files from MP3''s to Blue Ray ISO''s. I stream media from this file server to several media players throughout my house. The server consists of a Supermicro X6DHE-XG2 motherboard, 2 X 2.8ghz xeon processors, 4 gigs of ram and 2 Supermicro SAT2MV8 controllers. I have 14 1TB hitachi hard drives connected to the controllers. My initial thought was to just create a single 14 drive RaidZ2 pool, but i have read over and over again that i should be limiting each array to a max of 9 drives. So then i would end up with 2 X 7 drive RaidZ arrays. To keep the pool size at 12TB i would have to give up my extra parity drive going to this 2 array setup and it is concerning as i have no room for hot spares in this system. So in my mind i am left with only one other choice and this is going to 2XRaidZ2 pools and loosing an additional 2 TB so i am left with a 10TB ZFS pool. So my big question is given that i am working with 4mb - 50gb files is going with 14 spindles going incur a huge performance hit? I was hoping to be able to saturate a single GigE link with this setup, but i am concerned the single large array wont let me achieve this. aaaahhhhhh, decisions, decisions.... Any advice would be greatly appreciated. -- This message posted from opensolaris.org
On Wed, 2010-04-07 at 10:40 -0700, Jason S wrote:> I have been searching this forum and just about every ZFS document i can find trying to find the answer to my questions. But i believe the answer i am looking for is not going to be documented and is probably best learned from experience. > > > This is my first time playing around with open solaris and ZFS. I am in > the midst of replacing my home based filed server. This server hosts > all of my media files from MP3''s to Blue Ray ISO''s. I stream media > from this file server to several media players throughout my house. > > The server consists of a Supermicro X6DHE-XG2 motherboard, 2 X 2.8ghz > xeon processors, 4 gigs of ram and 2 Supermicro SAT2MV8 controllers. I > have 14 1TB hitachi hard drives connected to the controllers. >If you can at all afford it, upgrade your RAM to 8GB. More than anything else, I''ve found that additional RAM makes up for any other deficiencies with a ZFS setup. 4GB is OK, but 8GB is a pretty sweet spot for price/performance for a small NAS server.> My initial thought was to just create a single 14 drive RaidZ2 pool, > but i have read over and over again that i should be limiting each > array to a max of 9 drives. So then i would end up with 2 X 7 drive > RaidZ arrays. >That''s correct. You can certainly do a 14-drive Raidz2, but given how the access/storage pattern for data is in such a setup, you''ll likely suffer noticeable slowness vs. a 2x7-drive setup.> To keep the pool size at 12TB i would have to give up my extra parity > drive going to this 2 array setup and it is concerning as i have no > room for hot spares in this system. So in my mind i am left with only > one other choice and this is going to 2XRaidZ2 pools and loosing an > additional 2 TB so i am left with a 10TB ZFS pool. >You''ve pretty much hit it right there. There is *one* other option: create a zpool of two raidz1 vdevs: one with 6 drives, and one with 7 drives. Then add a hot spare for the pool. That will give you most of the performance of a 2x7 setup, with the capacity of 11 disks. The tradeoff is that it''s a bit less reliable, as you have to trust the ability of the hot spare to resilver before any additional drives fail in degraded array. For a home NAS, it''s likely a reasonable bet, though.> So my big question is given that i am working with 4mb - 50gb files is > going with 14 spindles going incur a huge performance hit? I was > hoping to be able to saturate a single GigE link with this setup, but > i am concerned the single large array wont let me achieve this. >Frankly, testing is the only way to be sure. :-) Writing files that large (and reading them back more frequently, I assume...) will tend to reduce the differences in performance between a 1x14 and 2x7 setup. One way to keep your 1Gb Ethernet saturated is to increase the RAM (as noted above). With 8GB of RAM, you should have enough buffer space in play to mask the differences in large file I/O between the 1x14 and 2x7 setups. 12GB or 16GB would most certainly erase pretty much any noticeable difference. For small random I/O, even with larger amounts of RAM, you''ll notice some difference between the two setups - exactly how noticeable I can''t say, and you''d have to try it to see, as it depends heavily on your access pattern.> > aaaahhhhhh, decisions, decisions.... > > Any advice would be greatly appreciated.One thing Richard or Bob might be able to answer better is the tradeoff between getting a cheap/small SSD for L2ARC and buying more RAM. That is, I don''t have a good feel for whether (for your normal usage case), it would be better to get 8GB of more RAM, or buy something like a cheap 40-60GB SSD for use as an L2ARC (or some combinations of the two). SSDs in that size range are $150-200, which is what 8GB of DDR1 ECC RAM will likely cost. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On Wed, 7 Apr 2010, Jason S wrote:> To keep the pool size at 12TB i would have to give up my extra > parity drive going to this 2 array setup and it is concerning as i > have no room for hot spares in this system. So in my mind i am left > with only one other choice and this is going to 2XRaidZ2 pools and > loosing an additional 2 TB so i am left with a 10TB ZFS pool.I would go with a single pool with two raidz2 vdevs, even if you don''t get the maximum possible space. Raidz is best avoided when using 1GB SATA disk drives because of the relatively high probability of data loss during a resilver and the long resilver times. I would trade the hot spare for the improved security of raidz2. The hot spare is more helpful for mirrored setups or raidz1, where the data reliability is more sensitive to how long it takes to recover a lost drive. Just buy a spare drive so that you can replace a failed drive expediently.> So my big question is given that i am working with 4mb - 50gb files > is going with 14 spindles going incur a huge performance hit? I was > hoping to be able to saturate a single GigE link with this setup, > but i am concerned the single large array wont let me achieve this.It is not difficult to saturate a gigabit link. It can be easily accomplished with just a couple of drives. The main factor is if zfs''s prefetch is aggressive enough. Each raidz2 vdev will offer the useful IOPS of a single disk drive so from an IOPS standpoint, the pool would behave like two drives. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, 7 Apr 2010, Erik Trimble wrote:> > One thing Richard or Bob might be able to answer better is the tradeoff > between getting a cheap/small SSD for L2ARC and buying more RAM. That > is, I don''t have a good feel for whether (for your normal usage case), > it would be better to get 8GB of more RAM, or buy something like a cheap > 40-60GB SSD for use as an L2ARC (or some combinations of the two). SSDs > in that size range are $150-200, which is what 8GB of DDR1 ECC RAM will > likely cost.If the storage is primarily used for single-user streamed video playback, data caching will have little value (data is accessed only once) and there may even be value to disabling data caching entirely (but cache metadata). The only useful data caching would be to support file prefetch. If data caching is disabled then the total RAM requirement may be reduced. If the storage will serve other purposes as well, then retaining the caching and buying more RAM is a wise idea. Zfs has a design weakness in that any substantial writes during streaming playback may temporarily interrupt (hickup) the streaming playback. This weakness seems to be inherent to zfs although there are tuning options to reduce its effect. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Thank you for the replies guys! I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to "skin the cat" here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. Something i forgot to note in my original post is the performance numbers i am concerned with are going to be during reads primarily. There could be at any one point 4 media players attempting to stream media from this server. The media players all have 100mb interfaces so as long i can can reliable stream 400mb/s it should be ok (this is assuming all the media players were playing high bitrate Blueray streams at one time). Any writing to this array would happen pretty infrequently and i normally schedule any file transfers for the wee hours of the morning anyway. -- This message posted from opensolaris.org
Hello, More for my own edification than to help Jason (sorry Jason!) I would like to clarify something. If read performance is paramount, am I correct in thinking RAIDZ is not the best way to go? Would not the ZFS equivalent of RAID 10 (striped mirror sets) offer better read performance? In this case, I realize that Jason also needs to maximize the space he has in order to store all of those legitimately copied Blu-Ray movies. ;-) Regards, Chris On Apr 7, 2010, at 3:09 PM, Jason S wrote:> Thank you for the replies guys! > > I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to "skin the cat" here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. > > Something i forgot to note in my original post is the performance numbers i am concerned with are going to be during reads primarily. There could be at any one point 4 media players attempting to stream media from this server. The media players all have 100mb interfaces so as long i can can reliable stream 400mb/s it should be ok (this is assuming all the media players were playing high bitrate Blueray streams at one time). Any writing to this array would happen pretty infrequently and i normally schedule any file transfers for the wee hours of the morning anyway. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > eSoft SpamFilter Training Tool > Train as Spam > Blacklist for All Users > Whitelist for All Users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/90f448ce/attachment.html>
On Wed, 7 Apr 2010, Jason S wrote:> I was actually already planning to get another 4 gigs of ram for the > box right away anyway, but thank you for mentioning it! As there > appears to be a couple ways to "skin the cat" here i think i am > going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration > and see what the performance is like. I have a fews days of grace > before i need to have this server ready for duty.It will be very difficult to test actual performance because the pool will behave differently after it has been in service for six months or a year. There will be some fragmentation and if the pool is allowed to grow to close to 100% full, the fragmentation in that last data may be severe. If you have background zfs snapshots enabled, then the performance is likely to degrade more quickly than if you don''t use snapshots since the disks will tend to be more full, and the disk heads will need to seek further. If you choose the 14 spindle raidz2, the chances are good that you will soon be complaining about performance. You mentioned that there may be four simultaneous readers and it is worth noting that this is already twice the number of vdevs if you go for the two vdev solution. With two vdevs and four readers, there will have to be disk seeking for data even if the data is perfectly sequentially organized. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 04/ 7/10 03:09 PM, Jason S wrote:> I was actually already planning to get another 4 gigs of ram for the > box right away anyway, but thank you for mentioning it! As there > appears to be a couple ways to "skin the cat" here i think i am going > to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and > see what the performance is like. I have a fews days of grace before > i need to have this server ready for duty.Just curious, what are you planning to boot from? AFAIK you can''t boot ZFS from anything much more complicated than a mirror. Cheers -- Frank
On Wed, 7 Apr 2010, Chris Dunbar wrote:> More for my own edification than to help Jason (sorry Jason!) I > would like to clarify something. If read performance is paramount, > am I correct in thinking RAIDZ is not the best way to go? Would not > the ZFS equivalent of RAID 10 (striped mirror sets) offer better > read performance? In this case, I realize that Jason also needs toStriped mirror vdevs are assured to offer peak performance. One would (naively) think that the striping in a raidz2 would allow it to offer more sequential performance, but zfs''s sequential file prefetch allows mirrors to offer about the same level of sequential performance. With the mirror setup, 128K blocks are pulled from each disk whereas with the raidz setup, the 128K block is split across the drives constituting a vdev. Zfs is very good at ramping up prefetch for large sequential files. Due to this, raidz2 should be seen as a way to improve storage efficiency and data reliability, and not so much as a way to improve sequential performance. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, Apr 7, 2010 at 12:09 PM, Jason S <j.single at shaw.ca> wrote:> I was actually already planning to get another 4 gigs of ram for the box > right away anyway, but thank you for mentioning it! As there appears to be a > couple ways to "skin the cat" here i think i am going to try both a 14 > spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance > is like. I have a fews days of grace before i need to have this server ready > for duty. > > Don''t bother with the 14-drive raidz2. I can attest to just how horriblethe performance is for a single, large, raidz2 vdev is: atrocious. Especially when it comes time to scrub or resilver. You''ll end up thrashing all the disks, taking close to a week to resilver a dead drive (if you can actually get it to complete), and pulling your hair out is frustration. Our original configuration in our storage servers used a single 24-drive raidz2 vdev using 7200 RPM SATA drives. Worked, not well, but it worked ... until the first drive died. After 3 weeks, the resilver still hadn''t finished, the backups processes weren''t completing overnight due to the resilver process, and things just went downhill. We redid the pool using 3x raidz2 vdevs using 8 drives each, and things are much better. (If I had to re-do it again today, I''d use 4x raidz2 vdevs using 6 drives each.) The more vdevs you can add to a pool, the better the raw I/O performance of the pool will be. Go with lots of smaller vdevs. With 14 drives, play around with the following: 2x raidz2 vdevs using 7 drives each 3x raidz2 vdevs using 5 drives each (with two hot-spares, or a mirror vdev for root?) 4x raidz2 vdevs using 4 drives each (with one hot-spare, perhaps?) 4x raidz1 vdevs using 4 drives each (maybe not enough redundancy?) 5x mirror vdevs using 3 drives each (maybe too much lost space for redundancy?) 7x mirror vdevs using 2 drives each You really need to decide which is more important: raw storage space or raw I/O throughput. They''re almost (not quite, but almost) mutually exclusive.> Something i forgot to note in my original post is the performance numbers i > am concerned with are going to be during reads primarily. There could be at > any one point 4 media players attempting to stream media from this server. > The media players all have 100mb interfaces so as long i can can reliable > stream 400mb/s it should be ok (this is assuming all the media players were > playing high bitrate Blueray streams at one time). Any writing to this array > would happen pretty infrequently and i normally schedule any file transfers > for the wee hours of the morning anyway. > >-- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/0ae36022/attachment.html>
On Wed, Apr 7, 2010 at 12:29 PM, Frank Middleton <f.middleton at apogeect.com>wrote:> On 04/ 7/10 03:09 PM, Jason S wrote: > > >> I was actually already planning to get another 4 gigs of ram for the >> box right away anyway, but thank you for mentioning it! As there >> appears to be a couple ways to "skin the cat" here i think i am going >> to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and >> see what the performance is like. I have a fews days of grace before >> i need to have this server ready for duty. >> > > Just curious, what are you planning to boot from? AFAIK you can''t > boot ZFS from anything much more complicated than a mirror. > > The OP mentioned OpenSolaris, so can''t comment on what can/can''t be bootedfrom on that OS. However, FreeBSD 8 can boot from a mirror pool, a raidz1 pool, and a raidz2 pool. So it''s not a limitation in ZFS itself. :) -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/53eb628f/attachment.html>
Ahh, Thank you for the reply Bob, that is the info i was after. It looks like i will be going with the 2 X 7 RaidZ2 option. And just to clarify as far as expanding this pool in the future my only option is to add another 7 spindle RaidZ2 array correct? Thanks for all the help guys ! -- This message posted from opensolaris.org
I am booting from a single 74gig WD raptor attached to the motherboards onboard SATA port. -- This message posted from opensolaris.org
On Wed, Apr 7 at 12:41, Jason S wrote:> And just to clarify as far as expanding this pool in the future my > only option is to add another 7 spindle RaidZ2 array correct?That is correct, unless you want to use the -f option to force-allow an asymmetric expansion of your pool. --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
On Wed, 2010-04-07 at 12:41 -0700, Jason S wrote:> Ahh, > > Thank you for the reply Bob, that is the info i was after. It looks like i will be going with the 2 X 7 RaidZ2 option. > > And just to clarify as far as expanding this pool in the future my only option is to add another 7 spindle RaidZ2 array correct? > > Thanks for all the help guys !You can add arbitrary-sized vdevs to a pool, but you can''t add any drives to an existing raidz[123] vdev. You can even add things like a mirrored vdev to a pool consisting of several raidz[123] vdevs. :-) Thus, it would certainly be possible to add, say, a 4-drive raidz1 to your 2x7 pool. It wouldn''t perform quite the same as a 3x7 pool, but it still would perform better than the 2x7 pool. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Freddie, now you have brought up another question :) I had always assumed that i would just used open solaris for this file server build, as i had not actually done any research in regards to other operatin systems that support ZFS. Does anyone have any advice as to wether i should be considering FreeBSD instead of Open Solaris? Both operating systems are somewhat foriegn to me as i come from the windows domain with a little bit of linux experience as well. -- This message posted from opensolaris.org
On Wed, Apr 7, 2010 at 1:22 PM, Jason S <j.single at shaw.ca> wrote:> now you have brought up another question :) I had always assumed that i > would just used open solaris for this file server build, as i had not > actually done any research in regards to other operatin systems that support > ZFS. Does anyone have any advice as to wether i should be considering > FreeBSD instead of Open Solaris? Both operating systems are somewhat foriegn > to me as i come from the windows domain with a little bit of linux > experience as well. > > If you want access to the latest and greatest ZFS features as soon as theyare available, you''ll need to use OpenSolaris (currently at ZFSv22 or newer). If you don''t mind waiting up to a year for new ZFS features, you can use FreeBSD (currently at ZFSv13 in 7.3 and 8.0). Hardware support for enterprise server gear may be better in OSol. Hardware support for general server gear should be about the same. Hardware support for desktop gear may be better in FreeBSD. Each has fancy software features that the other doesn''t (GEOM, HAST, IPFW/PF, Jails, etc in FreeBSD; Zones, Crossbow, whatever that fancy admin framework is called, integrated iSCSI, integrated CIFS, etc in OSol). I''m biased toward FreeBSD, but that''s because I''ve never used OSol. Anything is better than Linux. ;) -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/87a8c3a5/attachment.html>
On Wed, 7 Apr 2010, Jason S wrote:> systems that support ZFS. Does anyone have any advice as to wether i > should be considering FreeBSD instead of Open Solaris? Both > operating systems are somewhat foriegn to me as i come from theFreeBSD zfs does clearly work, although it is an older version of zfs (version 13) than comes with the latest Solaris 10 (version 15), or development OpenSolaris. Zfs is better integrated into Solaris than it is in FreeBSD since it was designed for Solaris. While I have not used FreeBSD zfs, my experience with Solaris 10 and FreeBSD is that Solaris 10 (and later) is an extremely feature-rich system which can take considerable time to figure out if you really want to use all of those features (but you don''t have to). FreeBSD is simpler because it does not do as much. FreeBSD boots extremely fast. If your only interest is with zfs, my impression is that in a year or two it will not really matter if you are using Solaris or FreeBSD because FreeBSD will have an updated zfs (with deduplication) and will be more mature than it is now. Today zfs is more mature and stable in Solaris. Solaris NFS is clearly more mature and performant than in FreeBSD. OpenSolaris native CIFS is apparently quite a good performer. I find that Solaris 10 with Samba works well for me. Solaris 10''s Live Upgrade (and the OpenSolaris equivalent) is quite valuable in that it allows you to upgrade the OS without more than a few minutes of down-time and with a quick fall-back if things don''t work as expected. It is more straightforward to update a FreeBSD install from source code because that is the way it is normally delivered. Sometimes this is useful in order to incorporate a fix as soon as possible without needing to wait for someone to produce binaries. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. Does anyone have any experience or pointers with this NAS software? -- This message posted from opensolaris.org
On Wednesday, April 7, 2010, Jason S <j.single at shaw.ca> wrote:> Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. > > Does anyone have any experience or pointers with this NAS software? > -- > This message posted from opensolaris.org > _I wouldn''t waste your time. My last go round lacp was completely broken for no apparent reason. The community is basically non-existent. --Tim
On Apr 7, 2010, at 16:47, Bob Friesenhahn wrote:> Solaris 10''s Live Upgrade (and the OpenSolaris equivalent) is quite > valuable in that it allows you to upgrade the OS without more than a > few minutes of down-time and with a quick fall-back if things don''t > work as expected. > > It is more straightforward to update a FreeBSD install from source > code because that is the way it is normally delivered. Sometimes > this is useful in order to incorporate a fix as soon as possible > without needing to wait for someone to produce binaries.If you''re going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc It''s not as comprehensive as FreeBSD Ports (21,500 and counting), but it has the major stuff and is quite good. I''d also look into the FreeBSD Handbook: http://freebsd.org/handbook
On Apr 7, 2010, at 3:24 PM, Tim Cook wrote:> On Wednesday, April 7, 2010, Jason S <j.single at shaw.ca> wrote: >> Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. >> >> Does anyone have any experience or pointers with this NAS software? >> -- >> This message posted from opensolaris.org >> _ > > > I wouldn''t waste your time. My last go round lacp was completely > broken for no apparent reason. The community is basically > non-existent.[richard pinches himself... yep, still there :-)] NexentaStor version 3.0 is based on b134 so it has the same basic foundation as the yet-unreleased OpenSolaris 2010.next. For an easy-to-use NAS box for the masses, it is much more friendly and usable than a basic OpenSolaris or Solaris 10 release. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On Wed, 7 Apr 2010, David Magda wrote:>> >> It is more straightforward to update a FreeBSD install from source code >> because that is the way it is normally delivered. Sometimes this is useful >> in order to incorporate a fix as soon as possible without needing to wait >> for someone to produce binaries. > > If you''re going to go with (Open)Solaris, the OP may also want to look into > the multi-platform pkgsrc for third-party open source software: > > http://www.pkgsrc.org/ > http://en.wikipedia.org/wiki/PkgsrcBut this does not update the OS kernel. It is for application packages. I did have to apply a source patch to the FreeBSD kernel the last time around. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, Apr 7, 2010 at 4:27 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Wed, 7 Apr 2010, David Magda wrote: > >> >>> It is more straightforward to update a FreeBSD install from source code >>> because that is the way it is normally delivered. Sometimes this is useful >>> in order to incorporate a fix as soon as possible without needing to wait >>> for someone to produce binaries. >>> >> >> If you''re going to go with (Open)Solaris, the OP may also want to look >> into the multi-platform pkgsrc for third-party open source software: >> >> http://www.pkgsrc.org/ >> http://en.wikipedia.org/wiki/Pkgsrc >> > > But this does not update the OS kernel. It is for application packages. I > did have to apply a source patch to the FreeBSD kernel the last time around. > > This is getting a bit off-topic regarding ZFS, but you only need to patchthe FreeBSD kernel if you don''t want to wait for an errata/security notice to be made. If you can wait, then you can just use the freebsd-update tool to do a binary update of just the affected (files), or even to the next (major or minor) release. Not sure what the equivalent process would be on (Open)Solaris (or if you even can do a patch/source update). However, I believe the mention of pkgsrc was for use on OSol. There''s very little reason to use pkgsrc on FreeBSD. -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/3757f9f7/attachment.html>
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Chris Dunbar > > like to clarify something. If read performance is paramount, am I > correct in thinking RAIDZ is not the best way to go? Would not the ZFS > equivalent of RAID 10 (striped mirror sets) offer better read > performance? In this case, I realize that Jason also needs to maximize > the space he has in order to store all of those legitimately copied > Blu-Ray movies. ;-)During my testing, for sequential reads using 6 disks, I got these numbers: (normalized times performance of a single disk) Stripe of 3 mirrors: 10.89 Raidz 6disks: 9.84 Raidz2 6disks: 7.17 Where any number >2 would max out a GigE. The main performance advantage of the stripe of mirrors is in the random reads, which aren''t very significant for Jason''s case. http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of David Magda > > If you''re going to go with (Open)Solaris, the OP may also want to look > into the multi-platform pkgsrc for third-party open source software: > > http://www.pkgsrc.org/ > http://en.wikipedia.org/wiki/PkgsrcAm I mistaken? I thought pkgsrc was for netbsd. For solaris/opensolaris, I would normally say opencsw or blastwave. (And in some circumstances, sunfreeware.)
On Apr 7, 2010, at 19:58, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of David Magda >> >> If you''re going to go with (Open)Solaris, the OP may also want to >> look >> into the multi-platform pkgsrc for third-party open source software: >> >> http://www.pkgsrc.org/ >> http://en.wikipedia.org/wiki/Pkgsrc > > Am I mistaken? I thought pkgsrc was for netbsd. > For solaris/opensolaris, I would normally say opencsw or blastwave. > (And in > some circumstances, sunfreeware.)It was originally created by the NetBSD (forking from FreeBSD), but like everything else they seem to do, it''s multi-platform: BSDs, Linux, Darwin/OS X, IRIX, AIX, Interix, QNX, HP-UX, and Solaris. AFAIK you can do cross-compiling as well (i.e., use Pkgsrc on Linux/AMD to compile a package for IRIX/MIPS). Pkgsrc currently has 9500 packages; Blastwave 4500; OpenCSW about 2300 AFAICT; FreeBSD Ports, 21500. YMMV.
On Wed, Apr 7, 2010 at 5:59 PM, Richard Elling <richard.elling at gmail.com>wrote:> On Apr 7, 2010, at 3:24 PM, Tim Cook wrote: > > On Wednesday, April 7, 2010, Jason S <j.single at shaw.ca> wrote: > >> Since i already have Open Solaris installed on the box, i probably wont > jump over to FreeBSD. However someone has suggested to me to look into > www.nexenta.org and i must say it is quite interesting. Someone correct me > if i am wrong but it looks like it is Open Solaris based and has basically > everything i am looking for (NFS and CIFS sharing). I am downloading it > right now and am going to install it on another machine to see if this GUI > is easy enough to use. > >> > >> Does anyone have any experience or pointers with this NAS software? > >> -- > >> This message posted from opensolaris.org > >> _ > > > > > > I wouldn''t waste your time. My last go round lacp was completely > > broken for no apparent reason. The community is basically > > non-existent. > > [richard pinches himself... yep, still there :-)] > > NexentaStor version 3.0 is based on b134 so it has the same basic > foundation > as the yet-unreleased OpenSolaris 2010.next. For an easy-to-use NAS box > for the masses, it is much more friendly and usable than a basic > OpenSolaris > or Solaris 10 release. > -- richard > > ZFS storage and performance consulting at http://www.RichardElling.com > ZFS training on deduplication, NexentaStor, and NAS performance > Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com > >**Unless of course you were looking for any community support or basic LACP functionality. ;) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/bb485cfe/attachment.html>
On Wed, Apr 7, 2010 at 4:58 PM, Edward Ned Harvey <solaris2 at nedharvey.com>wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of David Magda > > > > If you''re going to go with (Open)Solaris, the OP may also want to look > > into the multi-platform pkgsrc for third-party open source software: > > > > http://www.pkgsrc.org/ > > http://en.wikipedia.org/wiki/Pkgsrc > > Am I mistaken? I thought pkgsrc was for netbsd. > For solaris/opensolaris, I would normally say opencsw or blastwave. (And > in > some circumstances, sunfreeware.) > >pkgsrc is available for several Unix-like systems. NetBSD is just the origin of it, and the main development environment. It''s even available for MacOSX, DragonFlyBSD, Linux distros, and more. -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100407/b99016fd/attachment.html>
Go with the 2x7 raidz2. When you start to really run out of space, replace the drives with bigger ones. You will run out of space eventually regardless; this way you can replace 7 at a time, not 14 at a time. With luck, each replacement will last you long enough that the next replacement will come when the next generation of drive sizes is at the price sweet-spot. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100408/a80d16e9/attachment.bin>
Daniel Carosone wrote:> Go with the 2x7 raidz2. When you start to really run out of space, > replace the drives with bigger ones. You will run out of space > eventually regardless; this way you can replace 7 at a time, not 14 at > a time. With luck, each replacement will last you long enough that > the next replacement will come when the next generation of drive sizes > is at the price sweet-spot. > > -- > Dan.While that''s great in theory, there''s getting to be a consensus that 1TB 7200RPM 3.5" Sata drives are really going to be the last usable capacity. Drives above that simply hit the "too big and too slow" barrier - that is, they''re capacity has outstripped their performance to a degree that it''s actively harmful. Maybe if 15k SAS drives in the 1TB+ range become available (and cheap), they''ll be useful, but it looks like hard drives are really at the end of their advancement, as far as capacities per drive go. >1TB drives currently have excessively long resilver time, inferior reliability (for the most part), and increased power consumption. I''d generally recommend that folks NOT step beyond the 1TB capacity at the 3.5" hard drive format. Bottom line: while densities of storage are going up exponentially, throughput is barely linear, and IOPS has effectively brick-walled. And error rates are actually dropping (i.e. hard errors per full drive capacity). All of which combine to make the larger sized drives unreliable enough to avoid them. So, while it''s nice that you can indeed seemlessly swap up drives sizes (and your recommendation of using 2x7 helps that process), in reality, it''s not a good idea to upgrade from his existing 1TB drives. Now, in the Real Near Future when we have 1TB+ SSDs that are 1cent/GB, well, then, it will be nice to swap up. But not until then... -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On Thu, 8 Apr 2010, Erik Trimble wrote:> While that''s great in theory, there''s getting to be a consensus that 1TB > 7200RPM 3.5" Sata drives are really going to be the last usable capacity.Agreed. The 2.5" form factor is rapidly emerging. I see that enterprise 6-Gb/s SAS drives are available with 600GB capacity already. It won''t be long until they also reach your 1TB "barrier".> So, while it''s nice that you can indeed seemlessly swap up drives sizes (and > your recommendation of using 2x7 helps that process), in reality, it''s not a > good idea to upgrade from his existing 1TB drives.It would make more sense to add a new chassis, or replace the existing chassis with one which supports more (physically smaller) drives. While products are often sold based on their ability to be upgraded, upgrades often don''t make sense.> Now, in the Real Near Future when we have 1TB+ SSDs that are 1cent/GB, well, > then, it will be nice to swap up. But not until then...I don''t see that happening any time soon. FLASH is close to hitting the wall on device geometries and tri-level and quad-level only gets you so far. A new type of device will need to be invented. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Apr 8, 2010, at 8:52 AM, Bob Friesenhahn wrote:> On Thu, 8 Apr 2010, Erik Trimble wrote: >> While that''s great in theory, there''s getting to be a consensus that 1TB 7200RPM 3.5" Sata drives are really going to be the last usable capacity.I doubt that 1TB (or even 1.5TB) 3.5" disks are being manufactured anymore. These have dropped to the $100 price barrier already. 2TB are hanging out around $150.> Agreed. The 2.5" form factor is rapidly emerging. I see that enterprise 6-Gb/s SAS drives are available with 600GB capacity already. It won''t be long until they also reach your 1TB "barrier".Yep, seeing some nice movement in this space.>> So, while it''s nice that you can indeed seemlessly swap up drives sizes (and your recommendation of using 2x7 helps that process), in reality, it''s not a good idea to upgrade from his existing 1TB drives. > > It would make more sense to add a new chassis, or replace the existing chassis with one which supports more (physically smaller) drives. While products are often sold based on their ability to be upgraded, upgrades often don''t make sense. > >> Now, in the Real Near Future when we have 1TB+ SSDs that are 1cent/GB, well, then, it will be nice to swap up. But not until then... > > I don''t see that happening any time soon. FLASH is close to hitting the wall on device geometries and tri-level and quad-level only gets you so far. A new type of device will need to be invented.It is a good idea to not bet against Moore''s Law :-) The current state of the art is an 8GB (byte, not bit) MLC flash chip which is 162 mm^2. In the space of a 2.5" disk with some clever packaging you could pack dozens of TB. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
>>>>> "dm" == David Magda <dmagda at ee.ryerson.ca> writes: >>>>> "bf" == Bob Friesenhahn <bfriesen at simple.dallas.tx.us> writes:dm> OP may also want to look into the multi-platform pkgsrc for dm> third-party open source software: +1. jucr.opensolaris.org seems to be based on RPM which is totally fail. RPM is the oldest, crappiest, most frustrating thing! packages are always frustrating but pkgsrc is designed to isolate itself from the idiosyncracies of each host platform, through factoring. Its major weakness is upgrades, but with Solaris you can use zones and snapshots to make this a lot less painful: * run their ``bulk build'''' inside a zone. The ``bulk build'''' feature is like the jucr: it downloads stuff from all over the internet and bulids it, generates a tree of static web pages to report its results, plus a repository of binary packages. Like jucr it does not build packages on an ordinary machine, but in a well-specified minimal environment which has installed only the packages named as build dependencies---between each package build the bulk scripts remove all not-needed packages. Thus you really need a separate machine, like a zone, for bulk building. There is a non-bulk way to build pkgsrc, but it''s not as good. Except that unlike the jucr, the implementation of the bulk build is included in the pkgsrc distribution and supported and ordinary people who run pkgsrc are expected to use it themselves. * clone a zone, upgrade the packages inside it using the binary packages produced by the bulk build, and cut services over to the clone only after everything''s working right. Both of these things are a bit painful with pkgsrc on normal systems and much easier with zones and ZFS. The type of upgrade that''s guaranteed to work on pkgsrc, is: * to take a snapshot of /usr/pkgsrc which *is* pkgsrc, all packages'' build instructions, and no binaries under this tree * ``bulk build'''' * replace all your current running packages with the new binary packages in the repository the bulk build made. In practice people usually rebuild less than that to upgrade a package, and it often works anyway, but if it doesn''t work then you''re left wondering ``is pkgsrc just broken again, or will a more thorough upgrade actually work?'''' The coolest immediate trick is that you can run more than one bulk build with different starting options, ex SunPro vs gcc, 32 vs 64-bit. The first step of using pkgsrc is to ``bootstrap'''' it, and during bootstrap you choose the C compiler and also whether to use host''s or pkgsrc''s versions of things like perl and pax and awk. You also choose prefixes for /usr /var and /etc and /var/db/pkg that will isolate all pkgsrc files from the rest of the system. In general this level of pathname flexibility is only achievable at build time, so only a source-based package system can pull off this trick. The corrolary is that you can install more than one pkgsrc on a single system and choose between them with PATH. pkgsrc is generally designed to embed full pathnames of its shared libs, so this has got a good shot of working. You could have /usr/pkg64 and /usr/pkg32, or /usr/pkg-gcc and /usr/pkg-spro. pkgsrc will also build pkg_add, pkg_info, u.s.w. under /usr/pkg-gcc/bin which will point to /var/db/pkg-gcc or whatever to track what''s installed, so you can have more than one pkg_add on a single system pointing to different sets of directories. You could also do weirder things like use different paths every time you do a bulk build, like /usr/pkg-20100130 and /usr/pkg-20100408, although it''s very strange to do that so far. It would also be possible to use ugly post-Unix directory layouts, ex /pkg/<marker>/usr/bin and /pkg/<marker>/etc and /pkg/<marker>/var/db/pkg, and then make /pkg/<marker> into a ZFS that could be snapshotted and rolled back. It is odd in pkgsrc world to put /var/db/pkg tracking-database of what''s installed into the same subtree as the installed stuff itself, but in the context of ZFS it makes sense to do that. However the pathnames will be fixed for a given set of binary packages, so whatever you do with the ZFS the results of bulk builds sharing a common ``bootstrap'''' phase would have to stay mounted on the same directory. You cannot clone something to a new directory then add/remove packages. There was an attempt called ``pkgviews'''' to do something like this, but I think it''s ultimately doomed because the idea''s not compartmentalized enough to work with every package. In general pkgsrc gives you a toolkit for dealing with suboptimal package trees where a lot of shit is broken. It''s well-adapted to the ugly modern way we run Unixes, sealed, with only web facing the users, because you can dedicate an entire bulk build to one user-facing app. If you have an app that needs a one-line change to openldap, pkgsrc makes it easy to perform this 1-line change and rebuild 100 interdependent packages linked to your mutant library, culminating in your app''s package, and partition all of it inside /usr/pkg-weirdapp/bin and /etc/pkg-weirdapp and /var/pkg-weirdapp while the rest of the system uses the normal ldap library. IPS repositories will always have fixed install paths because most of these things are only easy to set at build time. Developing packages with pkgtool/rpm is really unpleasant. Here are some things pkgsrc does for me that pkgtool won''t: * I had to learn you must incant ''-lldap_r-2.4 -llber-2.4'' whenever you want to use ``real'''' LDAP client on Solaris, and nonsense has to go inside every single ``spec'''' file. With pkgsrc there is a ``buildlink'''' framework for ldap. If a host OS is weird, like Solaris, ``buildlink'''' will make it look normal, quietly adding -L<blah> symlink spaghetti to either make the modern Solaris libs appear under the plain -lldap names without any -2.4 garbage, or else build openldap from pkgsrc and point to the local copy. I will have to learn how to use LDAP properly within pkgsrc, regardless of what is my host platform. But if i were using pkgsrc, I probably wouldn''t even know about the -lldap_r-2.4 -llber-2.4 stupidity. Life is too short for this garbage. And when Solaris finally undoes the ldap crazy, you can fix it in one place in the pkgsrc framework instead of having to undo ad-hoc patches in 1000 .spec files. * If you want to patch a configure.in or Makefile.in under pkgsrc, there are macros to rerun autoconf, automake, and libtool. You can ask for specific versions of any of them, and they''ll be automatically built for you. Anything a package builder does all the time is factored out like this and becomes a single line in the package Makefile expressing what you want. And no one ever patches a ''configure'' script itself. That''s painful garbage, and the patch can''t be submitted upstream so it''s wasted work of the packager. We only patch the .in files and then wave a hand and say ``rebuild it''''. * When you are developing a RPM/pkgbuild package, over and over you say ''pkgtool build-only'', and the stupid thing rm -rf''s the build tree and starts over. pkgsrc is Makefile-based, and will start from where it left off. You can hit ^C, change something, and restart the build even, just like normal software development. Of course bulk builds do not work this way, but pkgsrc understand a tight development cycle is needed for the developers of packages, and this enforced clean, a, b, c, PANIC shit is no good when you are invoking the tool from the command line. There is even some imperfect provision within buildlink for running binaries from the not-installed build directory and having them pull in shared libs from other build directories inside pkgsrc instead of from /usr/pkg/lib, but IMHO this kind of thing is doomed and silly in the long run, and build dependencies should be fully installed inside the development zone. There are actually a lot of things gentoo does better than pkgsrc IMHO, but then pkgsrc bulk builds are smarter about the hidden dependency problem, and about marking whether two different versioned packages are binary-compatible or only source-compatible with each other. Neither one''s perfect, but I''ll take either one over RPM-derived straightjacketed chaos. and like I said ZFS+zones help with lots of what I hated about pkgsrc before (to make this a little more OT). Clonable filesystems and lightweight zones are real revolution-makers for packaging tools, and are the main reason I think source-based package systems have the strongest future. It is really possible, if you feel like it, to rebuild from source everything linked to libpng just to fix a small security bug in libpng, then push out the result with something rsync-ish, and maybe even install the result atomically, and it is cheaper to do this kind of thing than drive yourself blind with manual regression tests. Most of the time in a bulk build is spent installing and uninstalling packages, but with some clever AI the bulk build framework could map out the entire process and mark off certain nodes of the dependency graph which are unusually dense, and take snapshots of a filesystem with that set of packages installed which it coudl start with, then install or uninstall a few packages to reach the exact set specified by the build dependencies. This could make it possible to finish a lot more bulk builds per day. The type of regression tests the developers of big messy things like X11 and GNOME really need are the sort you can only do with an autobuild cluster, so I really think this is the way forward. bf> But this does not update the OS kernel. It is for application bf> packages. I did have to apply a source patch to the FreeBSD bf> kernel the last time around. yes, well, that is why it works on Solaris at all. It has nothing to do with kernels, and starts from the quaint and ancient tradition that any unix system will provide some form of basically standard application development environment. It''s a userland package system that works on a variety of ``less equal'''' operating systems. Of course it will not take over anything outside itself like kernels. and it''s not part of FreeBSD at all. It will run there, but you may as well use ports instead on FreeBSD because a lot more packages will be working properly then. pkgsrc is used on NetBSD, DragonFly, Mac OS X, u.s.w. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100408/bf362be4/attachment.bin>
On Thu, Apr 08, 2010 at 12:14:55AM -0700, Erik Trimble wrote:> Daniel Carosone wrote: >> Go with the 2x7 raidz2. When you start to really run out of space, >> replace the drives with bigger ones. > > While that''s great in theory, there''s getting to be a consensus that 1TB > 7200RPM 3.5" Sata drives are really going to be the last usable capacity.I dunno. The ''forces'' and issues you describe are real, but ''usable'' depends very heavily on the user''s requirements. For example, a large amount of the extra space available on a larger drive may be very rarely accessed in normal use (scrubs and resilvers aside). In the OP''s example of an ever-expanding home media collection, much of it will never or very rarely get re-watched. Another common use for the extra space is simply storing more historical snapshots, against the unlikely future need to access them. For such data, speed is really not a concern at all. For the subset of users for whom these forces are not overwhelming for real usage, that leaves scrubs and resilvers. There is room for improvement in zfs here, too - a more sequential streaming access pattern would help. To me, the biggest issue you left unmentioned is the problem of backup. There''s little option for backing up these larger drives, other than more of the same drives. In turn, lots of the use such drives will be put to, is for backing up other data stores, and there again, the usage pattern fits the above profile well. Another usage pattern we may see more of, and that helps address some of the performance issues, is this. Say I currently have 2 pools of 1TB disks, one as a backup for the other. I want to expand the space. I replace all the disks with 2TB units, but I also change my data distribution as it grows: now, each pool is to be at most half-full of data, and the other half is used as a backup of the opposite pool. ZFS send is fast enough that the backup windows are short, and I now have effectively twice as many spindles in active service.> [..] it looks like hard drives are really at the end of their > advancement, as far as capacities per drive go.The challenges are undeniable, but that''s way too big a call. Those are words you will regret in future; at least, I hope the future will be one in which those words are regrettable. :-)> >1TB drives currently have excessively long resilver time, inferior > reliability (for the most part), and increased power consumption.Yes, for the most part. However, a 2TB drive has dramatically less power consumption than 2x1TB drives (and less of other valuable resources, like bays and controller slots).> I''d generally recommend that folks NOT step beyond the 1TB capacity > at the 3.5" hard drive format.A general recommendation is fine, and this is one I agree with for many scenarios. At least, I''d recommend that folks look more closely at alternatives using 2.5" drives and sas expander bays than they might otherwise.> So, while it''s nice that you can indeed seemlessly swap up drives sizes > (and your recommendation of using 2x7 helps that process), in reality, > it''s not a good idea to upgrade from his existing 1TB drives.So what does he do instead, when he''s running out of space and 1TB drives are hard to come by? The advice still stands, as far as I''m concerned: do something now, that will leave you room for different expansion choices later - and evaluate the best expansion choice later, when the parameters of the time are known. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100409/c141f832/attachment.bin>
On Fri, 2010-04-09 at 08:07 +1000, Daniel Carosone wrote:> On Thu, Apr 08, 2010 at 12:14:55AM -0700, Erik Trimble wrote: > > Daniel Carosone wrote: > >> Go with the 2x7 raidz2. When you start to really run out of space, > >> replace the drives with bigger ones. > > > > While that''s great in theory, there''s getting to be a consensus that 1TB > > 7200RPM 3.5" Sata drives are really going to be the last usable capacity. > > I dunno. The ''forces'' and issues you describe are real, but ''usable'' > depends very heavily on the user''s requirements. >Well.... The problem is (and this isn''t just a ZFS issue) that resilver and scrub times /are/ very bad for >1TB disks. This goes directly to the problem of redundancy - if you don''t really care about resilver/scrub issues, then you really shouldn''t bother to use Raidz or mirroring. It''s pretty much in the same ballpark. That is, >1TB 3.5" drives have such long resilver/scrub times that with ZFS, it''s a good bet you can kill a second (or third) drive before you can scrub or resilver in time to compensate for the already-failed one. Put it another way, you get more errors before you have time to fix the old ones, which effectively means you now can''t fix errors before they become permanent. Permanent errors = data loss.> For example, a large amount of the extra space available on a larger > drive may be very rarely accessed in normal use (scrubs and resilvers > aside). In the OP''s example of an ever-expanding home media > collection, much of it will never or very rarely get > re-watched. Another common use for the extra space is simply storing > more historical snapshots, against the unlikely future need to access > them. For such data, speed is really not a concern at all. >Yes, it is. It''s still a concern, and not just in the scrub/resilver arena. Big drives have considerably lower performance, to the point where that replacing 1TB drives with 2TB drives may very well drop them below the threshold where they start to see stutter. That is, while the setup may work with 1TB drives, it won''t with 2TB drives. It''s not a no-brainer to just upgrade the size. For example, the 2TB 5900RPM 3.5" drives are (on average) over 2x as slow as the 1TB 7200RPM 3.5" drives for most operations. Access time is slower by 40%, and throughput is slower on by 30-50%.> For the subset of users for whom these forces are not overwhelming for > real usage, that leaves scrubs and resilvers. There is room for > improvement in zfs here, too - a more sequential streaming access > pattern would help. >While ZFS certainly has problems with randomly written small-data pools, scrubs and silvers on large streaming writes (like the media server) is rather straightforward. Note that RAID-6 and many RAID-5/3 hardware setups have similar issues. In any case, resilver/scrub times are becoming the dominant factor in reliability of these large drives.> To me, the biggest issue you left unmentioned is the problem of > backup. There''s little option for backing up these larger drives, > other than more of the same drives. In turn, lots of the use such > drives will be put to, is for backing up other data stores, and there > again, the usage pattern fits the above profile well. > > Another usage pattern we may see more of, and that helps address some > of the performance issues, is this. Say I currently have 2 pools of > 1TB disks, one as a backup for the other. I want to expand the > space. I replace all the disks with 2TB units, but I also change my > data distribution as it grows: now, each pool is to be at most > half-full of data, and the other half is used as a backup of the > opposite pool. ZFS send is fast enough that the backup windows are > short, and I now have effectively twice as many spindles in active > service. >Don''t count on ''zfs send'' being fast enough. Even for liberal values of "fast enough" - it''s highly data dependent. For the situation you describe, you''re actually making it worse - now, both pools have a backup I/O load which reduces their available throughput. If you''re talking about a pool that''s already 50% slower than one made of 1TB drives, then, well, you''re hosed.> > [..] it looks like hard drives are really at the end of their > > advancement, as far as capacities per drive go. > > The challenges are undeniable, but that''s way too big a call. Those > are words you will regret in future; at least, I hope the future will > be one in which those words are regrettable. :-) >Honestly, from what I''ve seen and heard both here and on other forums, the writing is on the wall, the fat lady has sung, and Mighty Casey has struck out. The 3.5" winchester hard drive is on terminal life support for use in enterprises. It will linger a little longer in commodity places, where its cost/GB overcomes its weaknesses. 2.5" HDs will last out the decade, as they''re slightly higher performance/GB and space/power savings will allow them to hold off solid-state media for a bit. But solid-state is the future, and a very near future it is.> > >1TB drives currently have excessively long resilver time, inferior > > reliability (for the most part), and increased power consumption. > > Yes, for the most part. However, a 2TB drive has dramatically less > power consumption than 2x1TB drives (and less of other valuable > resources, like bays and controller slots). >GB/Watt, yes. Performance/watt, no. And you chew up additional bays/slots/etc. trying to get back the performance with larger drives.> > I''d generally recommend that folks NOT step beyond the 1TB capacity > > at the 3.5" hard drive format. > > A general recommendation is fine, and this is one I agree with for > many scenarios. At least, I''d recommend that folks look more closely > at alternatives using 2.5" drives and sas expander bays than they > might otherwise. > > > So, while it''s nice that you can indeed seemlessly swap up drives sizes > > (and your recommendation of using 2x7 helps that process), in reality, > > it''s not a good idea to upgrade from his existing 1TB drives. > > So what does he do instead, when he''s running out of space and 1TB > drives are hard to come by? The advice still stands, as far as I''m > concerned: do something now, that will leave you room for different > expansion choices later - and evaluate the best expansion choice > later, when the parameters of the time are known. > -- > Dan.I echo what Bob said earlier: don''t plan on being able to upgrade these disks in-place. Plan for expanding the setup, but you won''t be able to upgrade the 1TB disks to large capacities. At least not for the better part of this decade, until you can replace them with solid-state drives of some sort. As a practical matter, small setups are for the most part not expandable/upgradable much, if at all. Buy what you need now, and plan on rebuying something new in 5-10 years, but don''t think that what you put together now can be continuously upgraded for a decade. You make too many tradeoffs in the initial design to allow for that kind of upgradability. Even enterprise stuff these days is pretty much "dispose and replace", not "upgrade". -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On 04/ 9/10 10:48 AM, Erik Trimble wrote:> Well.... > > The problem is (and this isn''t just a ZFS issue) that resilver and scrub > times /are/ very bad for>1TB disks. This goes directly to the problem > of redundancy - if you don''t really care about resilver/scrub issues, > then you really shouldn''t bother to use Raidz or mirroring. It''s pretty > much in the same ballpark. > > > That is,>1TB 3.5" drives have such long resilver/scrub times that with > ZFS, it''s a good bet you can kill a second (or third) drive before you > can scrub or resilver in time to compensate for the already-failed one. > Put it another way, you get more errors before you have time to fix the > old ones, which effectively means you now can''t fix errors before they > become permanent. Permanent errors = data loss. > > >That''s one of the big problems with the build it now, expand with bigger drives later approach. If you were designing from scratch with 2TB drives, you would be wise to consider triple parity raid, where double parity has acceptable reliability for 1TB drives. Each time drive capacity double (and performance does not) an extra level or parity is required. I guess this extrapolates to one data and N parity drives.. -- Ian.
On Thu, Apr 08, 2010 at 03:48:54PM -0700, Erik Trimble wrote:> Well....To be clear, I don''t disagree with you; in fact for a specific part of the market (at least) and a large part of your commentary, I agree. I just think you''re overstating the case for the rest.> The problem is (and this isn''t just a ZFS issue) that resilver and scrub > times /are/ very bad for >1TB disks. This goes directly to the problem > of redundancy - if you don''t really care about resilver/scrub issues, > then you really shouldn''t bother to use Raidz or mirroring. It''s pretty > much in the same ballpark.Sure, and that''s why you have raidz3 now; also why multi-way mirrors are getting more attention, as the drives are getting large enough that capacities and redundancies previously only available via raidz constructions can now be had with mirrors and a reasonable number of spindles. Large drives (with the constraints you describe) certainly change the deployment scenarios. I don''t agree that they shouldn''t be deployed at all, ever - which seems to be what you''re saying. Take 6x1TB in raidz2, replace with 6x2TB in three-way-mirror. Chances are, you''ve just improved performance. I''m just trying to show it''s really not all that black and white. As for error rates, this is something zfs should not be afraid of. Indeed, many of us would be happy to get drives with less internal ECC overhead and complexity for greater capacity, and tolerate the resultant higher error rates, specifically for use with zfs (sector errors, not overall drive failure, of course). Even if it means I need raidz4, and wind up with the same overall usable space, I may prefer the redundancy across drives rather than within.> That is, >1TB 3.5" drives have such long resilver/scrub times that with > ZFS, it''s a good bet you can kill a second (or third) drive before you > can scrub or resilver in time to compensate for the already-failed one. > Put it another way, you get more errors before you have time to fix the > old ones, which effectively means you now can''t fix errors before they > become permanent. Permanent errors = data loss.Again, potential zfs improvements could help here: - resilver in parallel for multiply redundant vdevs with multiple failures/replacements (currently, I think resilver restarts in this case?) - scrub a (top level) vdev at a time, rather than a whole pool. If I know I''m about to replace a drive, perhaps for capacity upgrade, I''ll scrub first to minimise the chances of tripping over a latent error, especially on the previous drive i just replaced. No need to scrub other vdevs right now. - scrub/resilver selectively by dataset, to allow higher priority data to be given better protection.> For example, the 2TB 5900RPM 3.5" drives are (on average) over 2x as > slow as the 1TB 7200RPM 3.5" drives for most operations. Access time is > slower by 40%, and throughput is slower on by 30-50%.Please, be fair and compare like with like - say replacing 5400rpm 1TB drives. Your same problem would apply if replacing 1TB 7200''s with 1TB 5400''s; it has little to do with the capacity. Indeed, at the same rpm, the higher density has the potential to be faster.> In any case, resilver/scrub times are becoming the dominant factor in > reliability of these large drives.Agreed; I''d argue they have been for some time (ie, even at the 1TB size).> As a practical matter, small setups are for the most part not > expandable/upgradable much, if at all. Buy what you need now, and plan > on rebuying something new in 5-10 years, but don''t think that what you > put together now can be continuously upgraded for a decade.On this, I agree completely, even on a shorter time-scale (say 3-5 years). On each generation, repurpose the previous generation for backup or something else as appropriate. This applies to drives, and to the boxes that house them. Even so, leave yourself wiggle room for upgrades and other unanticipated devlopments in the meantime where you can. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100409/85f4b830/attachment.bin>
Well I would like to thank everyone for there comments and ideas. I finally have this machine up and running with Nexenta Community edition and am really liking the GUI for administering it. It suits my needs perfectly and is running very well. I ended up going with 2 X 7 RaidZ2 vdevs in one pool for a total capacity of 10 TB. One thing i have noticed that seems a littler different from my previous hardware raid controller (Areca) is the data is not constantly being written to the spindles. For example i am copying some large files to the array right now (approx 4 gigs a file) and my network performance is showing a transfer rate on average of 75MB/s. When i physically watch the server i only see a 1-2 second flury of activity on the drives then about 10 seconds of no activity. Is this the nature of ZFS? Thanks for all the help! -- This message posted from opensolaris.org
On Thu, 8 Apr 2010, Jason S wrote:> One thing i have noticed that seems a littler different from my > previous hardware raid controller (Areca) is the data is not > constantly being written to the spindles. For example i am copying > some large files to the array right now (approx 4 gigs a file) and > my network performance is showing a transfer rate on average of > 75MB/s. When i physically watch the server i only see a 1-2 second > flury of activity on the drives then about 10 seconds of no > activity. Is this the nature of ZFS?Yes, this is the nature of ZFS. ZFS batches up writes and writes them in bulk. On a large memory system and with a very high write rate, up to 5 seconds worth of low-level write may be batched up. With a slow write rate, up to 30 seconds of user-level writes may be batched up. The reasons for doing this become obvious when you think about it a bit. Zfs writes data as large transactions (transaction groups) and uses copy on write (COW). Batching up the writes allows more full-blocks to be written, which decreases fragmentation, improves space allocation efficiency, improves write performance, and uses fewer precious IOPS. The main drawback is that reads/writes are temporarily stalled during part of the TXG write cycle. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Apr 8, 2010, at 6:19 PM, Daniel Carosone wrote:> > As for error rates, this is something zfs should not be afraid > of. Indeed, many of us would be happy to get drives with less internal > ECC overhead and complexity for greater capacity, and tolerate the > resultant higher error rates, specifically for use with zfs (sector > errors, not overall drive failure, of course). Even if it means I > need raidz4, and wind up with the same overall usable space, I may > prefer the redundancy across drives rather than within.Disagree. Reliability trumps availability every time. And the problem with the availability provided by redundancy techniques is that the amount of work needed to recover is increasing. This work is limited by latency and HDDs are not winning any latency competitions anymore. To combat this, some vendors are moving to an overprovision model. Current products deliver multiple "disks" in a single FRU with builtin, fine-grained redundancy. Because the size and scope of the FRU is bounded, the recovery can be optimized and the reliability of the FRU is increased. From a market perspective, these solutions are not suitable for the home user because the size and cost of the FRU is high. It remains to be seen how such products survive in the enterprise space as HDDs become relegated to backup roles. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On Thu, Apr 08, 2010 at 08:36:43PM -0700, Richard Elling wrote:> On Apr 8, 2010, at 6:19 PM, Daniel Carosone wrote: > > > > As for error rates, this is something zfs should not be afraid > > of. Indeed, many of us would be happy to get drives with less internal > > ECC overhead and complexity for greater capacity, and tolerate the > > resultant higher error rates, specifically for use with zfs (sector > > errors, not overall drive failure, of course). Even if it means I > > need raidz4, and wind up with the same overall usable space, I may > > prefer the redundancy across drives rather than within. > > Disagree. Reliability trumps availability every time.Often, but not sure about every. The economics shift around too fast for such truisms to be reliable, and there''s always room for an upstart (often in a niche) to make great economic advantages out of questioning this established wisdom. The oft-touted example is google''s servers, but there are many others.> And the problem > with the availability provided by redundancy techniques is that the > amount of work needed to recover is increasing. This work is limited > by latency and HDDs are not winning any latency competitions anymore.We''re talking about generalities; the niche can be very important to enable these kinds of tricks by holding some of the other troubling variables constant (e.g. application/programming platform). It doesn''t really matter whether you''re talking about 1 dual-PSU server vs 2 single-PSU servers, or whole datacentres - except that solid large-scale diversity tends to lessen your concentration (and perhaps spend) on internal redundancy within a datacentre (or disk). Put another way: some application niches are much more able to adopt redundancy techniques that don''t require so much work. Again, for the google example: if you''re big and diverse enough that shifting load between data centres on failure is no work, then moving the load for other reasons is viable too - such as moving to where it''s night time and power and cooling are cheaper. The work has been done once, up front, and the benefits are repeatable.> To combat this, some vendors are moving to an overprovision model. > Current products deliver multiple "disks" in a single FRU with builtin, > fine-grained redundancy. Because the size and scope of the FRU is > bounded, the recovery can be optimized and the reliability of the FRU > is increased.That''s not new. Past examplees in the direct experience of this community include the BladeStor and SSA-1000 storage units, which aggregated disks into failure domains (e.g. drawers) for a (big) density win. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100409/be6d4020/attachment.bin>
I thought I might chime in with my thoughts and experiences. For starters, I am very new to both OpenSolaris and ZFS, so take anything I say with a grain of salt. I have a home media server / backup server very similar to what the OP is looking for. I am currently using 4 x 1TB and 4 x 2TB drives set up as mirrors. Tomorrow, I''m going to wipe my pool and go to 4 x 1TB and 4 x 2TB in two 4 disk raidz''s. I backup my pool to 2 external 2TB drives that are simply striped using zfs send/receive followed by a scrub. As of right now, I only have 1.58TB of actual data. ZFS send over USB2.0 capped out at 27MB/s. The scrub for 1.5TB of backup data on the USB drives took roughly 14 hours. As needed, I''ll destroy the backup pool and add more drives as needed. I looked at a lot of different options for external backup, and decided to go with cheap (USB). I am using 1TB and 2TB WD Caviar Green drives for my storage pool, which are about the cheapest and probably close to the slowest consumer drives you can buy. I''ve only been at this for about 4-5 months now, and thankfully I haven''t had a drive fail yet so I cannot attest to resilver times. I do weekly scrubs on both my rpool and storage pool via a script called through cron. I just set things up to do scrubs during a timeframe when I know I''m not going to be using it for anything. I can''t recall the exact times it took for the scrubs to complete, but it wasn''t anything that interfered with my usage (yet...) The vast majority of any streaming media I do (up to 1080p) is over wireless-n. Occasionally, I will get stuttering (on the HD stuff), but I haven''t looked into whether it was due to a network or I/O bottleneck. Personally, I would think it was due to network traffic, but that is pure speculation. The vast majority of the time, I don''t have any issues whatsoever. The main point I''m trying to make is that I''m not I/O bound at this point. I''m also not streaming to 4 media players simultaneously. I currently have far more storage space than I am using. When I do end up running low on space, I plan to start with replacing the 1TB drives with, hopefully much cheaper at that point, 2TB drives. If using 2 x raidz vdevs doesn''t work well for me, I''ll go back to mirrors and start looking at other options for expansion. I find Erik Trimble''s statements regarding a 1 TB limit on drives to be a very bold statement. I don''t have the knowledge or the inclination to argue the point, but I am betting that we will continue to see advances in storage technology on par with what we have seen in the past. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. Again, I emphasize that this assumption is not based on any sort of knowledge other than past experience with the ever growing storage capacity of physical disks. My personal advice to the OP would be to set up three 4 x 1TB raidz vdevs, and investing in a reasonable backup solution. If you have to use the last two drives, set them up as a mirror. Redundancy is great, but in my humble opinion, for the home user that is using "cheap" hardware, it''s not as critical as performance and available storage space. That particular configuration would give you more IOPS than just two raidz2 vdevs, with slightly less redundancy and slightly more storage space. For my own needs, I don''t see redundancy as being as high a priority as IOPS and available storage space. Everyone has to make their own decision on that, and the ability of ZFS to accommodate a vast array of different individual needs is a big part of what makes it such an excellent filesystem. With a solid backup, there is really no reason you can''t redesign your pool at a later date if need be. Try out what you think will work best, and if that configuration doesn''t work well in some way, adjust and move on... There are a few different schools of thought on how to backup ZFS filesystems. ZFS send/receive works for me, but there are certainly weaknesses with using it as a backup solution (as has been much discussed on this list.) Hopefully, in the future it will be possible to remove vdevs from a pool and to restripe data across a pool. Those particular features would certainly be great for me. Just my thoughts. Eric -- This message posted from opensolaris.org
On Apr 8, 2010, at 9:06 PM, Daniel Carosone wrote:> On Thu, Apr 08, 2010 at 08:36:43PM -0700, Richard Elling wrote: >> On Apr 8, 2010, at 6:19 PM, Daniel Carosone wrote: >>> >>> As for error rates, this is something zfs should not be afraid >>> of. Indeed, many of us would be happy to get drives with less internal >>> ECC overhead and complexity for greater capacity, and tolerate the >>> resultant higher error rates, specifically for use with zfs (sector >>> errors, not overall drive failure, of course). Even if it means I >>> need raidz4, and wind up with the same overall usable space, I may >>> prefer the redundancy across drives rather than within. >> >> Disagree. Reliability trumps availability every time. > > Often, but not sure about every.I am quite sure.> The economics shift around too fast > for such truisms to be reliable, and there''s always room for an > upstart (often in a niche) to make great economic advantages out of > questioning this established wisdom. The oft-touted example is > google''s servers, but there are many others.A small change in reliability for massively parallel systems has a significant, multiplicative effect on the overall system. Companies like Google weigh many factors, including component reliability, when designing the systems.> >> And the problem >> with the availability provided by redundancy techniques is that the >> amount of work needed to recover is increasing. This work is limited >> by latency and HDDs are not winning any latency competitions anymore. > > We''re talking about generalities; the niche can be very important to > enable these kinds of tricks by holding some of the other troubling > variables constant (e.g. application/programming platform). It > doesn''t really matter whether you''re talking about 1 dual-PSU server > vs 2 single-PSU servers, or whole datacentres - except that solid > large-scale diversity tends to lessen your concentration (and perhaps > spend) on internal redundancy within a datacentre (or disk). > > Put another way: some application niches are much more able to adopt > redundancy techniques that don''t require so much work.At the other extreme, if disks were truly reliable, the only RAID that would matter is RAID-0.> Again, for the google example: if you''re big and diverse enough that > shifting load between data centres on failure is no work, then > moving the load for other reasons is viable too - such as moving > to where it''s night time and power and cooling are cheaper. The work > has been done once, up front, and the benefits are repeatable.Most folks never even get to a decent disaster recovery design, let alone a full datacenter mirror :-(>> To combat this, some vendors are moving to an overprovision model. >> Current products deliver multiple "disks" in a single FRU with builtin, >> fine-grained redundancy. Because the size and scope of the FRU is >> bounded, the recovery can be optimized and the reliability of the FRU >> is increased. > > That''s not new. Past examplees in the direct experience of this > community include the BladeStor and SSA-1000 storage units, which > aggregated disks into failure domains (e.g. drawers) for a (big) > density win.Nope. The FRUs for BladeStor and SSA-100 were traditional disks. To see something different you need to rethink the "disk" -- something like a Xiotech ISE. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
Eric Andersen wrote:> I find Erik Trimble''s statements regarding a 1 TB limit on drives to be a very bold statement. I don''t have the knowledge or the inclination to argue the point, but I am betting that we will continue to see advances in storage technology on par with what we have seen in the past. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. Again, I emphasize that this assumption is not based on any sort of knowledge other than past experience with the ever growing storage capacity of physical disks. > >Why thank you for recognizing my bold, God-like predictive powers. It comes from my obviously self-descriptive name, which means "Powerful/Eternal Ruler" <wink> Ahem. I''m not saying that hard drive manufacturers have (quite yet) hit their ability to increase storage densities - indeed, I do expect to see 4TB drives some time in the next couple of years. What I am saying is that it doesn''t matter if areal densities continue to increase - we''re at the point now with 1TB drives where the number of predictable hard error rates is just below the level which we can tolerate. That is, error rates (errors per X bits read/written) have dropped linearly over the past 3 decades, while densities are on a rather severe geometric increase, and data transfer rate is effectively stopped increasing at all. What this means is that while you can build a higher-capacity disk, the time you can effectively use it is dropping (i.e. before it experiences a non-recoverable error and has to be replaced), and the time that it takes to copy off all the data from drive to another one is increasing. If X = (time to use ) and Y = (time to copy off data), when X < 2*Y, you''re screwed. In fact, from an economic standpoint, when X < 100 * Y, you''re pretty much screwed. And 1TB drives are about the place where they can still just pass this test. 1.5TB drives and up aren''t going to be able to pass it. Everything I''ve said applies not only to 3.5" drives, but to 2.5" drives. It''s a problem with the basic winchester hard drive technology. We just get a bit more breathing space (maybe two technology cycles, which in the HD sector means about 3 years) with the 2.5" form factor. But even they are doomed shortly. I got a pack of Bud with your name on it. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Eric Andersen > > I backup my pool to 2 external 2TB drives that are simply striped using > zfs send/receive followed by a scrub. As of right now, I only have > 1.58TB of actual data. ZFS send over USB2.0 capped out at 27MB/s. The > scrub for 1.5TB of backup data on the USB drives took roughly 14 hours. > As needed, I''ll destroy the backup pool and add more drives as needed. > I looked at a lot of different options for external backup, and decided > to go with cheap (USB).I am doing something very similar. I backup to external USB''s, which I leave connected to the server for obviously days at a time ... zfs send followed by scrub. You might want to consider eSATA instead of USB. Just a suggestion. You should be able to go about 4x-6x faster than 27MB/s. I have found external enclosures to be unreliable. For whatever reason, they commonly just flake out, and have to be power cycled. This is unfortunately disastrous to solaris/opensolaris. The machine crashes, you have to power cycle, boot up in failsafe mode, import the pool(s) and then reboot once normal. I am wondering, how long have you been doing what you''re doing? Do you leave your drives connected all the time? Have you seen similar reliability issues? What external hardware are you using? I started doing this on one system (via eSATA) about a year ago. It worked flawlessly for about 4 months before the disk started crashing. I started doing it on another system (via USB) about 6 months ago. It just started crashing a couple of weeks ago. I am now in the market to try and identify any *well made* external enclosures. The best I''ve seen so far is the Dell RD1000, but we''re talking crazy overpriced, and hard drives that are too small to be useful to me.> If we still are capped out at 2TB as the limit for a physical > device in 2 years, I solemnly pledge now that I will drink a six-pack > of beer in his name.I solemnly pledge to do it anyway. And why wait? ;-)
No idea about the build quality, but is this the sort of thing you''re looking for? Not cheap, integrated RAID (sigh), but one cable only http://www.pc-pitstop.com/das/fit-500.asp Cheap, simple, 4 eSATA connections on one box http://www.pc-pitstop.com/sata_enclosures/scsat4eb.asp Still cheap, uses 4x SFF-8470 for a single cable connection http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp Slightly more expensive, but integrated port multiplier means one standard eSATA cable required http://www.pc-pitstop.com/sata_port_multipliers/scsat05b.asp On 9 avr. 2010, at 15:14, Edward Ned Harvey wrote:> I am now in the market to try and identify any *well made* external > enclosures. The best I''ve seen so far is the Dell RD1000, but we''re talking > crazy overpriced, and hard drives that are too small to be useful to me.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100409/3b8bf201/attachment.html>
You may be absolutely right. CPU clock frequency certainly has hit a wall at around 4GHz. However, this hasn''t stopped CPUs from getting progressively faster. I know this is mixing apples and oranges, but my point is that no matter what limits or barriers computing technology hits, someone comes along and finds a way to engineer around it. I have no idea what storage technology will look like years from now, but I will be very surprised if the limitations you''ve listed have held back advances in storage devices. No idea what those devices will look like or how they''ll work. If someone told me roughly 10 years ago that I would be using multi-core processors at the same clock speed as my Pentium 4, I would have probably scoffed at the idea. Here we are. I''m a drinker, not a prophet ;-) Like I said, I''ve built my system planning to upgrade with bigger capacity drives when I start running out of space rather then adding more drives. This is almost certainly unrealistic. I''ve always built my systems around planned upgradeability, but whenever it does come time for an upgrade, it never makes sense to do so. It''s usually much more cost effective to just build a new system with newer and better technology. It should take me a long while to fill up 9TB, but there was a time when I thought a single gigabyte was a ridiculous amount of storage too. Eric On Apr 8, 2010, at 11:21 PM, Erik Trimble wrote:> Eric Andersen wrote: >> I find Erik Trimble''s statements regarding a 1 TB limit on drives to be a very bold statement. I don''t have the knowledge or the inclination to argue the point, but I am betting that we will continue to see advances in storage technology on par with what we have seen in the past. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. Again, I emphasize that this assumption is not based on any sort of knowledge other than past experience with the ever growing storage capacity of physical disks. >> >> > Why thank you for recognizing my bold, God-like predictive powers. It comes from my obviously self-descriptive name, which means "Powerful/Eternal Ruler" <wink> > > Ahem. > > I''m not saying that hard drive manufacturers have (quite yet) hit their ability to increase storage densities - indeed, I do expect to see 4TB drives some time in the next couple of years. > > What I am saying is that it doesn''t matter if areal densities continue to increase - we''re at the point now with 1TB drives where the number of predictable hard error rates is just below the level which we can tolerate. That is, error rates (errors per X bits read/written) have dropped linearly over the past 3 decades, while densities are on a rather severe geometric increase, and data transfer rate is effectively stopped increasing at all. What this means is that while you can build a higher-capacity disk, the time you can effectively use it is dropping (i.e. before it experiences a non-recoverable error and has to be replaced), and the time that it takes to copy off all the data from drive to another one is increasing. If X = (time to use ) and Y = (time to copy off data), when X < 2*Y, you''re screwed. In fact, from an economic standpoint, when X < 100 * Y, you''re pretty much screwed. And 1TB drives are about the place where they can still just pass this test. 1.5TB drives and up aren''t going to be able to pass it. > > Everything I''ve said applies not only to 3.5" drives, but to 2.5" drives. It''s a problem with the basic winchester hard drive technology. We just get a bit more breathing space (maybe two technology cycles, which in the HD sector means about 3 years) with the 2.5" form factor. But even they are doomed shortly. > > > I got a pack of Bud with your name on it. :-) > > > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > Timezone: US/Pacific (GMT-0800) >
> I am doing something very similar. I backup to external USB''s, which I > leave connected to the server for obviously days at a time ... zfs send > followed by scrub. You might want to consider eSATA instead of USB. Just a > suggestion. You should be able to go about 4x-6x faster than 27MB/s.I did strongly consider going with eSATA. What I really wanted to use was FireWire 800 as it is reasonably fast and the ability to daisy chain devices is very appealing, but some of the stuff I''ve read regarding the state of OpenSolaris FireWire drivers scared me off. I decided against eSATA because I don''t have any eSATA ports. I could buy a controller or run SATA to eSATA cables of the four available onboard ports, but either way, when/if I run out of ports, that''s it. With USB, I can always use a hub if needed (at even slower speeds). If OpenSolaris supported SATA port multipliers, I''d have definitely gone with eSATA. The speed issue isn''t really critical to me, especially if I''m doing incremental send/receives. Recovering my data from backup will be a drag, but it is what it is. I decided cheap and simple was best, and went with USB.> I have found external enclosures to be unreliable. For whatever reason, > they commonly just flake out, and have to be power cycled. This is > unfortunately disastrous to solaris/opensolaris. The machine crashes, you > have to power cycle, boot up in failsafe mode, import the pool(s) and then > reboot once normal.This is what I''ve overwhelmingly heard as well. Most people point to the controllers in the enclosures. If I could find a reasonable backup method that avoided external enclosures altogether, I would take that route. For cost and simplicity it''s hard to beat externals.> I am wondering, how long have you been doing what you''re doing? Do you > leave your drives connected all the time? Have you seen similar reliability > issues? What external hardware are you using?Not long (1 week), so I''m just getting started. I don''t leave the drives connected. Plug them in, do a backup, zpool export, unplug and throw in my safe. It''s far from great, but it beats what I had before (nothing). I plan to do an incremental zfs send/receive every 2-4 weeks depending on how much new data I have. I can''t attest to any sort of reliability as I''ve only been at it for a very short period of time. I am using 2TB WD Elements drives (cheap). This particular model (WDBAAU0020HBK-NESN) hasn''t been on the market too terribly long. There is one review on Newegg of someone having issues with one from the start. It sucks, but I think the reality is that it''s pretty much a crapshoot when it comes to reliability on external drives/enclosures.> I started doing this on one system (via eSATA) about a year ago. It worked > flawlessly for about 4 months before the disk started crashing. I started > doing it on another system (via USB) about 6 months ago. It just started > crashing a couple of weeks ago. > > I am now in the market to try and identify any *well made* external > enclosures. The best I''ve seen so far is the Dell RD1000, but we''re talking > crazy overpriced, and hard drives that are too small to be useful to me.If you find something good, please let me know. There are a lot of different solutions for a lot of different scenarios and price points. I went with cheap. I won''t be terribly surprised if these drives end up flaking out on me. You usually get what you pay for. What I have isn''t great, but it''s better than nothing. Hopefully, I''ll never need to recover data from them. If they end up proving to be too unreliable, I''ll have to look at other options. Eric>> If we still are capped out at 2TB as the limit for a physical >> device in 2 years, I solemnly pledge now that I will drink a six-pack >> of beer in his name. > > I solemnly pledge to do it anyway. And why wait? ;-) >
On Fri, Apr 09, 2010 at 10:21:08AM -0700, Eric Andersen wrote:> If I could find a reasonable backup method that avoided external > enclosures altogether, I would take that route.I''m tending to like bare drives. If you have the chassis space, there are 5-in-3 bays that don''t need extra drive carriers, they just slot a bare 3.5" drive. For e.g. http://www.newegg.com/Product/Product.aspx?Item=N82E16817994077 a 5way raidz backup pool would be quite useful. Otherwise, there are esata "docking stations" for 1 or 2 drives. Overall, it''s cheap and you''re far more in control of the unknowns of controllers and chips. Then there are simple boxes to protect the drives in storage/transport, ranging from little silicone sleeves to 5 way hard plastic boxes.> >> If we still are capped out at 2TB as the limit for a physical > >> device in 2 years, I solemnly pledge now that I will drink a six-pack > >> of beer in his name. > > > > I solemnly pledge to do it anyway. And why wait? ;-)+6 -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100410/e2bb5277/attachment.bin>
On Fri, Apr 9, 2010 at 6:14 AM, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Eric Andersen >> >> I backup my pool to 2 external 2TB drives that are simply striped using >> zfs send/receive followed by a scrub. ?As of right now, I only have >> 1.58TB of actual data. ?ZFS send over USB2.0 capped out at 27MB/s. ?The >> scrub for 1.5TB of backup data on the USB drives took roughly 14 hours. >> As needed, I''ll destroy the backup pool and add more drives as needed. >> I looked at a lot of different options for external backup, and decided >> to go with cheap (USB). > > I am doing something very similar. ?I backup to external USB''s, which I > leave connected to the server for obviously days at a time ... zfs send > followed by scrub. ?You might want to consider eSATA instead of USB. ?Just a > suggestion. ?You should be able to go about 4x-6x faster than 27MB/s. > > I have found external enclosures to be unreliable. ?For whatever reason, > they commonly just flake out, and have to be power cycled. ?This is > unfortunately disastrous to solaris/opensolaris. ?The machine crashes, you > have to power cycle, boot up in failsafe mode, import the pool(s) and then > reboot once normal.I think your best bet for an external enclosure is to use a real chassis, like a Supermicro with a SAS backplane or similar. You local whitebox seller (or Newegg, or Silicon Mechanics) should be able to sell something like this. Sans Digital makes a few 4 and 8 drive cases that (for the money) look like they may not suck, with eSATA, USB or SAS connections. $300 for an 8-drive eSATA/PMP chassis, $400 for 8-drive SAS/SATA. I haven''t used them, but from the specs they look not horrible. http://www.newegg.com/Product/Product.aspx?Item=N82E16816111071 http://www.newegg.com/Product/Product.aspx?Item=N82E16816111092 -B -- Brandon High : bhigh at freaks.com
On Sat, Apr 10 at 7:22, Daniel Carosone wrote:>On Fri, Apr 09, 2010 at 10:21:08AM -0700, Eric Andersen wrote: >> If I could find a reasonable backup method that avoided external >> enclosures altogether, I would take that route. > >I''m tending to like bare drives. > >If you have the chassis space, there are 5-in-3 bays that don''t need >extra drive carriers, they just slot a bare 3.5" drive. For e.g. > >http://www.newegg.com/Product/Product.aspx?Item=N82E16817994077I have a few of the 3-in-2 versions of that same enclosure from the same manufacturer, and they installed in about 2 minutes in my tower case. The 5-in-3 doesn''t have grooves in the sides like their 3-in-2 does, so some cases may not accept the 5-in-3 if your case has tabs to support devices like DVD drives in the 5.25" slots. The grooves are clearly visible in this picture: http://www.newegg.com/Product/Product.aspx?Item=N82E16817994075 The doors are a bit "light" perhaps, but it works just fine for my needs and holds drives securely. The small fans are a bit noisy, but since the box lives in the basement I don''t really care. --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
On Fri, Apr 9, 2010 at 9:31 PM, Eric D. Mudama <edmudama at bounceswoosh.org>wrote:> On Sat, Apr 10 at 7:22, Daniel Carosone wrote: > >> On Fri, Apr 09, 2010 at 10:21:08AM -0700, Eric Andersen wrote: >> >>> If I could find a reasonable backup method that avoided external >>> enclosures altogether, I would take that route. >>> >> >> I''m tending to like bare drives. >> >> If you have the chassis space, there are 5-in-3 bays that don''t need >> extra drive carriers, they just slot a bare 3.5" drive. For e.g. >> >> http://www.newegg.com/Product/Product.aspx?Item=N82E16817994077 >> > > I have a few of the 3-in-2 versions of that same enclosure from the > same manufacturer, and they installed in about 2 minutes in my tower > case. > > The 5-in-3 doesn''t have grooves in the sides like their 3-in-2 does, > so some cases may not accept the 5-in-3 if your case has tabs to > support devices like DVD drives in the 5.25" slots. > > The grooves are clearly visible in this picture: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16817994075 > > The doors are a bit "light" perhaps, but it works just fine for my > needs and holds drives securely. The small fans are a bit noisy, but > since the box lives in the basement I don''t really care. > > --eric > > > -- > Eric D. Mudama > edmudama at mail.bounceswoosh.org >At that price, for the 5-in-3 at least, I''d go with supermicro. For $20 more, you get what appears to be a far more solid enclosure. --Tim> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100410/ab24c5e1/attachment.html>
On Sat, Apr 10, 2010 at 12:56:04PM -0500, Tim Cook wrote:> At that price, for the 5-in-3 at least, I''d go with supermicro. For $20 > more, you get what appears to be a far more solid enclosure.My intent with that link was only to show an example, not make a recommendation. I''m glad others have the example I picked as the easiest search result, I don''t. There are plenty of other choices. Note also, the supermicro ones are not trayless. The example was specifically of a trayless model. Supermicro may be good for permanent drives, but the trayless option is convenient for backup bays, where you have 2 or more sets of drives that rotate through the bays in the backup cycle. Getting extra trays can be irritating, and at least some trays make handling and storing the drives outside their slots rather cumbersome (odd corners and edges and stacking difficulties). Having bays that take bare drives is also great for recovering data from disks taken from other machines. Bare drives are also most easily interchangable between racks from different makers - say if a better trayless model became available between the purchase times of different machines. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100411/92fc68dc/attachment.bin>