Hello, I have a situation here at the office I would like some advice on. I have 6 Sun Fire x4550(Thumper) that I want to aggregate the storage and create a unified namespace for my client machines. My plan was to export the zpools from each thumper as an iscsi target to a Solaris machine and create a RAIDZ zpool from the iscsi targets. I think this is what they call RAID plaiding(RAID on RAID). This Solaris frontend machine would then share out this zpool via NFS or CIFS. My question is what is the best solution for this? The question i''m facing is how to add additional thumpers since you cannot expand a RAIDZ array. Thanks for taking the time to read this. Larry
What you want to do should actually be pretty easy. On the thumper''s, just do your normal raid-z/raid-z2, and export them to the solaris box. Then on the solaris box, you just create a zpool, and add the LUN''s one at a time. No raid at all. The system should just stripe across all of the LUN''s automagically, and since you''re already doing your raid on the thumper''s, they''re *protected*. You can keep growing the zpool indefinitely, I''m not aware of any maximum disk limitation. On Tue, Mar 25, 2008 at 6:12 PM, Larry Lui <llui at ncmir.ucsd.edu> wrote:> Hello, > I have a situation here at the office I would like some advice on. > > I have 6 Sun Fire x4550(Thumper) that I want to aggregate the storage > and create a unified namespace for my client machines. My plan was to > export the zpools from each thumper as an iscsi target to a Solaris > machine and create a RAIDZ zpool from the iscsi targets. I think this > is what they call RAID plaiding(RAID on RAID). This Solaris frontend > machine would then share out this zpool via NFS or CIFS. > > My question is what is the best solution for this? The question i''m > facing is how to add additional thumpers since you cannot expand a RAIDZ > array. > > Thanks for taking the time to read this. > > Larry > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080326/a078dc97/attachment.html>
I suspect there might be some defaults that are quite conservative, and I can probably get more performance out of the x4500. Using default Sun setup: load averages: 0.06, 0.05, 0.05 14:52:55 62 processes: 61 sleeping, 1 on cpu CPU states: 98.8% idle, 0.0% user, 1.2% kernel, 0.0% iowait, 0.0% swap Memory: 16G real, 1639M free, 209M swap in use, 16G swap free PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 477 daemon 18 60 -20 2424K 1640K sleep 821:14 0.52% nfsd 385 daemon 5 60 -20 2424K 1548K sleep 44:28 0.03% lockd It is rather busy, and yet nearly no CPU ness.. do I need to tell it to use more CPU for NFS? What are common settings used for relatively large NFS usage? -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
If you are using 6 Thumpers via iSCSI to provide storage to your zpool and don''t use either mirroring or RAIDZ/RAIDZ2 across the Thumpers, if one Thumper goes down then your storage pool is unavailable. I think you want some form of RAID at both levels. This message posted from opensolaris.org
On Wed, 26 Mar 2008, Tim wrote:> No raid at all. The system should just stripe across all of the LUN''s > automagically, and since you''re already doing your raid on the thumper''s, > they''re *protected*. You can keep growing the zpool indefinitely, I''m not > aware of any maximum disk limitation.The data may be protected, but the uptime will be dependent on the uptime of all of those systems. Downtime of *any* of the systems in a load-share configuration means downtime for the entire pool. Of course this is the case with any storage system as more hardware is added but autonomously administered hardware is more likely to encounter a problem. Local disk is usually more reliable than remote disk. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Larry Lui wrote:> Hello, > I have a situation here at the office I would like some advice on. > > I have 6 Sun Fire x4550(Thumper) that I want to aggregate the storage > and create a unified namespace for my client machines. My plan was to > export the zpools from each thumper as an iscsi target to a Solaris > machine and create a RAIDZ zpool from the iscsi targets. I think this > is what they call RAID plaiding(RAID on RAID). This Solaris frontend > machine would then share out this zpool via NFS or CIFS. >What is your operating definition of "unified namespace." In my mind, I''ve been providing a unified namespace for 20+ years -- it is a process rather than a product. For example, here at Sun, no matter where I login, I get my home directory.> My question is what is the best solution for this? The question i''m > facing is how to add additional thumpers since you cannot expand a RAIDZ > array. >Don''t think of a thumper as a whole disk. Then expanding a raidz2 (preferred) can be accomplished quite easily. For example, something like 6 thumpers, each providing N iSCSI volumes. You can add another thumper, move the data around and end up with 7 thumpers providing data -- online, no downtime. This will take a good long while to do because you are moving TBytes between thumpers, but it can be done. IMHO, there is some ugliness here. You might see if pNFS, QFS, or Lustre would better suit the requirements at the "unified namespace" level. -- richard
This issue with not having RAID on the front end solaris box is what happens when 1 of the backend thumpers dies. I would imagine that the entire zpool would become useless if 1 of the thumpers should die since the data would be across all the thumpers. Tim wrote:> What you want to do should actually be pretty easy. On the thumper''s, > just do your normal raid-z/raid-z2, and export them to the solaris box. > Then on the solaris box, you just create a zpool, and add the LUN''s one > at a time. No raid at all. The system should just stripe across all of > the LUN''s automagically, and since you''re already doing your raid on the > thumper''s, they''re *protected*. You can keep growing the zpool > indefinitely, I''m not aware of any maximum disk limitation. > > > On Tue, Mar 25, 2008 at 6:12 PM, Larry Lui <llui at ncmir.ucsd.edu > <mailto:llui at ncmir.ucsd.edu>> wrote: > > Hello, > I have a situation here at the office I would like some advice on. > > I have 6 Sun Fire x4550(Thumper) that I want to aggregate the storage > and create a unified namespace for my client machines. My plan was to > export the zpools from each thumper as an iscsi target to a Solaris > machine and create a RAIDZ zpool from the iscsi targets. I think this > is what they call RAID plaiding(RAID on RAID). This Solaris frontend > machine would then share out this zpool via NFS or CIFS. > > My question is what is the best solution for this? The question i''m > facing is how to add additional thumpers since you cannot expand a RAIDZ > array. > > Thanks for taking the time to read this. > > Larry > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- Larry Lui BIRN Coordinating Center UC San Diego 9500 Gilman Drive La Jolla, CA 92093 email: llui at ncmir dot ucsd dot edu phone: 858-822-0702 fax: 858-822-0828
My definition of a "unified namespace" is to provide the end user with 1 logical mount point which would be comprised of an aggregate of all the thumpers. A very simple example, 6 thumpers (17TB each). I want the end user to see one mount point that is 102TB large. I agree with you that there is some ugliness here. Thats why I''m hoping to get some better suggestions on how to accomplish this. I looked at Lustre but it seems to be linux only. Thanks for your input. Richard Elling wrote:> Larry Lui wrote: >> Hello, >> I have a situation here at the office I would like some advice on. >> >> I have 6 Sun Fire x4550(Thumper) that I want to aggregate the storage >> and create a unified namespace for my client machines. My plan was to >> export the zpools from each thumper as an iscsi target to a Solaris >> machine and create a RAIDZ zpool from the iscsi targets. I think this >> is what they call RAID plaiding(RAID on RAID). This Solaris frontend >> machine would then share out this zpool via NFS or CIFS. >> > > What is your operating definition of "unified namespace." In my mind, > I''ve been providing a unified namespace for 20+ years -- it is a process > rather than a product. For example, here at Sun, no matter where I login, > I get my home directory. > >> My question is what is the best solution for this? The question i''m >> facing is how to add additional thumpers since you cannot expand a >> RAIDZ array. >> > > Don''t think of a thumper as a whole disk. Then expanding a raidz2 > (preferred) > can be accomplished quite easily. For example, something like 6 > thumpers, each > providing N iSCSI volumes. You can add another thumper, move the data > around > and end up with 7 thumpers providing data -- online, no downtime. This > will take > a good long while to do because you are moving TBytes between thumpers, but > it can be done. > > IMHO, there is some ugliness here. You might see if pNFS, QFS, or Lustre > would better suit the requirements at the "unified namespace" level. > -- richard >-- Larry Lui BIRN Coordinating Center UC San Diego 9500 Gilman Drive La Jolla, CA 92093 email: llui at ncmir dot ucsd dot edu phone: 858-822-0702 fax: 858-822-0828
On Wed, Mar 26, 2008 at 11:04 AM, Larry Lui <llui at ncmir.ucsd.edu> wrote:> This issue with not having RAID on the front end solaris box is what > happens when 1 of the backend thumpers dies. I would imagine that the > entire zpool would become useless if 1 of the thumpers should die since > the data would be across all the thumpers. > > Tim wrote: > > What you want to do should actually be pretty easy. On the thumper''s, > > just do your normal raid-z/raid-z2, and export them to the solaris box. > > Then on the solaris box, you just create a zpool, and add the LUN''s one > > at a time. No raid at all. The system should just stripe across all of > > the LUN''s automagically, and since you''re already doing your raid on the > > thumper''s, they''re *protected*. You can keep growing the zpool > > indefinitely, I''m not aware of any maximum disk limitation. > > > > > > On Tue, Mar 25, 2008 at 6:12 PM, Larry Lui <llui at ncmir.ucsd.edu > > <mailto:llui at ncmir.ucsd.edu>> wrote: > > > > Hello, > > I have a situation here at the office I would like some advice on. > > > > I have 6 Sun Fire x4550(Thumper) that I want to aggregate the > storage > > and create a unified namespace for my client machines. My plan was > to > > export the zpools from each thumper as an iscsi target to a Solaris > > machine and create a RAIDZ zpool from the iscsi targets. I think > this > > is what they call RAID plaiding(RAID on RAID). This Solaris > frontend > > machine would then share out this zpool via NFS or CIFS. > > > > My question is what is the best solution for this? The question i''m > > facing is how to add additional thumpers since you cannot expand a > RAIDZ > > array. > > > > Thanks for taking the time to read this. > > > > Larry > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > Larry Lui > BIRN Coordinating Center > UC San Diego > 9500 Gilman Drive > La Jolla, CA 92093 > > email: llui at ncmir dot ucsd dot edu > phone: 858-822-0702 > fax: 858-822-0828 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >True, but then it becomes a matter of assumed risk vs. payoff. It seems to me it would be cheaper in the long run to have an entire spare thumper chassis that you could throw the drives into than the price/performance loss of doing raid on the front and the backend. If this is so mission critical that it can''t ever be down, I guess my first response would be "find a different way". In fact, my response would be buy a USP-VM or a Symm if it''s that mission critical, and put a cluster of *whatever* in front of them to serve your nfs traffic. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080326/4b7bcd16/attachment.html>
Larry Lui wrote:> My definition of a "unified namespace" is to provide the end user with 1 > logical mount point which would be comprised of an aggregate of all the > thumpers. A very simple example, 6 thumpers (17TB each). I want the > end user to see one mount point that is 102TB large. > > I agree with you that there is some ugliness here. Thats why I''m hoping > to get some better suggestions on how to accomplish this. I looked at > Lustre but it seems to be linux only. >WIP, see http://wiki.lustre.org/index.php?title=Lustre_OSS/MDS_with_ZFS_DMU But I''m not convinced this is what you are after, either. There are a number of people exporting ZFS+iSCSI to hosts running ZFS and subsequently exporting NFS. While maybe not the best possible performance, it should work ok. I''d suggest a migration plan to expand which will determine your logical volume size. AFAIK, there is little performance characterization that has been done for this, and there are zillions of possible permutations. Of course, backups will be challenging until ADM arrives. http://opensolaris.org/os/project/adm -- richard> Thanks for your input. > > Richard Elling wrote: > >> Larry Lui wrote: >> >>> Hello, >>> I have a situation here at the office I would like some advice on. >>> >>> I have 6 Sun Fire x4550(Thumper) that I want to aggregate the storage >>> and create a unified namespace for my client machines. My plan was to >>> export the zpools from each thumper as an iscsi target to a Solaris >>> machine and create a RAIDZ zpool from the iscsi targets. I think this >>> is what they call RAID plaiding(RAID on RAID). This Solaris frontend >>> machine would then share out this zpool via NFS or CIFS. >>> >>> >> What is your operating definition of "unified namespace." In my mind, >> I''ve been providing a unified namespace for 20+ years -- it is a process >> rather than a product. For example, here at Sun, no matter where I login, >> I get my home directory. >> >> >>> My question is what is the best solution for this? The question i''m >>> facing is how to add additional thumpers since you cannot expand a >>> RAIDZ array. >>> >>> >> Don''t think of a thumper as a whole disk. Then expanding a raidz2 >> (preferred) >> can be accomplished quite easily. For example, something like 6 >> thumpers, each >> providing N iSCSI volumes. You can add another thumper, move the data >> around >> and end up with 7 thumpers providing data -- online, no downtime. This >> will take >> a good long while to do because you are moving TBytes between thumpers, but >> it can be done. >> >> IMHO, there is some ugliness here. You might see if pNFS, QFS, or Lustre >> would better suit the requirements at the "unified namespace" level. >> -- richard >> >> > >
Best option is to stripe pairs of mirrors. So in your case create a pool which stripes over 3 mirrors, this will look like: pool mirror: thumper1 thumper2 mirror: thumper3 thumper4 mirror: thumper5 thumper6 So this will stripe over those 3 mirrors. you can add mirrors if extra space is needed. That''s the way we implement it right now. You can loose up to 3 servers (as long as they don''t belong to the same mirror) Of course the "nas head" is a single point of failure, and clustering iscsi zpools isn''t that easy as you would hope :-( iscsi is not yet supported by the sun cluster framework. One of the drawbacks you can also expect is when you ever have to boot the nashead while some or all of the targets are unavailable, solaris is doing very nasty during start up, instead off just booting and putting the pool in degraded mode you will get the nashead hanging during boot untill you fix the targets, then the nashead continues to boot. If you really concern about speed I would advice you to use Infiniband and not ethernet, its also a good idea to Isolate the iscsi traffic. K This message posted from opensolaris.org