David Runyon
2007-Sep-26 16:20 UTC
[zfs-discuss] Rule of Thumb for zfs server sizing with (192) 500 GB SATA disks?
I''m trying to get maybe 200 MB/sec over NFS for large movie files (need large capacity to hold all of them). Are there any rules of thumb on how much RAM is needed to handle this (probably RAIDZ for all the disks) with zfs, and how large a server should be used? The throughput required is not so large, so I am thinking an X4100 M2 or X4150 should be plenty. This message posted from opensolaris.org
I''ve been presented with the following scenario: - this is to be used primarily for ORACLE, including usage of ORACLE RMAN backups (to disk) - HP SAN (will NOT do JBOD) - 256 Gb disk available on high-speed Fibre Channel disk, currently on one LUN (1) - 256 Gb disk available on slower-speed SATA disk, currently on one LUN (2) - the mount point must be /ora01 to match legacy scripts First proposed solution: - create one zpool using FC disk to hold ORACLE apps and data - create one zpool using SATA disk to hold user directories, RMAN backup files, etc. I think this lacks vision. My proposal (which I''m rethinking and request suggestions): - create one zpool out of both LUNs, perhaps a zpool mirror - create /ora01, /ora_RMAN and /users zfs file systems underneath Hence: 1. My understanding of the situation is having a zpool on a single LUN precludes some ZFS technology such as self-healing, hence is not best practice. Right? 2. My understanding of the situation is that ZFS is capable of detecting the fast FC disk vs. the slow SATA disk and arranging the underlying storage to best take advantage of this eg. shuffle the snapshots off to slower disk. Right? 3. Suggestions for zpool, zfs file system layout This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.
Marc Bevand
2007-Sep-27 05:02 UTC
[zfs-discuss] Rule of Thumb for zfs server sizing with (192) 500 GB SATA disks?
David Runyon <david.runyon <at> sun.com> writes:> > I''m trying to get maybe 200 MB/sec over NFS for large movie files (need(I assume you meant 200 Mb/sec with a lower case "b".)> large capacity to hold all of them). Are there any rules of thumb on how > much RAM is needed to handle this (probably RAIDZ for all the disks) with > zfs, and how large a server should be used ?If you have a handful of users streaming large movie files over NFS, RAM is not going to be a bottleneck. One of my ultra low-end server (Turion MT-37 2.0 GHz, 512 MB RAM, five 500-GB SATA disk in a raidz1, consumer-grade Nvidia GbE NIC) running an old Nevada b55 install can serve large files at about 650-670 Mb/sec over NFS. CPU is the bottleneck at this level. The same box with a slightly better CPU or a better NIC (with a less CPU-intensive driver that doesn''t generate 45k interrupt/sec) would be capable of maxing out the GbE link. -marc
> David Runyon <david.runyon <at> sun.com> writes: > > > > I''m trying to get maybe 200 MB/sec over NFS for > large movie files (need > > (I assume you meant 200 Mb/sec with a lower case > "b".) > > > large capacity to hold all of them). Are there any > rules of thumb on how > > much RAM is needed to handle this (probably RAIDZ > for all the disks) with > > zfs, and how large a server should be used ? > > If you have a handful of users streaming large movie > files over NFS, > RAM is not going to be a bottleneck. One of my ultra > low-end server > (Turion MT-37 2.0 GHz, 512 MB RAM, five 500-GB SATA > disk in a raidz1, > consumer-grade Nvidia GbE NIC) running an old Nevada > b55 install can > serve large files at about 650-670 Mb/sec over NFS. > CPU is the > bottleneck at this level. The same box with a > slightly better CPU > or a better NIC (with a less CPU-intensive driver > that doesn''t generate > 45k interrupt/sec) would be capable of maxing out the > GbE link. > > -marc > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ssThanks! This message posted from opensolaris.org
Bruce Shaw wrote:> I''ve been presented with the following scenario: > > - this is to be used primarily for ORACLE, including usage of ORACLE > RMAN backups (to disk) > - HP SAN (will NOT do JBOD) > - 256 Gb disk available on high-speed Fibre Channel disk, currently on > one LUN (1)I presume this means a RAID array with nonvolatile write cache...> - 256 Gb disk available on slower-speed SATA disk, currently on one LUN > (2) > - the mount point must be /ora01 to match legacy scripts > > First proposed solution: > > - create one zpool using FC disk to hold ORACLE apps and data > - create one zpool using SATA disk to hold user directories, RMAN backup > files, etc. > > I think this lacks vision. > > My proposal (which I''m rethinking and request suggestions): > > - create one zpool out of both LUNs, perhaps a zpool mirror > - create /ora01, /ora_RMAN and /users zfs file systems underneath > > Hence: > > 1. My understanding of the situation is having a zpool on a single LUN > precludes some ZFS technology such as self-healing, hence is not best > practice. Right?Not completely correct, nor incorrect. You could use copies with a single LUN and be (mostly) protected against data loss. For important data and when you have only one LUN (eg. arrays) using copies is a way to manage your redundancy policies for each file system. In general, we like to see ZFS have the ability to repair data.> 2. My understanding of the situation is that ZFS is capable of > detecting the fast FC disk vs. the slow SATA disk and arranging the > underlying storage to best take advantage of this eg. shuffle the > snapshots off to slower disk. Right?No, not today.> 3. Suggestions for zpool, zfs file system layoutI presume you''ll use Solaris 10, so slogs are not available. Follow the procedures for databases on ZFS at: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Prior to Solaris 10u5 (spring ''08?) you will want to put your Oracle redo log on a separate storage pool, preferrably on the fastest storage. But if you follow this advice, then I wouldn''t try to mirror the redo log pool with a slow device because it is a write-mostly workload and will be sensitive to the latency of commits to both sides of the mirror. NB since you only have one LUN, the performance won''t be great anyway. Or to put this in perspective, competitive TPC-C benchmarks consume thousands of LUNs. In that case, you might consider investing in memory for a larger SGA. -- richard
>> 1. My understanding of the situation is having a zpool on a singleLUN precludes some ZFS technology such as self-healing, hence is not best practice. Right?>For important data and when you have only one LUN (eg. arrays) usingcopies is a way to manage your redundancy policies for each file system. In general, we like to see ZFS have the ability to repair data. I''m not sure what you mean by copies. ZFS snapshots? Clones?>> 3. Suggestions for zpool, zfs file system layout>Follow the procedures for databases on ZFS at:http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Yeah, I read that. Thanks. I''m assuming he means something like: zfs create -o recsize=8k -o mountpoint=/ora01 san01/ora01>Prior to Solaris 10u5 (spring ''08?) you will want to put your Oracleredo log on a separate storage pool, preferrably on the fastest storage. But if you follow this advice, then I wouldn''t try to mirror the redo log pool with a slow device because it is a write-mostly workload and will be sensitive to the latency of commits to both sides of the mirror. I eventually decided mirroring fast storage to slow was a stupid idea. Does the redo log filesystem need the same recsize as the filesystem ORACLE tables themselves? I thought they were more of a flat file.>NB since you only have one LUN, the performance won''t be great anyway.Or to put this in perspective, competitive TPC-C benchmarks consume thousands of LUNs. In that case, you might consider investing in memory for a larger SGA. I would have assumed the SAN would be the bottleneck on this. Would having the SAN publish more LUNs buy me anything in terms of overall performance? Wouldn''t the SAN be spinning its hamster wheel faster trying to deliver the data over multiple LUNs vs. one, losing us any performance gain on the ZFS end? This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.
actually, want 200 megabytes/sec (200 MB/sec), OK with using 2 or 4 GbE ports to network as needed. This message posted from opensolaris.org
David Runyon wrote:> actually, want 200 megabytes/sec (200 MB/sec), OK with using 2 or 4 GbE ports to network as needed.200 MBytes/s isochronous sustained is generally difficult for a small system. Even if you have enough "port bandwidth" you often approach internal bottlenecks of small systems (eg. memory bandwidth). If you look at the large system architectures which implement VOD, such as the Sun Streaming System, http://www.sun.com/servers/networking/streamingsystem/ You''ll notice large (RAM) buffers between the disks and the wire. -- richard
William Papolis
2007-Sep-30 03:48 UTC
[zfs-discuss] Rule of Thumb for zfs server sizing with (192) 500 GB SATA disks?
I checked this out at the Solaris internals link above, because I am also interested in the best setup for ZFS. Assuming 500GB drives ... It turns out that the most cost effective option (meaning the least "lost" drive space due to redundancy is to ... 1. Setup RaidZ of up to 8 drives (All must be the SAME size; 8 x 500GB) 2. Choose either 1 or 2 drives to be parity drives. (You lose this space) Bottom line is ... 1. You have either 3.5TB or 3TB of space and (500 or 1TB for parity) 2. with 500GB or 1TB devoted for redundancy. (for parity) 3. You can simultaneously lose 1 or 2 drives and your ARRAY is still intact! This maximizes space and performance should be up there too, right? How does this compare, performance wise, to 4 sets of 2-way mirrors? (4 sets of 2 -way mirrors is the fastest setup right? But you lose half the available capacity!) Thanks, Bill HERE IS THE QUOTE FROM SOLARIS INTERNALS FOR SIZING AN ARRAY "RAID-Z Configuration Requirements and Recommendations A RAID-Z configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised. * Start a single-parity RAID-Z (raidz) configuration at 3 disks (2+1) * Start a double-parity RAID-Z (raidz2) configuration at 5 disks (3+2) * (N+P) with P = 1 (raidz) or 2 (raidz2) and N equals 2, 4, or 8 * The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups. " This message posted from opensolaris.org
Richard Elling
2007-Oct-01 16:13 UTC
[zfs-discuss] Rule of Thumb for zfs server sizing with (192) 500 GB SATA disks?
comment below... William Papolis wrote:> I checked this out at the Solaris internals link above, because I am also interested in the best setup for ZFS. > > Assuming 500GB drives ... > > It turns out that the most cost effective option (meaning the least "lost" drive space due to redundancy is to ... > > 1. Setup RaidZ of up to 8 drives (All must be the SAME size; 8 x 500GB) > 2. Choose either 1 or 2 drives to be parity drives. (You lose this space) > > Bottom line is ... > 1. You have either 3.5TB or 3TB of space and (500 or 1TB for parity) > 2. with 500GB or 1TB devoted for redundancy. (for parity) > 3. You can simultaneously lose 1 or 2 drives and your ARRAY is still intact! > > This maximizes space and performance should be up there too, right? > How does this compare, performance wise, to 4 sets of 2-way mirrors? (4 sets of 2 -way mirrors is the fastest setup right? But you lose half the available capacity!) > > Thanks, > > Bill > > HERE IS THE QUOTE FROM SOLARIS INTERNALS FOR SIZING AN ARRAY > > "RAID-Z Configuration Requirements and Recommendations > > A RAID-Z configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised. > > * Start a single-parity RAID-Z (raidz) configuration at 3 disks (2+1) > * Start a double-parity RAID-Z (raidz2) configuration at 5 disks (3+2) > * (N+P) with P = 1 (raidz) or 2 (raidz2) and N equals 2, 4, or 8 > * The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups. "As I read this, after two cups of coffee, it seems confusing. The parity disks will be part of the "N" for zpool create. Cindy, let''s try to make this more consistent with the actual commands for managing pools. -- richard