Hi everyone, We are looking at ZFS to use as the back end to a pool of java servers doing image processing and serving this content over the internet. Our current solution is working well but the cost to scale and ability to scale is becoming a problem. Currently: - 20TB NFS servers running FreeBSD - Load balancer in front of them A bit about the workload: - 99.999% large reads, very small write requirement. - Reads average from ~1MB to 60MB. - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s during peak hours. Proposed hardware: - Dell PowerEdge 2970''s, 16GB RAM, quad cores of AMD. - LSI 1068 based SAS cards * 2 per server - 4 MD1000 with 1TB ES2''s * 15 - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis - Intel 10 gig-e to the switching infrastructure Questions: 1) Solaris, OpenSolaris, etc?? What''s the best for production? 2) Anything wrong with the hardware we selected? 3) any other words of wisdom - we are just starting out with ZFS but do have some Solaris background. Thanks! John
On Mon, 9 Feb 2009, John Welter wrote:> A bit about the workload: > > - 99.999% large reads, very small write requirement. > - Reads average from ~1MB to 60MB. > - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s > during peak hours.This is something that ZFS is particularly good at.> Proposed hardware: > > - Dell PowerEdge 2970''s, 16GB RAM, quad cores of AMD. > - LSI 1068 based SAS cards * 2 per server > - 4 MD1000 with 1TB ES2''s * 15 > - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis > - Intel 10 gig-e to the switching infrastructureThe only concern might be with the MD1000. Make sure that you can obtain it as a JBOD SAS configuration without the advertised PERC RAID controller. The PERC RAID controller is likely to get in the way when using ZFS. There has been mention here about unpleasant behavior when hot-swapping a failed drive in a Dell drive array with their RAID controller (does not come back automatically). Typically such simplified hardware is offered as "expansion" enclosures. Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures. It seems that you have done your homework well.> 1) Solaris, OpenSolaris, etc?? What''s the best for production?Choose Solaris 10U6 if OS stability and incremental patches are important for you. ZFS boot from mirrored drives in the PowerEdge 2970 should help make things very reliable, and the OS becomes easier to live-upgrade.> 3) any other words of wisdom - we are just starting out with ZFS but do > have some Solaris background.You didn''t say if you will continue using FreeBSD. While FreeBSD is a fine OS, my experience is that its client NFS read performance is considerably less than Solaris. With Solaris clients and a Solaris server, the NFS read is close to "wire speed". FreeBSD''s NFS client is not so good for bulk reads, presumably due to its read-ahead/caching strategy. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Sorry I wasn''t clear that the clients that hit this NFS back end are all Centos 5.2. FreeBSD is only used for the current NFS servers (a legacy deal) but that would go away with the new Solaris/ZFS back end. Dell will sell their boxes with SAS/5e controllers which are just a LSI 1068 board - these work with the MD1000 as a JBOD (we are doing some testing as we speak and it seems to work). The rest of the infrastructure is Dell so we are trying to stick with them....the devil we know.... ;^) Homework was easy with excellent resources like this list....just lurked awhile and picked up a lot from the traffic. Thanks again. John -----Original Message----- From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us] Sent: Monday, February 09, 2009 11:28 AM To: John Welter Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] ZFS for NFS back end On Mon, 9 Feb 2009, John Welter wrote:> A bit about the workload: > > - 99.999% large reads, very small write requirement. > - Reads average from ~1MB to 60MB. > - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s > during peak hours.This is something that ZFS is particularly good at.> Proposed hardware: > > - Dell PowerEdge 2970''s, 16GB RAM, quad cores of AMD. > - LSI 1068 based SAS cards * 2 per server > - 4 MD1000 with 1TB ES2''s * 15 > - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis > - Intel 10 gig-e to the switching infrastructureThe only concern might be with the MD1000. Make sure that you can obtain it as a JBOD SAS configuration without the advertised PERC RAID controller. The PERC RAID controller is likely to get in the way when using ZFS. There has been mention here about unpleasant behavior when hot-swapping a failed drive in a Dell drive array with their RAID controller (does not come back automatically). Typically such simplified hardware is offered as "expansion" enclosures. Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures. It seems that you have done your homework well.> 1) Solaris, OpenSolaris, etc?? What''s the best for production?Choose Solaris 10U6 if OS stability and incremental patches are important for you. ZFS boot from mirrored drives in the PowerEdge 2970 should help make things very reliable, and the OS becomes easier to live-upgrade.> 3) any other words of wisdom - we are just starting out with ZFS butdo> have some Solaris background.You didn''t say if you will continue using FreeBSD. While FreeBSD is a fine OS, my experience is that its client NFS read performance is considerably less than Solaris. With Solaris clients and a Solaris server, the NFS read is close to "wire speed". FreeBSD''s NFS client is not so good for bulk reads, presumably due to its read-ahead/caching strategy. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/