Patrick M. Hausen
2016-Feb-09 15:54 UTC
Best practices for ZFS setup for a strictly SSD based system?
Hi, all, while there is quite a bit of documentation on how to improve ZFS performance by using a combination of rotating disks and SSDs, I have not found much about an SSD only setup. We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things I am not at all sure about: * Does the recommended limit of 6 disks for a RAIDZ2 still hold? 2x 4 disks is quite a bit of overhead, could I use all 8 in one vdev and get away with it? (The maximum of 6 recommendation is in some old Sun doc) * Will e.g. MySQL still profit from residing on a mirror instead of a RAIDZ2, even if all disks are SSDs? * Does a separate ZIL and/or ARC cache device still make sense? Any pointers or direct help greatly appreciated. Or should I take this to freebsd-fs@? Thanks and best regards, Patrick -- punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe Tel. 0721 9109 0 * Fax 0721 9109 100 info at punkt.de http://www.punkt.de Gf: J?rgen Egeling AG Mannheim 108285
Alan Somers
2016-Feb-09 16:32 UTC
Best practices for ZFS setup for a strictly SSD based system?
On Tue, Feb 9, 2016 at 8:54 AM, Patrick M. Hausen <hausen at punkt.de> wrote:> Hi, all, > > while there is quite a bit of documentation on how to improve ZFS performance > by using a combination of rotating disks and SSDs, I have not found much about > an SSD only setup. > > We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things I am > not at all sure about: > > * Does the recommended limit of 6 disks for a RAIDZ2 still > hold? 2x 4 disks is quite a bit of overhead, could I use all 8 > in one vdev and get away with it? > (The maximum of 6 recommendation is in some old Sun doc)Nah, you can go much higher. This post describes the RAIDZ overhead. The main penalty to larger stripes is lower IOPs. Your RAIDZ array will have the same read IOPs as a single SSD, no matter how large it is. So, for example, a pool of two RAIDZ stripes each containing 4+2 disks will have twice the IOPS as a pool containing one RAIDZ stripe with 8+2 disks, and about the same storage overhead. http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/> > * Will e.g. MySQL still profit from residing on a mirror > instead of a RAIDZ2, even if all disks are SSDs?Yes, because a mirrored vdev has as many read IOPs as all of its disks combined. So a RAID10 of SSDs will have many read IOPs indeed.> > * Does a separate ZIL and/or ARC cache device still > make sense?Usually no. But it might make a difference if the ZIL or L2ARC devices have different characteristics from the regular devices. For example, you might use medium speed MLC flash for your regular vdevs and a very fast, small SLC device for the ZIL. But I wouldn't do it unless you thoroughly test it with your workload.> > Any pointers or direct help greatly appreciated. Or should I take this to freebsd-fs@?Will MySQL access its files in fixed-size records? If so, you can set the recsize filesystem property accordingly. If not, you should probably leave recsize at the default. If you profile MySQL's disk accesses and determine that there is a dominant recordsize, then go ahead and set ZFS's recsize to the next highest power of two. As usual, disable atime.> > Thanks and best regards, > Patrick > ---Alan
Jan Bramkamp
2016-Feb-09 17:28 UTC
Best practices for ZFS setup for a strictly SSD based system?
On 09/02/16 16:54, Patrick M. Hausen wrote:> Hi, all, > > while there is quite a bit of documentation on how to improve ZFS performance > by using a combination of rotating disks and SSDs, I have not found much about > an SSD only setup. > > We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things I am > not at all sure about: > > * Does the recommended limit of 6 disks for a RAIDZ2 still > hold? 2x 4 disks is quite a bit of overhead, could I use all 8 > in one vdev and get away with it? > (The maximum of 6 recommendation is in some old Sun doc)There are multiple reasons to limit number of disks per RAID-Z VDEV. * Resilver time: ZFS has to process all objects ordered by transaction id to resilver a RAID-Z. Resilvering is a torture test for the remaining disks of your degraded RAID-Z and with the ratio of bandwidth to capacity of current hard disks resilvering takes too long. This isn't an issue for SSDs. * For performance estimations think of the RAID-Z of one huge disk with larger blocks but the same IOPS as the slowest disk in the RAID-Z. Databases perform disk I/O in small blocks limiting your RAID-Z to the performance of about one of its member disks. * A ZFS pool can only grow by adding whole VDEVS or replacing all disks in a VDEV one at a time. Using mirror allows the pool to grow in smaller increments.> * Will e.g. MySQL still profit from residing on a mirror > instead of a RAIDZ2, even if all disks are SSDs?Yes OpenZFS schedules reads on mirrors to the disk with the shortest queue thus a mirror offers about sum of its member disks in read performance (IOPS and bandwidth) and the minimum of its member disks in write performance (IOPS and bandwidth). A pool with as many mirrored VDEVs as possible will offer the optimal performance for a given number of disks. For write heavy workloads the quality of the SSDs matters a lot as well. Cheap consumer SSDs can't sustain high write rates for any length of time. Even medium quality SSDs have a lot of jitter and suffer from throughput degradation under sustained write loads. Optimized server SSDs can sustain random write workloads with little jitter and bounded latency. A NVMe SSD can offer an additional order of magnitude performance increase over SATA SSDs but at a significant increase in price. With multiple NVMe SSDs you will run into the current scalability limits of ZFS and GEOM.> * Does a separate ZIL and/or ARC cache device still > make sense?Most likely not. An other optimization is splitting the log and table space and creating a dedicated ZFS dataset for each. Create the dataset containing the table space with the fixed record size of your MySQL backend. ZFS also offers a lot more consistency and atomicity quarantines than required by a minimal POSIX file system. This allows you to further reduce the syncing overhead by tuning MySQL to take advantage of ZFS quarantines.
Dont forget alignment and ashift. You may also want to test compression as well. IF you have spare cpu cycles I would imagine the systems cpu will handle it faster than any onboard ssd compression. Benchmarking would be of use here though. On 9 February 2016 at 15:54, Patrick M. Hausen <hausen at punkt.de> wrote:> Hi, all, > > while there is quite a bit of documentation on how to improve ZFS > performance > by using a combination of rotating disks and SSDs, I have not found much > about > an SSD only setup. > > We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things > I am > not at all sure about: > > * Does the recommended limit of 6 disks for a RAIDZ2 still > hold? 2x 4 disks is quite a bit of overhead, could I use all 8 > in one vdev and get away with it? > (The maximum of 6 recommendation is in some old Sun doc) > > * Will e.g. MySQL still profit from residing on a mirror > instead of a RAIDZ2, even if all disks are SSDs? > > * Does a separate ZIL and/or ARC cache device still > make sense? > > Any pointers or direct help greatly appreciated. Or should I take this to > freebsd-fs@? > > Thanks and best regards, > Patrick > -- > punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe > Tel. 0721 9109 0 * Fax 0721 9109 100 > info at punkt.de http://www.punkt.de > Gf: J?rgen Egeling AG Mannheim 108285 > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"