Gray Carper
2009-Jan-15 07:36 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
Hey, all! Using iozone (with the sequential read, sequential write, random read, and random write categories), on a Sun X4240 system running OpenSolaris b104 (NexentaStor 1.1.2, actually), we recently ran a number of relative performance tests using a few ZIL and L2ARC configurations (meant to try and uncover which configuration would be the best choice). I''d like to share the highlights with you all (without bogging you down with raw data) to see if anything strikes you. Our first (baseline) test used a ZFS pool which had a self-contained ZIL and L2ARC (i.e. not moved to other devices, the default configuration). Note that this system had both SSDs and SAS drive attached to the controller, but only the SAS drives were in use. In the second test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD and the L2ARC on four 146GB SAS drives. Random reads were significantly worse than the baseline, but all other categories were slightly better. In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD and the L2ARC on four 80GB SSDs. Sequential reads were better than the baseline, but all other categories were worse. In the fourth test, we rebuilt the ZFS pool with no separate ZIL, but with the L2ARC on four 146GB SAS drives. Random reads were significantly worse than the baseline and all other categories were about the same as the baseline. As you can imagine, we were disappointed. None of those configurations resulted in any significant improvements, and all of the configurations resulted in at least one category being worse. This was very much not what we expected. For the sake of sanity checking, we decided to run the baseline case again (ZFS pool which had a self-contained ZIL and L2ARC), but this time remove the SSDs completely from the box. Amazingly, the simple presence of the SSDs seemed to be a negative influence - the new SSD-free test showed improvement in every single category when compared to the original baseline test. So, this has lead us to the conclusion that we shouldn''t be mixing SSDs with SAS drives on the same controller (at least, not the controller we have in this box). Has anyone else seen problems like this before that might validate that conclusion? If so, we think we should probably build an SSD JBOD, hook it up to the box, and re-run the tests. This leads us to another question: Does anyone have any recommendations for SSD-performant controllers that have great OpenSolaris driver support? Thanks! -Gray -- Gray Carper MSIS Technical Services University of Michigan Medical School gcarper at umich.edu | skype: graycarper | 734.418.8506 http://www.umms.med.umich.edu/msis/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090115/836a9b19/attachment.html>
Will Murnane
2009-Jan-15 14:00 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
On Thu, Jan 15, 2009 at 02:36, Gray Carper <gcarper at umich.edu> wrote:> In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD and > the L2ARC on four 80GB SSDs.An obvious question: what SSDs are these? Where did you get them? Many, many consumer-level MLC SSDs have controllers by JMicron (also known for their lousy sata controllers, BTW) which cause stalling of all I/O under certain fairly common conditions (see [1]). Spending the cash for an SLC drive (such as the Intel X-25S) may solve the problem. Will [1]: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403&p=8
Gray Carper
2009-Jan-15 15:12 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
Hey there, Will! Thanks for the quick reply and the link. And: Oops! Yes - the SSD models would probably be useful information. ;> The 32GB SSD is an Intel X-25E (SLC). The 80GB SSDs are Intel X-25M (MLC). If MLC drives can be naughty, perhaps we should try an additional test: keep the 80GB SSDs out of the chassis, but leave the 32GB SSD in. -Gray On Thu, Jan 15, 2009 at 10:00 PM, Will Murnane <will.murnane at gmail.com>wrote:> On Thu, Jan 15, 2009 at 02:36, Gray Carper <gcarper at umich.edu> wrote: > > In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD and > > the L2ARC on four 80GB SSDs. > An obvious question: what SSDs are these? Where did you get them? > Many, many consumer-level MLC SSDs have controllers by JMicron (also > known for their lousy sata controllers, BTW) which cause stalling of > all I/O under certain fairly common conditions (see [1]). Spending > the cash for an SLC drive (such as the Intel X-25S) may solve the > problem. > > Will > > [1]: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403&p=8 >-- Gray Carper MSIS Technical Services University of Michigan Medical School gcarper at umich.edu | skype: graycarper | 734.418.8506 http://www.umms.med.umich.edu/msis/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090115/6788c553/attachment.html>
Gray Carper
2009-Jan-15 15:16 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
D''oh - I take that back. Upon re-reading, I expect that you weren''t indicting MLC drives generally, just the JMicron-controlled ones. It looks like we aren''t suffering from those, though. -Gray On Thu, Jan 15, 2009 at 11:12 PM, Gray Carper <gcarper at umich.edu> wrote:> Hey there, Will! Thanks for the quick reply and the link. > > And: Oops! Yes - the SSD models would probably be useful information. ;> > The 32GB SSD is an Intel X-25E (SLC). The 80GB SSDs are Intel X-25M (MLC). > If MLC drives can be naughty, perhaps we should try an additional test: keep > the 80GB SSDs out of the chassis, but leave the 32GB SSD in. > > -Gray > > > On Thu, Jan 15, 2009 at 10:00 PM, Will Murnane <will.murnane at gmail.com>wrote: > >> On Thu, Jan 15, 2009 at 02:36, Gray Carper <gcarper at umich.edu> wrote: >> > In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD >> and >> > the L2ARC on four 80GB SSDs. >> An obvious question: what SSDs are these? Where did you get them? >> Many, many consumer-level MLC SSDs have controllers by JMicron (also >> known for their lousy sata controllers, BTW) which cause stalling of >> all I/O under certain fairly common conditions (see [1]). Spending >> the cash for an SLC drive (such as the Intel X-25S) may solve the >> problem. >> >> Will >> >> [1]: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403&p=8 >> > > > > -- > Gray Carper > MSIS Technical Services > University of Michigan Medical School > gcarper at umich.edu | skype: graycarper | 734.418.8506 > http://www.umms.med.umich.edu/msis/ >-- Gray Carper MSIS Technical Services University of Michigan Medical School gcarper at umich.edu | skype: graycarper | 734.418.8506 http://www.umms.med.umich.edu/msis/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090115/c860662e/attachment.html>
Gray Carper
2009-Jan-16 01:29 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
Hey, Eric! Now things get complicated. ;> I was naively hoping to avoid revealing our exact pool configuration, fearing that it might lead to lots of tangential discussion, but I can see how it may be useful so that you have the whole picture. Time for the big reveal, then... Here''s the exact line used for the baseline test... create volume data raidz1 c3t600144F0494719240000000000000000d0 c3t600144F0494719D40000000000000000d0 c3t600144F049471A5F0000000000000000d0 c3t600144F049471A6C0000000000000000d0 c3t600144F049471A820000000000000000d0 c3t600144F049471A8E0000000000000000d0 ...the line for the 32GB SSD ZIL + 4x146GB SAS L2ARC test... create volume data raidz1 c3t600144F0494719240000000000000000d0 c3t600144F0494719D40000000000000000d0 c3t600144F049471A5F0000000000000000d0 c3t600144F049471A6C0000000000000000d0 c3t600144F049471A820000000000000000d0 c3t600144F049471A8E0000000000000000d0 cache c1t2d0 c1t3d0 c1t5d0 c1t6d0 log c1t4d0 ...the line for the 32GB SSD ZIL + 80GB SSD L2ARC... create volume data raidz1 c3t600144F0494719240000000000000000d0 c3t600144F0494719D40000000000000000d0 c3t600144F049471A5F0000000000000000d0 c3t600144F049471A6C0000000000000000d0 c3t600144F049471A820000000000000000d0 c3t600144F049471A8E0000000000000000d0 cache c1t7d0 c1t8d0 c1t9d0 c1t10d0 log c1t4d0 Now I''m sure someone is asking, "What are those crazy c3t600144F0494719240000000000000000d0, etc pool devices?". They are iSCSI targets. Our X4240 is the head node for virtualizing and aggregating six Thumpers-worth of storage. Each X4500 has its own raidz2 pool that is exported via 10GbE iSCSI, the X4240 collects them all with raidz1, and the resulting pool is about 140TB. To head off a few questions that might lead us astray: We have compelling NAS use-cases for this, it does work, and it is surprisingly fault-tolerant (for example: while under heavy load, we can reboot an entire iSCSI node without losing client connections, data, etc). Using the X25-E for the L2ARC, but having no separate ZIL, sounds like a worthwhile test. Is 32GB large enough for a good L2ARC, though? Thanks! -Gray On Fri, Jan 16, 2009 at 1:16 AM, Eric D. Mudama <edmudama at bounceswoosh.org>wrote:> On Thu, Jan 15 at 15:36, Gray Carper wrote: > >> Hey, all! >> >> Using iozone (with the sequential read, sequential write, random read, >> and >> random write categories), on a Sun X4240 system running OpenSolaris b104 >> (NexentaStor 1.1.2, actually), we recently ran a number of relative >> performance tests using a few ZIL and L2ARC configurations (meant to try >> and uncover which configuration would be the best choice). I''d like to >> share the highlights with you all (without bogging you down with raw >> data) >> to see if anything strikes you. >> >> Our first (baseline) test used a ZFS pool which had a self-contained ZIL >> and L2ARC (i.e. not moved to other devices, the default configuration). >> Note that this system had both SSDs and SAS drive attached to the >> controller, but only the SAS drives were in use. >> > > Can you please provide the exact config, in terms of how the zpool was > built? > > In the second test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD >> and >> the L2ARC on four 146GB SAS drives. Random reads were significantly worse >> than the baseline, but all other categories were slightly better. >> > > In this case, ZIL on the X25-E makes sense for writes, but the SAS > drives read slower than SSDs, so they''re probably not the best L2ARC > units unless you''re using 7200RPM devices in your main zpool. > > In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD and >> the L2ARC on four 80GB SSDs. Sequential reads were better than the >> baseline, but all other categories were worse. >> > > I''m wondering if the single X25-E is not enough faster than the core > pool, making a separate ZIL not worth it. > > In the fourth test, we rebuilt the ZFS pool with no separate ZIL, but >> with >> the L2ARC on four 146GB SAS drives. Random reads were significantly worse >> than the baseline and all other categories were about the same as the >> baseline. >> >> As you can imagine, we were disappointed. None of those configurations >> resulted in any significant improvements, and all of the configurations >> resulted in at least one category being worse. This was very much not >> what >> we expected. >> > > Have you tried using the X25-E as a L2ARC, keep the ZIL default, and > use the SAS drives as your core pool? > > Or were you using X25-M devices as your core pool before? How much > data is in the zpool? > > > -- > Eric D. Mudama > edmudama at mail.bounceswoosh.org > >-- Gray Carper MSIS Technical Services University of Michigan Medical School gcarper at umich.edu | skype: graycarper | 734.418.8506 http://www.umms.med.umich.edu/msis/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090116/c9abfb27/attachment.html>
Brendan Gregg - Sun Microsystems
2009-Jan-16 01:59 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
G''Day Gray, On Thu, Jan 15, 2009 at 03:36:47PM +0800, Gray Carper wrote:> > Hey, all! > Using iozone (with the sequential read, sequential write, random read, > and random write categories), on a Sun X4240 system running > OpenSolaris b104 (NexentaStor 1.1.2, actually), we recently ran a > number of relative performance tests using a few ZIL and L2ARC > configurations (meant to try and uncover which configuration would be > the best choice). I''d like to share the highlights with you all > (without bogging you down with raw data) to see if anything strikes > you. > Our first (baseline) test used a ZFS pool which had a self-contained > ZIL and L2ARC (i.e. not moved to other devices, the default > configuration). Note that this system had both SSDs and SAS drive > attached to the controller, but only the SAS drives were in use. > In the second test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD > and the L2ARC on four 146GB SAS drives. Random reads were > significantly worse than the baseline, but all other categories were > slightly better. > In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD > and the L2ARC on four 80GB SSDs. Sequential reads were better than theThe L2ARC trickle charges (especially since it feeds from random I/O, which by nature has low throughput), and with 4 x 80GB of it online - you could be looking at an 8 hour warmup, or longer. How long did you run iozone for? Also, the zfs recsize makes a difference for random I/O to the L2ARC - you probably want it set to 8 Kbytes or so, before creating files. ... The L2ARC code shipped with the Sun Storage 7000 has had some performance improvements that aren''t in OpenSolaris yet, but will be soon. Brendan -- Brendan Gregg, Sun Microsystems Fishworks. http://blogs.sun.com/brendan
Eric D. Mudama
2009-Jan-16 04:46 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
On Fri, Jan 16 at 9:29, Gray Carper wrote:> Using the X25-E for the L2ARC, but having no separate ZIL, sounds like a > worthwhile test. Is 32GB large enough for a good L2ARC, though?Without knowing much about ZFS internals, I''d just ask if how your average working data set compares to the sizes of the SSDs. --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
Neil Perrin
2009-Jan-16 18:06 UTC
[zfs-discuss] Lackluster ZFS performance trials using various ZIL and L2ARC configurations...
I don''t believe that iozone does any synchronous calls (fsync/O_DSYNC/O_SYNC), so the ZIL and separate logs (slogs) would be unused. I''d recommend performance testing by configuring filebench to do synchronous writes: http://opensolaris.org/os/community/performance/filebench/ Neil. On 01/15/09 00:36, Gray Carper wrote:> Hey, all! > > Using iozone (with the sequential read, sequential write, random read, > and random write categories), on a Sun X4240 system running OpenSolaris > b104 (NexentaStor 1.1.2, actually), we recently ran a number of relative > performance tests using a few ZIL and L2ARC configurations (meant to try > and uncover which configuration would be the best choice). I''d like to > share the highlights with you all (without bogging you down with raw > data) to see if anything strikes you. > > Our first (baseline) test used a ZFS pool which had a self-contained ZIL > and L2ARC (i.e. not moved to other devices, the default configuration). > Note that this system had both SSDs and SAS drive attached to the > controller, but only the SAS drives were in use. > > In the second test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD > and the L2ARC on four 146GB SAS drives. Random reads were significantly > worse than the baseline, but all other categories were slightly better. > > In the third test, we rebuilt the ZFS pool with the ZIL on a 32GB SSD > and the L2ARC on four 80GB SSDs. Sequential reads were better than the > baseline, but all other categories were worse. > > In the fourth test, we rebuilt the ZFS pool with no separate ZIL, but > with the L2ARC on four 146GB SAS drives. Random reads were significantly > worse than the baseline and all other categories were about the same as > the baseline. > > As you can imagine, we were disappointed. None of those configurations > resulted in any significant improvements, and all of the configurations > resulted in at least one category being worse. This was very much not > what we expected. > > For the sake of sanity checking, we decided to run the baseline case > again (ZFS pool which had a self-contained ZIL and L2ARC), but this time > remove the SSDs completely from the box. Amazingly, the simple presence > of the SSDs seemed to be a negative influence - the new SSD-free test > showed improvement in every single category when compared to the > original baseline test. > > So, this has lead us to the conclusion that we shouldn''t be mixing SSDs > with SAS drives on the same controller (at least, not the controller we > have in this box). Has anyone else seen problems like this before that > might validate that conclusion? If so, we think we should probably build > an SSD JBOD, hook it up to the box, and re-run the tests. This leads us > to another question: Does anyone have any recommendations for > SSD-performant controllers that have great OpenSolaris driver support? > > Thanks! > -Gray > -- > Gray Carper > MSIS Technical Services > University of Michigan Medical School > gcarper at umich.edu <mailto:gcarper at umich.edu> | skype: graycarper | > 734.418.8506 > http://www.umms.med.umich.edu/msis/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss