Is it true that Solaris 10 u4 does not have any of the nice ZIL controls that exist in the various recent Open Solaris flavors? I would like to move my ZIL to solid state storage, but I fear I can''t do it until I have another update. Heck, I would be happy to just be able to turn the ZIL off to see how my NFS on ZFS performance is effected before spending the $''s. Anyone know when will we see this in Solaris 10? Thanks, Jon -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 The above link shows how to disable to ZIL for testing purposes (it''s not generally recommended to keep it disabled in production). As to the putpack schedule of recent ZFS features into Solaris 10, I''m afraid I don''t have the information. Hopefully, someone else will know... Thanks, /jim Jonathan Loran wrote:> Is it true that Solaris 10 u4 does not have any of the nice ZIL controls > that exist in the various recent Open Solaris flavors? I would like to > move my ZIL to solid state storage, but I fear I can''t do it until I > have another update. Heck, I would be happy to just be able to turn the > ZIL off to see how my NFS on ZFS performance is effected before spending > the $''s. Anyone know when will we see this in Solaris 10? > > Thanks, > > Jon > >
On Tue, Jan 29, 2008 at 08:28:42PM -0500, Jim Mauro wrote:> > As to the putpack schedule of recent ZFS features into Solaris 10, I''m > afraid I > don''t have the information. Hopefully, someone else will know...I''ve got a box that I''m setting up soon (now, really) and I''d love to know when the next 10 update is being released, and what is expected to make it into that update from ZFS. Boot/Install would be great, but I don''t expect that to happen. :) -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it''s just that most of the shit out there is built by people who''d be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke
As other poster noted, you can disable it completely for testing.>From my understanding though, it''s not as "production-catastrophic" as it sounds to delay or disable ZIL.Many people run Linux boxes with ext3 in the standard setting, which only journals metadata, not file content. So the purpose of journalling for them is only to preserve structural integrity, at the expense of correctness. If you turn on full data journalling in ext3 you pay a speed penalty and very few people do it. With COW you get the same thing even with ZIL off. Maybe you miss some transactions in progress if you lose power, meh..... but the important thing is even with ZIL_disabled you should not ever have concerns about filesystem corruption. Good stuff! OpenSolaris Nevada 78 performed much better in our tests compared to Solaris 10u4. They''ve done a lot of performance work since then This message posted from opensolaris.org
Jonathan Loran writes: > > Is it true that Solaris 10 u4 does not have any of the nice ZIL controls > that exist in the various recent Open Solaris flavors? I would like to > move my ZIL to solid state storage, but I fear I can''t do it until I > have another update. Heck, I would be happy to just be able to turn the > ZIL off to see how my NFS on ZFS performance is effected before spending > the $''s. Anyone know when will we see this in Solaris 10? > You can certainly turn it off with any release (Jim''s link). It''s true that S10u4 does not have the "Separate Intent Log" to allow using an SSD for ZIL blocks. I believe S10U5 will have that feature. As noted, disabling the ZIL won''t lead to ZFS pool corruption, just DB corruption (that includes NFS clients). To protect against that, in the event of a server crash with zil_disable=1, you''d need to reboot all NFS clients of the server (clear the client''s caches) and better do this before the server comes back up (kind of a raw proposition here). -r > Thanks, > > Jon > > -- > > > - _____/ _____/ / - Jonathan Loran - - > - / / / IT Manager - > - _____ / _____ / / Space Sciences Laboratory, UC Berkeley > - / / / (510) 643-5146 jloran at ssl.berkeley.edu > - ______/ ______/ ______/ AST:7731^29u18e3 > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch - PAE wrote:> Jonathan Loran writes: > > > > Is it true that Solaris 10 u4 does not have any of the nice ZIL controls > > that exist in the various recent Open Solaris flavors? I would like to > > move my ZIL to solid state storage, but I fear I can''t do it until I > > have another update. Heck, I would be happy to just be able to turn the > > ZIL off to see how my NFS on ZFS performance is effected before spending > > the $''s. Anyone know when will we see this in Solaris 10? > > > > You can certainly turn it off with any release (Jim''s link). > > It''s true that S10u4 does not have the "Separate Intent Log" > to allow using an SSD for ZIL blocks. I believe S10U5 will > have that feature.Unfortunately it will not. A lot of ZFS fixes and features that had existed for a while will not be in U5 (for reasons I can''t go into here). They should be in S10U6... Neil.
Neil Perrin wrote:> > > Roch - PAE wrote: >> Jonathan Loran writes: >> > > Is it true that Solaris 10 u4 does not have any of the nice ZIL >> controls > that exist in the various recent Open Solaris flavors? I >> would like to > move my ZIL to solid state storage, but I fear I >> can''t do it until I > have another update. Heck, I would be happy >> to just be able to turn the > ZIL off to see how my NFS on ZFS >> performance is effected before spending > the $''s. Anyone know when >> will we see this in Solaris 10? >> > >> You can certainly turn it off with any release (Jim''s link). >> >> It''s true that S10u4 does not have the "Separate Intent Log" to allow >> using an SSD for ZIL blocks. I believe S10U5 will >> have that feature. >Don''t think we can live with this. Thanks> Unfortunately it will not. A lot of ZFS fixes and features > that had existed for a while will not be in U5 (for reasons I > can''t go into here). They should be in S10U6... > > Neil.I feel like we''re being hung out to dry here. I''ve got 70TB on 9 various Solaris 10 u4 servers, with different data sets. All of these are NFS servers. Two servers have a ton of small files, with a lot of read and write updating, and NFS performance on these are abysmal. ZFS is installed on SAN array''s (my first mistake). I will test by disabling the ZIL, but if it turns out the ZIL needs to be on a separate device, we''re hosed. Before ranting any more, I''ll do the test of disabling the ZIL. We may have to build out these systems with Open Solaris, but that will be hard as they are in production. I would have to install the new OS on test systems and swap out the drives during scheduled down time. Ouch. Jon -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
Are you already running with zfs_nocacheflush=1? We have SAN arrays with dual battery-backed controllers for the cache, so we definitely have this set on all our production systems. It makes a big difference for us. As I said before I don''t see the catastrophe in disabling ZIL though. We actually run our production Cyrus mail servers using failover servers so our downtime is typically just the small interval to switch active & idle nodes anyhow. We did this mainly for patching purposes. But we toyed with the idea of running OpenSolaris on them, then just upgrading the idle node to new OpenSolaris image every month using Jumpstart and switching to it. Anything goes wrong switch back to the other node. What we ended up doing, for political reasons, was putting the squeeze on our Sun reps and getting a 10u4 kernel spin patch with... what did they call it? Oh yeah "a big wad of ZFS fixes". So this ends up being a hug PITA because for the next 6 months to a year we are tied to getting any kernel patches through this other channel rather than the usual way. But it does work for us, so there you are. Give my choice I''d go with OpenSolaris but that''s a hard sell for datacenter management types. I think it''s no big deal in a production shop with good JumpStart and CFengine setups, where any host should be rebuildable from scratch in a matter of hours. Good luck. This message posted from opensolaris.org
On Jan 30, 2008, at 3:44 PM, Vincent Fox wrote:> What we ended up doing, for political reasons, was putting the > squeeze on our Sun reps and getting a 10u4 kernel spin patch with... > what did they call it? Oh yeah "a big wad of ZFS fixes". So this > ends up being a hug PITA because for the next 6 months to a year we > are tied to getting any kernel patches through this other channel > rather than the usual way. But it does work for us, so there you > are.Speaking of "big wad of ZFS fixes", is it me or is anyone else here getting kind of displeased over the glacial speed of the backporting of ZFS stability fixes to s10? It seems that we have to wait around 4-5 months for a oft-delayed s10 update for any fixes of substance to come out. Not only that, but also one day the zfs is its own patch, and then it is part of the current KU, and now it''s part of the nfs patch where "zfs" isn''t mentioned anywhere in the patch''s synopsis. /dale
jloran at ssl.berkeley.edu said:> I feel like we''re being hung out to dry here. I''ve got 70TB on 9 various > Solaris 10 u4 servers, with different data sets. All of these are NFS > servers. Two servers have a ton of small files, with a lot of read and > write updating, and NFS performance on these are abysmal. ZFS is installed > on SAN array''s (my first mistake). I will test by disabling the ZIL, but if > it turns out the ZIL needs to be on a separate device, we''re hosed.If you''re using SAN arrays, you should be in good shape. I''ll echo what Vincent Fox said about using either zfs_nocacheflush=1 (which is in S10U4), or setting the arrays to ignore the cache flush (SYNC_CACHE) requests. We do the latter here, and it makes a huge difference for NFS clients, basically putting the ZIL in NVRAM. However, I''m also unhappy about having to wait for S10U6 for the separate ZIL and/or cache features of ZFS. The lack of NV ZIL on our new Thumper makes it painfully slow over NFS for the large number of file create/delete type of workload. Here''s a question: Would having the client mount with "-o nocto" have the same effect (for that particular client) as disabling the ZIL on the server? If so, it might be less drastic than losing the ZIL for everyone. Regards, Marion
Vincent Fox wrote:> Are you already running with zfs_nocacheflush=1? We have SAN arrays with dual battery-backed controllers for the cache, so we definitely have this set on all our production systems. It makes a big difference for us. > >No, we''re not using the zfs_nocacheflush=1, but our SAN array''s are set to cache all writebacks, so it shouldn''t be needed. I may test this, if I get the chance to reboot one of the servers, but I''ll bet the storage arrays'' are working correctly.> As I said before I don''t see the catastrophe in disabling ZIL though. > >No catastrophe, just a potential mess.> We actually run our production Cyrus mail servers using failover servers so our downtime is typically just the small interval to switch active & idle nodes anyhow. We did this mainly for patching purposes. >Wish we could afford such replication. Poor EDU environment here, I''m afraid.> But we toyed with the idea of running OpenSolaris on them, then just upgrading the idle node to new OpenSolaris image every month using Jumpstart and switching to it. Anything goes wrong switch back to the other node. > > What we ended up doing, for political reasons, was putting the squeeze on our Sun reps and getting a 10u4 kernel spin patch with... what did they call it? Oh yeah "a big wad of ZFS fixes". So this ends up being a hug PITA because for the next 6 months to a year we are tied to getting any kernel patches through this other channel rather than the usual way. But it does work for us, so there you are. >Mmmm, for us, Open Solaris may be easier. I manly was after stability, to be honest. Our ongoing experience with bleeding edge Linux is painful at times, and on our big iron, I want them to just work. but if they''re so slow, they''re not really working right, are they? Sigh...> Give my choice I''d go with OpenSolaris but that''s a hard sell for datacenter management types. I think it''s no big deal in a production shop with good JumpStart and CFengine setups, where any host should be rebuildable from scratch in a matter of hours. Good luck. > >True, I''ll think about that going forward. Thanks, Jon> > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
Jonathan Loran wrote:> Vincent Fox wrote: >> Are you already running with zfs_nocacheflush=1? We have SAN arrays with dual battery-backed controllers for the cache, so we definitely have this set on all our production systems. It makes a big difference for us. >> >> > No, we''re not using the zfs_nocacheflush=1, but our SAN array''s are set > to cache all writebacks, so it shouldn''t be needed. I may test this, if > I get the chance to reboot one of the servers, but I''ll bet the storage > arrays'' are working correctly.I think there''s some confusion. ZFS and the ZIL issue controller commands to force the disk cache to be flushed to ensure data is on stable storage. If the disk cache is battery backed then the costly flush is unnecessary. As Vincent said, setting zfs_nocacheflush=1 can make a huge difference. Note that this is a system wide variable so all controllers serving ZFS devices should be non volatile to enable it. Neil.
> No, we''re not using the zfs_nocacheflush=1, but our > SAN array''s are set > to cache all writebacks, so it shouldn''t be needed. > I may test this, if > get the chance to reboot one of the servers, but > I''ll bet the storage > rrays'' are working correctly.Bzzzt, wrong. Read up on a few threads about this variable. The ZFS flush command used equates to "flush to rust" for most any array. What this works out to, is your array is not using it''s NV for what it''s supposed to. You get a little data in the NV but it''s tagged with this command that requires the NV to finish it''s job and report back data is on disk, before proceeding. Hopefully at some point the array people and the ZFS people will have a meeting of the minds on this issue of having the array report to the OS "yes I have battery-back SAFE NV" and it will all just automagically work. Until then, we set the variable in /etc/system. This message posted from opensolaris.org
> > However, I''m also unhappy about having to wait for S10U6 for the separate > ZIL and/or cache features of ZFS. The lack of NV ZIL on our new Thumper > makes it painfully slow over NFS for the large number of file create/delete > type of workload.I did a bit of testing on this (because I''m in the same boat) and was able to work around it by breaking my filesystem up into lots of individual zfs filesystems. Although the performance of each one isn''t great, as long as your load is threaded and distributed across filesystems, it should balance out. Steve Hillman Simon Fraser University This message posted from opensolaris.org
I dont know when will U5 or U6 coming,so i just set zfs_nocacheflush=1 to /etc/system,and the performance will speed up like zil_disable=1,and that''s more safe for the filesystem. the separate zlog feature is not in U4,the nfs performance on zfs will be too slow when you do not set zfs_nochacheflush=1 in your /etc/system file. This message posted from opensolaris.org
Guanghui Wang wrote:> I dont know when will U5 or U6 coming,so i just set zfs_nocacheflush=1 to /etc/system,and the performance will speed up like zil_disable=1,and that''s more safe for the filesystem. > > the separate zlog feature is not in U4,the nfs performance on zfs will be too slow when you do not set zfs_nochacheflush=1 in your /etc/system file. >Yeah, on one of my systems, I was able to set the zfs_nocacheflush=1, but the other machine that''s suffering isn''t patched up enough to use it. I have to schedule down time to patch it up. I don''t have hard numbers yet, but the seat of the pants impression is that stopping cache flushes has helped. On our SAN array''s, I thought the settings I choose would have had them ignore cache flushing, but apparently not. Thanks everyone for the help. I still look forward to using fast SSD for the ZIL when it comes to Solaris 10 U? as a preferred method. Jon -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
Someone had tell me that s10u5 will not contain your need SSD or NVRAM separate for zfs intent log. "Finally, s10u5 will only contain a small part of bugfix. But s10u6 will be a quite huge wad of features/fixes." Set nocacheflush=1 will huge improve your nfs client''s performance when use a NVRAM based zfs server.there is a blog which show you how to change the intelligent arrays ignore the nfs commit request flush to the stable storage devices. http://blogs.digitar.com/jjww/index.php?blogid=3&archive=2006-12 This message posted from opensolaris.org
On Jan 30, 2008 2:27 PM, Jonathan Loran <jloran at ssl.berkeley.edu> wrote:> Before ranting any more, I''ll do the test of disabling the ZIL. We may > have to build out these systems with Open Solaris, but that will be hard > as they are in production. I would have to install the new OS on test > systems and swap out the drives during scheduled down time. Ouch.Live upgrade can be very helpful here, either for upgrading or applying a flash archive. Once you are comfortable that Nevada performs like you want, you could prep the new OS on alternate slices or broken mirrors. Activating the updated OS should take only a few seconds longer than a standard "init 6". Failback is similarly easy. I can''t remember the last time I swapped physical drives to minimize the outage during an upgrade. -- Mike Gerdts http://mgerdts.blogspot.com/
This is true, but I think it''s the testing bit that worries me. It''s hard to lab out, and fully test an equivalent setup that has 350 active clients pounding on it to test usability and stability. One of our boxes has a boat load of special software running and various tweaks, that also would need to be validated. in other words, upgrades have tended to be painful. We don''t really have any Open Solaris experience yet, and we''ve more or less trusted Sun to ring out the issues to minimize the problems, and make these upgrades smoother. Of course, the irony is that the requirement for this very stability is why we haven''t seen the features in the ZFS code we need in Solaris 10. Thanks, Jon Mike Gerdts wrote:> On Jan 30, 2008 2:27 PM, Jonathan Loran <jloran at ssl.berkeley.edu> wrote: > >> Before ranting any more, I''ll do the test of disabling the ZIL. We may >> have to build out these systems with Open Solaris, but that will be hard >> as they are in production. I would have to install the new OS on test >> systems and swap out the drives during scheduled down time. Ouch. >> > > Live upgrade can be very helpful here, either for upgrading or > applying a flash archive. Once you are comfortable that Nevada > performs like you want, you could prep the new OS on alternate slices > or broken mirrors. Activating the updated OS should take only a few > seconds longer than a standard "init 6". Failback is similarly easy. > > I can''t remember the last time I swapped physical drives to minimize > the outage during an upgrade. > >-- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080202/317c1c3b/attachment.html>