Constantin Gonzalez
2008-Oct-22 16:26 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, On a busy NFS server, performance tends to be very modest for large amounts of small files due to the well known effects of ZFS and ZIL honoring the NFS COMMIT operation[1]. For the mature sysadmin who knows what (s)he does, there are three possibilities: 1. Live with it. Hard, if you see 10x less performance than could be and your users complain a lot. 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost, especially if you''re using an X4500/X4540 and can''t swap out fast SAS drives for cheap SATA drives to free the budget for flash ZIL drives.[2] 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me that if a tar xvf were writing locally to a ZFS file system, the writes wouldn''t be synchronous either, so there''s no point in forcing NFS users to having a better availability experience at the expense of performance. So, if the sysadmin draws the informed and conscious conclusion that (s)he doesn''t want to honor NFS COMMIT operations, what are options less disruptive than disabling ZIL completely? - I checked the NFS tunables from: http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html But could not find a tunable that would disable COMMIT honoring. Is there already an RFE asking for a share option that disable''s the translation of COMMIT to synchronous writes? - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already that asks for the ability to disable the ZIL on a per filesystem basis? Once Admins start to disable the ZIL for whole pools because the extra performance is too tempting, wouldn''t it be the lesser evil to let them disable it on a per filesystem basis? Comments? Cheers, Constantin [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist http://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
Neil Perrin
2008-Oct-22 16:45 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On 10/22/08 10:26, Constantin Gonzalez wrote:> Hi, > > On a busy NFS server, performance tends to be very modest for large amounts > of small files due to the well known effects of ZFS and ZIL honoring the > NFS COMMIT operation[1]. > > For the mature sysadmin who knows what (s)he does, there are three > possibilities: > > 1. Live with it. Hard, if you see 10x less performance than could be and your > users complain a lot. > > 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost, > especially if you''re using an X4500/X4540 and can''t swap out fast SAS > drives for cheap SATA drives to free the budget for flash ZIL drives.[2] > > 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me > that if a tar xvf were writing locally to a ZFS file system, the writes > wouldn''t be synchronous either, so there''s no point in forcing NFS users > to having a better availability experience at the expense of performance. > > > So, if the sysadmin draws the informed and conscious conclusion that (s)he > doesn''t want to honor NFS COMMIT operations, what are options less disruptive > than disabling ZIL completely? > > - I checked the NFS tunables from: > http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html > But could not find a tunable that would disable COMMIT honoring. > Is there already an RFE asking for a share option that disable''s the > translation of COMMIT to synchronous writes?- None that I know of...> > - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already > that asks for the ability to disable the ZIL on a per filesystem basis?Yes: 6280630 zil synchronicity Though personally I''ve been unhappy with the exposure that zil_disable has got. It was originally meant for debug purposes only. So providing an official way to make synchronous behaviour asynchronous is to me dangerous.> > Once Admins start to disable the ZIL for whole pools because the extra > performance is too tempting, wouldn''t it be the lesser evil to let them > disable it on a per filesystem basis? > > Comments? > > > Cheers, > Constantin > > [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine > [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on >
Bob Friesenhahn
2008-Oct-22 17:20 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On Wed, 22 Oct 2008, Neil Perrin wrote:> > On 10/22/08 10:26, Constantin Gonzalez wrote: >> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me >> that if a tar xvf were writing locally to a ZFS file system, the writes >> wouldn''t be synchronous either, so there''s no point in forcing NFS users >> to having a better availability experience at the expense of performance.The conclusion reached here is quite seriously wrong and no Sun employee should suggest it to a customer. If the system writing to a local filesystem reboots then the applications which were running are also lost and will see the new filesystem state when they are restarted. If an NFS server sponteneously reboots, the applications on the many clients are still running and the client systems are using cached data. This means that clients could do very bad things if the filesystem state (as seen by NFS) is suddenly not consistent. One of the joys of NFS is that the client continues unhindered once the server returns. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Marcelo Leal
2008-Oct-22 17:41 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
I agree with you Constantin that the sync is a performance problem, in the same way i think in a NFS environment it is just *required*. If the sync can be relaxed in a "specific NFS environment", my first opinion is that the NFS is not necessary on that environment in first place. IMHO a protocol like iSCSI would have a much better performance in such situation, at least would be no caution to handle the consistency between other clients. That said, options are always good, and have the possibility to disable the ZIL per filesystem is more one *gun* in the world. And as always, can reach the cops and the bad guys. Keep in mind that JB is trying to send to jail who is winning performance benchs without syncing to disks. ;-) Keep the good work in your blog! Leal -- This message posted from opensolaris.org
Ross
2008-Oct-22 17:52 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Well, it might be even more of a bodge than disabling the ZIL, but how about: - Create a 512MB ramdisk, use that for the ZIL - Buy a Micro Memory nvram PCI card for ?100 or so. - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the Micro Memory card. The ramdisk isn''t an ideal solution, but provided you don''t export the pool with it offline, it does work. We used it as a stop gap solution for a couple of weeks while waiting for a Micro Memory nvram card. Our reasoning was that our server''s on a UPS and we figured if something crashed badly enough to take out something like the UPS, the motherboard, etc, we''d be loosing data anyway. We just made sure we had good backups in case the pool got corrupted and crossed our fingers. The reason I say wait 3-6 months is that there''s a huge amount of activity with SSD''s at the moment. Sun said that they were planning to have flash storage launched by Christmas, so I figure there''s a fair chance that we''ll see some supported PCIe cards by next Spring. -- This message posted from opensolaris.org
Ross
2008-Oct-22 18:42 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Bah, I''ve done it again. I meant use it as a slog device, not as the ZIL... I''ll get this terminology in my head eventually. -- This message posted from opensolaris.org
Marcelo Leal
2008-Oct-22 18:54 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
> Bah, I''ve done it again. I meant use it as a slog > device, not as the ZIL...But the slog is the ZIL. formaly a *separate* intent log. What?s the matter? I think everyone did understand. I think you did make a confusion some threads before about ZIL and L2ARC. That is a different thing.. ;-) Leal.> > I''ll get this terminology in my head eventually.-- This message posted from opensolaris.org
Bill Sommerfeld
2008-Oct-22 18:55 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On Wed, 2008-10-22 at 10:45 -0600, Neil Perrin wrote:> Yes: 6280630 zil synchronicity > > Though personally I''ve been unhappy with the exposure that zil_disable has got. > It was originally meant for debug purposes only. So providing an official > way to make synchronous behaviour asynchronous is to me dangerous.It seems far more dangerous to only provide a global knob instead of a local knob. I want it in conjunction with bulk operations (like an ON "nightly" build, database reloads, etc.) where the response to a partial failure will be to rm -rf and start over. Any time spent waiting for intermediate states of the filesystem to be committed to stable store is wasted time.> > Once Admins start to disable the ZIL for whole pools because the extra > > performance is too tempting, wouldn''t it be the lesser evil to let them > > disable it on a per filesystem basis?Agreed.
Neil Perrin
2008-Oct-22 19:16 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
> But the slog is the ZIL. formaly a *separate* intent log.No the slog is not the ZIL! Here''s the definition of the terms as we''ve been trying to use them: ZIL: The body of code the supports synchronous requests, which writes out to the Intent Logs Intent Log: A stable storage log. There is one per file system & zvol. slog: An Intent Log on a separate stable device - preferably high speed. We don''t really have name for an Intent Log when it''s embedded in the main pool. I have in the past used the term clog for chained log. Originally before slogs existed, it was just the Intent Log. Neil.
Marcelo Leal
2008-Oct-22 19:56 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
> > But the slog is the ZIL. formaly a *separate* > intent log. > > No the slog is not the ZIL!Ok, when you did write this: "I''ve been slogging for a while on support for separate intent logs (slogs) for ZFS. Without slogs, the ZIL is allocated dynamically from the main pool". You were talking about "The body of code " in the statement: "the ZIL is allocated "? So i have misunderstood you... Leal.> > Here''s the definition of the terms as we''ve been > trying to use them: > > ZIL: > The body of code the supports synchronous requests, > , which writes > out to the Intent Logs > Intent Log: > A stable storage log. There is one per file system & > & zvol. > slog: > An Intent Log on a separate stable device - > - preferably high speed. > > We don''t really have name for an Intent Log when it''s > embedded in the main > pool. I have in the past used the term clog for > chained log. Originally before > slogs existed, it was just the Intent Log. > > Neil. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss-- This message posted from opensolaris.org
Neil Perrin
2008-Oct-22 20:06 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On 10/22/08 13:56, Marcelo Leal wrote:>>> But the slog is the ZIL. formaly a *separate* >> intent log. >> >> No the slog is not the ZIL! > Ok, when you did write this: > "I''ve been slogging for a while on support for separate intent logs (slogs) for ZFS. > Without slogs, the ZIL is allocated dynamically from the main pool". > > You were talking about "The body of code " in the statement: "the ZIL is allocated "? > So i have misunderstood you... > > Leal.I guess I need to fix that! Anyway the slog is not the ZIL it''s one of the two currently possible Intent Log types. Sorry for the confusion: Neil.
Miles Nordin
2008-Oct-22 20:46 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
>>>>> "cg" == Constantin Gonzalez <Constantin.Gonzalez at Sun.COM> writes:cg> if a tar xvf were writing locally to a ZFS file system, the cg> writes wouldn''t be synchronous either, so there''s no point in cg> forcing NFS users to having a better It''s worse for NFS because breaking the commit/lease/batch state machine destroys the illusion of statelessness. When you reboot the server, you''ll have to reboot all the clients to get them to behave consistently again. actually that is already my experince with NFSv4 diskless machines, but way old versions of nevada I''m using are probably the culprit. I thought NFSv2 -> NFSv3 was supposed to make this prestoserv, SSD, battery-backed DRAM stuff not needed for good performance any more. I guess not though. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081022/b831d976/attachment.bin>
Bob Friesenhahn
2008-Oct-22 20:56 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On Wed, 22 Oct 2008, Miles Nordin wrote:> > I thought NFSv2 -> NFSv3 was supposed to make this prestoserv, SSD, > battery-backed DRAM stuff not needed for good performance any more. I > guess not though.The intent was to allow the server to be able to buffer up more uncommitted data before the client system requested that it be committed to store. In this case, if the server spontaneously rebooted, the client is responsible for remembering the uncomitted data that it already sent so that it can send it again. This means that client behavior has quite a lot to do with perceived NFSv3 performance. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Nicolas Williams
2008-Oct-22 21:01 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On Wed, Oct 22, 2008 at 04:46:00PM -0400, Miles Nordin wrote:> I thought NFSv2 -> NFSv3 was supposed to make this prestoserv, SSD, > battery-backed DRAM stuff not needed for good performance any more. I > guess not though.There are still a number of operations in NFSv3 and NFSv4 which the client must wait for synchronously. Things like file creation, fsync() (duh) and therefore close(). This mostly negatively affects untarring and restores. Ideally applications like tar should be able to do asynchronous open()s and close()s, but the OS doesn''t provide those, so such apps would have to use threads. But in reality those apps are single-threaded and not remotely asynchronous. Nico --
Richard Elling
2008-Oct-22 21:04 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Constantin Gonzalez wrote:> Hi, > > On a busy NFS server, performance tends to be very modest for large amounts > of small files due to the well known effects of ZFS and ZIL honoring the > NFS COMMIT operation[1]. > > For the mature sysadmin who knows what (s)he does, there are three > possibilities: > > 1. Live with it. Hard, if you see 10x less performance than could be and your > users complain a lot. > > 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost, > especially if you''re using an X4500/X4540 and can''t swap out fast SAS > drives for cheap SATA drives to free the budget for flash ZIL drives.[2] >It is more important to use a separate disk, than to use a separate and fast disk. Anecdotal evidence suggests that using a USB hard disk works well. Remember, slogs are a write-only workload and tend to use very modest amounts of data -- you should see very few seeks on a dedicated slog device. Personally, I''d use a slice from the boot disk, because people tend to leave tons of available space there. -- richard> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me > that if a tar xvf were writing locally to a ZFS file system, the writes > wouldn''t be synchronous either, so there''s no point in forcing NFS users > to having a better availability experience at the expense of performance. > > > So, if the sysadmin draws the informed and conscious conclusion that (s)he > doesn''t want to honor NFS COMMIT operations, what are options less disruptive > than disabling ZIL completely? > > - I checked the NFS tunables from: > http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html > But could not find a tunable that would disable COMMIT honoring. > Is there already an RFE asking for a share option that disable''s the > translation of COMMIT to synchronous writes? > > - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already > that asks for the ability to disable the ZIL on a per filesystem basis? > > Once Admins start to disable the ZIL for whole pools because the extra > performance is too tempting, wouldn''t it be the lesser evil to let them > disable it on a per filesystem basis? > > Comments? > > > Cheers, > Constantin > > [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine > [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on > >
> On 10/22/08 13:56, Marcelo Leal wrote: > >>> But the slog is the ZIL. formaly a *separate* > >> intent log. > >> > >> No the slog is not the ZIL! > > Ok, when you did write this: > > "I''ve been slogging for a while on support for > separate intent logs (slogs) for ZFS. > > Without slogs, the ZIL is allocated dynamically > from the main pool". > > > > You were talking about "The body of code " in the > statement: "the ZIL is allocated "? > > So i have misunderstood you... > > > > Leal. > > I guess I need to fix that!See? I think you are being a little dramatic... Ok, there is the ZIL (code), and the "ZFS intent log". It''s just inevitable that people will call the "ZFS intent log.. ZIL for short". But i respect you! You write the code... ;-) Let''s go back to the point. Leal> Anyway the slog is not the ZIL it''s one of the two > currently possible Intent Log types. > > Sorry for the confusion: Neil. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss-- This message posted from opensolaris.org
Ricardo M. Correia
2008-Oct-22 22:47 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi Richard, On Qua, 2008-10-22 at 14:04 -0700, Richard Elling wrote:> It is more important to use a separate disk, than to use a separate and fast > disk. Anecdotal evidence suggests that using a USB hard disk works > well.While I don''t necessarily disagree with your statement, please note that (as far as I''m aware) USB disks don''t respect the "flush write cache" command, so in fact the disk may appear to be faster than it actually is because it''s not maintaining proper transactional consistency. Cheers, Ricardo
Richard Elling
2008-Oct-22 23:22 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Ricardo M. Correia wrote:> Hi Richard, > > On Qua, 2008-10-22 at 14:04 -0700, Richard Elling wrote: > >> It is more important to use a separate disk, than to use a separate and fast >> disk. Anecdotal evidence suggests that using a USB hard disk works >> well. >> > > While I don''t necessarily disagree with your statement, please note that > (as far as I''m aware) USB disks don''t respect the "flush write cache" > command, so in fact the disk may appear to be faster than it actually is > because it''s not maintaining proper transactional consistency. >YMMV. Some USB-to-SATA converters seem to have switches to enable or disable the "optimized for quick removal" mode. -- richard
Constantin Gonzalez
2008-Oct-23 07:58 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi,>> - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE >> already >> that asks for the ability to disable the ZIL on a per filesystem >> basis? > > Yes: 6280630 zil synchronicitygood, thanks for the pointer!> Though personally I''ve been unhappy with the exposure that zil_disable > has got. > It was originally meant for debug purposes only. So providing an official > way to make synchronous behaviour asynchronous is to me dangerous.IMHO, the need here is to give admins control over the way they want their file servers to behave. In this particular case, the admin argues that he knows what he''s doing, that he doesn''t want his NFS server to behave more strongly than a local filesystem and that he deserves control of that behaviour. Ideally, there would be an NFS option that lets customers choose whether they want to honor COMMIT requests or not. Disabling ZIL on a per filesystem basis is only the second best solution, but since that CR already exists, it seems to be the more realistic route. Thanks, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist http://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
Constantin Gonzalez
2008-Oct-23 12:40 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, Bob Friesenhahn wrote:> On Wed, 22 Oct 2008, Neil Perrin wrote: >> On 10/22/08 10:26, Constantin Gonzalez wrote: >>> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me >>> that if a tar xvf were writing locally to a ZFS file system, the writes >>> wouldn''t be synchronous either, so there''s no point in forcing NFS users >>> to having a better availability experience at the expense of performance. > > The conclusion reached here is quite seriously wrong and no Sun > employee should suggest it to a customer. If the system writing to aI''m not suggesting it to any customer. Actually, I argued quite a long time with the customer, trying to convince him that "slow but correct" is better. The conclusion above is a conscious decision by the customer. He says that he does not want NFS to turn any write into a synchronous write, he''s happy if all writes are asynchronous, because in this case the NFS server is a backup to disk device and if power fails he simply restarts the backup ''cause he has the data in multiple copies anyway.> local filesystem reboots then the applications which were running are > also lost and will see the new filesystem state when they are > restarted. If an NFS server sponteneously reboots, the applications > on the many clients are still running and the client systems are using > cached data. This means that clients could do very bad things if the > filesystem state (as seen by NFS) is suddenly not consistent. One of > the joys of NFS is that the client continues unhindered once the > server returns.Yes, we''re both aware of this. In this particular situation, the customer would restart his backup job (and thus the client application) in case the server dies. Thanks for pointing out the difference, this is indeed an important distinction. Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist http://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
Constantin Gonzalez
2008-Oct-23 12:56 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, yes, using slogs is the best solution. Meanwhile, using mirrored slogs from other servers'' RAM-Disks running on UPSs seem like an interesting idea, if the reliability of UPS-backed RAM is deemed reliable enough for the purposes of the NFS server. Thanks for siggesting this! Cheers, Constantin Ross wrote:> Well, it might be even more of a bodge than disabling the ZIL, but how about: > > - Create a 512MB ramdisk, use that for the ZIL > - Buy a Micro Memory nvram PCI card for ?100 or so. > - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the Micro Memory card. > > The ramdisk isn''t an ideal solution, but provided you don''t export the pool with it offline, it does work. We used it as a stop gap solution for a couple of weeks while waiting for a Micro Memory nvram card. > > Our reasoning was that our server''s on a UPS and we figured if something crashed badly enough to take out something like the UPS, the motherboard, etc, we''d be loosing data anyway. We just made sure we had good backups in case the pool got corrupted and crossed our fingers. > > The reason I say wait 3-6 months is that there''s a huge amount of activity with SSD''s at the moment. Sun said that they were planning to have flash storage launched by Christmas, so I figure there''s a fair chance that we''ll see some supported PCIe cards by next Spring. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist http://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
Bob Friesenhahn
2008-Oct-23 13:03 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On Thu, 23 Oct 2008, Constantin Gonzalez wrote:> > Yes, we''re both aware of this. In this particular situation, the customer > would restart his backup job (and thus the client application) in case the > server dies.So it is ok for this customer if their backup becomes silently corrupted and the backup software continues running? Consider that some of the backup files may have missing or corrupted data in the middle. Your customer is quite dedicated in that he will monitor the situation very well and remember to reboot the backup system, correct any corrupted files, and restart the backup software whenever the server panics and reboots. A properly built server should be able to handle NFS writes at gigabit wire-speed. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Constantin Gonzalez
2008-Oct-23 13:25 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, Bob Friesenhahn wrote:> On Thu, 23 Oct 2008, Constantin Gonzalez wrote: >> >> Yes, we''re both aware of this. In this particular situation, the customer >> would restart his backup job (and thus the client application) in case >> the >> server dies. > > So it is ok for this customer if their backup becomes silently corrupted > and the backup software continues running? Consider that some of the > backup files may have missing or corrupted data in the middle. Your > customer is quite dedicated in that he will monitor the situation very > well and remember to reboot the backup system, correct any corrupted > files, and restart the backup software whenever the server panics and > reboots.This is what the customer told me. He uses rsync and he is ok with restarting the rsync whenever the NFS server restarts.> A properly built server should be able to handle NFS writes at gigabit > wire-speed.I''m advocating for a properly built system, believe me :). Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist http://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
Bob Friesenhahn
2008-Oct-23 13:36 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
On Thu, 23 Oct 2008, Constantin Gonzalez wrote:> > This is what the customer told me. He uses rsync and he is ok with restarting > the rsync whenever the NFS server restarts.Then remind your customer to tell rsync to inspect the data rather than trusting time stamps. Rsync will then run quite a bit slower but at least it will catch a corrupted file. There is still the problem that the client OS may have cached data which it thinks is correct but no longer matches what is on the server. This may result in rsync making wrong decisions. A better approach is to run rsync on the server so that there is rsync to rsync communication rather than rsync to NFS. This can result in far better performance and without the NFS sychronous write problem. For my own backups, I initiate rsync on the server side and have a special secure rsync service set up on the clients so that the server sucks files from the clients. This works very well and helps with administration because any error conditions will be noted in just one place. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ross Smith
2008-Oct-23 17:51 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
No problem. I didn''t use mirrored slogs myself, but that''s certainly a step up for reliability. It''s pretty easy to create a boot script to re-create the ramdisk and re-attach it to the pool too. So long as you use the same device name for the ramdisk you can add it each time with a simple "zpool replace pool ramdisk" On Thu, Oct 23, 2008 at 1:56 PM, Constantin Gonzalez <Constantin.Gonzalez at sun.com> wrote:> Hi, > > yes, using slogs is the best solution. > > Meanwhile, using mirrored slogs from other servers'' RAM-Disks running on > UPSs > seem like an interesting idea, if the reliability of UPS-backed RAM is > deemed > reliable enough for the purposes of the NFS server. > > Thanks for siggesting this! > > Cheers, > Constantin > > Ross wrote: >> >> Well, it might be even more of a bodge than disabling the ZIL, but how >> about: >> >> - Create a 512MB ramdisk, use that for the ZIL >> - Buy a Micro Memory nvram PCI card for ?100 or so. >> - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace >> the Micro Memory card. >> >> The ramdisk isn''t an ideal solution, but provided you don''t export the >> pool with it offline, it does work. We used it as a stop gap solution for a >> couple of weeks while waiting for a Micro Memory nvram card. >> >> Our reasoning was that our server''s on a UPS and we figured if something >> crashed badly enough to take out something like the UPS, the motherboard, >> etc, we''d be loosing data anyway. We just made sure we had good backups in >> case the pool got corrupted and crossed our fingers. >> >> The reason I say wait 3-6 months is that there''s a huge amount of activity >> with SSD''s at the moment. Sun said that they were planning to have flash >> storage launched by Christmas, so I figure there''s a fair chance that we''ll >> see some supported PCIe cards by next Spring. >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Constantin Gonzalez Sun Microsystems GmbH, > Germany > Principal Field Technologist > http://blogs.sun.com/constantin > Tel.: +49 89/4 60 08-25 91 > http://google.com/search?q=constantin+gonzalez > > Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 > Kirchheim-Heimstetten > Amtsgericht Muenchen: HRB 161028 > Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer > Vorsitzender des Aufsichtsrates: Martin Haering >
Roch Bourbonnais
2008-Oct-25 07:49 UTC
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Le 23 oct. 08 ? 05:40, Constantin Gonzalez a ?crit :> Hi, > > Bob Friesenhahn wrote: >> On Wed, 22 Oct 2008, Neil Perrin wrote: >>> On 10/22/08 10:26, Constantin Gonzalez wrote: >>>> 3. Disable ZIL[1]. This is of course evil, but one customer >>>> pointed out to me >>>> that if a tar xvf were writing locally to a ZFS file system, >>>> the writes >>>> wouldn''t be synchronous either, so there''s no point in forcing >>>> NFS users >>>> to having a better availability experience at the expense of >>>> performance. >> >> The conclusion reached here is quite seriously wrong and no Sun >> employee should suggest it to a customer. If the system writing to a > > I''m not suggesting it to any customer. Actually, I argued quite a > long time > with the customer, trying to convince him that "slow but correct" is > better. > > The conclusion above is a conscious decision by the customer. He > says that he > does not want NFS to turn any write into a synchronous write, he''s > happy if > all writes are asynchronous, because in this case the NFS server is > a backup to > disk device and if power fails he simply restarts the backup ''cause > he has the > data in multiple copies anyway. >The case of a full backup (but not incremental) where an operator is monitoring that the server stays up for the full duration (or does the manual restart of the operation) seems like a singular case where this might make half sense. But as was stated, for performance which is the goal here, better use a bulk type transfer of data through some specific protocol (as opposed to NFS small file manipulations). What this creates is that failure of the server has immediate obvious repercusion on the client, and things can be restarted without further coordination. I understand also that with NFS directory delegation or Exclusive mount points one could solve this NFS peculiarity (which is totally unrelated to ZFS, and not to be confused with the ZFS / SAN storage cache flush condition). If CIFS is not subject to the same penalty, I can only assume that the integrity of the client''s view cannot be guaranteed after a server crash. Anyone knows this for sure ? -r>> local filesystem reboots then the applications which were running are >> also lost and will see the new filesystem state when they are >> restarted. If an NFS server sponteneously reboots, the applications >> on the many clients are still running and the client systems are >> using >> cached data. This means that clients could do very bad things if the >> filesystem state (as seen by NFS) is suddenly not consistent. One of >> the joys of NFS is that the client continues unhindered once the >> server returns. > > Yes, we''re both aware of this. In this particular situation, the > customer > would restart his backup job (and thus the client application) in > case the > server dies. > > Thanks for pointing out the difference, this is indeed an important > distinction. > > Cheers, > Constantin > > -- > Constantin Gonzalez Sun Microsystems > GmbH, Germany > Principal Field Technologist http://blogs.sun.com/constantin > Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez > > Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim- > Heimstetten > Amtsgericht Muenchen: HRB 161028 > Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland > Boemer > Vorsitzender des Aufsichtsrates: Martin Haering > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss