Is it a supported configuration to have a single LUN presented to 4 different Sun servers over a fiber channel network and then mounting that LUN on each host as the same ZFS filesystem? We need any of the 4 servers to be able to write data to this shared FC disk. We are not using NFS as we do not want to go over the network, just direct to the FC disk from any of the hosts. Thanks This message posted from opensolaris.org
> Is it a supported configuration to have a single LUN presented to 4 > different Sun servers over a fiber channel network and then mounting > that LUN on each host as the same ZFS filesystem?ZFS today does not support multi-host simultaneous mounts. There''s no arbitration for the pool metadata, so you''ll end up corrupting the filesystem if you force it. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
On Friday, August 24, 2007 at 20:14:05 CEST, Matt B wrote: Hi,> Is it a supported configuration to have a single LUN presented to 4 different Sun servers over a fiber channel network and then mounting that LUN on each host as the same ZFS filesystem?No. You can neither access ZFS nor UFS in that way. Only one host can mount the file system at the same time (read/write or read-only doesn''t matter here).> We need any of the 4 servers to be able to write data to this shared FC disk. We are not using NFS as we do not want to go over the network, just direct to the FC disk from any of the hosts.If you don''t want to use NFS, you can use QFS in such a configuration. The shared writer approach of QFS allows mounting the same file system on different hosts at the same time. Ronald -- Sun Microsystems GmbH Ronald K?hn, TSC - Solaris Sonnenallee 1 ronald.kuehn at sun.com D-85551 Kirchheim-Heimstetten Tel: +49-89-46008-2901 Amtsgericht M?nchen: HRB 161028 Fax: +49-89-46008-2954 Gesch?ftsf?hrer: Wolfgang Engels, Dr. Roland B?mer Vorsitzender des Aufsichtsrates: Martin H?ring
That is what I was afraid of. In regards to QFS and NFS, isnt QFS something that must be purchased? I looked on the SUN website and it appears to be a little pricey. NFS is free, but is there a way to use NFS without traversing the network? We already have our SAN presenting this disk to each of the four hosts using Fiber HBA''s so the network is not part of the picture at this point. Is there some way to utilize NFS with the SAN and the 4 hosts that are fiber attached? Thanks This message posted from opensolaris.org
On Friday, August 24, 2007 at 20:41:04 CEST, Matt B wrote:> That is what I was afraid of. > > In regards to QFS and NFS, isnt QFS something that must be purchased? I looked on the SUN website and it appears to be a little pricey. > > NFS is free, but is there a way to use NFS without traversing the network? We already have our SAN presenting this disk to each of the four hosts using Fiber HBA''s so the network is not part of the picture at this point. > Is there some way to utilize NFS with the SAN and the 4 hosts that are fiber attached?You cannot use NFS to talk directly to SAN devices. What''s wrong with using the network? Attach the SAN devices to one host (or more hosts in a Sun Cluster configuration to get HA) and share the file systems using NFS. That way you are able to enjoy the benefits of ZFS. Ronald
Cant use the network because these 4 hosts are database servers that will be dumping close to a Terabyte every night. If we put that over the network all the other servers would be starved This message posted from opensolaris.org
On Friday, August 24, 2007 at 21:06:28 CEST, Matt B wrote:> Cant use the network because these 4 hosts are database servers that will be dumping close to a Terabyte every night. If we put that over the network all the other servers would be starvedI''m afraid there aren''t many other options than - a shared file system like QFS to directly access the SAN devices from different hosts in parallel - add more network capacity (either additional network connections or faster links like 10 gigabit ethernet) to get the required performance with NFS Ronald
> That is what I was afraid of. > > In regards to QFS and NFS, isnt QFS something that must be purchased? > I looked on the SUN website and it appears to be a little pricey.That''s correct. Earlier this year Sun declared an intent to opensource QFS/SAMFS, but that doesn''t help you install it today. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
James C. McPherson
2007-Aug-25 06:45 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
Ronald Kuehn wrote:> On Friday, August 24, 2007 at 21:06:28 CEST, Matt B wrote: >> Cant use the network because these 4 hosts are database servers >> that will be dumping close to a Terabyte every night. If we put >> that over the network all the other servers would be starved > > I''m afraid there aren''t many other options than > > - a shared file system like QFS to directly access the SAN devices > from different hosts in parallel - add more network capacity (either > additional network connections or faster links like 10 gigabit > ethernet) to get the required performance with NFSBackground - I used to work for Sun''s CPRE and PTS support organisations, getting customers back on their feet after large scale disasters in the storage area. Here''s where I start to take a hard line about configs. If your 4 hosts are db servers, dumping ~1Tb per night and you cannot afford either: - sufficient space for them to have their own large luns, or - a dedicated GigE network for dumping that data, then you need to make an investment in your business and do is very soon. That would either be purchasing QFS, and/ or purchasing extra space for your array. You haven''t - as far as I can see - explained why you must have each of these hosts being able to read and write that data in a shared configuration. Perhaps if you explain what you are really trying to achieve we could help you more appropriately. James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson
the 4 database servers are part of an Oracle RAC configuration. 3 databases are hosted on these servers, BIGDB1 on all 4, littledb1 on the first 2, and littledb2 on the last two. The oracle backup system spawns db backup jobs that could occur on any node based on traffic and load. All nodes are fiber attached to a SAN. They all of FC access to the same set of SAN disks where the nightly dumps must go to. The plan all along was to save the gigE network for network traffic and have the nightly backups occur over the dedicated fc network. Originally, we tried using our tape backup software to read the oracle flash recovery area (oracle raw device on a seperate set of san disks), however our backup software has a known issue with the the particular version of ORacle we are using. So we scavenged up a San disk that would be mounted with a filesystem so that the tape backup software can just read the oracle dump file like a regular file. However this does not work because all four hosts need access to the backup indexes which are stored on the shared disk. As I mentioned earlier this is not working with ZFS and apparently is fostering corruption in the ZFS. We havent done seperate dedicated disks to each host because to divide the available disk space would not result in enough space when distributed. Also our failover capabiliteis for backup would be gone as if one of the hosts fails that happens to have the disk attached for a certain database, no other host can step in and do the backup, whereas the original plan was that all 4 servers read/write to the same set of shared storage. Any host can backup any of the three databases and the next night a different host could do the backup and it would be no problem as it would have access to the shared indexes on the shared disk Now it seems our only option is to switch to NFS (and use the network) while the dedicated Fiber laid to each of these four hosts goes unused or buy QFS for tens of thousands of dollars All the physical infrastructure is there for a dedicated backup FC network, it seems just for lack of a shared filesystem to lay on top of the v490''s to keep arbitrate between them and the shared disk Too bad the san we are using cant export nfs shares directly over the FC to HOST hbas. I am all for storage servers that have FC, but publish NFS over the network, we just dont want to use the network in this case, we want to use the FC I still wonder if NFS could be used over the FC network in some way similar to how NFS works over ethernet/tcp network Let me know if I am overlooking something, the last hope here is to see if GlusterFS can run reliably on the Solaris 10 v490''s talking to our san. Maybe IP over Fiberchannel and just treat the FC as if it was a network This message posted from opensolaris.org
On 8/25/07, Matt B <mattbreedlove at yahoo.com> wrote:> the 4 database servers are part of an Oracle RAC configuration. 3 databases are hosted on these servers, BIGDB1 on all 4, littledb1 on the first 2, and littledb2 on the last two. The oracle backup system spawns db backup jobs that could occur on any node based on traffic and load. All nodes are fiber attached to a SAN. They all of FC access to the same set of SAN disks where the nightly dumps must go to. The plan all along was to save the gigE network for network traffic and have the nightly backups occur over the dedicated fc network. >Matt, Can you just alter the backup job that oracle spawns to import the pool then do the backup and finally export the pool? Regards, Vic
> Originally, we tried using our tape backup software > to read the oracle flash recovery area (oracle raw > device on a seperate set of san disks), however our > backup software has a known issue with the the > particular version of ORacle we are using.So one option is to get the backup vendor to update their software; or to upgrade Oracle? That doesn''t sound likely to be practical, though.> However this does not work because all four hosts > need access to the backup indexes which are stored on > the shared disk. As I mentioned earlier this is not > working with ZFS and apparently is fostering > corruption in the ZFS.Yes. It won''t work with any non-shared file system.> the original plan was that all 4 servers read/write to > the same set of shared storage.So you need a shared file system (or you need to use Oracle''s sharing capabilities, but it sounds like your tape backup software can''t deal with that).> Now it seems our only option is to switch to NFS (and > use the network) while the dedicated Fiber laid to > each of these four hosts goes unused or buy QFS for > tens of thousands of dollarsAnother possibility might be to buy Sanergy (which allows NFS traffic to be re-routed from either QFS or UFS file systems to direct SAN I/O in some cases), but I don''t know whether it''s supported with Oracle. And it might be more expensive than shared QFS (which is supported with Oracle RAC).> I still wonder if NFS could be used over the FC > network in some way similar to how NFS works over > ethernet/tcp networkPossibly. I''m not sure what configurations Sun supports IP-over-FC in.> Let me know if I am overlooking something, the last > hope here is to see if GlusterFS can run reliably on > the Solaris 10 v490''s talking to our san.Uh. You''d trust your data to that? It doesn''t look very baked.> Maybe IP over Fiberchannel and just treat the FC as > if it was a networkYes, that''s a possibility. Seriously, though, if you''ve got terabytes of data being backed up every night, spending the money on QFS, or dedicated disks, or anything else that would give you backup capabilities, sounds like a Really Good Idea. (There''s a reason why backups are a major cost of storage ownership. Sigh.) This message posted from opensolaris.org
Im not sure what you mean This message posted from opensolaris.org
On Sat, Aug 25, 2007 at 12:36:34 -0700, Matt B wrote: : Im not sure what you mean I think what he''s trying to tell you is that you need to consult a storage expert. -- Dickon Hood Due to digital rights management, my .sig is temporarily unavailable. Normal service will be resumed as soon as possible. We apologise for the inconvenience in the meantime. No virus was found in this outgoing message as I didn''t bother looking.
Here is what seems to be the best course of action assuming IP over FC is supported by the HBA''s (which I am pretty sure they so since this is all brand new equipment) Mount the shared disk backup lun on Node 1 via the FC link to the SAN as a non-redundant ZFS volume. On node 1 RMAN (oracle backup system) will read/write its data to /backup as local disk Configure Node 1 to NFS Publish /backup over the IP enabled FC network using the HBA''s ip address Nodes 2-4 will then NFS mount over the IP enabled FC connections via the FC switch They will mount at /backup as well So with this we would not be using the gige network to transfer our backup data, All 4 hosts can failover the backups to each other and all our data is stored using zfs to boot, not to mention not having to buy QFS or physically move hardware at all Any forseeable problems with this configuration? Of course I will destroy the exiting ZFS filesystem that is on the disks now prior to setting this up since it might be corruputed This message posted from opensolaris.org
On Sat, 25 Aug 2007, Matt B wrote: .... snip ....> I still wonder if NFS could be used over the FC network in some way similar to how NFS works over ethernet/tcp networkIf you''re running Qlogic FC HBAs, you can run a TCP/IP stack over the FC links. That would allow NFS traffic over the FC connections. I''m not necessarily recommending this as a solution - nor have I tried it myself. Just letting you know that the possibility exists. ... snip ... Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
I have tried tcpip over fc in the lab, the performance was no diff compare to gigabit ethernet. -----Original Message----- From: "Al Hopper" <al at logical-approach.com> To: "Matt B" <mattbreedlove at yahoo.com> Cc: zfs-discuss at opensolaris.org Sent: 8/26/2007 9:29 AM Subject: Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts On Sat, 25 Aug 2007, Matt B wrote: .... snip ....> I still wonder if NFS could be used over the FC network in some way similar to how NFS works over ethernet/tcp networkIf you''re running Qlogic FC HBAs, you can run a TCP/IP stack over the FC links. That would allow NFS traffic over the FC connections. I''m not necessarily recommending this as a solution - nor have I tried it myself. Just letting you know that the possibility exists. ... snip ... Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Rainer J.H. Brandt
2007-Aug-26 14:36 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
Sorry, this is a bit off-topic, but anyway: Ronald Kuehn writes:> No. You can neither access ZFS nor UFS in that way. Only one > host can mount the file system at the same time (read/write or > read-only doesn''t matter here).I can see why you wouldn''t recommend trying this with UFS (only one host knows which data has been committed to the disk), but is it really impossible? I don''t see why multiple UFS mounts wouldn''t work, if only one of them has write access. Can you elaborate? Thanks, Rainer Brandt
On Sunday, August 26, 2007 at 16:36:26 CEST, Rainer J.H. Brandt wrote:> Ronald Kuehn writes: > > No. You can neither access ZFS nor UFS in that way. Only one > > host can mount the file system at the same time (read/write or > > read-only doesn''t matter here). > > I can see why you wouldn''t recommend trying this with UFS > (only one host knows which data has been committed to the disk), > but is it really impossible? > > I don''t see why multiple UFS mounts wouldn''t work, if only one > of them has write access. Can you elaborate?Hi, UFS wasn''t designed as a shared file system. The kernel always assumes it is the only party accessing or modifying any on-disk data structures. With that premise it uses caching quite heavily. The view of the file system (cached structures + on-disk state) is consistent. The on-disk state alone isn''t while the file system is mounted. Any other system accessing the on-disk state w/o taking into consideration the data cached on the original host will probably see inconsistencies. This will lead to data corruption and panics. If only one system mounts the file system read/write and other hosts only mount it read-only the read-only hosts will get an inconsistent view of the file system because they don''t know what''s in the cache of the r/w host. These approaches exist to solve this problem: - Only allow one host to directly access the file system. Other systems access it by talking over the network to this host: + NFS + the pxfs layer of Sun Cluster (global file system) - Use a file system designed with some kind of co-ordination for parallel access to the on-disk data structures built in: + QFS (Shared mode uses a meta data server on one host to manage the right to access certain parts of the on-disk structures. The operation on the data itself then takes place over the storage path. In that case multiple systems can modify on-disk structures directly. They only need to ask the meta data server for permission.) I hope that helps, Ronald
An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070826/1a605d06/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Sun_Logo.gif Type: image/gif Size: 1257 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070826/1a605d06/attachment.gif>
Rainer J.H. Brandt
2007-Aug-26 15:47 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
Ronald Kuehn writes:> On Sunday, August 26, 2007 at 16:36:26 CEST, Rainer J.H. Brandt wrote: > > > Ronald Kuehn writes: > > > No. You can neither access ZFS nor UFS in that way. Only one > > > host can mount the file system at the same time (read/write or > > > read-only doesn''t matter here). > > > > I can see why you wouldn''t recommend trying this with UFS > > (only one host knows which data has been committed to the disk), > > but is it really impossible? > > > > I don''t see why multiple UFS mounts wouldn''t work, if only one > > of them has write access. Can you elaborate? > > Hi, > > UFS wasn''t designed as a shared file system. The kernel > always assumes it is the only party accessing or modifying > any on-disk data structures. With that premise it uses caching > quite heavily. The view of the file system (cached structures + on-disk > state) is consistent. The on-disk state alone isn''t while the > file system is mounted. Any other system accessing the on-disk > state w/o taking into consideration the data cached on the original > host will probably see inconsistencies. This will lead to data corruption > and panics. If only one system mounts the file system read/write > and other hosts only mount it read-only the read-only hosts will > get an inconsistent view of the file system because they don''t know > what''s in the cache of the r/w host. > > These approaches exist to solve this problem: > - Only allow one host to directly access the file system. Other > systems access it by talking over the network to this host: > + NFS > + the pxfs layer of Sun Cluster (global file system) > - Use a file system designed with some kind of co-ordination for parallel > access to the on-disk data structures built in: > + QFS (Shared mode uses a meta data server on one host to > manage the right to access certain parts of the on-disk structures. > The operation on the data itself then takes place over the storage > path. In that case multiple systems can modify on-disk structures > directly. They only need to ask the meta data server for permission.) > > I hope that helps, > RonaldYes, thank you for confirming what I said. So it is possible, but not recommended, because I must take care not to read from files for which buffers haven''t been flushed yet. Rainer Brandt
Rainer J.H. Brandt
2007-Aug-26 15:52 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
Tim, thanks for answering...> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> > <html> > <head> > <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> > <title></title> > </head> > <body bgcolor="#ffffff" text="#000000">...but please don''t send HTML, if possible.> <br> > Try this explanation..<br> > <br> > Host A mounts UFS file system rw<br> > Hosts B-C mount sam UFS file system read only<br> > <br> > In natural scheme of things hosts B-C read files and <b><u><i>cache > metadata about the files and file system</i>.<br> > </u></b><br> > Host A changes the file system. The metadata that hosts B-C have cached > is now incorrect. If they go to access the file system and find that > its state has changed things can start to go wrong.<br>Sure, that''s why I said it wouldn''t be recommended. But for someone who knows what he''s doing, e.g. who reads from a transfer directory only after all writing to it is known to be completed, it should be technically possible, right? Thanks again, Rainer Brandt
On Sunday, August 26, 2007 at 17:47:32 CEST, Rainer J.H. Brandt wrote:> Ronald Kuehn writes: > > On Sunday, August 26, 2007 at 16:36:26 CEST, Rainer J.H. Brandt wrote: > > > > > Ronald Kuehn writes: > > > > No. You can neither access ZFS nor UFS in that way. Only one > > > > host can mount the file system at the same time (read/write or > > > > read-only doesn''t matter here). > > > > > > I can see why you wouldn''t recommend trying this with UFS > > > (only one host knows which data has been committed to the disk), > > > but is it really impossible? > > > > > > I don''t see why multiple UFS mounts wouldn''t work, if only one > > > of them has write access. Can you elaborate? > > > > Hi, > > > > UFS wasn''t designed as a shared file system. The kernel > > always assumes it is the only party accessing or modifying > > any on-disk data structures. With that premise it uses caching > > quite heavily. The view of the file system (cached structures + on-disk > > state) is consistent. The on-disk state alone isn''t while the > > file system is mounted. Any other system accessing the on-disk > > state w/o taking into consideration the data cached on the original > > host will probably see inconsistencies. This will lead to data corruption > > and panics. If only one system mounts the file system read/write > > and other hosts only mount it read-only the read-only hosts will > > get an inconsistent view of the file system because they don''t know > > what''s in the cache of the r/w host. > > > > These approaches exist to solve this problem: > > - Only allow one host to directly access the file system. Other > > systems access it by talking over the network to this host: > > + NFS > > + the pxfs layer of Sun Cluster (global file system) > > - Use a file system designed with some kind of co-ordination for parallel > > access to the on-disk data structures built in: > > + QFS (Shared mode uses a meta data server on one host to > > manage the right to access certain parts of the on-disk structures. > > The operation on the data itself then takes place over the storage > > path. In that case multiple systems can modify on-disk structures > > directly. They only need to ask the meta data server for permission.) > > > > I hope that helps, > > Ronald > > Yes, thank you for confirming what I said. > > So it is possible, but not recommended, because I must take care > not to read from files for which buffers haven''t been flushed yet.It is technically possible to mount the file system on more than one system, but it _will_ lead to data corruption and panics. Just making sure buffers for a file have been flushed to disk on the writer is _not_ enough. So it is not only not recommended, it is practically not possible to do such a configuration in a safe way. There is no way to force the read-only host to only read the data when they are consistent. Even on a lower level, writes to the file system are not atomic. When the read-only host picks up data while the other hosts update is not complete it will get random inconsistencies instead of correct meta-data. So as a summary: No way to do that with current UFS. Ronald
Casper.Dik at Sun.COM
2007-Aug-26 16:23 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
>Yes, thank you for confirming what I said. > >So it is possible, but not recommended, because I must take care >not to read from files for which buffers haven''t been flushed yet.Not, it''s much worse than that: UFS will not re-read cached data for the read-only mount so the read-only mount will continue to use data which is stale; e.g., once a file is opened and as long as its inode is in use (file open) or cached, you will never see a change to the file. You will never notice changes to data unless you no longer cache it. Panics and random data reads are guaranteed. Casper
Rainer J.H. Brandt
2007-Aug-26 17:52 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
Ronald, thanks for your comments. I was thinking about this scenario: Host w continuously has a UFS mounted with read/write access. Host w writes to the file f/ff/fff. Host w ceases to touch anything under f. Three hours later, host r mounts the file system read-only, reads f/ff/fff, and unmounts the file system. My assumption was: a1) This scenario won''t hurt w, a2) this scenario won''t damage the data on the file system, a3) this scenario won''t hurt r, and a4) the read operation will succeed, even if w continues with arbitrary I/O, except that it doesn''t touch anything under f until after r has unmounted the file system. Of course everything that you and Tim and Casper said is true, but I''m still inclined to try that scenario. Rainer
Rainer J.H. Brandt wrote:> Ronald, > > thanks for your comments. > > I was thinking about this scenario: > > Host w continuously has a UFS mounted with read/write access. > Host w writes to the file f/ff/fff. > Host w ceases to touch anything under f. > Three hours later, host r mounts the file system read-only, > reads f/ff/fff, and unmounts the file system. > > My assumption was: > > a1) This scenario won''t hurt w, > a2) this scenario won''t damage the data on the file system, > a3) this scenario won''t hurt r, and > a4) the read operation will succeed, > > even if w continues with arbitrary I/O, except that it doesn''t > touch anything under f until after r has unmounted the file system. > > Of course everything that you and Tim and Casper said is true, > but I''m still inclined to try that scenario.you might get lucky once (note: I said "might"), but there''s no guarantee, and sooner or later this approach *will* cause data corruption. wouldn''t it be much simpler to use NFS & automounter for this scenario (I didn''t follow the whole thread, so this may have been discussed already)? Michael -- Michael Schuster Sun Microsystems, Inc. recursion, n: see ''recursion''
Rainer, If you are looking for a means to safely "READ" any filesystem, please take a look at Availability Suite. One can safely take Point-in-Time copies of any Solaris supported filesystem, including ZFS, at any snapshot interval of one''s choosing, and then access the shadow volume on any system within the SAN, be it Fibre Channel or iSCSI. If the node wanting access to the data is distant, Available Suite also offers Remote Replication. http://www.opensolaris.org/os/project/avs/ http://www.opensolaris.org/os/project/iscsitgt/ Jim> Ronald, > > thanks for your comments. > > I was thinking about this scenario: > > Host w continuously has a UFS mounted with read/write access. > Host w writes to the file f/ff/fff. > Host w ceases to touch anything under f. > Three hours later, host r mounts the file system read-only, > reads f/ff/fff, and unmounts the file system. > > My assumption was: > > a1) This scenario won''t hurt w, > a2) this scenario won''t damage the data on the file system, > a3) this scenario won''t hurt r, and > a4) the read operation will succeed, > > even if w continues with arbitrary I/O, except that it doesn''t > touch anything under f until after r has unmounted the file system. > > Of course everything that you and Tim and Casper said is true, > but I''m still inclined to try that scenario. > > Rainer > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussJim Dunham Solaris, Storage Software Group Sun Microsystems, Inc. 1617 Southwood Drive Nashua, NH 03063 Email: James.Dunham at Sun.COM http://blogs.sun.com/avs
Rainer J.H. Brandt wrote:> Ronald, > > thanks for your comments. > > I was thinking about this scenario: > > Host w continuously has a UFS mounted with read/write access. > Host w writes to the file f/ff/fff. > Host w ceases to touch anything under f. > Three hours later, host r mounts the file system read-only, > reads f/ff/fff, and unmounts the file system. > > My assumption was: > > a1) This scenario won''t hurt w, > a2) this scenario won''t damage the data on the file system, > a3) this scenario won''t hurt r, and > a4) the read operation will succeed, > > even if w continues with arbitrary I/O, except that it doesn''t > touch anything under f until after r has unmounted the file system.If the filesystem is mounted on host w, then host w is entitled to write to it at any time. If you want to reliably ensure that w does not perform any writes, then it must be unmounted on w. Note also that mounting a filesystem read-only does not guarantee that the disk will not be written, because of atime updates (this is arguably a Unix design flaw, but still has to be taken into account). So r may also write to the disk, unless the filesystem is specifically mounted with options that prevent any physical writes.> Of course everything that you and Tim and Casper said is true, > but I''m still inclined to try that scenario.I don''t understand why you would ever want to risk this with valuable data. -- David Hopwood <david.hopwood at industrial-designers.co.uk>
On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote:> Sorry, this is a bit off-topic, but anyway: > > Ronald Kuehn writes: >> No. You can neither access ZFS nor UFS in that way. Only one >> host can mount the file system at the same time (read/write or >> read-only doesn''t matter here). > > I can see why you wouldn''t recommend trying this with UFS > (only one host knows which data has been committed to the disk), > but is it really impossible? > > I don''t see why multiple UFS mounts wouldn''t work, if only one > of them has write access. Can you elaborate?Even with a single writer you would need to be concerned with read cache invalidation on the read-only hosts and (probably harder) ensuring that read hosts don''t rely on half-written updates (since UFS doesn''t do atomic on-disk updates). Even without explicit caching on the read-only hosts there is some "implicit caching" when, for example, a read host reads a directory entry and then uses that information to access a file. The file may have been unlinked in the meantime. This means that you need atomic reads, as well as writes. Boyd
Rainer J.H. Brandt
2007-Aug-27 06:36 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
David Hopwood writes:> Note also that mounting a filesystem read-only does not guarantee that > the disk will not be written, because of atime updates (this is arguably > a Unix design flaw, but still has to be taken into account). So r mayI can mount with the -noatime option.> I don''t understand why you would ever want to risk this with valuable > data.Please don''t create the impression that I suggested that, because other readers may believe you. I didn''t suggest risking valuable data. I know about NFS, QFS, VCFS, and other reliable solutions. I asked my question out of technical curiosity, and I hereby apologize for the obvious waste of bandwith. In the future, I''ll go look at the sources instead. Rainer
> Host w continuously has a UFS mounted with read/write > access. > Host w writes to the file f/ff/fff. > Host w ceases to touch anything under f. > Three hours later, host r mounts the file system read-only, > reads f/ff/fff, and unmounts the file system.This would probably work for a non-journaled file system, because UFS will flush all of the modified data and inodes in the three hour timespan (actually, much faster than that). I''m not sure that it would work if journaling was enabled, though; I don''t recall any timeouts for pushing transactions out of the log. You could flush the journal using ''lockfs -f'' before the delay, but it''s still not 100% reliable.> a1) This scenario won''t hurt w, > a2) this scenario won''t damage the data on the file system,True (as long as R never mounts read/write).> a3) this scenario won''t hurt r, and > a4) the read operation will succeed,It could conceivably panic R if R sees an inconsistency on the file system. This is actually very unlikely in newer releases of UFS; nearly the only panics left (IIRC) are in cases where UFS is trying to allocate or deallocate a block and discovers an inconsistency in the bitmap. (Corrupt directories and inodes will simply log a warning and return EIO.) I''d be very cautious, though. QFS in multi-reader mode can solve this problem very easily, as the readers don''t cache file system metadata much (the timing is tunable), and will invalidate data caches if the metadata describing the file changes. I don''t know if Sun prices multi-reader QFS lower than shared QFS, but it''s worth talking to your salesperson. This message posted from opensolaris.org
> On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote: > > Sorry, this is a bit off-topic, but anyway: > > > > Ronald Kuehn writes: > >> No. You can neither access ZFS nor UFS in that > way. Only one > >> host can mount the file system at the same time > (read/write or > >> read-only doesn''t matter here). > > > > I can see why you wouldn''t recommend trying this > with UFS > > (only one host knows which data has been committed > to the disk), > > but is it really impossible? > > > > I don''t see why multiple UFS mounts wouldn''t work, > if only one > > of them has write access. Can you elaborate? > > Even with a single writer you would need to be > concerned with read > cache invalidation on the read-only hosts and > (probably harder) > ensuring that read hosts don''t rely on half-written > updates (since > UFS doesn''t do atomic on-disk updates). > > Even without explicit caching on the read-only hosts > there is some > "implicit caching" when, for example, a read host > reads a directory > entry and then uses that information to access a > file. The file may > have been unlinked in the meantime. This means that > you need atomic > reads, as well as writes. > > Boyd > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ssIt''s worse than this. Consider the read-only clients. When you access a filesystem object (file, directory, etc.), UFS will write metadata to update atime. I believe that there is a noatime option to mount, but I am unsure as to whether this is sufficient. my 2c. --Dave This message posted from opensolaris.org
It sounds like you are looking for a shared file system like Sun''s QFS? Take a look here http://opensolaris.org/os/project/samqfs/What_are_QFS_and_SAM/ Writes from multiple hosts go through the metadata server, basically, that handles locking and update problems. I believe there are other open source shared file systems around if you are trying to specifically address the sharing problem. David Olsen wrote:>> On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote: >> >>> Sorry, this is a bit off-topic, but anyway: >>> >>> Ronald Kuehn writes: >>> >>>> No. You can neither access ZFS nor UFS in that >>>> >> way. Only one >> >>>> host can mount the file system at the same time >>>> >> (read/write or >> >>>> read-only doesn''t matter here). >>>> >>> I can see why you wouldn''t recommend trying this >>> >> with UFS >> >>> (only one host knows which data has been committed >>> >> to the disk), >> >>> but is it really impossible? >>> >>> I don''t see why multiple UFS mounts wouldn''t work, >>> >> if only one >> >>> of them has write access. Can you elaborate? >>> >> Even with a single writer you would need to be >> concerned with read >> cache invalidation on the read-only hosts and >> (probably harder) >> ensuring that read hosts don''t rely on half-written >> updates (since >> UFS doesn''t do atomic on-disk updates). >> >> Even without explicit caching on the read-only hosts >> there is some >> "implicit caching" when, for example, a read host >> reads a directory >> entry and then uses that information to access a >> file. The file may >> have been unlinked in the meantime. This means that >> you need atomic >> reads, as well as writes. >> >> Boyd >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss >> > > It''s worse than this. Consider the read-only clients. When you access a filesystem object (file, directory, etc.), UFS will write metadata to update atime. I believe that there is a noatime option to mount, but I am unsure as to whether this is sufficient. > > my 2c. > --Dave > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
> It''s worse than this. Consider the read-only clients. When you > access a filesystem object (file, directory, etc.), UFS will write > metadata to update atime. I believe that there is a noatime option to > mount, but I am unsure as to whether this is sufficient.Is this some particular build or version that does this? I can''t find a version of UFS that updates atimes (or anything else) when mounted read-only. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Charles DeBardeleben
2007-Aug-28 15:04 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
Are you sure that UFS writes a-time on read-only filesystems? I do not think that it is supposed to. If it does, I think that this is a bug. I have mounted read-only media before, and not gotten any write errors. -Charles David Olsen wrote:>> On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote: >> >>> Sorry, this is a bit off-topic, but anyway: >>> >>> Ronald Kuehn writes: >>> >>>> No. You can neither access ZFS nor UFS in that >>>> >> way. Only one >> >>>> host can mount the file system at the same time >>>> >> (read/write or >> >>>> read-only doesn''t matter here). >>>> >>> I can see why you wouldn''t recommend trying this >>> >> with UFS >> >>> (only one host knows which data has been committed >>> >> to the disk), >> >>> but is it really impossible? >>> >>> I don''t see why multiple UFS mounts wouldn''t work, >>> >> if only one >> >>> of them has write access. Can you elaborate? >>> >> Even with a single writer you would need to be >> concerned with read >> cache invalidation on the read-only hosts and >> (probably harder) >> ensuring that read hosts don''t rely on half-written >> updates (since >> UFS doesn''t do atomic on-disk updates). >> >> Even without explicit caching on the read-only hosts >> there is some >> "implicit caching" when, for example, a read host >> reads a directory >> entry and then uses that information to access a >> file. The file may >> have been unlinked in the meantime. This means that >> you need atomic >> reads, as well as writes. >> >> Boyd >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss >> > > It''s worse than this. Consider the read-only clients. When you access a filesystem object (file, directory, etc.), UFS will write metadata to update atime. I believe that there is a noatime option to mount, but I am unsure as to whether this is sufficient. > > my 2c. > --Dave > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Casper.Dik at Sun.COM
2007-Aug-28 15:06 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
>> It''s worse than this. Consider the read-only clients. When you >> access a filesystem object (file, directory, etc.), UFS will write >> metadata to update atime. I believe that there is a noatime option to >> mount, but I am unsure as to whether this is sufficient. > >Is this some particular build or version that does this? I can''t find a >version of UFS that updates atimes (or anything else) when mounted >read-only.No that is clearly not the case; read-only mounts never write. They just cache too much, too long and that is sufficient for them to never see data or max stale and live data, corrupting the outcome of whatever process is using that data, including possible the kernel. Casper
On Tue, 28 Aug 2007, Charles DeBardeleben wrote:> Are you sure that UFS writes a-time on read-only filesystems? I do not think > that it is supposed to. If it does, I think that this is a bug. I have > mounted > read-only media before, and not gotten any write errors. > > -CharlesI think what might''ve been _meant_ here is sharing a UFS filesystem via NFS to different clients, some or all of which mount that ''NFS export'' readonly. On the NFS server, you''ll still see write activity on the backing filesystem - for access time updates. That''s in the context of this thread - "shared filesystem". UFS if mounted readonly should not write to the medium. Definitely not for atime updates. FrankH.> > David Olsen wrote: >>> On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote: >>> >>>> Sorry, this is a bit off-topic, but anyway: >>>> >>>> Ronald Kuehn writes: >>>> >>>>> No. You can neither access ZFS nor UFS in that >>>>> >>> way. Only one >>> >>>>> host can mount the file system at the same time >>>>> >>> (read/write or >>> >>>>> read-only doesn''t matter here). >>>>> >>>> I can see why you wouldn''t recommend trying this >>>> >>> with UFS >>> >>>> (only one host knows which data has been committed >>>> >>> to the disk), >>> >>>> but is it really impossible? >>>> >>>> I don''t see why multiple UFS mounts wouldn''t work, >>>> >>> if only one >>> >>>> of them has write access. Can you elaborate? >>>> >>> Even with a single writer you would need to be >>> concerned with read >>> cache invalidation on the read-only hosts and >>> (probably harder) >>> ensuring that read hosts don''t rely on half-written >>> updates (since >>> UFS doesn''t do atomic on-disk updates). >>> >>> Even without explicit caching on the read-only hosts >>> there is some >>> "implicit caching" when, for example, a read host >>> reads a directory >>> entry and then uses that information to access a >>> file. The file may >>> have been unlinked in the meantime. This means that >>> you need atomic >>> reads, as well as writes. >>> >>> Boyd >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discu >>> ss >>> >> >> It''s worse than this. Consider the read-only clients. When you access a filesystem object (file, directory, etc.), UFS will write metadata to update atime. I believe that there is a noatime option to mount, but I am unsure as to whether this is sufficient. >> >> my 2c. >> --Dave >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Tue, 28 Aug 2007, David Olsen wrote:>> On 27/08/2007, at 12:36 AM, Rainer J.H. Brandt wrote:[ ... ]>>> I don''t see why multiple UFS mounts wouldn''t work, >> if only one >>> of them has write access. Can you elaborate? >> >> Even with a single writer you would need to be >> concerned with read >> cache invalidation on the read-only hosts and >> (probably harder) >> ensuring that read hosts don''t rely on half-written >> updates (since >> UFS doesn''t do atomic on-disk updates).That synchronization issue is always there for shared filesystems. For example, the NFS specs mention it explicitly, sections 4.11 / 4.12 of RFC 1813 for reference. Some quotes: 4.11 Caching policies The NFS version 3 protocol does not define a policy for caching on the client or server. In particular, there is no support for strict cache consistency between a client and server, nor between different clients. See [Kazar] for a discussion of the issues of cache synchronization and mechanisms in several distributed file systems. 4.12 Stable versus unstable writes [ ... ] Unfortunately, client A can''t tell for sure, so it will need to retransmit the buffers, thus overwriting the changes from client B. Fortunately, write sharing is rare and the solution matches the current write sharing situation. Without using locking for synchronization, the behaviour will be indeterminate. "Just sharing" a filesystem, even when using something "made to share" like NFS, doesn''t solve writer/reader cache consistency issues. There needs to be a locking / arbitration mechanism (which in NFS is provided by rpc.lockd _AND_ the use of flock/fcntl in the applications - and which is done by a QFS-private lockmgr daemon for the "shared writer" case) if the shared resource isn''t readonly-for-everyone. As long as everyone is reader, or writes are extremely infrequent, "sharing" doesn''t cause problems. But if that makes you decide to "simply share the SAN because it [seems to] works", think again. Sometimes, a little strategic planning is advisable. FrankH.
The following seems much more complicated, much less supported, and much more prone to failure than just setting up Sun Cluster on the nodes and using it just for HA storage and the Global File System. You do not have to put the Oracle RAC instances under Sun Cluster control. On 8/25/07, Matt B <mattbreedlove at yahoo.com> wrote:> Here is what seems to be the best course of action assuming IP over FC is supported by the HBA''s (which I am pretty sure they so since this is all brand new equipment) > > Mount the shared disk backup lun on Node 1 via the FC link to the SAN as a non-redundant ZFS volume. > On node 1 RMAN (oracle backup system) will read/write its data to /backup as local disk > Configure Node 1 to NFS Publish /backup over the IP enabled FC network using the HBA''s ip address > > Nodes 2-4 will then NFS mount over the IP enabled FC connections via the FC switch > They will mount at /backup as well > > So with this we would not be using the gige network to transfer our backup data, All 4 hosts can failover the backups to each other and all our data is stored using zfs to boot, not to mention not having to buy QFS or physically move hardware at all > > Any forseeable problems with this configuration? Of course I will destroy the exiting ZFS filesystem that is on the disks now prior to setting this up since it might be corruputed > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Paul Kraus
If you have disks to experiment on & corrupt (and you will!) try this: System A mounts the SAN [b]disk[/b] and format w/ UFS System A umounts [b]disk[/b] System B mounts [b]disk[/b] B runs [i]touch x[/i] on [b]disk[/b]. System A mounts [b]disk[/b] System A and B umount [b]disk[/b] System B [i]fscks[/i] [b]disk[/b] System A [i]fscks[/i] [b]disk[/b] You [b]will[/b] find errors. This message posted from opensolaris.org
Casper.Dik at Sun.COM wrote:> > >> It''s worse than this. Consider the read-only clients. When you > >> access a filesystem object (file, directory, etc.), UFS will write > >> metadata to update atime. I believe that there is a noatime option to > >> mount, but I am unsure as to whether this is sufficient. > > > >Is this some particular build or version that does this? I can''t find a > >version of UFS that updates atimes (or anything else) when mounted > >read-only. > > No that is clearly not the case; read-only mounts never write.AFAIK, a read-only UFS mount will unroll the log and thus write to the medium. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Casper.Dik at Sun.COM
2007-Aug-29 11:31 UTC
[zfs-discuss] Single SAN Lun presented to 4 Hosts
>AFAIK, a read-only UFS mount will unroll the log and thus write to th>e medium.It does not (that''s what code inspection suggests). It will update the in-memory image with the log entries but the log will not be rolled. Casper
Casper.Dik at Sun.COM wrote:> > >AFAIK, a read-only UFS mount will unroll the log and thus write to th> >e medium. > > > It does not (that''s what code inspection suggests). > > It will update the in-memory image with the log entries but the > log will not be rolled.Why then does fsck mount the fs read-only before starting the fsck task? I thought this was in order to unroll the log first. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
> No. You can neither access ZFS nor UFS in that way. > Only one host can mount the file system at the same time > (read/write or read-only doesn''t matter here).[...]> If you don''t want to use NFS, you can use QFS in such a configuration. > The shared writer approach of QFS allows mounting the same file system > on different hosts at the same time.Thank you. We had been using multiple read-only UFS monts and one R/W mount as a "poor-man''s" technique to move data between SAN-connected hosts. Based on your discussion, this appears to be a Really Bad Idea[tm]. That said, is there a "HOWTO" anywhere on installing QFS on Solaris 9 (Sparc64) machines? Is that even possible? This message posted from opensolaris.org
Peter L. Thomas wrote:> That said, is there a "HOWTO" anywhere on installing QFS on Solaris 9 (Sparc64) machines? Is that even possible?We''ve been selling SAMFS (which qfs is a part of) for ages, long before S10 ever saw the light, so I''d be *very* surprised if it wasn''t documented with the whole qfs wad you get when you acquire (read: buy) the stuff. Michael -- Michael Schuster Recursion, n.: see ''Recursion''