Hans-Joerg Haederli - Sun Switzerland Zurich - Sun Support Services
2006-Sep-18 09:24 UTC
[zfs-discuss] ZFS and HDS ShadowImage
Hi colleagues IHAC who wants to use ZFS with his HDS box. He asks now how he can do the following: - Create ZFS pool/fs on HDS LUNs - Create Copy with ShadowImage inside HDS - Disconnect ShadowImage - Import ShadowImage with ZFS in addition to the existing ZFS pool/fs I wonder how ZFS is handling this. But it should be no issue. Any suggestions ? Please reply to me directly as I''m not on this alias. TIA Regards Joerg -- * Hans-Joerg Haederli* Product Responsible Manager Server Switzerland hans-joerg.haederli at sun.com Voice +41 (0)44 908 90 00 Fax +41 (0)44 908 99 01 * Sun Microsystems (Schweiz) AG* Javastrasse 2/Hegnau CH-8604 Volketswil Switzerland www.sun.ch
Hans-Joerg Haederli - Sun Switzerland Zurich - Sun Support Services wrote:> Hi colleagues > > IHAC who wants to use ZFS with his HDS box. He asks now how he can do > the > following: > > - Create ZFS pool/fs on HDS LUNs > - Create Copy with ShadowImage inside HDS > - Disconnect ShadowImage > - Import ShadowImage with ZFS in addition to the existing ZFS pool/fs > > I wonder how ZFS is handling this. But it should be no issue.This question has been asked a few times with no good answers. There are two underlying issues that I can''t square away as I don''t have gear to test with. 1 - ZFS is self consistent but if you take a LUN snapshot then any transactions in flight might not be completed and the pool - Which you need to snap in its entirety - might not be consistent. The more LUNs you have in the pool the more problematic this could get. Exporting the pool first would probably get around this issue. 2 - If you import LUNs with the same label or ID as a currently mounted pool then ZFS will .... no one seems to know. For example: I have a pool on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait a few days. I then present LUNs X'' and Y'' to the host. What happens? Make it even more complex and present all the LUNs to the host after a reboot. Do you get different parts of the pool from different LUNs? Does ZFS say, "What the hell?!??!" I''d love to have an answer but, again, no gear to test with at this time.
On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon wrote:> > 1 - ZFS is self consistent but if you take a LUN snapshot then any > transactions in flight might not be completed and the pool - Which you > need to snap in its entirety - might not be consistent. The more LUNs > you have in the pool the more problematic this could get. Exporting the > pool first would probably get around this issue.This isn''t true. The snapshot will be entirely consistent - you will have just lost the last few seconds of non-synchronous writes.> 2 - If you import LUNs with the same label or ID as a currently mounted > pool then ZFS will .... no one seems to know. For example: I have a pool > on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, > ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait a few > days. I then present LUNs X'' and Y'' to the host. What happens? Make it > even more complex and present all the LUNs to the host after a reboot. > Do you get different parts of the pool from different LUNs? Does ZFS > say, "What the hell?!??!"ZFS will not allow you to import the second pool (I believe it won''t even present the pool as a valid option to import). Each pool is identified by a unique GUID. You cannot have two pools active on the system with the same GUID. If this is really a valid use case, we could invent a way to assign a new GUID on import. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
It''s a valid use case in the high-end enterprise space. While it probably makes good sense to use ZFS for snapshot creation, there are still cases where array-based snapshots/clones/BCVs make sense. (DR/Array-based replication, data-verification, separate spindle-pool, legacy/migration reasons, and a few other scenarios) In the VxVM world, there are wrappers/utilities that allow you to change the VxVM disk-signature to something OTHER than the original DG name, allowing you to import the "cloned diskgroup" back onto the same system with a different name. Something similar for ZFS while not "pretty" (or likely to be supported :-) would possibly be a good start for some customers while a more supportable method is looked into. My 2 cents, -- MikeE Michael J. Ellis (mike.ellis at fidelity.com) FISC/UNIX Engineering 400 Puritan Way (M2G) Marlborough, MA 01752 Phone: 508-787-8564 -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Eric Schrock Sent: Monday, September 18, 2006 2:42 PM To: Torrey McMahon Cc: zfs-discuss at opensolaris.org; j.haederli at sun.com Subject: Re: [zfs-discuss] ZFS and HDS ShadowImage On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon wrote:> > 1 - ZFS is self consistent but if you take a LUN snapshot then any > transactions in flight might not be completed and the pool - Which you> need to snap in its entirety - might not be consistent. The more LUNs > you have in the pool the more problematic this could get. Exportingthe> pool first would probably get around this issue.This isn''t true. The snapshot will be entirely consistent - you will have just lost the last few seconds of non-synchronous writes.> 2 - If you import LUNs with the same label or ID as a currentlymounted> pool then ZFS will .... no one seems to know. For example: I have apool> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, > ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait a few > days. I then present LUNs X'' and Y'' to the host. What happens? Make it> even more complex and present all the LUNs to the host after a reboot.> Do you get different parts of the pool from different LUNs? Does ZFS > say, "What the hell?!??!"ZFS will not allow you to import the second pool (I believe it won''t even present the pool as a valid option to import). Each pool is identified by a unique GUID. You cannot have two pools active on the system with the same GUID. If this is really a valid use case, we could invent a way to assign a new GUID on import. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Sep 18, 2006, at 14:41, Eric Schrock wrote:> >> 2 - If you import LUNs with the same label or ID as a currently >> mounted >> pool then ZFS will .... no one seems to know. For example: I have >> a pool >> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, >> ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait a few >> days. I then present LUNs X'' and Y'' to the host. What happens? >> Make it >> even more complex and present all the LUNs to the host after a >> reboot. >> Do you get different parts of the pool from different LUNs? Does ZFS >> say, "What the hell?!??!" > > ZFS will not allow you to import the second pool (I believe it won''t > even present the pool as a valid option to import). Each pool is > identified by a unique GUID. You cannot have two pools active on the > system with the same GUID. If this is really a valid use case, we > could > invent a way to assign a new GUID on import.err .. i believe the point is that you will have multiple disks claiming to be the same disk which can wreak havoc on a system (eg: I''ve got a 4 disk pool with a unique GUID and 8 disks claiming to be part of that same pool) - it''s the same problem on VxVM with storing the identifier in the private region on the disks - when you do bit level replication it''s always blind to the upper-level, host-based, logical volume groupings .. if this is the case - you''re probably best using the latest leadville patch (119130 or 119131) and maintaining blacklists for what should be seen by the system. You can also zone the BCVs or SI copies on the controller port to prevent name collisions, but if you can''t modify the portlist (eg: EMC bin file changes) then the host based blacklist is going to be the way to go. Jonathan
I''m really not an expert on ZFS, but at least from my point to handle such cases ZFS has to handle at least the following points - GUID a new/different GUID has to be assigned - LUNs ZFS has to be aware that device trees are different, if these are part of some kind of metadata stored on the pools/fs - FS Have to be mounted somewhere else It looks as this has not been implemented yet nor even tested. For Desaster Recovery this looks as a usefull way if it would work ;-) Isn''t it ? Regards Joerg Jonathan Edwards wrote:> > On Sep 18, 2006, at 14:41, Eric Schrock wrote: > >> >>> 2 - If you import LUNs with the same label or ID as a currently mounted >>> pool then ZFS will .... no one seems to know. For example: I have a >>> pool >>> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, >>> ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait a few >>> days. I then present LUNs X'' and Y'' to the host. What happens? Make it >>> even more complex and present all the LUNs to the host after a reboot. >>> Do you get different parts of the pool from different LUNs? Does ZFS >>> say, "What the hell?!??!" >> >> >> ZFS will not allow you to import the second pool (I believe it won''t >> even present the pool as a valid option to import). Each pool is >> identified by a unique GUID. You cannot have two pools active on the >> system with the same GUID. If this is really a valid use case, we could >> invent a way to assign a new GUID on import. > > > err .. i believe the point is that you will have multiple disks > claiming to be the same disk which can wreak havoc on a system (eg: > I''ve got a 4 disk pool with a unique GUID and 8 disks claiming to be > part of that same pool) - it''s the same problem on VxVM with storing > the identifier in the private region on the disks - when you do bit > level replication it''s always blind to the upper-level, host-based, > logical volume groupings .. if this is the case - you''re probably best > using the latest leadville patch (119130 or 119131) and maintaining > blacklists for what should be seen by the system. You can also zone > the BCVs or SI copies on the controller port to prevent name > collisions, but if you can''t modify the portlist (eg: EMC bin file > changes) then the host based blacklist is going to be the way to go. > > Jonathan > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, Sep 18, 2006 at 03:29:49PM -0400, Jonathan Edwards wrote:> > err .. i believe the point is that you will have multiple disks > claiming to be the same disk which can wreak havoc on a system (eg: > I''ve got a 4 disk pool with a unique GUID and 8 disks claiming to be > part of that same pool) - it''s the same problem on VxVM with storing > the identifier in the private region on the disks - when you do bit > level replication it''s always blind to the upper-level, host-based, > logical volume groupings .. if this is the case - you''re probably > best using the latest leadville patch (119130 or 119131) and > maintaining blacklists for what should be seen by the system. You > can also zone the BCVs or SI copies on the controller port to prevent > name collisions, but if you can''t modify the portlist (eg: EMC bin > file changes) then the host based blacklist is going to be the way to > go.I don''t understand how this changes my explanation at all. If you have multiple disks ''claiming to be the same disk'', does this mean that they actually show up as the same /dev/dsk/* path depening on blind luck? If so, that''s well below the level of ZFS. If they show up as different paths and/or devids, then ZFS will behave exactly as I described and you will be perfectly safe - you just won''t be able to import the pool from the mirrored devices. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Mon, Sep 18, 2006 at 10:06:21PM +0200, Joerg Haederli wrote:> I''m really not an expert on ZFS, but at least from my point to > handle such cases ZFS has to handle at least the following points > > - GUID a new/different GUID has to be assignedAs I mentioned previously, ZFS handles this gracefully in the sense that it doesn''t allow two pools with the same GUID to exist on the system.> - LUNs ZFS has to be aware that device trees are different, if > these are part of some kind of metadata stored on the pools/fsAs long as they appear as separate devices under Solaris, ZFS will handle this today.> - FS Have to be mounted somewhere elseI don''t understand what you''re suggesting here.> It looks as this has not been implemented yet nor even tested.What hasn''t been implemented? As far as I can tell, this is a request for the previously mentioned RFE (ability to change GUIDs on import). I''m not sure what you mean by an unimplemented RFE being "nor even tested". - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric Schrock wrote:> On Mon, Sep 18, 2006 at 10:06:21PM +0200, Joerg Haederli wrote: > >> It looks as this has not been implemented yet nor even tested. >> > > What hasn''t been implemented? As far as I can tell, this is a request > for the previously mentioned RFE (ability to change GUIDs on import). > I''m not sure what you mean by an unimplemented RFE being "nor even > tested". >From a previous email to the list ... ShadowImage takes a snapshot of the LUN, and copies all the blocks to a new LUN (physical copy). In our case the new LUN is then made available on the same host as the original LUN. After the ShadowImage is taken, we can see the snapshop using the format(1M) command as an additional disk. But when running "zpool import" , it only says: "no pools available to import". I think this is a bug. At least it should say something like "pool with the same name already imported". I have only tested this on 10 06/06, but I haven''t found anything similar in the bug database, so it has to be in OpenSolaris as well. Its not the transport layer. It works fine as the LUN IDs are different and the devices will come up with different /dev/dsk entries. (And if not then you can fix that on the array in most cases.) The problem is that devices are present with the same GUID and the behavior of ZFS is unknown. Here''s an example: I''ve three LUNs in a ZFS pool offered from my HW raid array. I take a snapshot onto three other LUNs. A day later I turn the host off. I go to the array and offer all six LUNs, the pool that was in use as well as the snapshot that I took a day previously, and offer all three LUNs to the host. The host comes up and automagically adds all the LUNs to the host with correct /dev/dsk entries. What happens?
Torrey McMahon wrote:> A day later I turn the host off. I go to the array and offer all six > LUNs, the pool that was in use as well as the snapshot that I took a > day previously, and offer all three LUNs to the host.Errr....that should be.... A day later I turn the host off. I go to the array and offer all six LUNs, the pool that was in use as well as the snapshot that I took a day previously, to the host. I so need an editor.
Joerg Haederli wrote:> I''m really not an expert on ZFS, but at least from my point to > handle such cases ZFS has to handle at least the following points > - GUID a new/different GUID has to be assigned > - LUNs ZFS has to be aware that device trees are different, if > these are part of some kind of metadata stored on the pools/fs > - FS Have to be mounted somewhere else > It looks as this has not been implemented yet nor even tested. > > For Desaster Recovery this looks as a usefull way if it would work ;-)In my experience, we would not normally try to mount two different copies of the same data at the same time on a single host. To avoid confusion, we would especially not want to do this if the data represents two different points of time. I would encourage you to stick with more traditional, tried, and true disaster recovery methods. Remember: disaster recovery is almost entirely a process, not technology. -- richard
> In my experience, we would not normally try to mount two different > copies of the same data at the same time on a single host. To avoid > confusion, we would especially not want to do this if the data represents > two different points of time. I would encourage you to stick with more > traditional, tried, and true disaster recovery methods. Remember: disaster > recovery is almost entirely a process, not technology.Darn straight. Of course those administrators keep asking for it anyway. The VxVM list gets a somewhat consistent stream of requests asking about issues similar to this. Until very recently there was no general tool to help with this. The unsupported method of destroying volume information to create new unique volumes wasn''t dangerous enough to keep people from using this technique. :-) ZFS is different enough that the techniques used on VxVM do not apply. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
On Mon, Sep 18, 2006 at 06:03:47PM -0400, Torrey McMahon wrote:> > Its not the transport layer. It works fine as the LUN IDs are different > and the devices will come up with different /dev/dsk entries. (And if > not then you can fix that on the array in most cases.) The problem is > that devices are present with the same GUID and the behavior of ZFS is > unknown.It''s not unknown, as I''ve been trying to explain.> Here''s an example: I''ve three LUNs in a ZFS pool offered from my HW raid > array. I take a snapshot onto three other LUNs. A day later I turn the > host off. I go to the array and offer all six LUNs, the pool that was in > use as well as the snapshot that I took a day previously, and offer all > three LUNs to the host. The host comes up and automagically adds all the > LUNs to the host with correct /dev/dsk entries. > > What happens?ZFS will use the existing pool as defined in the cache file, which in this case will still contain the correct devices. The new mirrored LUNs will not be used. They will not show as available pools to import because the pool GUID is in use. A reasonable bug is to report this inconsistency (ostensibly part of a pool but not present in the current config), though there are some tricky edge conditions. A more complicated RFE would be to detect this as a self-consistent version of the same pool, and have a way to change the GUID on import. If you export the pool before you poweroff the host, and then want to import one of the two pools, the version with the most recent uberblock will "win". If they both have the same uberblock (i.e. are really the identical mirror), the results are non-deterministic. Depending on the order in which devices are discovered, you may end up with one pool or the other, or some combination of both. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Sep 18, 2006, at 23:16, Eric Schrock wrote:> >> Here''s an example: I''ve three LUNs in a ZFS pool offered from my >> HW raid >> array. I take a snapshot onto three other LUNs. A day later I turn >> the >> host off. I go to the array and offer all six LUNs, the pool that >> was in >> use as well as the snapshot that I took a day previously, and >> offer all >> three LUNs to the host. The host comes up and automagically adds >> all the >> LUNs to the host with correct /dev/dsk entries. >> >> What happens? > > ZFS will use the existing pool as defined in the cache file, which in > this case will still contain the correct devices. The new mirrored > LUNs > will not be used. They will not show as available pools to import > because the pool GUID is in use. A reasonable bug is to report this > inconsistency (ostensibly part of a pool but not present in the > current > config), though there are some tricky edge conditions. A more > complicated RFE would be to detect this as a self-consistent > version of > the same pool, and have a way to change the GUID on import. > > If you export the pool before you poweroff the host, and then want to > import one of the two pools, the version with the most recent > uberblock > will "win". If they both have the same uberblock (i.e. are really the > identical mirror), the results are non-deterministic. Depending > on the > order in which devices are discovered, you may end up with one pool or > the other, or some combination of both.ah .. there we go - so we have an interaction between an uberblock date and prioritization on the import .. very keen. The non- deterministic case is well known in other self-describing pools or diskgroups (eg: vxdg) and where the 6385531 RFE/bug came from on Leadville to provide more options for sites that lack flexibility on the SAN and presentation ports to mask out replicated disks. I guess there''s a couple of corner cases that you may have already considered that would be good to explain: 1) If the zpool was imported when the split was done, can the secondary pool be imported by another host if the /dev/dsk entries are different? I''m assuming that you could simply use the -f option .. would the guid change? 2) If the guid does indeed change could this zpool then be imported back on the first host at the same time by specifying the secondary guid instead of the pool name? 3) Can the same zpool be mounted on two separate hosts at the same time .. in other words what happens when a second host tries to import -f a zpool that''s already mounted by the first host? Jonathan
On Mon, Sep 18, 2006 at 11:55:27PM -0400, Jonathan Edwards wrote:> > 1) If the zpool was imported when the split was done, can the > secondary pool be imported by another host if the /dev/dsk entries > are different? I''m assuming that you could simply use the -f > option .. would the guid change?Yes, the pool can be imported on another host. However, you cannot change the GUID (short of writing a custom tool do rewrite the labels).> 2) If the guid does indeed change could this zpool then be imported > back on the first host at the same time by specifying the secondary > guid instead of the pool name?Yes, ''zpool import'' allows pools to be imported by GUID, and the pool name can be changed as part of the import to not conflict with the existing name.> 3) Can the same zpool be mounted on two separate hosts at the same > time .. in other words what happens when a second host tries to > import -f a zpool that''s already mounted by the first host?No, ZFS does not support active clustering from multiple hosts. You will end up with corrupted data. See recent discussions about enhancements to prevent this from happening accidentally (preventing auto-open on boot if it''s been written to from another host, providing hostname and last write time for ''zpool import'', etc). - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
>This isn''t true. The snapshot will be entirely consistent - you will >have just lost the last few seconds of non-synchronous writes.Eric, I don?t see how this can be the case for a pool backed by multiple LUNs. Take the simple striped case, with two LUNs, 0 and 1. If I take a snapshot of LUN 0 on Monday, and a snapshot of LUN 1 on Tuesday, those two snapshots will not form a consistent ZFS pool. For two snapshots taken only a second apart, there''s more chance that they will be consistent, but it?s still not guaranteed. I?m not sure whether HDS allows a snapshot of multiple LUNs to be taken atomically, which is required to take a consistent snapshot of a multi-LUN file system like ZFS or QFS (or, for that matter, UFS over SVM). For UFS, ?lockfs -w? allows consistency. QFS is missing this. I don?t think it?s implemented for ZFS yet either, though it seems it would be fairly simple to implement (simply pause after the current transaction group and don?t start another; perhaps writes to the intent log should be paused as well). Anton This message posted from opensolaris.org
On Tue, Sep 19, 2006 at 10:52:52AM -0700, Anton B. Rang wrote:> > I don?t see how this can be the case for a pool backed by multiple > LUNs. Take the simple striped case, with two LUNs, 0 and 1. If I take > a snapshot of LUN 0 on Monday, and a snapshot of LUN 1 on Tuesday, > those two snapshots will not form a consistent ZFS pool. > > For two snapshots taken only a second apart, there''s more chance that > they will be consistent, but it?s still not guaranteed. > > I?m not sure whether HDS allows a snapshot of multiple LUNs to be > taken atomically, which is required to take a consistent snapshot of a > multi-LUN file system like ZFS or QFS (or, for that matter, UFS over > SVM). For UFS, ?lockfs -w? allows consistency. QFS is missing this. I > don?t think it?s implemented for ZFS yet either, though it seems it > would be fairly simple to implement (simply pause after the current > transaction group and don?t start another; perhaps writes to the > intent log should be paused as well).Ah. I has assumed that ''taking a LUN snapshot'' was an atomic operation across all LUNs in the pool. If this isn''t possible, then you are correct - you can easily end up with inconsistent and corrupted state. Taking a snapshot of a single LUN pool will not lead to inconsistency. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Hey Tony... When (properly) doing Array-based snapshots/BCVs with EMC/Hitachi/what-have you arrays, you create "lun groups" out of the luns you''re interested in snappin''. You then perform snapshot/clone operations on that "lun group" which will make it atomic across all members of that group. Where things get a lot more interesting is with luns (belonging to the same "snap/clone" group) that live on different arrays. I''m not sure where the vendors are with the concept of federated (still atomic) snapshots, but I suggest avoiding such a configuration entirely thereby side-stepping the issue. My 2 cents, -- MikeE -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Anton B. Rang Sent: Tuesday, September 19, 2006 1:53 PM To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] Re: ZFS and HDS ShadowImage>This isn''t true. The snapshot will be entirely consistent - you will >have just lost the last few seconds of non-synchronous writes.Eric, I don''t see how this can be the case for a pool backed by multiple LUNs. Take the simple striped case, with two LUNs, 0 and 1. If I take a snapshot of LUN 0 on Monday, and a snapshot of LUN 1 on Tuesday, those two snapshots will not form a consistent ZFS pool. For two snapshots taken only a second apart, there''s more chance that they will be consistent, but it''s still not guaranteed. I''m not sure whether HDS allows a snapshot of multiple LUNs to be taken atomically, which is required to take a consistent snapshot of a multi-LUN file system like ZFS or QFS (or, for that matter, UFS over SVM). For UFS, ''lockfs -w'' allows consistency. QFS is missing this. I don''t think it''s implemented for ZFS yet either, though it seems it would be fairly simple to implement (simply pause after the current transaction group and don''t start another; perhaps writes to the intent log should be paused as well). Anton This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Eric Schrock wrote:> On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon wrote: > >> 1 - ZFS is self consistent but if you take a LUN snapshot then any >> transactions in flight might not be completed and the pool - Which you >> need to snap in its entirety - might not be consistent. The more LUNs >> you have in the pool the more problematic this could get. Exporting the >> pool first would probably get around this issue. >> > > This isn''t true. The snapshot will be entirely consistent - you will > have just lost the last few seconds of non-synchronous writes. >When a synchronous write comes in does it wait for other pending i/o to complete? Which writes are tagged as synchronous? (Checksums and uberblocks?) If you take a snapshot of the different devices in a pool without using a transaction group of some kind you could still be out of whack but that would be a bad idea in the first place.> >> 2 - If you import LUNs with the same label or ID as a currently mounted >> pool then ZFS will .... no one seems to know. For example: I have a pool >> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, >> ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait a few >> days. I then present LUNs X'' and Y'' to the host. What happens? Make it >> even more complex and present all the LUNs to the host after a reboot. >> Do you get different parts of the pool from different LUNs? Does ZFS >> say, "What the hell?!??!" >> > > ZFS will not allow you to import the second pool (I believe it won''t > even present the pool as a valid option to import). Each pool is > identified by a unique GUID. You cannot have two pools active on the > system with the same GUID. If this is really a valid use case, we could > invent a way to assign a new GUID on import. > > - Eric > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock >
Darren Dunham wrote:>> In my experience, we would not normally try to mount two different >> copies of the same data at the same time on a single host. To avoid >> confusion, we would especially not want to do this if the data represents >> two different points of time. I would encourage you to stick with more >> traditional, tried, and true disaster recovery methods. Remember: disaster >> recovery is almost entirely a process, not technology. >> > > Darn straight. > > Of course those administrators keep asking for it anyway. The VxVM list > gets a somewhat consistent stream of requests asking about issues > similar to this. >Think data mining or prod/dev/test environments, I might want to take a snapshot of my Data with a capital D and perform some set of operations on it. Of those you have two general use cases: * A copy of the data set on the same host but used by a different application. A ZFS snapshot might meet the requirements in some of those cases. However, in a lot of those cases you''re going to want the copy of the data set on different physical media so as not to interfere with the performance of your currently in-use application. * A copy of the dataset on a different host. zfs send/recv might meet the requirements in some of these cases. However, you may run into time issues where customers want the snapshot *now* and don''t want to wait for what could be a lengthy send/recv operation. Since a ZFS pool is, for lack of better terms, the current least common denominator when it comes to snapshots, host connectivity, and performance people are going to want to use HW raid arrays and the snapshot mechanisms. Oh...and for DR too. One point to choke in a data center so we better figure that one out too. ;)
still more below... Torrey McMahon wrote:> Darren Dunham wrote: >>> In my experience, we would not normally try to mount two different >>> copies of the same data at the same time on a single host. To avoid >>> confusion, we would especially not want to do this if the data >>> represents >>> two different points of time. I would encourage you to stick with more >>> traditional, tried, and true disaster recovery methods. Remember: >>> disaster >>> recovery is almost entirely a process, not technology. >>> >> >> Darn straight. >> >> Of course those administrators keep asking for it anyway. The VxVM list >> gets a somewhat consistent stream of requests asking about issues >> similar to this. > > > Think data mining or prod/dev/test environments, I might want to take a > snapshot of my Data with a capital D and perform some set of operations > on it. Of those you have two general use cases: > > * A copy of the data set on the same host but used by a different > application. A ZFS snapshot might meet the requirements in some of > those cases. However, in a lot of those cases you''re going to want > the copy of the data set on different physical media so as not to > interfere with the performance of your currently in-use application. > * A copy of the dataset on a different host. zfs send/recv might > meet the requirements in some of these cases. However, you may run > into time issues where customers want the snapshot *now* and don''t > want to wait for what could be a lengthy send/recv operation. > > > Since a ZFS pool is, for lack of better terms, the current least common > denominator when it comes to snapshots, host connectivity, and > performance people are going to want to use HW raid arrays and the > snapshot mechanisms. > > Oh...and for DR too. One point to choke in a data center so we better > figure that one out too. ;)[caveat: I haven''t tried this] My thought is that once you make a ZFS snapshot, you''re golden. The snapshot is read-only and the later changes to the pool won''t affect it. Once you import the ShadowImage onto the other (dev/test) machine, you can clone the snapshot and be off to the races. -- richard
> My thought is that once you make a ZFS snapshot, you''re golden. The > snapshot is read-only and the later changes to the pool won''t affect > it.Close. The snapshot is read-only but the pointers to it are read-write (since they are all descendants of the ?berblock). If you do get a clean copy of the snapshot, you should be fine; but there''s a tiny chance that you won''t see the snapshot at all, or it will turn out damaged, if you are snapshotting LUNs non-atomically. Easy to recover from but probably a manual process. :-( Anton This message posted from opensolaris.org
Torrey McMahon wrote On 09/19/06 16:29,:> Eric Schrock wrote: > >> On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon wrote: >> >> >>> 1 - ZFS is self consistent but if you take a LUN snapshot then any >>> transactions in flight might not be completed and the pool - Which >>> you need to snap in its entirety - might not be consistent. The more >>> LUNs you have in the pool the more problematic this could get. >>> Exporting the pool first would probably get around this issue. >>> >> >> >> This isn''t true. The snapshot will be entirely consistent - you will >> have just lost the last few seconds of non-synchronous writes. >>> When a synchronous write comes in does it wait for other pending i/o to > complete? No. The ZFS Intent Log (ZIL) writes out a record for the transacation (TX_WRITE, TX_ACL, TX_CREATE, TX_TRUNCATE, etc) and any other transactions that may be dependents. > Which writes are tagged as synchronous? (Checksums and uberblocks?) Transactions arrive marked as synchronous with O_DSYNC/O_SYNC/O_RSYNC or are flushed synchronously as a result of VOP_FSYNC from nfs or fsync. Neil.