thr3ads.net - zfs discuss - [zfs-discuss] ZFS and HDS ShadowImage [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Hans-Joerg Haederli - Sun Switzerland Zurich - Sun Support Services

2006-Sep-18 09:24 UTC

[zfs-discuss] ZFS and HDS ShadowImage

Hi colleagues

IHAC who wants to use ZFS with his HDS box.  He asks now how he can do the
following:

 - Create ZFS pool/fs on HDS LUNs
 - Create Copy  with ShadowImage inside HDS
 - Disconnect ShadowImage
 - Import ShadowImage with ZFS in addition to the existing ZFS pool/fs

I wonder how ZFS is handling this. But it should be no issue.

Any suggestions ?


Please reply to me directly as I''m not on this alias.

TIA

Regards

Joerg
-- 
	  	* Hans-Joerg Haederli*
Product Responsible Manager Server Switzerland
hans-joerg.haederli at sun.com
Voice   	+41 (0)44 908 90 00
Fax   	+41 (0)44 908 99 01

	  	* Sun Microsystems (Schweiz) AG*
Javastrasse 2/Hegnau
CH-8604 Volketswil
Switzerland

www.sun.ch

Torrey McMahon

2006-Sep-18 18:20 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Hans-Joerg Haederli - Sun Switzerland Zurich - Sun Support Services
wrote:> Hi colleagues
>
> IHAC who wants to use ZFS with his HDS box.  He asks now how he can do 
> the
> following:
>
> - Create ZFS pool/fs on HDS LUNs
> - Create Copy  with ShadowImage inside HDS
> - Disconnect ShadowImage
> - Import ShadowImage with ZFS in addition to the existing ZFS pool/fs
>
> I wonder how ZFS is handling this. But it should be no issue.

This question has been asked a few times with no good answers. There are 
two underlying issues that I can''t square away as I don''t have
gear to
test with.

1 - ZFS is self consistent but if you take a LUN snapshot then any 
transactions in flight might not be completed and the pool - Which you 
need to snap in its entirety - might not be consistent. The more LUNs 
you have in the pool the more problematic this could get. Exporting the 
pool first would probably get around this issue.

2 - If you import LUNs with the same label or ID as a currently mounted 
pool then ZFS will .... no one seems to know. For example: I have a pool 
on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, 
ignoring issue #1 above for now, to LUN X'' and LUN Y'' and wait
a few
days. I then present LUNs X'' and Y'' to the host. What happens?
Make it
even more complex and present all the LUNs to the host after a reboot. 
Do you get different parts of the pool from different LUNs? Does ZFS 
say, "What the hell?!??!"

I''d love to have an answer but, again, no gear to test with at this
time.

Eric Schrock

2006-Sep-18 18:41 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon
wrote:> 
> 1 - ZFS is self consistent but if you take a LUN snapshot then any 
> transactions in flight might not be completed and the pool - Which you 
> need to snap in its entirety - might not be consistent. The more LUNs 
> you have in the pool the more problematic this could get. Exporting the 
> pool first would probably get around this issue.
This isn''t true.  The snapshot will be entirely consistent - you will
have just lost the last few seconds of non-synchronous writes.
> 2 - If you import LUNs with the same label or ID as a currently mounted 
> pool then ZFS will .... no one seems to know. For example: I have a pool 
> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, 
> ignoring issue #1 above for now, to LUN X'' and LUN Y'' and
wait a few
> days. I then present LUNs X'' and Y'' to the host. What
happens? Make it
> even more complex and present all the LUNs to the host after a reboot. 
> Do you get different parts of the pool from different LUNs? Does ZFS 
> say, "What the hell?!??!"
ZFS will not allow you to import the second pool (I believe it won''t
even present the pool as a valid option to import).  Each pool is
identified by a unique GUID.  You cannot have two pools active on the
system with the same GUID.  If this is really a valid use case, we could
invent a way to assign a new GUID on import.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Ellis, Mike

2006-Sep-18 19:01 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

It''s a valid use case in the high-end enterprise space.

While it probably makes good sense to use ZFS for snapshot creation,
there are still cases where array-based snapshots/clones/BCVs make
sense. (DR/Array-based replication, data-verification, separate
spindle-pool, legacy/migration reasons, and a few other scenarios)

In the VxVM world, there are wrappers/utilities that allow you to change
the VxVM disk-signature to something OTHER than the original DG name,
allowing you to import the "cloned diskgroup" back onto the same
system
with a different name. Something similar for ZFS while not "pretty"
(or
likely to be supported :-) would possibly be a good start for some
customers while a more supportable method is looked into.

My 2 cents,

 -- MikeE

Michael J. Ellis (mike.ellis at fidelity.com)
FISC/UNIX Engineering
400 Puritan Way (M2G)
Marlborough, MA 01752
Phone: 508-787-8564 

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Eric Schrock
Sent: Monday, September 18, 2006 2:42 PM
To: Torrey McMahon
Cc: zfs-discuss at opensolaris.org; j.haederli at sun.com
Subject: Re: [zfs-discuss] ZFS and HDS ShadowImage

On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon
wrote:> 
> 1 - ZFS is self consistent but if you take a LUN snapshot then any 
> transactions in flight might not be completed and the pool - Which you
> need to snap in its entirety - might not be consistent. The more LUNs 
> you have in the pool the more problematic this could get. Exporting
the > pool first would probably get around this issue.
This isn''t true.  The snapshot will be entirely consistent - you will
have just lost the last few seconds of non-synchronous writes.
> 2 - If you import LUNs with the same label or ID as a currently
mounted > pool then ZFS will .... no one seems to know. For example: I have a
pool > on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, 
> ignoring issue #1 above for now, to LUN X'' and LUN Y'' and
wait a few
> days. I then present LUNs X'' and Y'' to the host. What
happens? Make it
> even more complex and present all the LUNs to the host after a reboot.
> Do you get different parts of the pool from different LUNs? Does ZFS 
> say, "What the hell?!??!"
ZFS will not allow you to import the second pool (I believe it won''t
even present the pool as a valid option to import).  Each pool is
identified by a unique GUID.  You cannot have two pools active on the
system with the same GUID.  If this is really a valid use case, we could
invent a way to assign a new GUID on import.

- Eric

--
Eric Schrock, Solaris Kernel Development
http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jonathan Edwards

2006-Sep-18 19:29 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Sep 18, 2006, at 14:41, Eric Schrock wrote:
>
>> 2 - If you import LUNs with the same label or ID as a currently  
>> mounted
>> pool then ZFS will .... no one seems to know. For example: I have  
>> a pool
>> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y,
>> ignoring issue #1 above for now, to LUN X'' and LUN Y''
and wait a few
>> days. I then present LUNs X'' and Y'' to the host. What
happens?
>> Make it
>> even more complex and present all the LUNs to the host after a  
>> reboot.
>> Do you get different parts of the pool from different LUNs? Does ZFS
>> say, "What the hell?!??!"
>
> ZFS will not allow you to import the second pool (I believe it
won''t
> even present the pool as a valid option to import).  Each pool is
> identified by a unique GUID.  You cannot have two pools active on the
> system with the same GUID.  If this is really a valid use case, we  
> could
> invent a way to assign a new GUID on import.
err .. i believe the point is that you will have multiple disks  
claiming to be the same disk which can wreak havoc on a system (eg:  
I''ve got a 4 disk pool with a unique GUID and 8 disks claiming to be  
part of that same pool) - it''s the same problem on VxVM with storing  
the identifier in the private region on the disks - when you do bit  
level replication it''s always blind to the upper-level, host-based,  
logical volume groupings .. if this is the case - you''re probably  
best using the latest leadville patch (119130 or 119131) and  
maintaining blacklists for what should be seen by the system.  You  
can also zone the BCVs or SI copies on the controller port to prevent  
name collisions, but if you can''t modify the portlist (eg: EMC bin  
file changes) then the host based blacklist is going to be the way to  
go.

Jonathan

Joerg Haederli

2006-Sep-18 20:06 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

I''m really not an expert on ZFS, but at least from my point to
handle such cases ZFS has to handle at least the following points
   - GUID   a new/different GUID has to be assigned
   - LUNs   ZFS has to be aware that device trees are different, if
     these are part of some kind of metadata stored on the pools/fs
   - FS     Have to be mounted somewhere else
It looks as this has not been implemented yet nor even tested.

For Desaster Recovery this looks as a usefull way if it would work ;-)

Isn''t it ?

Regards

Joerg

Jonathan Edwards wrote:> 
> On Sep 18, 2006, at 14:41, Eric Schrock wrote:
> 
>>
>>> 2 - If you import LUNs with the same label or ID as a currently 
mounted
>>> pool then ZFS will .... no one seems to know. For example: I have 
a
>>> pool
>>> on two LUNS X and Y called mypool. I take a snapshot of LUN X &
Y,
>>> ignoring issue #1 above for now, to LUN X'' and LUN
Y'' and wait a few
>>> days. I then present LUNs X'' and Y'' to the host.
What happens?  Make it
>>> even more complex and present all the LUNs to the host after a 
reboot.
>>> Do you get different parts of the pool from different LUNs? Does
ZFS
>>> say, "What the hell?!??!"
>>
>>
>> ZFS will not allow you to import the second pool (I believe it
won''t
>> even present the pool as a valid option to import).  Each pool is
>> identified by a unique GUID.  You cannot have two pools active on the
>> system with the same GUID.  If this is really a valid use case, we 
could
>> invent a way to assign a new GUID on import.
> 
> 
> err .. i believe the point is that you will have multiple disks  
> claiming to be the same disk which can wreak havoc on a system (eg:  
> I''ve got a 4 disk pool with a unique GUID and 8 disks claiming to
be
> part of that same pool) - it''s the same problem on VxVM with
storing
> the identifier in the private region on the disks - when you do bit  
> level replication it''s always blind to the upper-level,
host-based,
> logical volume groupings .. if this is the case - you''re probably 
best
> using the latest leadville patch (119130 or 119131) and  maintaining 
> blacklists for what should be seen by the system.  You  can also zone 
> the BCVs or SI copies on the controller port to prevent  name 
> collisions, but if you can''t modify the portlist (eg: EMC bin 
file
> changes) then the host based blacklist is going to be the way to  go.
> 
> Jonathan
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Eric Schrock

2006-Sep-18 20:33 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Mon, Sep 18, 2006 at 03:29:49PM -0400, Jonathan Edwards
wrote:> 
> err .. i believe the point is that you will have multiple disks  
> claiming to be the same disk which can wreak havoc on a system (eg:  
> I''ve got a 4 disk pool with a unique GUID and 8 disks claiming to
be
> part of that same pool) - it''s the same problem on VxVM with
storing
> the identifier in the private region on the disks - when you do bit  
> level replication it''s always blind to the upper-level,
host-based,
> logical volume groupings .. if this is the case - you''re probably
> best using the latest leadville patch (119130 or 119131) and  
> maintaining blacklists for what should be seen by the system.  You  
> can also zone the BCVs or SI copies on the controller port to prevent  
> name collisions, but if you can''t modify the portlist (eg: EMC bin
> file changes) then the host based blacklist is going to be the way to  
> go.
I don''t understand how this changes my explanation at all.  If you have
multiple disks ''claiming to be the same disk'', does this mean
that they
actually show up as the same /dev/dsk/* path depening on blind luck?  If
so, that''s well below the level of ZFS.  If they show up as different
paths and/or devids, then ZFS will behave exactly as I described and you
will be perfectly safe - you just won''t be able to import the pool from
the mirrored devices.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-Sep-18 20:39 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Mon, Sep 18, 2006 at 10:06:21PM +0200, Joerg Haederli
wrote:> I''m really not an expert on ZFS, but at least from my point to
> handle such cases ZFS has to handle at least the following points
>
>   - GUID   a new/different GUID has to be assigned
As I mentioned previously, ZFS handles this gracefully in the sense that
it doesn''t allow two pools with the same GUID to exist on the system.
>   - LUNs   ZFS has to be aware that device trees are different, if
>     these are part of some kind of metadata stored on the pools/fs
As long as they appear as separate devices under Solaris, ZFS will
handle this today.
>   - FS     Have to be mounted somewhere else
I don''t understand what you''re suggesting here.
> It looks as this has not been implemented yet nor even tested.
What hasn''t been implemented?  As far as I can tell, this is a request
for the previously mentioned RFE (ability to change GUIDs on import).
I''m not sure what you mean by an unimplemented RFE being "nor even
tested".

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Torrey McMahon

2006-Sep-18 22:03 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Eric Schrock wrote:> On Mon, Sep 18, 2006 at 10:06:21PM +0200, Joerg Haederli wrote:
>   
>> It looks as this has not been implemented yet nor even tested.
>>     
>
> What hasn''t been implemented?  As far as I can tell, this is a
request
> for the previously mentioned RFE (ability to change GUIDs on import).
> I''m not sure what you mean by an unimplemented RFE being "nor
even
> tested".
>   
 From a previous email to the list ...

    ShadowImage takes a snapshot of the LUN, and copies all the blocks to a
    new LUN (physical copy). In our case the new LUN is then made available
    on the same host as the original LUN.

    After the ShadowImage is taken, we can see the snapshop using the
    format(1M) command as an additional disk. But when running "zpool
    import" , it only says: "no pools available to import".

    I think this is a bug. At least it should say something like "pool with
    the same name already imported". I have only tested this on 10 06/06,
    but I haven''t found anything similar in the bug database, so it has
to
    be in OpenSolaris as well.

Its not the transport layer. It works fine as the LUN IDs are different 
and the devices will come up with different /dev/dsk entries. (And if 
not then you can fix that on the array in most cases.) The problem is 
that devices are present with the same GUID and the behavior of ZFS is 
unknown.

Here''s an example: I''ve three LUNs in a ZFS pool offered from
my HW raid
array. I take a snapshot onto three other LUNs. A day later I turn the 
host off. I go to the array and offer all six LUNs, the pool that was in 
use as well as the snapshot that I took a day previously, and offer all 
three LUNs to the host. The host comes up and automagically adds all the 
LUNs to the host with correct /dev/dsk entries.

What happens?

Torrey McMahon

2006-Sep-18 22:07 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Torrey McMahon wrote:> A day later I turn the host off. I go to the array and offer all six 
> LUNs, the pool that was in use as well as the snapshot that I took a 
> day previously, and offer all three LUNs to the host. 
Errr....that should be....

    A day later I turn the host off. I go to the array and offer all six
    LUNs, the pool that was in use as well as the snapshot that I took a
    day previously, to the host.

I so need an editor.

Richard Elling - PAE

2006-Sep-18 22:32 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Joerg Haederli wrote:> I''m really not an expert on ZFS, but at least from my point to
> handle such cases ZFS has to handle at least the following points
>   - GUID   a new/different GUID has to be assigned
>   - LUNs   ZFS has to be aware that device trees are different, if
>     these are part of some kind of metadata stored on the pools/fs
>   - FS     Have to be mounted somewhere else
> It looks as this has not been implemented yet nor even tested.
> 
> For Desaster Recovery this looks as a usefull way if it would work ;-)
In my experience, we would not normally try to mount two different
copies of the same data at the same time on a single host.  To avoid
confusion, we would especially not want to do this if the data represents
two different points of time.  I would encourage you to stick with more
traditional, tried, and true disaster recovery methods.  Remember: disaster
recovery is almost entirely a process, not technology.
  -- richard

Darren Dunham

2006-Sep-18 23:04 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

> In my experience, we would not normally try to mount two different
> copies of the same data at the same time on a single host.  To avoid
> confusion, we would especially not want to do this if the data represents
> two different points of time.  I would encourage you to stick with more
> traditional, tried, and true disaster recovery methods.  Remember: disaster
> recovery is almost entirely a process, not technology.
Darn straight.

Of course those administrators keep asking for it anyway.  The VxVM list
gets a somewhat consistent stream of requests asking about issues
similar to this. 

Until very recently there was no general tool to help with this.  The
unsupported method of destroying volume information to create new unique
volumes wasn''t dangerous enough to keep people from using this
technique. :-)

ZFS is different enough that the techniques used on VxVM do not apply.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Eric Schrock

2006-Sep-19 03:16 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Mon, Sep 18, 2006 at 06:03:47PM -0400, Torrey McMahon
wrote:> 
> Its not the transport layer. It works fine as the LUN IDs are different 
> and the devices will come up with different /dev/dsk entries. (And if 
> not then you can fix that on the array in most cases.) The problem is 
> that devices are present with the same GUID and the behavior of ZFS is 
> unknown.
It''s not unknown, as I''ve been trying to explain.
> Here''s an example: I''ve three LUNs in a ZFS pool offered
from my HW raid
> array. I take a snapshot onto three other LUNs. A day later I turn the 
> host off. I go to the array and offer all six LUNs, the pool that was in 
> use as well as the snapshot that I took a day previously, and offer all 
> three LUNs to the host. The host comes up and automagically adds all the 
> LUNs to the host with correct /dev/dsk entries.
> 
> What happens?
ZFS will use the existing pool as defined in the cache file, which in
this case will still contain the correct devices.  The new mirrored LUNs
will not be used.  They will not show as available pools to import
because the pool GUID is in use.  A reasonable bug is to report this
inconsistency (ostensibly part of a pool but not present in the current
config), though there are some tricky edge conditions.  A more
complicated RFE would be to detect this as a self-consistent version of
the same pool, and have a way to change the GUID on import.

If you export the pool before you poweroff the host, and then want to
import one of the two pools, the version with the most recent uberblock
will "win".  If they both have the same uberblock (i.e. are really the
identical mirror), the results are non-deterministic.   Depending on the
order in which devices are discovered, you may end up with one pool or
the other, or some combination of both.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Jonathan Edwards

2006-Sep-19 03:55 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Sep 18, 2006, at 23:16, Eric Schrock wrote:
>
>> Here''s an example: I''ve three LUNs in a ZFS pool
offered from my
>> HW raid
>> array. I take a snapshot onto three other LUNs. A day later I turn  
>> the
>> host off. I go to the array and offer all six LUNs, the pool that  
>> was in
>> use as well as the snapshot that I took a day previously, and  
>> offer all
>> three LUNs to the host. The host comes up and automagically adds  
>> all the
>> LUNs to the host with correct /dev/dsk entries.
>>
>> What happens?
>
> ZFS will use the existing pool as defined in the cache file, which in
> this case will still contain the correct devices.  The new mirrored  
> LUNs
> will not be used.  They will not show as available pools to import
> because the pool GUID is in use.  A reasonable bug is to report this
> inconsistency (ostensibly part of a pool but not present in the  
> current
> config), though there are some tricky edge conditions.  A more
> complicated RFE would be to detect this as a self-consistent  
> version of
> the same pool, and have a way to change the GUID on import.
>
> If you export the pool before you poweroff the host, and then want to
> import one of the two pools, the version with the most recent  
> uberblock
> will "win".  If they both have the same uberblock (i.e. are
really the
> identical mirror), the results are non-deterministic.   Depending  
> on the
> order in which devices are discovered, you may end up with one pool or
> the other, or some combination of both.
ah .. there we go - so we have an interaction between an uberblock  
date and prioritization on the import .. very keen.  The non- 
deterministic case is well known in other self-describing pools or  
diskgroups (eg: vxdg) and where the 6385531 RFE/bug came from on  
Leadville to provide more options for sites that lack flexibility on  
the SAN and presentation ports to mask out replicated disks.

I guess there''s a couple of corner cases that you may have already  
considered that would be good to explain:

1) If the zpool was imported when the split was done, can the  
secondary pool be imported by another host if the /dev/dsk entries  
are different?  I''m assuming that you could simply use the -f  
option .. would the guid change?

2) If the guid does indeed change could this zpool then be imported  
back on the first host at the same time by specifying the secondary  
guid instead of the pool name?

3) Can the same zpool be mounted on two separate hosts at the same  
time .. in other words what happens when a second host tries to  
import -f a zpool that''s already mounted by the first host?

Jonathan

Eric Schrock

2006-Sep-19 05:01 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

On Mon, Sep 18, 2006 at 11:55:27PM -0400, Jonathan Edwards
wrote:> 
> 1) If the zpool was imported when the split was done, can the  
> secondary pool be imported by another host if the /dev/dsk entries  
> are different?  I''m assuming that you could simply use the -f  
> option .. would the guid change?
Yes, the pool can be imported on another host.  However, you cannot
change the GUID (short of writing a custom tool do rewrite the labels).
> 2) If the guid does indeed change could this zpool then be imported  
> back on the first host at the same time by specifying the secondary  
> guid instead of the pool name?
Yes, ''zpool import'' allows pools to be imported by GUID, and
the pool
name can be changed as part of the import to not conflict with the
existing name.
> 3) Can the same zpool be mounted on two separate hosts at the same  
> time .. in other words what happens when a second host tries to  
> import -f a zpool that''s already mounted by the first host?
No, ZFS does not support active clustering from multiple hosts.  You
will end up with corrupted data.  See recent discussions about
enhancements to prevent this from happening accidentally (preventing
auto-open on boot if it''s been written to from another host, providing
hostname and last write time for ''zpool import'', etc).

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Anton B. Rang

2006-Sep-19 17:52 UTC

head link

[zfs-discuss] Re: ZFS and HDS ShadowImage

>This isn''t true. The snapshot will be entirely consistent - you
will
>have just lost the last few seconds of non-synchronous writes.
Eric,

I don?t see how this can be the case for a pool backed by multiple LUNs.  Take
the simple striped case, with two LUNs, 0 and 1. If I take a snapshot of LUN 0
on Monday, and a snapshot of LUN 1 on Tuesday, those two snapshots will not form
a consistent ZFS pool.

For two snapshots taken only a second apart, there''s more chance that
they will be consistent, but it?s still not guaranteed.

I?m not sure whether HDS allows a snapshot of multiple LUNs to be taken
atomically, which is required to take a consistent snapshot of a multi-LUN file
system like ZFS or QFS (or, for that matter, UFS over SVM). For UFS, ?lockfs -w?
allows consistency. QFS is missing this. I don?t think it?s implemented for ZFS
yet either, though it seems it would be fairly simple to implement (simply pause
after the current transaction group and don?t start another; perhaps writes to
the intent log should be paused as well).

Anton
 
 
This message posted from opensolaris.org

Eric Schrock

2006-Sep-19 17:56 UTC

head link

[zfs-discuss] Re: ZFS and HDS ShadowImage

On Tue, Sep 19, 2006 at 10:52:52AM -0700, Anton B. Rang
wrote:> 
> I don?t see how this can be the case for a pool backed by multiple
> LUNs.  Take the simple striped case, with two LUNs, 0 and 1. If I take
> a snapshot of LUN 0 on Monday, and a snapshot of LUN 1 on Tuesday,
> those two snapshots will not form a consistent ZFS pool.
> 
> For two snapshots taken only a second apart, there''s more chance
that
> they will be consistent, but it?s still not guaranteed.
> 
> I?m not sure whether HDS allows a snapshot of multiple LUNs to be
> taken atomically, which is required to take a consistent snapshot of a
> multi-LUN file system like ZFS or QFS (or, for that matter, UFS over
> SVM). For UFS, ?lockfs -w? allows consistency. QFS is missing this. I
> don?t think it?s implemented for ZFS yet either, though it seems it
> would be fairly simple to implement (simply pause after the current
> transaction group and don?t start another; perhaps writes to the
> intent log should be paused as well).
Ah.   I has assumed that ''taking a LUN snapshot'' was an atomic
operation
across all LUNs in the pool.  If this isn''t possible, then you are
correct - you can easily end up with inconsistent and corrupted state.

Taking a snapshot of a single LUN pool will not lead to inconsistency.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Ellis, Mike

2006-Sep-19 18:12 UTC

head link

[zfs-discuss] Re: ZFS and HDS ShadowImage

Hey Tony...

When (properly) doing Array-based snapshots/BCVs with
EMC/Hitachi/what-have you arrays, you create "lun groups" out of the
luns you''re interested in snappin''. You then perform
snapshot/clone
operations on that "lun group" which will make it atomic across all
members of that group. 

Where things get a lot more interesting is with luns (belonging to the
same "snap/clone" group) that live on different arrays. I''m
not sure
where the vendors are with the concept of federated (still atomic)
snapshots, but I suggest avoiding such a configuration entirely thereby
side-stepping the issue.

My 2 cents,

 -- MikeE

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Anton B. Rang
Sent: Tuesday, September 19, 2006 1:53 PM
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] Re: ZFS and HDS ShadowImage

>This isn''t true. The snapshot will be entirely consistent - you
will
>have just lost the last few seconds of non-synchronous writes.
Eric,

I don''t see how this can be the case for a pool backed by multiple
LUNs.
Take the simple striped case, with two LUNs, 0 and 1. If I take a
snapshot of LUN 0 on Monday, and a snapshot of LUN 1 on Tuesday, those
two snapshots will not form a consistent ZFS pool.

For two snapshots taken only a second apart, there''s more chance that
they will be consistent, but it''s still not guaranteed.

I''m not sure whether HDS allows a snapshot of multiple LUNs to be taken
atomically, which is required to take a consistent snapshot of a
multi-LUN file system like ZFS or QFS (or, for that matter, UFS over
SVM). For UFS, ''lockfs -w'' allows consistency. QFS is missing
this. I
don''t think it''s implemented for ZFS yet either, though it
seems it
would be fairly simple to implement (simply pause after the current
transaction group and don''t start another; perhaps writes to the intent
log should be paused as well).

Anton
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Torrey McMahon

2006-Sep-19 22:29 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Eric Schrock wrote:> On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon wrote:
>   
>> 1 - ZFS is self consistent but if you take a LUN snapshot then any 
>> transactions in flight might not be completed and the pool - Which you 
>> need to snap in its entirety - might not be consistent. The more LUNs 
>> you have in the pool the more problematic this could get. Exporting the
>> pool first would probably get around this issue.
>>     
>
> This isn''t true.  The snapshot will be entirely consistent - you
will
> have just lost the last few seconds of non-synchronous writes.
>   
When a synchronous write comes in does it wait for other pending i/o to 
complete? Which writes are tagged as synchronous? (Checksums and 
uberblocks?)

If you take a snapshot of the different devices in a pool without using 
a transaction group of some kind you could still be out of whack but 
that would be a bad idea in the first place.



>   
>> 2 - If you import LUNs with the same label or ID as a currently mounted
>> pool then ZFS will .... no one seems to know. For example: I have a
pool
>> on two LUNS X and Y called mypool. I take a snapshot of LUN X & Y, 
>> ignoring issue #1 above for now, to LUN X'' and LUN Y''
and wait a few
>> days. I then present LUNs X'' and Y'' to the host. What
happens? Make it
>> even more complex and present all the LUNs to the host after a reboot. 
>> Do you get different parts of the pool from different LUNs? Does ZFS 
>> say, "What the hell?!??!"
>>     
>
> ZFS will not allow you to import the second pool (I believe it
won''t
> even present the pool as a valid option to import).  Each pool is
> identified by a unique GUID.  You cannot have two pools active on the
> system with the same GUID.  If this is really a valid use case, we could
> invent a way to assign a new GUID on import.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
>

Torrey McMahon

2006-Sep-19 23:01 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Darren Dunham wrote:>> In my experience, we would not normally try to mount two different
>> copies of the same data at the same time on a single host.  To avoid
>> confusion, we would especially not want to do this if the data
represents
>> two different points of time.  I would encourage you to stick with more
>> traditional, tried, and true disaster recovery methods.  Remember:
disaster
>> recovery is almost entirely a process, not technology.
>>     
>
> Darn straight.
>
> Of course those administrators keep asking for it anyway.  The VxVM list
> gets a somewhat consistent stream of requests asking about issues
> similar to this. 
>   

Think data mining or prod/dev/test environments, I might want to take a 
snapshot of my Data with a capital D and perform some set of operations 
on it. Of those you have two general use cases:

    * A copy of the data set on the same host but used by a different
      application. A ZFS snapshot might meet the requirements in some of
      those cases. However, in a lot of those cases you''re going to
want
      the copy of the data set on different physical media so as not to
      interfere with the performance of your currently in-use application.
    * A copy of the dataset on a different host. zfs send/recv might
      meet the requirements in some of these cases. However, you may run
      into time issues where customers want the snapshot *now* and
don''t
      want to wait for what could be a lengthy send/recv operation.


Since a ZFS pool is, for lack of better terms, the current least common 
denominator when it comes to snapshots, host connectivity, and 
performance people are going to want to use HW raid arrays and the 
snapshot mechanisms.

Oh...and for DR too. One point to choke in a data center so we better 
figure that one out too. ;)

Richard Elling - PAE

2006-Sep-20 00:10 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

still more below...

Torrey McMahon wrote:> Darren Dunham wrote:
>>> In my experience, we would not normally try to mount two different
>>> copies of the same data at the same time on a single host.  To
avoid
>>> confusion, we would especially not want to do this if the data 
>>> represents
>>> two different points of time.  I would encourage you to stick with
more
>>> traditional, tried, and true disaster recovery methods.  Remember: 
>>> disaster
>>> recovery is almost entirely a process, not technology.
>>>     
>>
>> Darn straight.
>>
>> Of course those administrators keep asking for it anyway.  The VxVM
list
>> gets a somewhat consistent stream of requests asking about issues
>> similar to this.   
> 
> 
> Think data mining or prod/dev/test environments, I might want to take a 
> snapshot of my Data with a capital D and perform some set of operations 
> on it. Of those you have two general use cases:
> 
>    * A copy of the data set on the same host but used by a different
>      application. A ZFS snapshot might meet the requirements in some of
>      those cases. However, in a lot of those cases you''re going to
want
>      the copy of the data set on different physical media so as not to
>      interfere with the performance of your currently in-use application.
>    * A copy of the dataset on a different host. zfs send/recv might
>      meet the requirements in some of these cases. However, you may run
>      into time issues where customers want the snapshot *now* and
don''t
>      want to wait for what could be a lengthy send/recv operation.
> 
> 
> Since a ZFS pool is, for lack of better terms, the current least common 
> denominator when it comes to snapshots, host connectivity, and 
> performance people are going to want to use HW raid arrays and the 
> snapshot mechanisms.
> 
> Oh...and for DR too. One point to choke in a data center so we better 
> figure that one out too. ;)
[caveat: I haven''t tried this]
My thought is that once you make a ZFS snapshot, you''re golden.  The
snapshot is read-only and the later changes to the pool won''t affect
it.  Once you import the ShadowImage onto the other (dev/test) machine,
you can clone the snapshot and be off to the races.
  -- richard

Anton B. Rang

2006-Sep-20 02:29 UTC

head link

[zfs-discuss] Re: ZFS and HDS ShadowImage

> My thought is that once you make a ZFS snapshot, you''re golden.
The
> snapshot is read-only and the later changes to the pool won''t
affect
> it.
Close.  The snapshot is read-only but the pointers to it are read-write (since
they are all descendants of the ?berblock).  If you do get a clean copy of the
snapshot, you should be fine; but there''s a tiny chance that you
won''t see the snapshot at all, or it will turn out damaged, if you are
snapshotting LUNs non-atomically.  Easy to recover from but probably a manual
process.  :-(

Anton
 
 
This message posted from opensolaris.org

Neil Perrin

2006-Sep-20 03:52 UTC

head link

[zfs-discuss] ZFS and HDS ShadowImage

Torrey McMahon wrote On 09/19/06 16:29,:> Eric Schrock wrote:
> 
>> On Mon, Sep 18, 2006 at 02:20:24PM -0400, Torrey McMahon wrote:
>>  
>>
>>> 1 - ZFS is self consistent but if you take a LUN snapshot then any 
>>> transactions in flight might not be completed and the pool - Which 
>>> you need to snap in its entirety - might not be consistent. The
more
>>> LUNs you have in the pool the more problematic this could get. 
>>> Exporting the pool first would probably get around this issue.
>>>     
>>
>>
>> This isn''t true.  The snapshot will be entirely consistent -
you will
>> have just lost the last few seconds of non-synchronous writes.
>> > When a synchronous write comes in does it wait for other pending i/o to
 > complete?

No. The ZFS Intent Log (ZIL) writes out a record for the transacation
(TX_WRITE, TX_ACL, TX_CREATE, TX_TRUNCATE, etc) and any other transactions
that may be dependents.

 > Which writes are tagged as synchronous? (Checksums and uberblocks?)

Transactions arrive marked as synchronous with O_DSYNC/O_SYNC/O_RSYNC
or are flushed synchronously as a result of VOP_FSYNC from nfs or fsync.

Neil.

zfs discuss - Sep 2006 - ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] Re: ZFS and HDS ShadowImage

[zfs-discuss] Re: ZFS and HDS ShadowImage

[zfs-discuss] Re: ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage

[zfs-discuss] Re: ZFS and HDS ShadowImage

[zfs-discuss] ZFS and HDS ShadowImage