thr3ads.net - zfs discuss - [zfs-discuss] zfs lock [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2006-Apr-04 08:37 UTC

[zfs-discuss] zfs lock

Hello zfs-discuss,

  Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
  be forced to be locked (and then unlocked) in a way that image on
  disk is consistent while fs is locked. Of course you don''t have to
  umount filesystem. This is very useful on higher arrays (and now
  even on midrange) with functionality like BCV.

  Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you
  want to take adventage of BCVs just before you split them you want
  to be sure that filesystem laid on these luns is consistent - and as
  spliting many BCVs isn''t atomic it would probably harm even zfs.

  Something like
            zpool lock poolname
            zpool unlock poolname
            
-- 
Best regards,
 Robert                          mailto:rmilkowski at task.gda.pl
                                     http://milek.blogspot.com

Darren Reed

2006-Apr-04 08:51 UTC

head link

[zfs-discuss] zfs lock

Robert Milkowski wrote:
>Hello zfs-discuss,
>
>  Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
>  be forced to be locked (and then unlocked) in a way that image on
>  disk is consistent while fs is locked. Of course you don''t have
to
>  umount filesystem. This is very useful on higher arrays (and now
>  even on midrange) with functionality like BCV.
>
>  Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you
>  want to take adventage of BCVs just before you split them you want
>  to be sure that filesystem laid on these luns is consistent - and as
>  spliting many BCVs isn''t atomic it would probably harm even zfs.
>
>  Something like
>            zpool lock poolname
>            zpool unlock poolname
>  
>
This sounds like you want to do:

mount -o ro,remount /zfs/pool/filesystem

and go from having a read-write filesystem to read-only.
Is that what a "locked" filesystem is?

Or am I not quite understanding what you''re doing here?

Darren

Darren J Moffat

2006-Apr-04 09:30 UTC

head link

[zfs-discuss] zfs lock

Robert Milkowski wrote:> Hello zfs-discuss,
> 
>   Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
>   be forced to be locked (and then unlocked) in a way that image on
>   disk is consistent while fs is locked. 
But ZFS is always consistent on disk any way so what am I missing here ?

What do you mean by lock that is different to making all the
pools read only or exporting the pool and reimporting it ?

-- 
Darren J Moffat

Robert Milkowski

2006-Apr-04 10:21 UTC

head link

[zfs-discuss] zfs lock

Hello Darren,

Tuesday, April 4, 2006, 10:51:57 AM, you wrote:

DR> Robert Milkowski wrote:
>>Hello zfs-discuss,
>>
>>  Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
>>  be forced to be locked (and then unlocked) in a way that image on
>>  disk is consistent while fs is locked. Of course you don''t
have to
>>  umount filesystem. This is very useful on higher arrays (and now
>>  even on midrange) with functionality like BCV.
>>
>>  Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you
>>  want to take adventage of BCVs just before you split them you want
>>  to be sure that filesystem laid on these luns is consistent - and as
>>  spliting many BCVs isn''t atomic it would probably harm even
zfs.
>>
>>  Something like
>>            zpool lock poolname
>>            zpool unlock poolname
>>  
>>
DR> This sounds like you want to do:

DR> mount -o ro,remount /zfs/pool/filesystem

DR> and go from having a read-write filesystem to read-only.
DR> Is that what a "locked" filesystem is?

DR> Or am I not quite understanding what you''re doing here?

That won''t work while applications are working.
What I want to is that every applications ''freezes'' in a IOs
to that
pool and fs is guaranteed not to make any more changes on disks in a
pool so I can deteach all BCVs. Then you just ''unlock''
filesystem/pool. From the point of applications all IO''s were just
taking long time - that''s it.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Apr-04 10:38 UTC

head link

[zfs-discuss] zfs lock

Hello Darren,

Tuesday, April 4, 2006, 11:30:08 AM, you wrote:

DJM> Robert Milkowski wrote:>> Hello zfs-discuss,
>> 
>>   Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
>>   be forced to be locked (and then unlocked) in a way that image on
>>   disk is consistent while fs is locked. 
DJM> But ZFS is always consistent on disk any way so what am I missing here ?

Well, generally yes. In the scenario I''m describing I guess
it''s not.

It could take even 10s to split just one BCV and you''re doing
one-by-one. So if you have a pool build on top of 10 disks and let''s
assume that every BCV will split in 10s. That way 1/10th part of the
pool will be splitted first, then next part in next 10s, and so on.
You will end-up with a pool of disks were every disk is from different
time. I don''t think it will work even with ZFS without some kind of
freezing pool for the time of splitting.

However one could create snapshots and then split disk. Then at least
snapshots should be ok - however I''m not sure how zfs will coupe with
the rest of the pool (with a lot of changes).

DJM> What do you mean by lock that is different to making all the
DJM> pools read only or exporting the pool and reimporting it ?

That way I have to shut down all aplications also (sometimes it''s
necessary afterall, sometimes not).

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Joerg Schilling

2006-Apr-04 10:40 UTC

head link

[zfs-discuss] zfs lock

Robert Milkowski <rmilkowski at task.gda.pl> wrote:
> DR> This sounds like you want to do:
>
> DR> mount -o ro,remount /zfs/pool/filesystem
>
> DR> and go from having a read-write filesystem to read-only.
> DR> Is that what a "locked" filesystem is?
>
> DR> Or am I not quite understanding what you''re doing here?
>
> That won''t work while applications are working.
> What I want to is that every applications ''freezes'' in a
IOs to that
> pool and fs is guaranteed not to make any more changes on disks in a
> pool so I can deteach all BCVs. Then you just ''unlock''
Does lockfs(1m) work?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Darren J Moffat

2006-Apr-04 10:54 UTC

head link

[zfs-discuss] zfs lock

Joerg Schilling wrote:> Robert Milkowski <rmilkowski at task.gda.pl> wrote:
> 
>> DR> This sounds like you want to do:
>>
>> DR> mount -o ro,remount /zfs/pool/filesystem
>>
>> DR> and go from having a read-write filesystem to read-only.
>> DR> Is that what a "locked" filesystem is?
>>
>> DR> Or am I not quite understanding what you''re doing here?
>>
>> That won''t work while applications are working.
>> What I want to is that every applications ''freezes''
in a IOs to that
>> pool and fs is guaranteed not to make any more changes on disks in a
>> pool so I can deteach all BCVs. Then you just
''unlock''
> 
> Does lockfs(1m) work?
No lockfs despite the generic name is UFS only.  As per its man page.

I did try it before I read the man page though, invalid ioctl for device 
was the reply.

-- 
Darren J Moffat

Anton B. Rang

2006-Apr-04 16:24 UTC

head link

[zfs-discuss] Re: zfs lock

It''s more the "lockfs -w" functionality of UFS.  The goal is
to have a consistent image on disk while you start creating a backup/snapshot on
the hardware.  (Generally you only need this until you have all the snapshot
processes started, since the arrays usually can tolerate writes that come in
during the snapshot.)

For UFS, this blocks write operations, and prevents access time updates until
the unlock.

Anton
 
 
This message posted from opensolaris.org

Torrey McMahon

2006-Apr-04 21:18 UTC

head link

[zfs-discuss] zfs lock

Robert Milkowski wrote:>
> DJM> But ZFS is always consistent on disk any way so what am I missing
here ?
>
> Well, generally yes. In the scenario I''m describing I guess
it''s not.
>
> It could take even 10s to split just one BCV and you''re doing
> one-by-one. So if you have a pool build on top of 10 disks and
let''s
> assume that every BCV will split in 10s. That way 1/10th part of the
> pool will be splitted first, then next part in next 10s, and so on.
> You will end-up with a pool of disks were every disk is from different
> time. I don''t think it will work even with ZFS without some kind
of
> freezing pool for the time of splitting.
>
> However one could create snapshots and then split disk. Then at least
> snapshots should be ok - however I''m not sure how zfs will coupe
with
> the rest of the pool (with a lot of changes).
>
> DJM> What do you mean by lock that is different to making all the
> DJM> pools read only or exporting the pool and reimporting it ?
>
> That way I have to shut down all aplications also (sometimes it''s
> necessary afterall, sometimes not).
This question: How do I lock a ZFS fs to ensure consistency while I 
{take a BCV, set up remote replication, etc} has come up before. Given 
that ZFS is always consistent on disk I can''t see why you
can''t simply
take the snapshot without having to do anything beforehand. I guess the 
question could be rephrased as...

    If I have a ZFS pool on a hardware mirror and I split the mirror
    without issuing any ZFS commands can I be assured that the recently
    split off mirror can simply be mounted on an other system? Are there
    any consistency issues? Any labeling or unique ID''s that need to be
    changed before a second system sees the split off section?


You can replace mirror with BCV, SRDF volume, etc. etc.

Robert Milkowski

2006-Apr-04 21:33 UTC

head link

[zfs-discuss] zfs lock

Hello Torrey,

Tuesday, April 4, 2006, 11:18:26 PM, you wrote:

TM> Robert Milkowski wrote:>>
>> DJM> But ZFS is always consistent on disk any way so what am I
missing here ?
>>
>> Well, generally yes. In the scenario I''m describing I guess
it''s not.
>>
>> It could take even 10s to split just one BCV and you''re doing
>> one-by-one. So if you have a pool build on top of 10 disks and
let''s
>> assume that every BCV will split in 10s. That way 1/10th part of the
>> pool will be splitted first, then next part in next 10s, and so on.
>> You will end-up with a pool of disks were every disk is from different
>> time. I don''t think it will work even with ZFS without some
kind of
>> freezing pool for the time of splitting.
>>
>> However one could create snapshots and then split disk. Then at least
>> snapshots should be ok - however I''m not sure how zfs will
coupe with
>> the rest of the pool (with a lot of changes).
>>
>> DJM> What do you mean by lock that is different to making all the
>> DJM> pools read only or exporting the pool and reimporting it ?
>>
>> That way I have to shut down all aplications also (sometimes
it''s
>> necessary afterall, sometimes not).
TM> This question: How do I lock a ZFS fs to ensure consistency while I 
TM> {take a BCV, set up remote replication, etc} has come up before. Given
TM> that ZFS is always consistent on disk I can''t see why you
can''t simply
TM> take the snapshot without having to do anything beforehand. I guess the
TM> question could be rephrased as...

TM>     If I have a ZFS pool on a hardware mirror and I split the mirror
TM>     without issuing any ZFS commands can I be assured that the recently
TM>     split off mirror can simply be mounted on an other system? Are there
TM>     any consistency issues? Any labeling or unique ID''s that
need to be
TM>     changed before a second system sees the split off section?

TM> You can replace mirror with BCV, SRDF volume, etc. etc.

Actually it''s not the same - when you split mirror or any other 2-way
association then you''re probably right. But it can''t be true
that if I
split 10  BCV one by one in an 1hour delay between (and there''re ten
luns in raidz) that what I get on BCV has anything to do with
somtething even remotly consistent. However you are probably right
in case I would make a snapshot before I start split.

It would be interesting to make such a test - I''ll try to do it in a
free time.

But you mentioned interesteing question - if one split mirror (BCV,
etc.) wouldn''t zfs be confused which disk is which doring import, etc?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Nicolas Williams

2006-Apr-04 22:34 UTC

head link

[zfs-discuss] zfs lock

On Tue, Apr 04, 2006 at 10:37:06AM +0200, Robert Milkowski
wrote:> Hello zfs-discuss,
> 
>   Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
>   be forced to be locked (and then unlocked) in a way that image on
>   disk is consistent while fs is locked. Of course you don''t have
to
>   umount filesystem. This is very useful on higher arrays (and now
>   even on midrange) with functionality like BCV.
> 
>   Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you
>   want to take adventage of BCVs just before you split them you want
>   to be sure that filesystem laid on these luns is consistent - and as
>   spliting many BCVs isn''t atomic it would probably harm even zfs.
> 
>   Something like
>             zpool lock poolname
>             zpool unlock poolname
This is barely distinguishable from ZFS snapshots.  At most this means
that currently in-flight (between the syscall layer and ZFS) I/O
operations will be atomic with respect to taking a snapshot.  For all I
know this may already be the case at snapshot time (ZFS team?).

Given subsequent posts what you''re interested in is making the
*applications* pause and sync their files such that their on-filesystem
state is consistent, then you''d snapshot the filesystem, then
you''d
allow the applications to proceed and you''d start your backup.

The key thing here is that potentially complex applications may be
involved and nothing the filesystem can do can save the application
developer the work of providing a complex feature in their application.

ZFS is orthogonal to the underlying problem, though it helps that it has
constant-time snapshots, because it means that your applications need
not pause for long.

This is an application-specific problem, and each application has to
know what to do.

The best the OS can do is deal with applications that are always
consistent between write(2)/writev(2) system calls.

Other reliably detectable simple I/O patterns could be accomodated also,
at significant cost in OS complexity.

Applications know their internal state, while the kernel can barely
guess at the state of the simplest such applications.  What you want is
really a job for applications, not the OS, not ZFS.  Specifically
applications either need a synchronous quiesce/pause feature or need to
be able to recover from partially completed sets of I/O operations
(rollback).

Nico
--

Mike Gerdts

2006-Apr-05 01:06 UTC

head link

[zfs-discuss] zfs lock

On 4/4/06, Nicolas Williams <Nicolas.Williams at sun.com>
wrote:> Applications know their internal state, while the kernel can barely
> guess at the state of the simplest such applications.  What you want is
> really a job for applications, not the OS, not ZFS.  Specifically
> applications either need a synchronous quiesce/pause feature or need to
> be able to recover from partially completed sets of I/O operations
> (rollback).
It is reasonable for an application to keep track of its current
state.  However, it is unreasonable to expect an application to keep
track of all of its previous states.  Consider a database that uses
several terabytes of storage.  In the non-ZFS world, that storage is
typically divided into several file systems for a variety of reasons:

1) Backup/restore operation
2) Ability to control which spindles different classes of data are on
3) Because multi-terabyte file systems scare people.

Assuming that ZFS doesn''t provide me with a way of ensuring that data
files that tend to be hot are not on the same spindles, I don''t see
that most people are going to trust that ZFS or any other file system
will guess where the best place for data is.  Sure there is a chance
that the file system will do a good job, but what happens when IO
patterns change over time?  Does it automatically rebalance, even if
the hot spots are all reads?  Does it give the administrator the
chance to look at what is on a hot spindle and change it?

Assuming the answers to all of those questions are not favorable to
the storage guru or non-technical manager that is asking the
questions, multiple file systems will be used.  Now, you have this
multi-terabyte file system that is spread across somewhere between 2
and 200 file systems.  With ZFS, to achieve the desired administrative
control in this would mean somewhere between 2 and 200 storage pools
as well.

As thousands of transactions are in flight, there is no way that the
database can keep track of its state if each file system is
snapshotted independently.  If they all are snapshotted at *exactly*
the same time you would have a consistent image, just like you would
have a consistent image if the server crashed.  Oracle deals with this
scenario using "hot backup mode".  By placing the database in hot
backup mode, you could then do the snapshots, take it out of hot
backup mode, and split off your BCV. You could split it off before
taking it out of hot backup mode as well.

Assuming that you are dealing with multiple file systems that need to
be snapshotted to get a consistent image, you may also be able to kill
-STOP your application to pause its activity while you take the
snapshots.  Once the snapshots are complete, use kill -CONT to allow
it to continue.

If there are benchmarks or customer experience out there that
indicates that  a multiterabyte zfs pool with a big file system is the
right way to run Oracle, this would be good data to publicize. 
Including details such as how ZFS automatically eliminates hot spots,
boosts throughput, makes transactions faster, etc., would be
important.  One particular concern that I have is that if a data file
starts out as sequential but is updated regularly, subsequent
sequential reads (full table scans) of the file seem like they will
turn into random reads.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/

Robert Milkowski

2006-Apr-05 07:38 UTC

head link

[zfs-discuss] zfs lock

Hello Nicolas,

Wednesday, April 5, 2006, 12:34:11 AM, you wrote:

NW> On Tue, Apr 04, 2006 at 10:37:06AM +0200, Robert Milkowski
wrote:>> Hello zfs-discuss,
>> 
>>   Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can
>>   be forced to be locked (and then unlocked) in a way that image on
>>   disk is consistent while fs is locked. Of course you don''t
have to
>>   umount filesystem. This is very useful on higher arrays (and now
>>   even on midrange) with functionality like BCV.
>> 
>>   Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you
>>   want to take adventage of BCVs just before you split them you want
>>   to be sure that filesystem laid on these luns is consistent - and as
>>   spliting many BCVs isn''t atomic it would probably harm even
zfs.
>> 
>>   Something like
>>             zpool lock poolname
>>             zpool unlock poolname
NW> This is barely distinguishable from ZFS snapshots.  At most this means
NW> that currently in-flight (between the syscall layer and ZFS) I/O
NW> operations will be atomic with respect to taking a snapshot.  For all I
NW> know this may already be the case at snapshot time (ZFS team?).

NW> Given subsequent posts what you''re interested in is making the
NW> *applications* pause and sync their files such that their on-filesystem
NW> state is consistent, then you''d snapshot the filesystem, then
you''d
NW> allow the applications to proceed and you''d start your backup.

NW> The key thing here is that potentially complex applications may be
NW> involved and nothing the filesystem can do can save the application
NW> developer the work of providing a complex feature in their application.

NW> ZFS is orthogonal to the underlying problem, though it helps that it has
NW> constant-time snapshots, because it means that your applications need
NW> not pause for long.

NW> This is an application-specific problem, and each application has to
NW> know what to do.

You misunderstand me - it''s my fault I should have been more specific.

Anyway looks like snapshots would be probably a solution.
However if there are many filesystems in a pool then it''s
still doable but not that "user-friendly".

And if main filesystem is in a "strange" state I wonder if rollback
would in principle always work (assuming snapshot is ok).

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Torrey McMahon

2006-Apr-05 21:59 UTC

head link

[zfs-discuss] zfs lock

Robert Milkowski wrote:> Hello Torrey,
>
> Tuesday, April 4, 2006, 11:18:26 PM, you wrote:
>   
>
> TM>     If I have a ZFS pool on a hardware mirror and I split the mirror
> TM>     without issuing any ZFS commands can I be assured that the
recently
> TM>     split off mirror can simply be mounted on an other system? Are
there
> TM>     any consistency issues? Any labeling or unique ID''s
that need to be
> TM>     changed before a second system sees the split off section?
>
>
> TM> You can replace mirror with BCV, SRDF volume, etc. etc.
>
> Actually it''s not the same - when you split mirror or any other
2-way
> association then you''re probably right. But it can''t be
true that if I
> split 10  BCV one by one in an 1hour delay between (and there''re
ten
> luns in raidz) that what I get on BCV has anything to do with
> somtething even remotly consistent. However you are probably right
> in case I would make a snapshot before I start split.

Are you saying you want to take a BCV of the entire pool every hour for 
ten hours, rinse, wash, and repeat? I''m not too familar with BCVs so 
please refresh my memory. (I thought they were configurable independent 
or dependent snapshots.)
>
> But you mentioned interesteing question - if one split mirror (BCV,
> etc.) wouldn''t zfs be confused which disk is which doring import,
etc?

This is where someone from the ZFS dev team needs to jump in. I can 
imagine issues around importing volumes with ZFS on them and unique IDs 
but I don''t know if they''re fact or fantasy.

Nicolas Williams

2006-Apr-05 22:20 UTC

head link

[zfs-discuss] zfs lock

On Wed, Apr 05, 2006 at 09:38:24AM +0200, Robert Milkowski
wrote:> You misunderstand me - it''s my fault I should have been more
specific.
> 
> Anyway looks like snapshots would be probably a solution.
> However if there are many filesystems in a pool then it''s
> still doable but not that "user-friendly".
Ah, I think I see, you want a zpool-level way of taking all the
necessary snapshots [of all the filesystems in that pool].

And presumably you want some sort of zpool-wide atomicity guarantee(?),
or did I misunderstand that too?

But I don''t think any such atomicity guarantee could be made that is
meaningful vis-a-vis any but the very simplest applications.

BTW, you could script a zpool snapshot like so:

function zpool_snapshot {
	zfs list -o name -t filesystem | \
		grep "^${1:-pool}" | \
		sed "s/\$/\@${2:-$(date +%Y%m%m)}/" | \
		xargs -I{} zfs snapshot {}
}

Nico
--

Matthew Ahrens

2006-Apr-05 23:20 UTC

head link

[zfs-discuss] zfs lock

On Wed, Apr 05, 2006 at 05:20:12PM -0500, Nicolas Williams
wrote:> On Wed, Apr 05, 2006 at 09:38:24AM +0200, Robert Milkowski wrote:
> > You misunderstand me - it''s my fault I should have been more
specific.
> > 
> > Anyway looks like snapshots would be probably a solution.
> > However if there are many filesystems in a pool then it''s
> > still doable but not that "user-friendly".
> 
> Ah, I think I see, you want a zpool-level way of taking all the
> necessary snapshots [of all the filesystems in that pool].
> 
> And presumably you want some sort of zpool-wide atomicity guarantee(?),
> or did I misunderstand that too?
Taking lots of snapshots quickly (and perhaps atomically) is definately
on my radar.  Hopefully I will get to it soon after s10u2.  See:

6373978 want to take lots of snapshots quickly (''zfs snapshot
-r'')

--matt

Robert Milkowski

2006-Apr-06 08:10 UTC

head link

[zfs-discuss] zfs lock

Hello Matthew,

Thursday, April 6, 2006, 1:20:07 AM, you wrote:

MA> On Wed, Apr 05, 2006 at 05:20:12PM -0500, Nicolas Williams
wrote:>> On Wed, Apr 05, 2006 at 09:38:24AM +0200, Robert Milkowski wrote:
>> > You misunderstand me - it''s my fault I should have been
more specific.
>> > 
>> > Anyway looks like snapshots would be probably a solution.
>> > However if there are many filesystems in a pool then it''s
>> > still doable but not that "user-friendly".
>> 
>> Ah, I think I see, you want a zpool-level way of taking all the
>> necessary snapshots [of all the filesystems in that pool].
>> 
>> And presumably you want some sort of zpool-wide atomicity guarantee(?),
>> or did I misunderstand that too?
MA> Taking lots of snapshots quickly (and perhaps atomically) is definately
MA> on my radar.  Hopefully I will get to it soon after s10u2.  See:

MA> 6373978 want to take lots of snapshots quickly (''zfs snapshot
-r'')

ok, that''s it.

And I don''t think that zpool-wide atomicity is needed.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Apr-06 08:25 UTC

head link

[zfs-discuss] zfs lock

Hello Torrey,

Wednesday, April 5, 2006, 11:59:59 PM, you wrote:

TM> Robert Milkowski wrote:>> Hello Torrey,
>>
>> Tuesday, April 4, 2006, 11:18:26 PM, you wrote:
>>   
>>
>> TM>     If I have a ZFS pool on a hardware mirror and I split the
mirror
>> TM>     without issuing any ZFS commands can I be assured that the
recently
>> TM>     split off mirror can simply be mounted on an other system?
Are there
>> TM>     any consistency issues? Any labeling or unique ID''s
that need to be
>> TM>     changed before a second system sees the split off section?
>>
>>
>> TM> You can replace mirror with BCV, SRDF volume, etc. etc.
>>
>> Actually it''s not the same - when you split mirror or any
other 2-way
>> association then you''re probably right. But it can''t
be true that if I
>> split 10  BCV one by one in an 1hour delay between (and
there''re ten
>> luns in raidz) that what I get on BCV has anything to do with
>> somtething even remotly consistent. However you are probably right
>> in case I would make a snapshot before I start split.

TM> Are you saying you want to take a BCV of the entire pool every hour for
TM> ten hours, rinse, wash, and repeat? I''m not too familar with
BCVs so
TM> please refresh my memory. (I thought they were configurable independent
TM> or dependent snapshots.)

            LUN1 <-> BCV1
            LUN2 <-> BCV2
            LUN3 <-> BCV3

Now, lets assume that all three BCVs are synchronized (so you can
think of them as there are actually three mirrors).

Now lets do:
            zpool create test raidz lun1 lun2 lun3
            zfs create test/d100

Now run an application which is reading/writing to test/d100.

Now lets do:

            zfs snapshot test/d100 at monday

[application is still runing - data consistency from the point of
application - it''s not zfs related so I don''t care -
let''s say it''s an
ftp server and it really doesn''t matter]

Now let''s split first BCV1 so we have:

          LUN1 < SPLITTED > BCV1
          LUN2 <-> BCV2
          LUN3 <-> BCV3

Now wait 5 minutes and then split BCV2, then wait another 5 minutes
and split BCV3. At the end we''ve got all three BCVs splitted.

Now I understand I can import zpool test using only BCVs, right?
If the same host which has access to LUN1-3 and is actually actively
using zpool test has also access to these three BCVs then is it
possible to import at the same time these BCVs on the same host (with
different pool name of course)?
What if I didn''t imported zpool on BCVs but exported zpool test on
normal pools. Now If I will do ''zpool import test'' the pool
test will
be based only on LUN1-3 or only on BCV1-3 or some mixed configuration?
Or I will get an info that there are two different pools with the same
name so I have to give pool_ID not it''s name - ?

Lets assume I can import pool test on these three BCVs on different
host. Snapshot test/d100 at monday is expected to be consistent but test
and test/d100 filesystems are in a ''strange'' state - and I
wonder what
are possible corner cases here - I hope that not system panic.

ps. I know that in real life there won''t be 5 minutes intervals in
such a case, but intervals like 1-10s will be common.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Darren Dunham

2006-Apr-06 19:00 UTC

head link

[zfs-discuss] Re: Re[2]: zfs lock

> ps. I know that in real life there won''t be 5 minutes
> intervals in
> such a case, but intervals like 1-10s will be common.
Maybe, maybe not.  It seems to me the process of creation of multi-lun BCVs is
almost identical to shoving a copy of a ZFS pool over to another machine via
''dd'' from the live disks, one at a time.

If the BCV split works without issue, then I would expect a multi-volume
''dd'' to work  just as well.  Even with the ability to
"correctly" copy a ZFS system, I expect someone to try to do it at
this layer.  That could take significantly longer than 5 minutes.

Unless there''s data that is guaranteed synchronized on a disk but not
across the pool as a whole, I think these procedures would have the same effect,
no?

-- 
Darren
 
 
This message posted from opensolaris.org

Torrey McMahon

2006-Apr-07 21:48 UTC

head link

[zfs-discuss] zfs lock

Hi Robert.

Robert Milkowski wrote:> Hello Torrey,
>
> TM> Are you saying you want to take a BCV of the entire pool every hour
for
> TM> ten hours, rinse, wash, and repeat? I''m not too familar
with BCVs so
> TM> please refresh my memory. (I thought they were configurable
independent
> TM> or dependent snapshots.)
>
>
>             LUN1 <-> BCV1
>             LUN2 <-> BCV2
>             LUN3 <-> BCV3
>
> Now, lets assume that all three BCVs are synchronized (so you can
> think of them as there are actually three mirrors).
>
> Now lets do:
>             zpool create test raidz lun1 lun2 lun3
>             zfs create test/d100
>
> Now run an application which is reading/writing to test/d100.
>
> Now lets do:
>
>             zfs snapshot test/d100 at monday
>
> [application is still runing - data consistency from the point of
> application - it''s not zfs related so I don''t care -
let''s say it''s an
> ftp server and it really doesn''t matter]
>
> Now let''s split first BCV1 so we have:
>
>           LUN1 < SPLITTED > BCV1
>           LUN2 <-> BCV2
>           LUN3 <-> BCV3
>
> Now wait 5 minutes and then split BCV2, then wait another 5 minutes
> and split BCV3. At the end we''ve got all three BCVs splitted.
>   

...and thats where things break down. ZFS is self consistent but only 
when you have all the LUNs together. You can''t snapshot the LUNs are 
different times or you''ll have very weird results. You''d see
the same
thing with UFS or VxFS as well.
> Now I understand I can import zpool test using only BCVs, right?
>   

That has yet to be answered by the ZFS team. They haven''t taken the
bait
on my previous emails but maybe I need to start a new thread. ;)
> If the same host which has access to LUN1-3 and is actually actively
> using zpool test has also access to these three BCVs then is it
> possible to import at the same time these BCVs on the same host (with
> different pool name of course)?
>   

I don''t think so. Primarily due to the deltas between the LUN snapshots
causing data weirdness. Also, I think ZFS might get confused if you 
don''t change some ID information in the BCVs to have a different pool 
name or something as mentioned above.
> What if I didn''t imported zpool on BCVs but exported zpool test on
> normal pools. Now If I will do ''zpool import test'' the
pool test will
> be based only on LUN1-3 or only on BCV1-3 or some mixed configuration?
> Or I will get an info that there are two different pools with the same
> name so I have to give pool_ID not it''s name - ?
>
> Lets assume I can import pool test on these three BCVs on different
> host. Snapshot test/d100 at monday is expected to be consistent but test
> and test/d100 filesystems are in a ''strange'' state - and
I wonder what
> are possible corner cases here - I hope that not system panic.
>   

I don''t think your snapshot would be consistent as well unless you
could
guarantee that ZFS kept the snapshot on one of the three LUNs.

I think in this case you would need to stop i/o to all the LUNs in the 
zpool - Pick your favorite method - then make the BCV operations, and 
then if we ever find out how to remount the BCVs without confusing ZFS 
mount them someplace else. Or you could limit yourself to one pool per 
LUN instead of striping/mirroring between them. If you had one pool per 
LUN and split a BCV volume up it would be consistent, in theory. (Then 
we''d run into the "how do you mount it" issue....)

zfs discuss - Apr 2006 - zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] Re: zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] zfs lock

[zfs-discuss] Re: Re[2]: zfs lock

[zfs-discuss] zfs lock