Hello zfs-discuss, Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can be forced to be locked (and then unlocked) in a way that image on disk is consistent while fs is locked. Of course you don''t have to umount filesystem. This is very useful on higher arrays (and now even on midrange) with functionality like BCV. Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you want to take adventage of BCVs just before you split them you want to be sure that filesystem laid on these luns is consistent - and as spliting many BCVs isn''t atomic it would probably harm even zfs. Something like zpool lock poolname zpool unlock poolname -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:>Hello zfs-discuss, > > Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can > be forced to be locked (and then unlocked) in a way that image on > disk is consistent while fs is locked. Of course you don''t have to > umount filesystem. This is very useful on higher arrays (and now > even on midrange) with functionality like BCV. > > Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you > want to take adventage of BCVs just before you split them you want > to be sure that filesystem laid on these luns is consistent - and as > spliting many BCVs isn''t atomic it would probably harm even zfs. > > Something like > zpool lock poolname > zpool unlock poolname > >This sounds like you want to do: mount -o ro,remount /zfs/pool/filesystem and go from having a read-write filesystem to read-only. Is that what a "locked" filesystem is? Or am I not quite understanding what you''re doing here? Darren
Robert Milkowski wrote:> Hello zfs-discuss, > > Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can > be forced to be locked (and then unlocked) in a way that image on > disk is consistent while fs is locked.But ZFS is always consistent on disk any way so what am I missing here ? What do you mean by lock that is different to making all the pools read only or exporting the pool and reimporting it ? -- Darren J Moffat
Hello Darren, Tuesday, April 4, 2006, 10:51:57 AM, you wrote: DR> Robert Milkowski wrote:>>Hello zfs-discuss, >> >> Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can >> be forced to be locked (and then unlocked) in a way that image on >> disk is consistent while fs is locked. Of course you don''t have to >> umount filesystem. This is very useful on higher arrays (and now >> even on midrange) with functionality like BCV. >> >> Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you >> want to take adventage of BCVs just before you split them you want >> to be sure that filesystem laid on these luns is consistent - and as >> spliting many BCVs isn''t atomic it would probably harm even zfs. >> >> Something like >> zpool lock poolname >> zpool unlock poolname >> >>DR> This sounds like you want to do: DR> mount -o ro,remount /zfs/pool/filesystem DR> and go from having a read-write filesystem to read-only. DR> Is that what a "locked" filesystem is? DR> Or am I not quite understanding what you''re doing here? That won''t work while applications are working. What I want to is that every applications ''freezes'' in a IOs to that pool and fs is guaranteed not to make any more changes on disks in a pool so I can deteach all BCVs. Then you just ''unlock'' filesystem/pool. From the point of applications all IO''s were just taking long time - that''s it. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Darren, Tuesday, April 4, 2006, 11:30:08 AM, you wrote: DJM> Robert Milkowski wrote:>> Hello zfs-discuss, >> >> Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can >> be forced to be locked (and then unlocked) in a way that image on >> disk is consistent while fs is locked.DJM> But ZFS is always consistent on disk any way so what am I missing here ? Well, generally yes. In the scenario I''m describing I guess it''s not. It could take even 10s to split just one BCV and you''re doing one-by-one. So if you have a pool build on top of 10 disks and let''s assume that every BCV will split in 10s. That way 1/10th part of the pool will be splitted first, then next part in next 10s, and so on. You will end-up with a pool of disks were every disk is from different time. I don''t think it will work even with ZFS without some kind of freezing pool for the time of splitting. However one could create snapshots and then split disk. Then at least snapshots should be ok - however I''m not sure how zfs will coupe with the rest of the pool (with a lot of changes). DJM> What do you mean by lock that is different to making all the DJM> pools read only or exporting the pool and reimporting it ? That way I have to shut down all aplications also (sometimes it''s necessary afterall, sometimes not). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski <rmilkowski at task.gda.pl> wrote:> DR> This sounds like you want to do: > > DR> mount -o ro,remount /zfs/pool/filesystem > > DR> and go from having a read-write filesystem to read-only. > DR> Is that what a "locked" filesystem is? > > DR> Or am I not quite understanding what you''re doing here? > > That won''t work while applications are working. > What I want to is that every applications ''freezes'' in a IOs to that > pool and fs is guaranteed not to make any more changes on disks in a > pool so I can deteach all BCVs. Then you just ''unlock''Does lockfs(1m) work? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Joerg Schilling wrote:> Robert Milkowski <rmilkowski at task.gda.pl> wrote: > >> DR> This sounds like you want to do: >> >> DR> mount -o ro,remount /zfs/pool/filesystem >> >> DR> and go from having a read-write filesystem to read-only. >> DR> Is that what a "locked" filesystem is? >> >> DR> Or am I not quite understanding what you''re doing here? >> >> That won''t work while applications are working. >> What I want to is that every applications ''freezes'' in a IOs to that >> pool and fs is guaranteed not to make any more changes on disks in a >> pool so I can deteach all BCVs. Then you just ''unlock'' > > Does lockfs(1m) work?No lockfs despite the generic name is UFS only. As per its man page. I did try it before I read the man page though, invalid ioctl for device was the reply. -- Darren J Moffat
It''s more the "lockfs -w" functionality of UFS. The goal is to have a consistent image on disk while you start creating a backup/snapshot on the hardware. (Generally you only need this until you have all the snapshot processes started, since the arrays usually can tolerate writes that come in during the snapshot.) For UFS, this blocks write operations, and prevents access time updates until the unlock. Anton This message posted from opensolaris.org
Robert Milkowski wrote:> > DJM> But ZFS is always consistent on disk any way so what am I missing here ? > > Well, generally yes. In the scenario I''m describing I guess it''s not. > > It could take even 10s to split just one BCV and you''re doing > one-by-one. So if you have a pool build on top of 10 disks and let''s > assume that every BCV will split in 10s. That way 1/10th part of the > pool will be splitted first, then next part in next 10s, and so on. > You will end-up with a pool of disks were every disk is from different > time. I don''t think it will work even with ZFS without some kind of > freezing pool for the time of splitting. > > However one could create snapshots and then split disk. Then at least > snapshots should be ok - however I''m not sure how zfs will coupe with > the rest of the pool (with a lot of changes). > > DJM> What do you mean by lock that is different to making all the > DJM> pools read only or exporting the pool and reimporting it ? > > That way I have to shut down all aplications also (sometimes it''s > necessary afterall, sometimes not).This question: How do I lock a ZFS fs to ensure consistency while I {take a BCV, set up remote replication, etc} has come up before. Given that ZFS is always consistent on disk I can''t see why you can''t simply take the snapshot without having to do anything beforehand. I guess the question could be rephrased as... If I have a ZFS pool on a hardware mirror and I split the mirror without issuing any ZFS commands can I be assured that the recently split off mirror can simply be mounted on an other system? Are there any consistency issues? Any labeling or unique ID''s that need to be changed before a second system sees the split off section? You can replace mirror with BCV, SRDF volume, etc. etc.
Hello Torrey, Tuesday, April 4, 2006, 11:18:26 PM, you wrote: TM> Robert Milkowski wrote:>> >> DJM> But ZFS is always consistent on disk any way so what am I missing here ? >> >> Well, generally yes. In the scenario I''m describing I guess it''s not. >> >> It could take even 10s to split just one BCV and you''re doing >> one-by-one. So if you have a pool build on top of 10 disks and let''s >> assume that every BCV will split in 10s. That way 1/10th part of the >> pool will be splitted first, then next part in next 10s, and so on. >> You will end-up with a pool of disks were every disk is from different >> time. I don''t think it will work even with ZFS without some kind of >> freezing pool for the time of splitting. >> >> However one could create snapshots and then split disk. Then at least >> snapshots should be ok - however I''m not sure how zfs will coupe with >> the rest of the pool (with a lot of changes). >> >> DJM> What do you mean by lock that is different to making all the >> DJM> pools read only or exporting the pool and reimporting it ? >> >> That way I have to shut down all aplications also (sometimes it''s >> necessary afterall, sometimes not).TM> This question: How do I lock a ZFS fs to ensure consistency while I TM> {take a BCV, set up remote replication, etc} has come up before. Given TM> that ZFS is always consistent on disk I can''t see why you can''t simply TM> take the snapshot without having to do anything beforehand. I guess the TM> question could be rephrased as... TM> If I have a ZFS pool on a hardware mirror and I split the mirror TM> without issuing any ZFS commands can I be assured that the recently TM> split off mirror can simply be mounted on an other system? Are there TM> any consistency issues? Any labeling or unique ID''s that need to be TM> changed before a second system sees the split off section? TM> You can replace mirror with BCV, SRDF volume, etc. etc. Actually it''s not the same - when you split mirror or any other 2-way association then you''re probably right. But it can''t be true that if I split 10 BCV one by one in an 1hour delay between (and there''re ten luns in raidz) that what I get on BCV has anything to do with somtething even remotly consistent. However you are probably right in case I would make a snapshot before I start split. It would be interesting to make such a test - I''ll try to do it in a free time. But you mentioned interesteing question - if one split mirror (BCV, etc.) wouldn''t zfs be confused which disk is which doring import, etc? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On Tue, Apr 04, 2006 at 10:37:06AM +0200, Robert Milkowski wrote:> Hello zfs-discuss, > > Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can > be forced to be locked (and then unlocked) in a way that image on > disk is consistent while fs is locked. Of course you don''t have to > umount filesystem. This is very useful on higher arrays (and now > even on midrange) with functionality like BCV. > > Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you > want to take adventage of BCVs just before you split them you want > to be sure that filesystem laid on these luns is consistent - and as > spliting many BCVs isn''t atomic it would probably harm even zfs. > > Something like > zpool lock poolname > zpool unlock poolnameThis is barely distinguishable from ZFS snapshots. At most this means that currently in-flight (between the syscall layer and ZFS) I/O operations will be atomic with respect to taking a snapshot. For all I know this may already be the case at snapshot time (ZFS team?). Given subsequent posts what you''re interested in is making the *applications* pause and sync their files such that their on-filesystem state is consistent, then you''d snapshot the filesystem, then you''d allow the applications to proceed and you''d start your backup. The key thing here is that potentially complex applications may be involved and nothing the filesystem can do can save the application developer the work of providing a complex feature in their application. ZFS is orthogonal to the underlying problem, though it helps that it has constant-time snapshots, because it means that your applications need not pause for long. This is an application-specific problem, and each application has to know what to do. The best the OS can do is deal with applications that are always consistent between write(2)/writev(2) system calls. Other reliably detectable simple I/O patterns could be accomodated also, at significant cost in OS complexity. Applications know their internal state, while the kernel can barely guess at the state of the simplest such applications. What you want is really a job for applications, not the OS, not ZFS. Specifically applications either need a synchronous quiesce/pause feature or need to be able to recover from partially completed sets of I/O operations (rollback). Nico --
On 4/4/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> Applications know their internal state, while the kernel can barely > guess at the state of the simplest such applications. What you want is > really a job for applications, not the OS, not ZFS. Specifically > applications either need a synchronous quiesce/pause feature or need to > be able to recover from partially completed sets of I/O operations > (rollback).It is reasonable for an application to keep track of its current state. However, it is unreasonable to expect an application to keep track of all of its previous states. Consider a database that uses several terabytes of storage. In the non-ZFS world, that storage is typically divided into several file systems for a variety of reasons: 1) Backup/restore operation 2) Ability to control which spindles different classes of data are on 3) Because multi-terabyte file systems scare people. Assuming that ZFS doesn''t provide me with a way of ensuring that data files that tend to be hot are not on the same spindles, I don''t see that most people are going to trust that ZFS or any other file system will guess where the best place for data is. Sure there is a chance that the file system will do a good job, but what happens when IO patterns change over time? Does it automatically rebalance, even if the hot spots are all reads? Does it give the administrator the chance to look at what is on a hot spindle and change it? Assuming the answers to all of those questions are not favorable to the storage guru or non-technical manager that is asking the questions, multiple file systems will be used. Now, you have this multi-terabyte file system that is spread across somewhere between 2 and 200 file systems. With ZFS, to achieve the desired administrative control in this would mean somewhere between 2 and 200 storage pools as well. As thousands of transactions are in flight, there is no way that the database can keep track of its state if each file system is snapshotted independently. If they all are snapshotted at *exactly* the same time you would have a consistent image, just like you would have a consistent image if the server crashed. Oracle deals with this scenario using "hot backup mode". By placing the database in hot backup mode, you could then do the snapshots, take it out of hot backup mode, and split off your BCV. You could split it off before taking it out of hot backup mode as well. Assuming that you are dealing with multiple file systems that need to be snapshotted to get a consistent image, you may also be able to kill -STOP your application to pause its activity while you take the snapshots. Once the snapshots are complete, use kill -CONT to allow it to continue. If there are benchmarks or customer experience out there that indicates that a multiterabyte zfs pool with a big file system is the right way to run Oracle, this would be good data to publicize. Including details such as how ZFS automatically eliminates hot spots, boosts throughput, makes transactions faster, etc., would be important. One particular concern that I have is that if a data file starts out as sequential but is updated regularly, subsequent sequential reads (full table scans) of the file seem like they will turn into random reads. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Hello Nicolas, Wednesday, April 5, 2006, 12:34:11 AM, you wrote: NW> On Tue, Apr 04, 2006 at 10:37:06AM +0200, Robert Milkowski wrote:>> Hello zfs-discuss, >> >> Some filesystems (like fs on EMC Celerra and AFAIK XFS on Linux) can >> be forced to be locked (and then unlocked) in a way that image on >> disk is consistent while fs is locked. Of course you don''t have to >> umount filesystem. This is very useful on higher arrays (and now >> even on midrange) with functionality like BCV. >> >> Imagine zpool build on 10 LUNs provided by Symmetrix. Now if you >> want to take adventage of BCVs just before you split them you want >> to be sure that filesystem laid on these luns is consistent - and as >> spliting many BCVs isn''t atomic it would probably harm even zfs. >> >> Something like >> zpool lock poolname >> zpool unlock poolnameNW> This is barely distinguishable from ZFS snapshots. At most this means NW> that currently in-flight (between the syscall layer and ZFS) I/O NW> operations will be atomic with respect to taking a snapshot. For all I NW> know this may already be the case at snapshot time (ZFS team?). NW> Given subsequent posts what you''re interested in is making the NW> *applications* pause and sync their files such that their on-filesystem NW> state is consistent, then you''d snapshot the filesystem, then you''d NW> allow the applications to proceed and you''d start your backup. NW> The key thing here is that potentially complex applications may be NW> involved and nothing the filesystem can do can save the application NW> developer the work of providing a complex feature in their application. NW> ZFS is orthogonal to the underlying problem, though it helps that it has NW> constant-time snapshots, because it means that your applications need NW> not pause for long. NW> This is an application-specific problem, and each application has to NW> know what to do. You misunderstand me - it''s my fault I should have been more specific. Anyway looks like snapshots would be probably a solution. However if there are many filesystems in a pool then it''s still doable but not that "user-friendly". And if main filesystem is in a "strange" state I wonder if rollback would in principle always work (assuming snapshot is ok). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Torrey, > > Tuesday, April 4, 2006, 11:18:26 PM, you wrote: > > > TM> If I have a ZFS pool on a hardware mirror and I split the mirror > TM> without issuing any ZFS commands can I be assured that the recently > TM> split off mirror can simply be mounted on an other system? Are there > TM> any consistency issues? Any labeling or unique ID''s that need to be > TM> changed before a second system sees the split off section? > > > TM> You can replace mirror with BCV, SRDF volume, etc. etc. > > Actually it''s not the same - when you split mirror or any other 2-way > association then you''re probably right. But it can''t be true that if I > split 10 BCV one by one in an 1hour delay between (and there''re ten > luns in raidz) that what I get on BCV has anything to do with > somtething even remotly consistent. However you are probably right > in case I would make a snapshot before I start split.Are you saying you want to take a BCV of the entire pool every hour for ten hours, rinse, wash, and repeat? I''m not too familar with BCVs so please refresh my memory. (I thought they were configurable independent or dependent snapshots.)> > But you mentioned interesteing question - if one split mirror (BCV, > etc.) wouldn''t zfs be confused which disk is which doring import, etc?This is where someone from the ZFS dev team needs to jump in. I can imagine issues around importing volumes with ZFS on them and unique IDs but I don''t know if they''re fact or fantasy.
On Wed, Apr 05, 2006 at 09:38:24AM +0200, Robert Milkowski wrote:> You misunderstand me - it''s my fault I should have been more specific. > > Anyway looks like snapshots would be probably a solution. > However if there are many filesystems in a pool then it''s > still doable but not that "user-friendly".Ah, I think I see, you want a zpool-level way of taking all the necessary snapshots [of all the filesystems in that pool]. And presumably you want some sort of zpool-wide atomicity guarantee(?), or did I misunderstand that too? But I don''t think any such atomicity guarantee could be made that is meaningful vis-a-vis any but the very simplest applications. BTW, you could script a zpool snapshot like so: function zpool_snapshot { zfs list -o name -t filesystem | \ grep "^${1:-pool}" | \ sed "s/\$/\@${2:-$(date +%Y%m%m)}/" | \ xargs -I{} zfs snapshot {} } Nico --
On Wed, Apr 05, 2006 at 05:20:12PM -0500, Nicolas Williams wrote:> On Wed, Apr 05, 2006 at 09:38:24AM +0200, Robert Milkowski wrote: > > You misunderstand me - it''s my fault I should have been more specific. > > > > Anyway looks like snapshots would be probably a solution. > > However if there are many filesystems in a pool then it''s > > still doable but not that "user-friendly". > > Ah, I think I see, you want a zpool-level way of taking all the > necessary snapshots [of all the filesystems in that pool]. > > And presumably you want some sort of zpool-wide atomicity guarantee(?), > or did I misunderstand that too?Taking lots of snapshots quickly (and perhaps atomically) is definately on my radar. Hopefully I will get to it soon after s10u2. See: 6373978 want to take lots of snapshots quickly (''zfs snapshot -r'') --matt
Hello Matthew, Thursday, April 6, 2006, 1:20:07 AM, you wrote: MA> On Wed, Apr 05, 2006 at 05:20:12PM -0500, Nicolas Williams wrote:>> On Wed, Apr 05, 2006 at 09:38:24AM +0200, Robert Milkowski wrote: >> > You misunderstand me - it''s my fault I should have been more specific. >> > >> > Anyway looks like snapshots would be probably a solution. >> > However if there are many filesystems in a pool then it''s >> > still doable but not that "user-friendly". >> >> Ah, I think I see, you want a zpool-level way of taking all the >> necessary snapshots [of all the filesystems in that pool]. >> >> And presumably you want some sort of zpool-wide atomicity guarantee(?), >> or did I misunderstand that too?MA> Taking lots of snapshots quickly (and perhaps atomically) is definately MA> on my radar. Hopefully I will get to it soon after s10u2. See: MA> 6373978 want to take lots of snapshots quickly (''zfs snapshot -r'') ok, that''s it. And I don''t think that zpool-wide atomicity is needed. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Torrey, Wednesday, April 5, 2006, 11:59:59 PM, you wrote: TM> Robert Milkowski wrote:>> Hello Torrey, >> >> Tuesday, April 4, 2006, 11:18:26 PM, you wrote: >> >> >> TM> If I have a ZFS pool on a hardware mirror and I split the mirror >> TM> without issuing any ZFS commands can I be assured that the recently >> TM> split off mirror can simply be mounted on an other system? Are there >> TM> any consistency issues? Any labeling or unique ID''s that need to be >> TM> changed before a second system sees the split off section? >> >> >> TM> You can replace mirror with BCV, SRDF volume, etc. etc. >> >> Actually it''s not the same - when you split mirror or any other 2-way >> association then you''re probably right. But it can''t be true that if I >> split 10 BCV one by one in an 1hour delay between (and there''re ten >> luns in raidz) that what I get on BCV has anything to do with >> somtething even remotly consistent. However you are probably right >> in case I would make a snapshot before I start split.TM> Are you saying you want to take a BCV of the entire pool every hour for TM> ten hours, rinse, wash, and repeat? I''m not too familar with BCVs so TM> please refresh my memory. (I thought they were configurable independent TM> or dependent snapshots.) LUN1 <-> BCV1 LUN2 <-> BCV2 LUN3 <-> BCV3 Now, lets assume that all three BCVs are synchronized (so you can think of them as there are actually three mirrors). Now lets do: zpool create test raidz lun1 lun2 lun3 zfs create test/d100 Now run an application which is reading/writing to test/d100. Now lets do: zfs snapshot test/d100 at monday [application is still runing - data consistency from the point of application - it''s not zfs related so I don''t care - let''s say it''s an ftp server and it really doesn''t matter] Now let''s split first BCV1 so we have: LUN1 < SPLITTED > BCV1 LUN2 <-> BCV2 LUN3 <-> BCV3 Now wait 5 minutes and then split BCV2, then wait another 5 minutes and split BCV3. At the end we''ve got all three BCVs splitted. Now I understand I can import zpool test using only BCVs, right? If the same host which has access to LUN1-3 and is actually actively using zpool test has also access to these three BCVs then is it possible to import at the same time these BCVs on the same host (with different pool name of course)? What if I didn''t imported zpool on BCVs but exported zpool test on normal pools. Now If I will do ''zpool import test'' the pool test will be based only on LUN1-3 or only on BCV1-3 or some mixed configuration? Or I will get an info that there are two different pools with the same name so I have to give pool_ID not it''s name - ? Lets assume I can import pool test on these three BCVs on different host. Snapshot test/d100 at monday is expected to be consistent but test and test/d100 filesystems are in a ''strange'' state - and I wonder what are possible corner cases here - I hope that not system panic. ps. I know that in real life there won''t be 5 minutes intervals in such a case, but intervals like 1-10s will be common. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> ps. I know that in real life there won''t be 5 minutes > intervals in > such a case, but intervals like 1-10s will be common.Maybe, maybe not. It seems to me the process of creation of multi-lun BCVs is almost identical to shoving a copy of a ZFS pool over to another machine via ''dd'' from the live disks, one at a time. If the BCV split works without issue, then I would expect a multi-volume ''dd'' to work just as well. Even with the ability to "correctly" copy a ZFS system, I expect someone to try to do it at this layer. That could take significantly longer than 5 minutes. Unless there''s data that is guaranteed synchronized on a disk but not across the pool as a whole, I think these procedures would have the same effect, no? -- Darren This message posted from opensolaris.org
Hi Robert. Robert Milkowski wrote:> Hello Torrey, > > TM> Are you saying you want to take a BCV of the entire pool every hour for > TM> ten hours, rinse, wash, and repeat? I''m not too familar with BCVs so > TM> please refresh my memory. (I thought they were configurable independent > TM> or dependent snapshots.) > > > LUN1 <-> BCV1 > LUN2 <-> BCV2 > LUN3 <-> BCV3 > > Now, lets assume that all three BCVs are synchronized (so you can > think of them as there are actually three mirrors). > > Now lets do: > zpool create test raidz lun1 lun2 lun3 > zfs create test/d100 > > Now run an application which is reading/writing to test/d100. > > Now lets do: > > zfs snapshot test/d100 at monday > > [application is still runing - data consistency from the point of > application - it''s not zfs related so I don''t care - let''s say it''s an > ftp server and it really doesn''t matter] > > Now let''s split first BCV1 so we have: > > LUN1 < SPLITTED > BCV1 > LUN2 <-> BCV2 > LUN3 <-> BCV3 > > Now wait 5 minutes and then split BCV2, then wait another 5 minutes > and split BCV3. At the end we''ve got all three BCVs splitted. >...and thats where things break down. ZFS is self consistent but only when you have all the LUNs together. You can''t snapshot the LUNs are different times or you''ll have very weird results. You''d see the same thing with UFS or VxFS as well.> Now I understand I can import zpool test using only BCVs, right? >That has yet to be answered by the ZFS team. They haven''t taken the bait on my previous emails but maybe I need to start a new thread. ;)> If the same host which has access to LUN1-3 and is actually actively > using zpool test has also access to these three BCVs then is it > possible to import at the same time these BCVs on the same host (with > different pool name of course)? >I don''t think so. Primarily due to the deltas between the LUN snapshots causing data weirdness. Also, I think ZFS might get confused if you don''t change some ID information in the BCVs to have a different pool name or something as mentioned above.> What if I didn''t imported zpool on BCVs but exported zpool test on > normal pools. Now If I will do ''zpool import test'' the pool test will > be based only on LUN1-3 or only on BCV1-3 or some mixed configuration? > Or I will get an info that there are two different pools with the same > name so I have to give pool_ID not it''s name - ? > > Lets assume I can import pool test on these three BCVs on different > host. Snapshot test/d100 at monday is expected to be consistent but test > and test/d100 filesystems are in a ''strange'' state - and I wonder what > are possible corner cases here - I hope that not system panic. >I don''t think your snapshot would be consistent as well unless you could guarantee that ZFS kept the snapshot on one of the three LUNs. I think in this case you would need to stop i/o to all the LUNs in the zpool - Pick your favorite method - then make the BCV operations, and then if we ever find out how to remount the BCVs without confusing ZFS mount them someplace else. Or you could limit yourself to one pool per LUN instead of striping/mirroring between them. If you had one pool per LUN and split a BCV volume up it would be consistent, in theory. (Then we''d run into the "how do you mount it" issue....)