thr3ads.net - zfs discuss - [zfs-discuss] Project Proposal: Availability Suite [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Jim Dunham

2007-Jan-26 15:26 UTC

[zfs-discuss] Project Proposal: Availability Suite

Project Overview:

I propose the creation of a project on opensolaris.org, to bring to the
community two Solaris host-based data services; namely volume snapshot and
volume replication. These two data services exist today as the Sun StorageTek
Availability Suite, a Solaris 8, 9 & 10, unbundled product set, consisting
of Instant Image (II) and Network Data Replicator (SNDR).

Project Description:

Although Availability Suite is typically known as just two data services (II
& SNDR), there is an underlying Solaris I/O filter driver framework which
supports these two data services. This framework provides the means to stack one
or more block-based, pseudo device drivers on to any pre-provisioned cb_ops
structure, [
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
], thereby shunting all cb_ops I/O into the top of a developed filter driver,
(for driver specific processing), then out the bottom of this filter driver,
back into the original cb_ops entry points.

Availability Suite was developed to interpose itself on the I/O stack of a block
device, providing a filter driver framework with the means to intercept any I/O
originating from an upstream file system, database or application layer I/O.
This framework provided the means for Availability Suite to support snapshot and
remote replication data services for UFS, QFS, VxFS, and more recently the ZFS
file system, plus various databases like Oracle, Sybase and PostgreSQL, and also
application I/Os. By providing a filter driver at this point in the Solaris I/O
stack, it allows for any number of data services to be implemented, without
regard to the underlying block storage that they will be configured on. Today,
as a snapshot and/or replication solution, the framework allows both the source
and destination block storage device to not only differ in physical
characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical
characteristics such as in RAID type, volume managed storage (i.e., SVM, VxVM),
lofi, zvols, even ram disks.

Community Involvement:

By providing this filter-driver framework, two working filter drivers (II &
SNDR), and an extensive collection of supporting software and utilities, it is
envisioned that those individuals and companies that adopt OpenSolaris as a
viable storage platform, will also utilize and enhance the existing II &
SNDR data services, plus have offered to them the means in which to develop
their own block-based filter driver(s), further enhancing the use and adoption
on OpenSolaris.

A very timely example that is very applicable to Availability Suite and the
OpenSolaris community, is the recent announcement of the Project Proposal: lofi
[ compression & encryption ] -
http://www.opensolaris.org/jive/click.jspa&messageID=26841. By leveraging
both the Availability Suite and the lofi OpenSolaris projects, it would be
highly probable to not only offer compression & encryption to lofi devices
(as already proposed), but by collectively leveraging these two project,
creating the means to support file systems, databases and applications, across
all block-based storage devices.

Since Availability Suite has strong technical ties to storage, please look for
email discussion for this project at: <storage-discuss at opensolaris dot
org>

A complete set of Availability Suite administration guides can be found at:
http://docs.sun.com/app/docs?p=coll%2FAVS4.0


Project Lead:

Jim Dunham http://www.opensolaris.org/viewProfile.jspa?username=jdunham 

Availability Suite - New Solaris Storage Group
 
 
This message posted from opensolaris.org

Jason J. W. Williams

2007-Jan-27 00:15 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Could the replication engine eventually be integrated more tightly
with ZFS? That would be slick alternative to send/recv.

Best Regards,
Jason

On 1/26/07, Jim Dunham <James.Dunham at sun.com>
wrote:> Project Overview:
>
> I propose the creation of a project on opensolaris.org, to bring to the
community two Solaris host-based data services; namely volume snapshot and
volume replication. These two data services exist today as the Sun StorageTek
Availability Suite, a Solaris 8, 9 & 10, unbundled product set, consisting
of Instant Image (II) and Network Data Replicator (SNDR).
>
> Project Description:
>
> Although Availability Suite is typically known as just two data services
(II & SNDR), there is an underlying Solaris I/O filter driver framework
which supports these two data services. This framework provides the means to
stack one or more block-based, pseudo device drivers on to any pre-provisioned
cb_ops structure, [
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
], thereby shunting all cb_ops I/O into the top of a developed filter driver,
(for driver specific processing), then out the bottom of this filter driver,
back into the original cb_ops entry points.
>
> Availability Suite was developed to interpose itself on the I/O stack of a
block device, providing a filter driver framework with the means to intercept
any I/O originating from an upstream file system, database or application layer
I/O. This framework provided the means for Availability Suite to support
snapshot and remote replication data services for UFS, QFS, VxFS, and more
recently the ZFS file system, plus various databases like Oracle, Sybase and
PostgreSQL, and also application I/Os. By providing a filter driver at this
point in the Solaris I/O stack, it allows for any number of data services to be
implemented, without regard to the underlying block storage that they will be
configured on. Today, as a snapshot and/or replication solution, the framework
allows both the source and destination block storage device to not only differ
in physical characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical
characteristics such as in RAID type, volume managed storage (i.e., SVM, VxVM),
lofi, zvols, even ram disks.
>
> Community Involvement:
>
> By providing this filter-driver framework, two working filter drivers (II
& SNDR), and an extensive collection of supporting software and utilities,
it is envisioned that those individuals and companies that adopt OpenSolaris as
a viable storage platform, will also utilize and enhance the existing II &
SNDR data services, plus have offered to them the means in which to develop
their own block-based filter driver(s), further enhancing the use and adoption
on OpenSolaris.
>
> A very timely example that is very applicable to Availability Suite and the
OpenSolaris community, is the recent announcement of the Project Proposal: lofi
[ compression & encryption ] -
http://www.opensolaris.org/jive/click.jspa&messageID=26841. By leveraging
both the Availability Suite and the lofi OpenSolaris projects, it would be
highly probable to not only offer compression & encryption to lofi devices
(as already proposed), but by collectively leveraging these two project,
creating the means to support file systems, databases and applications, across
all block-based storage devices.
>
> Since Availability Suite has strong technical ties to storage, please look
for email discussion for this project at: <storage-discuss at opensolaris dot
org>
>
> A complete set of Availability Suite administration guides can be found at:
http://docs.sun.com/app/docs?p=coll%2FAVS4.0
>
>
> Project Lead:
>
> Jim Dunham http://www.opensolaris.org/viewProfile.jspa?username=jdunham
>
> Availability Suite - New Solaris Storage Group
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Jim Dunham

2007-Jan-27 01:22 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Jason J. W. Williams wrote:> Could the replication engine eventually be integrated more tightly 
> with ZFS?Not it in the present form. The architecture and implementation of 
Availability Suite is driven off block-based replication at the device 
level (/dev/rdsk/...), something that allows the product to replicate 
any Solaris file system, database, etc., without any knowledge of what 
it is actually replicating.

To pursue ZFS replication in the manner of Availability Suite, one needs 
to see what replication looks like from an abstract point of view. So 
simplistically, remote replication is like the letter ''h'',
where the
left side of the letter is the complete I/O path on the primary node, 
the horizontal part of the letter is the remote replication network 
link, and the right side of the letter is only the bottom half of the 
complete I/O path on the secondary node.

Next ZFS would have to have its functional I/O path split into two 
halves, a top and bottom piece.  Next we configure replication, the 
letter ''h'', between two given nodes, running both a top and
bottom piece
of ZFS on the source node, and just the bottom half of ZFS on the 
secondary node.

Today, the SNDR component of Availability Suite works like the letter 
''h'' today, where we split the Solaris I/O stack into a top and
bottom
half. The top half is that software (file system, database or 
application I/O) that directs its I/Os to the bottom half (raw device, 
volume manager or block device).

So all that needs to be done is to design and build a new variant of the 
letter ''h'', and find the place to separate ZFS into two
pieces.

- Jim Dunham
>
> That would be slick alternative to send/recv.
>
> Best Regards,
> Jason
>
> On 1/26/07, Jim Dunham <James.Dunham at sun.com> wrote:
>> Project Overview:
>>
>> I propose the creation of a project on opensolaris.org, to bring to 
>> the community two Solaris host-based data services; namely volume 
>> snapshot and volume replication. These two data services exist today 
>> as the Sun StorageTek Availability Suite, a Solaris 8, 9 & 10, 
>> unbundled product set, consisting of Instant Image (II) and Network 
>> Data Replicator (SNDR).
>>
>> Project Description:
>>
>> Although Availability Suite is typically known as just two data 
>> services (II & SNDR), there is an underlying Solaris I/O filter 
>> driver framework which supports these two data services. This 
>> framework provides the means to stack one or more block-based, pseudo 
>> device drivers on to any pre-provisioned cb_ops structure, [ 
>>
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
>> ], thereby shunting all cb_ops I/O into the top of a developed filter 
>> driver, (for driver specific processing), then out the bottom of this 
>> filter driver, back into the original cb_ops entry points.
>>
>> Availability Suite was developed to interpose itself on the I/O stack 
>> of a block device, providing a filter driver framework with the means 
>> to intercept any I/O originating from an upstream file system, 
>> database or application layer I/O. This framework provided the means 
>> for Availability Suite to support snapshot and remote replication 
>> data services for UFS, QFS, VxFS, and more recently the ZFS file 
>> system, plus various databases like Oracle, Sybase and PostgreSQL, 
>> and also application I/Os. By providing a filter driver at this point 
>> in the Solaris I/O stack, it allows for any number of data services 
>> to be implemented, without regard to the underlying block storage 
>> that they will be configured on. Today, as a snapshot and/or 
>> replication solution, the framework allows both the source and 
>> destination block storage device to not only differ in physical 
>> characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical 
>> characteristics such as in RAID type, volume managed storage (i.e., 
>> SVM, VxVM), lofi, zvols, even ram disks.
>>
>> Community Involvement:
>>
>> By providing this filter-driver framework, two working filter drivers 
>> (II & SNDR), and an extensive collection of supporting software and
>> utilities, it is envisioned that those individuals and companies that 
>> adopt OpenSolaris as a viable storage platform, will also utilize and 
>> enhance the existing II & SNDR data services, plus have offered to 
>> them the means in which to develop their own block-based filter 
>> driver(s), further enhancing the use and adoption on OpenSolaris.
>>
>> A very timely example that is very applicable to Availability Suite 
>> and the OpenSolaris community, is the recent announcement of the 
>> Project Proposal: lofi [ compression & encryption ] - 
>> http://www.opensolaris.org/jive/click.jspa&messageID=26841. By 
>> leveraging both the Availability Suite and the lofi OpenSolaris 
>> projects, it would be highly probable to not only offer compression
&
>> encryption to lofi devices (as already proposed), but by collectively 
>> leveraging these two project, creating the means to support file 
>> systems, databases and applications, across all block-based storage 
>> devices.
>>
>> Since Availability Suite has strong technical ties to storage, please 
>> look for email discussion for this project at: <storage-discuss at 
>> opensolaris dot org>
>>
>> A complete set of Availability Suite administration guides can be 
>> found at: http://docs.sun.com/app/docs?p=coll%2FAVS4.0
>>
>>
>> Project Lead:
>>
>> Jim Dunham http://www.opensolaris.org/viewProfile.jspa?username=jdunham
>>
>> Availability Suite - New Solaris Storage Group
>>
>>
>> This message posted from opensolaris.org
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>

Jason J. W. Williams

2007-Jan-29 19:50 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Thank you for the detailed explanation. It is very helpful to
understand the issue. Is anyone successfully using SNDR with ZFS yet?

Best Regards,
Jason

On 1/26/07, Jim Dunham <James.Dunham at sun.com>
wrote:> Jason J. W. Williams wrote:
> > Could the replication engine eventually be integrated more tightly
> > with ZFS?
> Not it in the present form. The architecture and implementation of
> Availability Suite is driven off block-based replication at the device
> level (/dev/rdsk/...), something that allows the product to replicate
> any Solaris file system, database, etc., without any knowledge of what
> it is actually replicating.
>
> To pursue ZFS replication in the manner of Availability Suite, one needs
> to see what replication looks like from an abstract point of view. So
> simplistically, remote replication is like the letter
''h'', where the
> left side of the letter is the complete I/O path on the primary node,
> the horizontal part of the letter is the remote replication network
> link, and the right side of the letter is only the bottom half of the
> complete I/O path on the secondary node.
>
> Next ZFS would have to have its functional I/O path split into two
> halves, a top and bottom piece.  Next we configure replication, the
> letter ''h'', between two given nodes, running both a top
and bottom piece
> of ZFS on the source node, and just the bottom half of ZFS on the
> secondary node.
>
> Today, the SNDR component of Availability Suite works like the letter
> ''h'' today, where we split the Solaris I/O stack into a
top and bottom
> half. The top half is that software (file system, database or
> application I/O) that directs its I/Os to the bottom half (raw device,
> volume manager or block device).
>
> So all that needs to be done is to design and build a new variant of the
> letter ''h'', and find the place to separate ZFS into two
pieces.
>
> - Jim Dunham
>
> >
> > That would be slick alternative to send/recv.
> >
> > Best Regards,
> > Jason
> >
> > On 1/26/07, Jim Dunham <James.Dunham at sun.com> wrote:
> >> Project Overview:
> >>
> >> I propose the creation of a project on opensolaris.org, to bring
to
> >> the community two Solaris host-based data services; namely volume
> >> snapshot and volume replication. These two data services exist
today
> >> as the Sun StorageTek Availability Suite, a Solaris 8, 9 & 10,
> >> unbundled product set, consisting of Instant Image (II) and
Network
> >> Data Replicator (SNDR).
> >>
> >> Project Description:
> >>
> >> Although Availability Suite is typically known as just two data
> >> services (II & SNDR), there is an underlying Solaris I/O
filter
> >> driver framework which supports these two data services. This
> >> framework provides the means to stack one or more block-based,
pseudo
> >> device drivers on to any pre-provisioned cb_ops structure, [
> >>
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
> >> ], thereby shunting all cb_ops I/O into the top of a developed
filter
> >> driver, (for driver specific processing), then out the bottom of
this
> >> filter driver, back into the original cb_ops entry points.
> >>
> >> Availability Suite was developed to interpose itself on the I/O
stack
> >> of a block device, providing a filter driver framework with the
means
> >> to intercept any I/O originating from an upstream file system,
> >> database or application layer I/O. This framework provided the
means
> >> for Availability Suite to support snapshot and remote replication
> >> data services for UFS, QFS, VxFS, and more recently the ZFS file
> >> system, plus various databases like Oracle, Sybase and PostgreSQL,
> >> and also application I/Os. By providing a filter driver at this
point
> >> in the Solaris I/O stack, it allows for any number of data
services
> >> to be implemented, without regard to the underlying block storage
> >> that they will be configured on. Today, as a snapshot and/or
> >> replication solution, the framework allows both the source and
> >> destination block storage device to not only differ in physical
> >> characteristics (DAS, Fibre Channel, iSCSI, etc.), but also
logical
> >> characteristics such as in RAID type, volume managed storage
(i.e.,
> >> SVM, VxVM), lofi, zvols, even ram disks.
> >>
> >> Community Involvement:
> >>
> >> By providing this filter-driver framework, two working filter
drivers
> >> (II & SNDR), and an extensive collection of supporting
software and
> >> utilities, it is envisioned that those individuals and companies
that
> >> adopt OpenSolaris as a viable storage platform, will also utilize
and
> >> enhance the existing II & SNDR data services, plus have
offered to
> >> them the means in which to develop their own block-based filter
> >> driver(s), further enhancing the use and adoption on OpenSolaris.
> >>
> >> A very timely example that is very applicable to Availability
Suite
> >> and the OpenSolaris community, is the recent announcement of the
> >> Project Proposal: lofi [ compression & encryption ] -
> >> http://www.opensolaris.org/jive/click.jspa&messageID=26841. By
> >> leveraging both the Availability Suite and the lofi OpenSolaris
> >> projects, it would be highly probable to not only offer
compression &
> >> encryption to lofi devices (as already proposed), but by
collectively
> >> leveraging these two project, creating the means to support file
> >> systems, databases and applications, across all block-based
storage
> >> devices.
> >>
> >> Since Availability Suite has strong technical ties to storage,
please
> >> look for email discussion for this project at: <storage-discuss
at
> >> opensolaris dot org>
> >>
> >> A complete set of Availability Suite administration guides can be
> >> found at: http://docs.sun.com/app/docs?p=coll%2FAVS4.0
> >>
> >>
> >> Project Lead:
> >>
> >> Jim Dunham
http://www.opensolaris.org/viewProfile.jspa?username=jdunham
> >>
> >> Availability Suite - New Solaris Storage Group
> >>
> >>
> >> This message posted from opensolaris.org
> >> _______________________________________________
> >> zfs-discuss mailing list
> >> zfs-discuss at opensolaris.org
> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >>
>
>

Jim Dunham

2007-Jan-30 04:11 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Jason,> Thank you for the detailed explanation. It is very helpful to
> understand the issue. Is anyone successfully using SNDR with ZFS yet?Of the opportunities I''ve been involved with the answer is yes, but so 
far I''ve not seen SNDR with  ZFS in a production environment, but that 
does not mean they don''t exists. It was not until late June
''06, that
AVS 4.0, Solaris 10 and ZFS were generally available, and to date AVS 
has not been made available for the Solaris Express, Community Release, 
but it will be real soon.

While I have your attention, there are two issues between ZFS and AVS 
that needs mentioning.

1). When ZFS is given an entire LUN to place in a ZFS storage pool, ZFS 
detect this, enabling SCSI write-caching on the LUN, and also opens the 
LUN with exclusive access, preventing other data services (like AVS) 
from accessing this device. The work-around is to manually format the 
LUN, typically placing all the blocks into a single partition, then just 
place this partition into the ZFS storage pool. ZFS detect this, not 
owning the entire LUN, and doesn''t enable write-caching, which means it
also doesn''t open the LUN with exclusive access, and therefore AVS and 
ZFS can share the same LUN.

I thought about submitting an RFE to have ZFS provide a means to 
override this restriction, but I am not 100% certain that a ZFS 
filesystem directly accessing a write-cached enabled LUN is the same 
thing as a replicated ZFS filesystem accessing a write-cached enabled 
LUN. Even though AVS is write-order consistent, there are disaster 
recovery scenarios, when enacted, where block-order, verses write-order 
I/Os are issued.

2). One has to be very cautious in using "zpool import -f .... "
(forced
import), especially on a LUN or LUNs in which SNDR is actively 
replicating into. If ZFS complains that the storage pool was not cleanly 
exported when issuing a "zpool import ...", and one attempts a
"zpool
import -f ....", without checking the active replication state, they are 
sure to panic Solaris. Of  course this failure scenario is no different 
then accessing a LUN or LUNs on dual-ported, or SAN based storage when 
another Solaris host is still accessing the ZFS filesystem, or 
controller based replication, as they are all just different operational 
scenarios of the same issue, data blocks changing out from underneath 
the ZFS filesystem, and its CRC checking mechanisms.

Jim
>
> Best Regards,
> Jason

Jason J. W. Williams

2007-Jan-30 06:10 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Hi Jim,

Thank you very much for the heads up. Unfortunately, we need the
write-cache enabled for the application I was thinking of combining
this with. Sounds like SNDR and ZFS need some more soak time together
before you can use both to their full potential together?

Best Regards,
Jason

On 1/29/07, Jim Dunham <James.Dunham at sun.com>
wrote:> Jason,
> > Thank you for the detailed explanation. It is very helpful to
> > understand the issue. Is anyone successfully using SNDR with ZFS yet?
> Of the opportunities I''ve been involved with the answer is yes,
but so
> far I''ve not seen SNDR with  ZFS in a production environment, but
that
> does not mean they don''t exists. It was not until late June
''06, that
> AVS 4.0, Solaris 10 and ZFS were generally available, and to date AVS
> has not been made available for the Solaris Express, Community Release,
> but it will be real soon.
>
> While I have your attention, there are two issues between ZFS and AVS
> that needs mentioning.
>
> 1). When ZFS is given an entire LUN to place in a ZFS storage pool, ZFS
> detect this, enabling SCSI write-caching on the LUN, and also opens the
> LUN with exclusive access, preventing other data services (like AVS)
> from accessing this device. The work-around is to manually format the
> LUN, typically placing all the blocks into a single partition, then just
> place this partition into the ZFS storage pool. ZFS detect this, not
> owning the entire LUN, and doesn''t enable write-caching, which
means it
> also doesn''t open the LUN with exclusive access, and therefore AVS
and
> ZFS can share the same LUN.
>
> I thought about submitting an RFE to have ZFS provide a means to
> override this restriction, but I am not 100% certain that a ZFS
> filesystem directly accessing a write-cached enabled LUN is the same
> thing as a replicated ZFS filesystem accessing a write-cached enabled
> LUN. Even though AVS is write-order consistent, there are disaster
> recovery scenarios, when enacted, where block-order, verses write-order
> I/Os are issued.
>
> 2). One has to be very cautious in using "zpool import -f .... "
(forced
> import), especially on a LUN or LUNs in which SNDR is actively
> replicating into. If ZFS complains that the storage pool was not cleanly
> exported when issuing a "zpool import ...", and one attempts a
"zpool
> import -f ....", without checking the active replication state, they
are
> sure to panic Solaris. Of  course this failure scenario is no different
> then accessing a LUN or LUNs on dual-ported, or SAN based storage when
> another Solaris host is still accessing the ZFS filesystem, or
> controller based replication, as they are all just different operational
> scenarios of the same issue, data blocks changing out from underneath
> the ZFS filesystem, and its CRC checking mechanisms.
>
> Jim
>
> >
> > Best Regards,
> > Jason
>
>

Torrey McMahon

2007-Feb-02 19:33 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Jason J. W. Williams wrote:> Hi Jim,
>
> Thank you very much for the heads up. Unfortunately, we need the
> write-cache enabled for the application I was thinking of combining
> this with. Sounds like SNDR and ZFS need some more soak time together
> before you can use both to their full potential together?
Well...there is the fact that SNDR works with other FS other then ZFS. 
(Yes, I know this is the ZFS list.) Working around architectural issues 
for ZFS and ZFS alone might cause issues for others.

I think the best of both worlds approach would be to let SNDR plug-in to 
ZFS along the same lines the crypto stuff will be able to plug in, 
different compression types, etc. There once was a slide that showed how 
that worked....or I''m hallucinating again.

Nicolas Williams

2007-Feb-02 20:01 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

On Fri, Jan 26, 2007 at 05:15:28PM -0700, Jason J. W. Williams
wrote:> Could the replication engine eventually be integrated more tightly
> with ZFS? That would be slick alternative to send/recv.
But a continuous zfs send/recv would be cool too.  In fact, I think ZFS
tightly integrated with SNDR wouldn''t be that much different from a
continuous zfs send/recv.

Nico
--

Torrey McMahon

2007-Feb-02 20:17 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Nicolas Williams wrote:> On Fri, Jan 26, 2007 at 05:15:28PM -0700, Jason J. W. Williams wrote:
>   
>> Could the replication engine eventually be integrated more tightly
>> with ZFS? That would be slick alternative to send/recv.
>>     
>
> But a continuous zfs send/recv would be cool too.  In fact, I think ZFS
> tightly integrated with SNDR wouldn''t be that much different from
a
> continuous zfs send/recv.
Even better with snapshots, and scoreboarding, and synch vs asynch and 
and and and .....

Nicolas Williams

2007-Feb-02 20:35 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

On Fri, Feb 02, 2007 at 03:17:17PM -0500, Torrey McMahon
wrote:> Nicolas Williams wrote:
> >But a continuous zfs send/recv would be cool too.  In fact, I think ZFS
> >tightly integrated with SNDR wouldn''t be that much different
from a
> >continuous zfs send/recv.
> 
> Even better with snapshots, and scoreboarding, and synch vs asynch and 
> and and and .....
Right.  I hadn''t thought of that.  A replication system that is well
integrated with ZFS should have very similar properties whether designed
as a journalling scheme or as a scoreboarding scheme.

A continuous zfs send/recv as I imagine it would be like journalling
while ZFS+SNDR would be more like scoreboarding.

Unlike traditional journalling replication, a continuous ZFS send/recv
scheme could deal with resource constraints by taking a snapshot and
throttling replication until resources become available again.
Replication throttling would mean losing some transaction history, but
since we don''t expose that right now, nothing would be lost.

Scoreboarding (what SNDR does) should perform better in general, but in
the case of COW filesystems and databases ISTM that it should be a wash
unless it''s properly integrated with the COW system, and
that''s what
makes me think scoreboarding and journalling approach each other at the
limit when integrated with ZFS.

In general I would expect journalling to have better reliability
semantics (since you always know exactly the last transaction that was
successfully replicated).

Nico
--

Torrey McMahon

2007-Feb-02 20:44 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Nicolas Williams wrote:> On Fri, Feb 02, 2007 at 03:17:17PM -0500, Torrey McMahon wrote:
>   
>> Nicolas Williams wrote:
>>     
>>> But a continuous zfs send/recv would be cool too.  In fact, I think
ZFS
>>> tightly integrated with SNDR wouldn''t be that much
different from a
>>> continuous zfs send/recv.
>>>       
>> Even better with snapshots, and scoreboarding, and synch vs asynch and 
>> and and and .....
>>     
>
> Right.  I hadn''t thought of that.  A replication system that is
well
> integrated with ZFS should have very similar properties whether designed
> as a journalling scheme or as a scoreboarding scheme.
Here''s an other thing to think about: ZFS is a COW filesystem. Even if 
I''m changing one piece of data over and over, which in the past might
be
a set of blocks, I''m going to be writing out new blocks on disk. Many 
replication strategies take into account the fact that even through your 
data is changing quite a bit the actual block storage level changes are 
much smaller.

Jonathan Edwards

2007-Feb-02 21:04 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

On Feb 2, 2007, at 15:35, Nicolas Williams wrote:
> Unlike traditional journalling replication, a continuous ZFS send/recv
> scheme could deal with resource constraints by taking a snapshot and
> throttling replication until resources become available again.
> Replication throttling would mean losing some transaction history, but
> since we don''t expose that right now, nothing would be lost.
>
> Scoreboarding (what SNDR does) should perform better in general,  
> but in
> the case of COW filesystems and databases ISTM that it should be a  
> wash
> unless it''s properly integrated with the COW system, and
that''s what
> makes me think scoreboarding and journalling approach each other at  
> the
> limit when integrated with ZFS.
hmm .. a COW scoreboard .. visions of Clustra with the notion of  
"each node
is an atomic failure unit" spring to mind .. of course in this light,  
there''s not
much of a difference between just replication and global  
synchronization ..

very interesting ..

---
.je

Torrey McMahon

2007-Feb-02 21:22 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Jonathan Edwards wrote:>
> On Feb 2, 2007, at 15:35, Nicolas Williams wrote:
>
>> Unlike traditional journalling replication, a continuous ZFS send/recv
>> scheme could deal with resource constraints by taking a snapshot and
>> throttling replication until resources become available again.
>> Replication throttling would mean losing some transaction history, but
>> since we don''t expose that right now, nothing would be lost.
>>
>> Scoreboarding (what SNDR does) should perform better in general, but in
>> the case of COW filesystems and databases ISTM that it should be a wash
>> unless it''s properly integrated with the COW system, and
that''s what
>> makes me think scoreboarding and journalling approach each other at the
>> limit when integrated with ZFS.
>
> hmm .. a COW scoreboard .. visions of Clustra with the notion of "each
> node
> is an atomic failure unit" spring to mind .. of course in this light, 
> there''s not
> much of a difference between just replication and global 
> synchronization ..
But would you want a COW scoreboard or a transaction log? Or would there 
be a difference? Is it Friday yet? I think we need to start drinking on 
this one. ;)

mike

2007-Feb-03 05:03 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

My two (everyman''s) cents - could something like this be modeled after
MySQL replication or even something like DRBD (drbd.org) ? Seems like
possibly the same idea.

On 1/26/07, Jim Dunham <James.Dunham at sun.com>
wrote:> Project Overview:
> ...

Robert Milkowski

2007-Feb-03 11:35 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Hello Nicolas,

Friday, February 2, 2007, 9:01:20 PM, you wrote:

NW> On Fri, Jan 26, 2007 at 05:15:28PM -0700, Jason J. W. Williams
wrote:>> Could the replication engine eventually be integrated more tightly
>> with ZFS? That would be slick alternative to send/recv.
NW> But a continuous zfs send/recv would be cool too.  In fact, I think ZFS
NW> tightly integrated with SNDR wouldn''t be that much different
from a
NW> continuous zfs send/recv.

It would. zfs send/recv is more flexible in what it''s expecting on the
remote side. You can have uncompressed file system on sending sinde,
and compressed one on remote, not to mention other properties and
different raid type and pool/fs size.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Frank Hofmann

2007-Feb-05 09:51 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

On Fri, 2 Feb 2007, Torrey McMahon wrote:
> Jason J. W. Williams wrote:
>> Hi Jim,
>> 
>> Thank you very much for the heads up. Unfortunately, we need the
>> write-cache enabled for the application I was thinking of combining
>> this with. Sounds like SNDR and ZFS need some more soak time together
>> before you can use both to their full potential together?
>
> Well...there is the fact that SNDR works with other FS other then ZFS.
(Yes,
> I know this is the ZFS list.) Working around architectural issues for ZFS
and
> ZFS alone might cause issues for others.
SNDR has some issues with logging UFS as well. If you start a SNDR live 
copy on an active logging UFS (not _writelocked_), the UFS log state may 
not be copied consistently.

If you want a live remote replication facility, it _NEEDS_ to talk to the 
filesystem somehow. There must be a callback mechanism that the filesystem 
could use to tell the replicator "and from exactly now on you start 
replicating". The only entity which can truly give this signal is the 
filesystem itself.

And no, that _not_ when the filesystem does a "flush write cache"
ioctl.
Or when the user has just issued a "sync" command or similar.
For ZFS, it''d be when a ZIL transaction is closed (as I understand it),
for UFS it''d be when the UFS log is fully rolled. There''s no
notification
to external entities when these two events happen.
SNDR tries its best to achieve this detection, but without actually 
_stopping_ all I/O (on UFS: writelocking), there''s a window of 
vulnerability still open.
And SNDR/II don''t stop filesystem I/O - by basic principle.
That''s how
they''re sold/advertised/intended to be used.

I''m all willing to see SNDR/II go open - we could finally work these 
issues !

FrankH.
>
> I think the best of both worlds approach would be to let SNDR plug-in to
ZFS
> along the same lines the crypto stuff will be able to plug in, different 
> compression types, etc. There once was a slide that showed how that 
> worked....or I''m hallucinating again.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Jim Dunham

2007-Feb-05 12:49 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Frank,> On Fri, 2 Feb 2007, Torrey McMahon wrote:
>
>> Jason J. W. Williams wrote:
>>> Hi Jim,
>>>
>>> Thank you very much for the heads up. Unfortunately, we need the
>>> write-cache enabled for the application I was thinking of combining
>>> this with. Sounds like SNDR and ZFS need some more soak time
together
>>> before you can use both to their full potential together?
>>
>> Well...there is the fact that SNDR works with other FS other then 
>> ZFS. (Yes, I know this is the ZFS list.) Working around architectural 
>> issues for ZFS and ZFS alone might cause issues for others.
>
> SNDR has some issues with logging UFS as well. If you start a SNDR 
> live copy on an active logging UFS (not _writelocked_), the UFS log 
> state may not be copied consistently.
Treading "very" carefully, UFS logging may have issues with being 
replicated, not the other way around. SNDR replication (after 
synchronizing) maintains a write-order consistent volume, thus if there 
is an issue with UFS logging being able to access an SNDR secondary, 
then UFS logging will also have issues with accessing a volume after 
Solaris crashes. The end result of Solaris crashing, or SNDR replication 
stopping, is a write-ordered, crash-consistent volume.

Given that both UFS logging and SNDR are (near) perfect (or there would 
be a flood of escalations), this issue in all cases I''ve seen to date, 
is that the SNDR primary volume being replicated is mounted with UFS 
logging enable, but the SNDR secondary is not mounted with UFS logging 
enabled. Once this condition happens, the problem can be resolved by 
fixing /etc/vfstab to correct the inconsistent mount options, and then 
performing an SNDR update sync.
>
> If you want a live remote replication facility, it _NEEDS_ to talk to 
> the filesystem somehow. There must be a callback mechanism that the 
> filesystem could use to tell the replicator "and from exactly now on 
> you start replicating". The only entity which can truly give this 
> signal is the filesystem itself.
There is an RFE against SNDR for something called "in-line PIT". I
hope
that this work will get done soon.
>
> And no, that _not_ when the filesystem does a "flush write cache"
> ioctl. Or when the user has just issued a "sync" command or
similar.
> For ZFS, it''d be when a ZIL transaction is closed (as I understand
> it), for UFS it''d be when the UFS log is fully rolled.
There''s no
> notification to external entities when these two events happen.
Because ZFS is always on-disk consistent, this is not an issue. So far 
in ALL my testing with replicating ZFS with SNDR, I have not seen ZFS fail!

Of course be careful to not confuse my stated position with another 
closely related scenario. That being accessing ZFS on the remote node 
via a forced import "zpool import -f <name>", with  active SNDR 
replication, as ZFS is sure to panic the system. ZFS, unlike other 
filesystems has 0% tolerance to corrupted metadata.

Jim

> SNDR tries its best to achieve this detection, but without actually 
> _stopping_ all I/O (on UFS: writelocking), there''s a window of 
> vulnerability still open.
> And SNDR/II don''t stop filesystem I/O - by basic principle.
That''s how
> they''re sold/advertised/intended to be used.
>
> I''m all willing to see SNDR/II go open - we could finally work
these
> issues !
>
> FrankH.
>
>>
>> I think the best of both worlds approach would be to let SNDR plug-in 
>> to ZFS along the same lines the crypto stuff will be able to plug in, 
>> different compression types, etc. There once was a slide that showed 
>> how that worked....or I''m hallucinating again.
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Frank Hofmann

2007-Feb-05 13:01 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

On Mon, 5 Feb 2007, Jim Dunham wrote:
> Frank,
>> On Fri, 2 Feb 2007, Torrey McMahon wrote:
>> 
>>> Jason J. W. Williams wrote:
>>>> Hi Jim,
>>>> 
>>>> Thank you very much for the heads up. Unfortunately, we need
the
>>>> write-cache enabled for the application I was thinking of
combining
>>>> this with. Sounds like SNDR and ZFS need some more soak time
together
>>>> before you can use both to their full potential together?
>>> 
>>> Well...there is the fact that SNDR works with other FS other then
ZFS.
>>> (Yes, I know this is the ZFS list.) Working around architectural
issues
>>> for ZFS and ZFS alone might cause issues for others.
>> 
>> SNDR has some issues with logging UFS as well. If you start a SNDR live
>> copy on an active logging UFS (not _writelocked_), the UFS log state
may
>> not be copied consistently.
>
> Treading "very" carefully, UFS logging may have issues with being
replicated,
> not the other way around. SNDR replication (after synchronizing) maintains
a
> write-order consistent volume, thus if there is an issue with UFS logging 
> being able to access an SNDR secondary, then UFS logging will also have 
> issues with accessing a volume after Solaris crashes. The end result of 
> Solaris crashing, or SNDR replication stopping, is a write-ordered, 
> crash-consistent volume.
Except that you''re not getting user data consistency - because UFS
logging
only does the write-ordered crash consistency for metadata.

In other words, it''s possible with UFS logging to see metadata changes 
(file growth/shrink, filling of holes in sparse files) that do not match 
the file contents - AFTER crash recovery.

To get full consistency of data and metadata across crashes / replication 
termination, with a replicator underneath, the filesystem needs a way of 
telling the replicator "and now start/stop replicating please". For
the
filesystem to barrier.

I''m not saying SNDR isn''t doing a good job. I''m just
saying it could do a
perfect job if it integrated in this way with the filesystem on top. If 
there were ''start/stop'' hooks.

II is a different matter again. It had, for some time, don''t know if 
that''s still true, a window where it would EIO writes when enabling the
image. Neither UFS logging nor ZFS very much like being told "this 
critical write of yours errored out".

FrankH.
>
> Given that both UFS logging and SNDR are (near) perfect (or there would be
a
> flood of escalations), this issue in all cases I''ve seen to date,
is that the
> SNDR primary volume being replicated is mounted with UFS logging enable,
but
> the SNDR secondary is not mounted with UFS logging enabled. Once this 
> condition happens, the problem can be resolved by fixing /etc/vfstab to 
> correct the inconsistent mount options, and then performing an SNDR update 
> sync.
>
>> 
>> If you want a live remote replication facility, it _NEEDS_ to talk to
the
>> filesystem somehow. There must be a callback mechanism that the
filesystem
>> could use to tell the replicator "and from exactly now on you
start
>> replicating". The only entity which can truly give this signal is
the
>> filesystem itself.
>
> There is an RFE against SNDR for something called "in-line PIT".
I hope that
> this work will get done soon.
>
>> 
>> And no, that _not_ when the filesystem does a "flush write
cache" ioctl. Or
>> when the user has just issued a "sync" command or similar.
>> For ZFS, it''d be when a ZIL transaction is closed (as I
understand it), for
>> UFS it''d be when the UFS log is fully rolled. There''s
no notification to
>> external entities when these two events happen.
>
> Because ZFS is always on-disk consistent, this is not an issue. So far in
ALL
> my testing with replicating ZFS with SNDR, I have not seen ZFS fail!
>
> Of course be careful to not confuse my stated position with another closely
> related scenario. That being accessing ZFS on the remote node via a forced 
> import "zpool import -f <name>", with  active SNDR
replication, as ZFS is
> sure to panic the system. ZFS, unlike other filesystems has 0% tolerance to
> corrupted metadata.
>
> Jim
>
>
>> SNDR tries its best to achieve this detection, but without actually 
>> _stopping_ all I/O (on UFS: writelocking), there''s a window of
>> vulnerability still open.
>> And SNDR/II don''t stop filesystem I/O - by basic principle.
That''s how
>> they''re sold/advertised/intended to be used.
>> 
>> I''m all willing to see SNDR/II go open - we could finally work
these issues
>> !
>> 
>> FrankH.
>> 
>>> 
>>> I think the best of both worlds approach would be to let SNDR
plug-in to
>>> ZFS along the same lines the crypto stuff will be able to plug in, 
>>> different compression types, etc. There once was a slide that
showed how
>>> that worked....or I''m hallucinating again.
>>> 
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>> 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Frank Hofmann

2007-Feb-05 13:09 UTC

head link

[zfs-discuss] Project Proposal: Availability Suite

Btw, in case that gets lost between my devil''s advocatism:
A happy +1 from me for the proposal !

FrankH.

zfs discuss - Jan 2007 - Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite

[zfs-discuss] Project Proposal: Availability Suite