thr3ads.net - zfs discuss - [zfs-discuss] drbd using zfs send/receive? [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Jakob Praher

2006-Sep-18 15:45 UTC

[zfs-discuss] drbd using zfs send/receive?

hi everyone,

I am planning on creating a local SAN via NFS(v4) and several redundant 
nodes.

I have been using DRBD on linux before and now am asking whether some of 
  you have experience on on-demand network filesystem mirrors.

I have yet little Solaris sysadmin know how, but i am interesting 
whether there is an on-demand support for sending snapshots. I.e. not 
via a cron job, but via a kind of filesystem change notification system.

Is this mere a hack or can it be used to create some sort of failover.

E.g. DRBD has the master/slave option, which can be configured easily. 
Something like this would be nice out of the box. So in case of failure 
another node is the master and if the former master is back again, it is 
simply the slave, so that both have the current data available again.

Any pointers to solutions in that area are greatly appreaciated.

-- Jakob

Frank Cusack

2006-Sep-18 23:28 UTC

head link

[zfs-discuss] drbd using zfs send/receive?

On September 18, 2006 5:45:08 PM +0200 Jakob Praher <jp at hapra.at>
wrote:> hi everyone,
>
> I am planning on creating a local SAN via NFS(v4) and several redundant
nodes.
huh.  How do you create a SAN with NFS?
> I have been using DRBD on linux before and now am asking whether some of  
you have experience on
> on-demand network filesystem mirrors.
>
> I have yet little Solaris sysadmin know how, but i am interesting whether
there is an on-demand
> support for sending snapshots. I.e. not via a cron job, but via a kind of
filesystem change
> notification system.
AFAIK, Solaris does not export file change notification to userland in
any way that would be useful for on-demand filesystem replication.  From
looking at drbd for 5 minutes, it looks like the kind of notification
that windows/linux/macos provides isn''t what drbd uses; it does BLOCK
LEVEL replication, and part of the software is a kernel module to export
that data to userspace.  It sounds like that distinction doesn''t matter
for what you are trying to achieve, and I believe that this block-by-block
duplication isn''t a great idea for zfs anyway.  It might be neat if zfs
could inform userland of each new txg.
> Is this mere a hack or can it be used to create some sort of failover.
>
> E.g. DRBD has the master/slave option, which can be configured easily.
Something like this would
> be nice out of the box. So in case of failure another node is the master
and if the former master
> is back again, it is simply the slave, so that both have the current data
available again.
>
> Any pointers to solutions in that area are greatly appreaciated.
See if <http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_now_with>
comes close.

I have 2 setups, one using SC 3.2 with a SAN (both systems can access
the same filesystem, yes it''s not as redundant as a remote node and
remote filesystem, but it''s for HA not DR.  I could add another JBOD
to the SAN and configure zfs to mirror between the two enclosures to
get rid of the SPoF of the JBOD backplane/midplane, but it''s not
worth it.

The other setup is using my own cron script (zfs send | zfs recv) to
send snapshots to a "remote" (just another server in the same rack)
host.  This is for a service that also has very high availability
requirements but where I can''t afford shared storage.  I do a homegrown
heartbeat and failover thing.  I''m looking at replacing the cron script
with the SMF service linked above, but I''m in no rush since the cron
job
works quite well.

If zfs is otherwise a good solution for you, you might want to consider
if you really need true on-demand replication.  Maybe 5-minute or even
1-minute recency is good enough.  I would imagine that you don''t
actually
get too much better than 30s with drbd anyway, since outside of fsync()
data doesn''t actually make it to disk (and then replicated by drbd)
more frequently than that for some generic application.

-frank

Torrey McMahon

2006-Sep-19 19:15 UTC

head link

[zfs-discuss] drbd using zfs send/receive?

Frank Cusack wrote:> On September 18, 2006 5:45:08 PM +0200 Jakob Praher <jp at hapra.at>
wrote:
>> hi everyone,
>>
>> I am planning on creating a local SAN via NFS(v4) and several 
>> redundant nodes.
>
> huh. How do you create a SAN with NFS?

Not to get into a semantic holy way on acronyms but, in the past, I have 
seen NFS grouped under the SAN umbrella. Most people hear SAN and think 
FC block but recall that the "S" stands for Storage. Not very common
but
it happens.

-- 
Torrey McMahon
Sun Microsystems Inc.

Jakob Praher

2006-Sep-21 08:48 UTC

head link

[zfs-discuss] Re: drbd using zfs send/receive?

Frank Cusack wrote:> On September 18, 2006 5:45:08 PM +0200 Jakob Praher <jp at hapra.at>
wrote:
> huh.  How do you create a SAN with NFS?Sorry. Okay it would be Network Attached Sotrage not the other way round 
. I guess you are right.

BUT if we are at discussing NFS for distributed stroage: What are your 
guys performance data for NFSv4 as a storage node. How well does the 
current Solaris NFSv4 stack interoperate with the Linux stack?
Would you go for that?

What about iSCSI on top of ZFS? is that an option. I did a research on 
iSCSI vs NFSv4 once and I found out that the overhead for transproting 
the fs metadata (in the NFSv4 case) is not the real problem for many 
szenarios. Especially the COMPOUND messages should help here.
> 
>> I have been using DRBD on linux before and now am asking whether some 
>> of   you have experience on
>> on-demand network filesystem mirrors.
>>
> 
> AFAIK, Solaris does not export file change notification to userland in
> any way that would be useful for on-demand filesystem replication.  From
> looking at drbd for 5 minutes, it looks like the kind of notification
> that windows/linux/macos provides isn''t what drbd uses; it does
BLOCK
> LEVEL replication, and part of the software is a kernel module to export
> that data to userspace.  It sounds like that distinction doesn''t
matter
> for what you are trying to achieve, and I believe that this block-by-block
> duplication isn''t a great idea for zfs anyway.  It might be neat
if zfs
> could inform userland of each new txg.
> yes. exactly. It is a block device driver and that replicates. So it 
sits right underneath Linux''s VFS.
Okay that is something i wanted to know. Are there any good heartbeat 
control apps for Solaris out there? I mean if i want to have failover 
(even if it is a little bit cheap) it should detect failures and react 
accordingly. Switching from Sender to Receiver should not be difficult 
given that all you need is to make ZFS snapshots. (and that is really 
cheap in ZFS).

>> Is this mere a hack or can it be used to create some sort of failover.
>>
>> E.g. DRBD has the master/slave option, which can be configured easily. 
>> Something like this would
>> be nice out of the box. So in case of failure another node is the 
>> master and if the former master
>> is back again, it is simply the slave, so that both have the current 
>> data available again.
>>
>> Any pointers to solutions in that area are greatly appreaciated.
> 
> See if
<http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_now_with>
> comes close.
> 
> I have 2 setups, one using SC 3.2 with a SAN (both systems can access
> the same filesystem, yes it''s not as redundant as a remote node
and
> remote filesystem, but it''s for HA not DR.  I could add another
JBOD
> to the SAN and configure zfs to mirror between the two enclosures to
> get rid of the SPoF of the JBOD backplane/midplane, but it''s not
> worth it.
> JBOD, SPoF - what are these things?
> The other setup is using my own cron script (zfs send | zfs recv) to
> send snapshots to a "remote" (just another server in the same
rack)
> host.  This is for a service that also has very high availability
> requirements but where I can''t afford shared storage.  I do a
homegrown
> heartbeat and failover thing.  I''m looking at replacing the cron
script
> with the SMF service linked above, but I''m in no rush since the
cron job
> works quite well.
> 
> If zfs is otherwise a good solution for you, you might want to consider
> if you really need true on-demand replication.  Maybe 5-minute or even
> 1-minute recency is good enough.  I would imagine that you don''t
actually
> get too much better than 30s with drbd anyway, since outside of fsync()
> data doesn''t actually make it to disk (and then replicated by
drbd)
> more frequently than that for some generic application.
Okay. I think zfs is nice. I am using xfs+lvm2 on my linux boxes so far. 
This works nice too.

SMF is the init.d replacement of solaris, right? What would that look 
like. What would SMF do, but restart your app if it fails? Would you 
like to have a background task running instead of kicking it on with cron?

Thanks
-- Jakob

eric kustarz

2006-Sep-21 17:35 UTC

head link

[zfs-discuss] Re: drbd using zfs send/receive?

Jakob Praher wrote:> Frank Cusack wrote:
>> On September 18, 2006 5:45:08 PM +0200 Jakob Praher <jp at
hapra.at> wrote:
> 
>> huh.  How do you create a SAN with NFS?
> Sorry. Okay it would be Network Attached Sotrage not the other way round 
> . I guess you are right.
> 
> BUT if we are at discussing NFS for distributed stroage: What are your 
> guys performance data for NFSv4 as a storage node. How well does the 
> current Solaris NFSv4 stack interoperate with the Linux stack?
> Would you go for that?
Depends on what your application is for NFS performance.  The Solaris 
NFS stack can easily saturate a 1Gbe link doing straight I/O.  Doing 
heavy metadata operations obviously won''t be the case.

If you follow the nfsv4 IETF working group, you''ll see that the NFSv4 
people have been meeting about every 4 months for over 6 years to do 
interopability testing.  So anyone who has a serious NFSv4 stack (Sun, 
Netapp, Linux, IBM, Hummingbird, etc) interoperates great with others 
with a serious stack.

You''re best bet as always is try your specific application and see if
it
performs to what you want.
> 
> What about iSCSI on top of ZFS? is that an option. I did a research on 
> iSCSI vs NFSv4 once and I found out that the overhead for transproting 
> the fs metadata (in the NFSv4 case) is not the real problem for many 
> szenarios. Especially the COMPOUND messages should help here.
Compound messages may help in the future but i don''t think anyone has 
fully taken advantage of them yet - most VFS''s are the same, and its a 
little tricky given the historical part of the kernel.

We''ve integrated some things in the Solaris kernel to take advantage
and
have thrown around other ideas that haven''t made it quite yet.

eric
> 
>>
>>> I have been using DRBD on linux before and now am asking whether
some
>>> of   you have experience on
>>> on-demand network filesystem mirrors.
>>>
>>
>> AFAIK, Solaris does not export file change notification to userland in
>> any way that would be useful for on-demand filesystem replication. 
From
>> looking at drbd for 5 minutes, it looks like the kind of notification
>> that windows/linux/macos provides isn''t what drbd uses; it
does BLOCK
>> LEVEL replication, and part of the software is a kernel module to
export
>> that data to userspace.  It sounds like that distinction
doesn''t matter
>> for what you are trying to achieve, and I believe that this 
>> block-by-block
>> duplication isn''t a great idea for zfs anyway.  It might be
neat if zfs
>> could inform userland of each new txg.
>>
> yes. exactly. It is a block device driver and that replicates. So it 
> sits right underneath Linux''s VFS.
> Okay that is something i wanted to know. Are there any good heartbeat 
> control apps for Solaris out there? I mean if i want to have failover 
> (even if it is a little bit cheap) it should detect failures and react 
> accordingly. Switching from Sender to Receiver should not be difficult 
> given that all you need is to make ZFS snapshots. (and that is really 
> cheap in ZFS).
> 
> 
>>> Is this mere a hack or can it be used to create some sort of
failover.
>>>
>>> E.g. DRBD has the master/slave option, which can be configured 
>>> easily. Something like this would
>>> be nice out of the box. So in case of failure another node is the 
>>> master and if the former master
>>> is back again, it is simply the slave, so that both have the
current
>>> data available again.
>>>
>>> Any pointers to solutions in that area are greatly appreaciated.
>>
>> See if
<http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_now_with>
>> comes close.
>>
>> I have 2 setups, one using SC 3.2 with a SAN (both systems can access
>> the same filesystem, yes it''s not as redundant as a remote
node and
>> remote filesystem, but it''s for HA not DR.  I could add
another JBOD
>> to the SAN and configure zfs to mirror between the two enclosures to
>> get rid of the SPoF of the JBOD backplane/midplane, but it''s
not
>> worth it.
>>
> JBOD, SPoF - what are these things?
> 
>> The other setup is using my own cron script (zfs send | zfs recv) to
>> send snapshots to a "remote" (just another server in the same
rack)
>> host.  This is for a service that also has very high availability
>> requirements but where I can''t afford shared storage.  I do a
homegrown
>> heartbeat and failover thing.  I''m looking at replacing the
cron script
>> with the SMF service linked above, but I''m in no rush since
the cron job
>> works quite well.
>>
>> If zfs is otherwise a good solution for you, you might want to consider
>> if you really need true on-demand replication.  Maybe 5-minute or even
>> 1-minute recency is good enough.  I would imagine that you
don''t actually
>> get too much better than 30s with drbd anyway, since outside of fsync()
>> data doesn''t actually make it to disk (and then replicated by
drbd)
>> more frequently than that for some generic application.
> 
> Okay. I think zfs is nice. I am using xfs+lvm2 on my linux boxes so far. 
> This works nice too.
> 
> SMF is the init.d replacement of solaris, right? What would that look 
> like. What would SMF do, but restart your app if it fails? Would you 
> like to have a background task running instead of kicking it on with cron?
> 
> Thanks
> -- Jakob
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Frank Cusack

2006-Sep-22 01:12 UTC

head link

[zfs-discuss] Re: drbd using zfs send/receive?

On September 21, 2006 10:48:34 AM +0200 Jakob Praher <jp at hapra.at>
wrote:> Frank Cusack wrote:
>> On September 18, 2006 5:45:08 PM +0200 Jakob Praher <jp at
hapra.at> wrote:
>
> BUT if we are at discussing NFS for distributed stroage: What are your
> guys performance data for NFSv4 as a storage node. How well does the
> current Solaris NFSv4 stack interoperate with the Linux stack?
> Would you go for that?
My last knowledge of Linux NFSv4 vs. Solaris NFSv4 is that they don''t
interoperate.  This was about a year ago.  I''ve always had to force
Linux to v3.
>  Are there any good heartbeat
> control apps for Solaris out there?
Sun Cluster and Veritas VCS come to mind.  I use ucarp for a homegrown
solution.
> JBOD, SPoF - what are these things?
Wow.  Just a Bunch of Disks.  Single Point of Failure.
> SMF is the init.d replacement of solaris, right? What would that look
> like. What would SMF do, but restart your app if it fails? Would you like
> to have a background task running instead of kicking it on with cron?
<http://opensolaris.org/os/community/smf/>

-frank

Eric Kustarz

2006-Sep-22 01:24 UTC

head link

[zfs-discuss] Re: drbd using zfs send/receive?

Frank Cusack wrote:
> On September 21, 2006 10:48:34 AM +0200 Jakob Praher <jp at hapra.at>
wrote:
>
>> Frank Cusack wrote:
>>
>>> On September 18, 2006 5:45:08 PM +0200 Jakob Praher <jp at
hapra.at>
>>> wrote:
>>
>>
>> BUT if we are at discussing NFS for distributed stroage: What are your
>> guys performance data for NFSv4 as a storage node. How well does the
>> current Solaris NFSv4 stack interoperate with the Linux stack?
>> Would you go for that?
>
>
> My last knowledge of Linux NFSv4 vs. Solaris NFSv4 is that they
don''t
> interoperate.  This was about a year ago.  I''ve always had to
force
> Linux to v3.

They interoprate just fine.  The only weird thing is how the linux 
people implemented their pseudo-filesystem - make sure to add "fsid=0"
to your exports via ''exportfs''.  So to transition from v3 to
v4, they
require you administratively to make a change or otherwise your 
Opensolaris clients won''t be able to mount the linux server.  And yes 
they are planning on fixing it.

http://wiki.linux-nfs.org/index.php/Nfsv4_configuration
http://blogs.sun.com/macrbg/date/20051020

If you see something that doesn''t work, let us or the linux people
know.

eric

Torrey McMahon

2006-Sep-24 23:17 UTC

head link

[zfs-discuss] Re: drbd using zfs send/receive?

Jakob Praher wrote:>
>
> What about iSCSI on top of ZFS? is that an option.
It will be shortly. Check out the iSCSI target project on the 
opensolaris site.

Maybe Matching Threads

Search for more reasonably related threads

zfs discuss - Sep 2006 - drbd using zfs send/receive?

[zfs-discuss] drbd using zfs send/receive?

[zfs-discuss] drbd using zfs send/receive?

[zfs-discuss] drbd using zfs send/receive?

[zfs-discuss] Re: drbd using zfs send/receive?

[zfs-discuss] Re: drbd using zfs send/receive?

[zfs-discuss] Re: drbd using zfs send/receive?

[zfs-discuss] Re: drbd using zfs send/receive?

[zfs-discuss] Re: drbd using zfs send/receive?

Maybe Matching Threads