thr3ads.net - zfs discuss - [zfs-discuss] Real time mirroring [Feb 2008]

If this information is useful, please help other people find it:
Share via:

J.P. King

2008-Feb-08 08:25 UTC

[zfs-discuss] Real time mirroring

Someone suggested an idea, which the more I think about the less insane it 
sounds.  I thought I would ask the assembled masses to see if anyone had 
tried anything like this, and how successful they had been.

I''ll start with the simplest variant of the solution, but there are 
potentially subtleties which could be applied.

Take 3 machines, for the same of argument 2 x4500s and an x4100 as a head 
unit.

Export the storage from each of the x4500s by making it an iSCSI target.
Import the storage onto the x4100 making it an iSCSI initiator.

Using ZFS (and I assume this is preferable to Solaris Volume manager) 
set up a mirror between the two sets of storage.

Assuming that works, one of the two servers can be moved to a different 
site, and you now have real time, cross site mirroring of data.

For added tweaks I believe that I can arrange to have two head units
so that I can do something resembling failover of data, if not necessarily 
instantaneously.

The only issue I haven''t yet been able to come up with a solution for
in
this thought experiment is how to recover quickly from one half of the 
mirror going down.  As far as I can tell I need to re-silver the entire
half of the mirror, which could take some time.  Am I missing some clever 
trick here?

I''m interested in any input as to why this does or doesn''t
work, and I''m
especially interested to hear from anyone that has actually done something 
like this already.

Cheers,

Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support

Mertol Ozyoney

2008-Feb-08 10:15 UTC

head link

[zfs-discuss] Real time mirroring

I think I have heard something called dirty time logging being implemented
in ZFS. 



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyoney at Sun.COM



-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of J.P. King
Sent: 08 ?ubat 2008 Cuma 10:26
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] Real time mirroring


Someone suggested an idea, which the more I think about the less insane it 
sounds.  I thought I would ask the assembled masses to see if anyone had 
tried anything like this, and how successful they had been.

I''ll start with the simplest variant of the solution, but there are 
potentially subtleties which could be applied.

Take 3 machines, for the same of argument 2 x4500s and an x4100 as a head 
unit.

Export the storage from each of the x4500s by making it an iSCSI target.
Import the storage onto the x4100 making it an iSCSI initiator.

Using ZFS (and I assume this is preferable to Solaris Volume manager) 
set up a mirror between the two sets of storage.

Assuming that works, one of the two servers can be moved to a different 
site, and you now have real time, cross site mirroring of data.

For added tweaks I believe that I can arrange to have two head units
so that I can do something resembling failover of data, if not necessarily 
instantaneously.

The only issue I haven''t yet been able to come up with a solution for
in
this thought experiment is how to recover quickly from one half of the 
mirror going down.  As far as I can tell I need to re-silver the entire
half of the mirror, which could take some time.  Am I missing some clever 
trick here?

I''m interested in any input as to why this does or doesn''t
work, and I''m
especially interested to hear from anyone that has actually done something 
like this already.

Cheers,

Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Darren J Moffat

2008-Feb-08 10:59 UTC

head link

[zfs-discuss] Real time mirroring

J.P. King wrote:> Someone suggested an idea, which the more I think about the less insane it 
> sounds.  I thought I would ask the assembled masses to see if anyone had 
> tried anything like this, and how successful they had been.
> 
> I''ll start with the simplest variant of the solution, but there
are
> potentially subtleties which could be applied.
> 
> Take 3 machines, for the same of argument 2 x4500s and an x4100 as a head 
> unit.
> 
> Export the storage from each of the x4500s by making it an iSCSI target.
> Import the storage onto the x4100 making it an iSCSI initiator.
> 
> Using ZFS (and I assume this is preferable to Solaris Volume manager) 
> set up a mirror between the two sets of storage.
Remember to also deploy IPsec to protect the iSCSI traffic.  You want at 
least IPsec with AH to get integrity protection on the wire and for 
cross site you likely what ESP+Auth as well.

-- 
Darren J Moffat

J.P. King

2008-Feb-08 11:21 UTC

head link

[zfs-discuss] Real time mirroring

> Remember to also deploy IPsec to protect the iSCSI traffic.  You want at 
> least IPsec with AH to get integrity protection on the wire and for cross 
> site you likely what ESP+Auth as well.
How will this help given dark fibre between the sites?  I''m not doing
this
over a public internet!
> Darren J Moffat
Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support

J.P. King

2008-Feb-08 11:30 UTC

head link

[zfs-discuss] Real time mirroring

> I think I have heard something called dirty time logging being implemented
> in ZFS.
Thanks for the pointer.  Certainly interesting, but according to the 
talks/emails I''ve found a month or so ago ZFS "will offer"
this, so I am
guessing it isn''t there yet, and certainly not in a released version of
Solaris.

Knowing that it is (probably) on the way is still useful.
> Mertol Ozyoney
Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support

Victor Latushkin

2008-Feb-08 11:42 UTC

head link

[zfs-discuss] Real time mirroring

J.P. King wrote:>> I think I have heard something called dirty time logging being
implemented
>> in ZFS.
> 
> Thanks for the pointer.  Certainly interesting, but according to the 
> talks/emails I''ve found a month or so ago ZFS "will
offer" this, so I am
> guessing it isn''t there yet, and certainly not in a released
version of
> Solaris.
> 
> Knowing that it is (probably) on the way is still useful.
It is already there, see here

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/vdev_impl.h#130

and try full-text search for dtl in usr/src/uts/common/fs/zfs/ as well

hth,
victor

Mertol Ozyoney

2008-Feb-08 11:45 UTC

head link

[zfs-discuss] Real time mirroring

What is the procedure for enabling DTL ? 

PS: I am no unix guru



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyoney at Sun.COM



-----Original Message-----
From: Victor.Latushkin at Sun.COM [mailto:Victor.Latushkin at Sun.COM] 
Sent: 08 ?ubat 2008 Cuma 13:42
To: J.P. King
Cc: Mertol Ozyoney; zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] Real time mirroring

J.P. King wrote:>> I think I have heard something called dirty time logging being
implemented>> in ZFS.
> 
> Thanks for the pointer.  Certainly interesting, but according to the 
> talks/emails I''ve found a month or so ago ZFS "will
offer" this, so I am
> guessing it isn''t there yet, and certainly not in a released
version of
> Solaris.
> 
> Knowing that it is (probably) on the way is still useful.
It is already there, see here

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/
zfs/sys/vdev_impl.h#130

and try full-text search for dtl in usr/src/uts/common/fs/zfs/ as well

hth,
victor

Darren J Moffat

2008-Feb-08 11:46 UTC

head link

[zfs-discuss] Real time mirroring

J.P. King wrote:>> Remember to also deploy IPsec to protect the iSCSI traffic.  You want
at
>> least IPsec with AH to get integrity protection on the wire and for
cross
>> site you likely what ESP+Auth as well.
> 
> How will this help given dark fibre between the sites?  I''m not
doing this
> over a public internet!
The IPsec AH is to ensure that you don''t get corruption on the wire - 
this is especially important if the iSCSI targets are not ZVOLs but even 
then I''d highly recommend it.  If you are happy with the physical 
security of your cable then you don''t need the ESP.

-- 
Darren J Moffat

Ross

2008-Feb-08 12:32 UTC

head link

[zfs-discuss] Real time mirroring

Heh, it might have been me who suggested that.  I''m testing the idea
out at the moment, but being new to Solaris it''s taking some time.

So far I''ve confirmed that you can import iSCSI volumes to ZFS fine,
but you need to use static discovery.  If you use sendtargets, it breaks when
devices go offline (hangs iSCSI and ZFS and then Solaris won''t boot). 
I''ve also got a basic cluster running with HA-ZFS mirroring a pair of
iSCSI disks, with HA-NFS running on top of that.  That appears to work fine too,
and is pretty reliable.

In terms of recovery time after one half of the mirror going down, I thought ZFS
already had that feature - it was one of the things I read that gave me this
idea in the first place.  Have a look at page 15 of this presentation, it
specifically says "a 5 second outage takes 5 seconds to repair":
http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf

I read that to understand that if the iSCSI server breaks but is repairable, you
will only need to re-sync the data that has changed.  Of course, if the whole
thing dies you have rather a lot of data to shift around, but if you''re
running ZFS with dual parity raid on the x4500''s, the chances are
you''ll only need to do that when hell freezes over :)

I''m doing my level best to kill our setup at the moment.  I''ve
been pulling the (virtual) power on the iSCSI servers, resilvering ZFS, and
swopping ZFS between the two cluster nodes.  So far I''ve had a few
teething problems but it''s always come back online and I''ve
never lost any data.  Even swopping active nodes in the cluster while iSCSI
devices are offline isn''t a problem, but I do have a lot more stress
testing to do.

The latest trick is that I''ve now got 5 Solaris boxes running under
VMware (2x iSCSI servers, 2x Cluster, 1x client), and I''m about to
test:
VMware -> Solaris -> ZFS pool -> iSCSI -> Solaris Cluster ->
HA-ZFS -> HA-NFS -> VMware

Yes, Vmware is quite happy accessing an NFS store hosted within itself, although
I''m yet to test how it handles a cluster node failure.  I''m
going to test that, and then host an XP desktop on the NFS share and see how
performance compares to a desktop on native storage.  I figure that will give me
a reasonable idea as to how much overhead this is adding :)

One of the main reasons I''m testing with VMware is that I plan to
access the iSCSI storage on the Thumpers via a Solaris machine hosted under
VMware.  That way I can connect directly to it from other virtual servers and
take advantage of the 64Gbps speed and low latency of the virtual network.  It
means mirroring the Thumpers shouldn''t add any noticable latency to the
traffic.

That''s about the extent of my progress so far.  Would love to hear your
feedback if you''re testing this too.
 
 
This message posted from opensolaris.org

Ross

2008-Feb-11 12:19 UTC

head link

[zfs-discuss] Real time mirroring

Found my first problems with this today.  The ZFS mirror appears to work fine,
but if you disconnect one of the iSCSI targets it hangs for 5 mins or more. 
I''m also seeing very concerning behaviour when attempting to re-attach
the missing disk.

My test scenario is:
 - Two 35GB iSCSI targets are being shared using ZFS shareiscsi=on
 - They are imported to a 3rd Solaris box and used to create a mirrored ZFS pool
 - I use that to mount a NFS share, and connected to that with VMware ESX server

My first test was to clone a virtual machine onto the new volume.  That appeared
to work fine, so I decided to test the mirroring.  I started another clone
operation then powered down one of the iSCSI targets.  Well, the clone operation
seemed to hang as soon as I did that, so I ran "zpool status" to see
what was going on.  The news wasn''t good:  That hung too.

Nothing happened in either window for a good 5 minutes, then ESX popped up with
an error saying "the virtual disk is either corrupted or not a supported
format", and at the exact same time the zpool status command completed, but
showing that all the drives were still ONLINE.

I immediately re-ran zpool status, now it reported that one iSCSI was now
offline and the pool was running in a degraded state.

So, for some reason it''s taken 5 minutes for the iSCSI device to go
offline, it''s locked up ZFS for that entire time, and ZFS reported the
wrong status the first time around too.

The only good news is that now that ZFS is in a degraded state I can start the
clone operation again and it completes fine with just half of the mirror
available.

Next, I powered on the missing server, checked "format < /dev/null"
to ensure the drives had re-connected, and used "zpool online" to
re-attach the missing disk.  So far it''s taken over an hour to attempt
to resilver files from a 10 minute copy, and the progress report is up and down
like a yo-yo.  The progress reporting from ZFS so far has been:
 - 2.25% done, 0h13m to go
 - 7.20% done, 0h12m to go
 - 6.14% done, 0h8m to go    (odd, how does it go down?)
 ...
 - 78.50% done, 0h2m to go
 - 41.67% done, 0h8m to go   (huh?)
 ...
 - 72.45% done, 0h3m to go
 - 42.42% done, 0h9m to go

Getting concerned now, I''m actually wondering if this is ever going to
complete, and I have no idea if these problems are ZFS or iSCSI related.
 
 
This message posted from opensolaris.org

Ross

2008-Feb-11 12:38 UTC

head link

[zfs-discuss] Real time mirroring

Well 5 minutes after posting that the resilver completed.  However despite it
saying that the "resilver completed with 0 errors" ten minutes ago,
the device still shows as unavailable, and my pool is still degraded.
 
 
This message posted from opensolaris.org

Ross

2008-Feb-11 13:24 UTC

head link

[zfs-discuss] Real time mirroring

Well, I got it working, but not in a tidy way.  I''m running HA-ZFS
here, so I moved the ZFS pool over to the other node in the cluster.  That had
exactly the same problems however, the iSCSI disks were unavailable.

Then I found an article from November 2006
(http://web.ivy.net/~carton/oneNightOfWork/20061119-carton.html), saying that
the iSCSI initiator won''t reconnect until you reboot.  I rebooted one
node of the cluster, then swopped ZFS back over to there and Voila!  Fully
working mirrored storage again.

So I guess it''s an iSCSI initiator problem in that it doesn''t
reconnect properly to a rebooted target, but it''s not a particularly
stable solution at this stage.
 
 
This message posted from opensolaris.org

justin

2008-Feb-12 03:09 UTC

head link

[zfs-discuss] Real time mirroring

Have you looked at AVS? (http://opensolaris.org/os/project/avs/)
 
 
This message posted from opensolaris.org

zfs discuss - Feb 2008 - Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring

[zfs-discuss] Real time mirroring