thr3ads.net - zfs discuss - [zfs-discuss] ZFS over multiple iSCSI targets [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Matthew Plumb

2008-Sep-07 02:25 UTC

[zfs-discuss] ZFS over multiple iSCSI targets

I currently use a Linux server w/ 8 disks (approx. 3Tb) as an NFS/SAMBA/netatalk
fileserver. I have a matching Linux server as a backup, using nightly rsync jobs
to keep the backup current. I use LVM, so replacing a disk is doable, but kind
of a pain. Also, I really wanted to use encrypted disks, but found that even NFS
speeds were reduced on my hardware.

So, the idea I have now is to use my Linux machines to encrypt the disks and
export them as iSCSI LUNs, and use a third server running opensolaris to create
a ZFS pool from the iSCSI LUNs. Assuming the iSCSI traffic is on a separate
network, this seems like it should accomplish my goals of making adding disk
easier and ensuring everything on disk is always encrypted. Does this plan make
sense? Any recommendations on how best to use the disk I have to ensure I have a
safe backup strategy?
--
This message posted from opensolaris.org

Miles Nordin

2008-Sep-07 06:29 UTC

head link

[zfs-discuss] ZFS over multiple iSCSI targets

>>>>> "mp" == Matthew Plumb <solaris at
reality-based.com> writes:
    mp> how best to use the disk I have to ensure I have a safe backup
    mp> strategy?

continue using rsync between a ZFS pool and an LVM2 pool.  At the very
least, have two ZFS pools.

For ZFS over iSCSI, have some zpool-layer redundancy because ZFS seems
to be far more vulnerable to corruption if the redundancy is below the
iSCSI layer, especially when the iSCSI targets reboot and ZFS does not.

I get some strange livelock-ish behavior with heavily-loaded Linux IET
targets, so set up something that you can test, but something you can
back out of if, after a month or two, you find it''s not stable.

let me know how it goes.  I want to try dm_crypt under iSCSI, too, as
soon as my VIA board finally arrives.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080907/1f937808/attachment.bin>

Ross

2008-Sep-07 08:52 UTC

head link

[zfs-discuss] ZFS over multiple iSCSI targets

I was looking into something like that last year, I was mirroring two iSCSI
drives using ZFS.  The only real problem I found was that ZFS hangs the pool for
3 minutes if an iSCSI device gets disconnected,  unfortunately ZFS waits for the
iSCSI timeout before it realises something has happened.  After the 3 minutes it
did offline the device and carry on working with the remaining one.

So if you don''t mind a 3 minute wait if something goes wrong I think it
will work fine.

Also, since you can add mirrors at any stage with zpool attach, you can create
the ZFS pool on your backup server, transfer the data over from your live
machine, and once it''s working, reformat your live machine as an iscsi
volume and attach it to the pool.

I think the idea of doing this as separate disks is a good one if you want to
add disks later.  Just bear in mind that you won''t be able to have any
kind of raid on the individual servers, your only protection will be the
mirroring between the devices.

Exporting them as one huge iSCSI volume is good if you''re paranoid
about data loss.  You can use raid5 or 6 on the Linux servers, and then mirror
those large volumes with ZFS.  The downside is that it''s much harder to
add storage.  I don''t know if iSCSI volumes can be expanded, so you
might have to break the mirror, create a larger iSCSI volume and resync all your
data with that approach.
--
This message posted from opensolaris.org

Peter Schuller

2008-Sep-07 09:53 UTC

head link

[zfs-discuss] ZFS over multiple iSCSI targets

> Exporting them as one huge iSCSI volume is good if you''re paranoid
about data loss.  You can use raid5 or 6 on the Linux servers, and then mirror
those large volumes with ZFS.  The downside is that it''s much harder to
add storage.  I don''t know if iSCSI volumes can be expanded, so you
might have to break the mirror, create a larger iSCSI volume and resync all your
data with that approach.
Just be careful with respect to writer barriers. The software raid in
Linux does not support them with raid5/raid6, so you loose the
correctness aspect of ZFS you otherwise get even without hw raid
controllers.

(Speaking of this, can someone speak to the general state of affairs
with iSCSI with respect to write barriers? I assume Solaris does it
correctly; what about the bsd/linux stuff? Can one trust that the
iSCSI targets correctly implement cache flushing/writer barriers?)

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080907/ac336253/attachment.bin>

Miles Nordin

2008-Sep-08 17:35 UTC

head link

[zfs-discuss] ZFS over multiple iSCSI targets

>>>>> "ps" == Peter Schuller <peter.schuller at
infidyne.com> writes:
    ps> The software raid in Linux does not support [write barriers]
    ps> with raid5/raid6,

yeah i read this warning also and think it''s a good argument for not
using it.  

 http://lwn.net/Articles/283161/

With RAID5 or RAID6 there is of course the write hole.  But the way I
read it, just making soft partitions or mirrors with LVM2 breaks write
barriers too.  From the comments:

-----8<-----
Q. is there any work going on towards making the barriers work on lvm
   volumes?

A. Yes, but only single disk DM targets (e.g. linear), see:
   http://lkml.org/lkml/2008/2/15/125

   Unfortunately, this patch hasn''t been pushed upstream and the DM
   maintainer (agk) hasn''t really commented on when it might.
-----8<-----

The downside is that if you do raidz2 above iscsi, I think iSCSI makes
one TCP circuit for each target, so the congestion avoidance will work
less well.  maybe RED on the switch can help, and probably needs lots
[more than i have done] performance testing / comparison.

    ps> iSCSI with respect to write barriers?

+1.  

Does anyone even know of a good way to actually test it?  So far it
seems the only way to know if your OS is breaking write barriers is to
trade gossip and guess.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080908/3c336bb8/attachment.bin>

Tuomas Leikola

2008-Sep-08 18:55 UTC

head link

[zfs-discuss] ZFS over multiple iSCSI targets

On Mon, Sep 8, 2008 at 8:35 PM, Miles Nordin <carton at ivy.net>
wrote:>    ps> iSCSI with respect to write barriers?
>
> +1.
>
> Does anyone even know of a good way to actually test it?  So far it
> seems the only way to know if your OS is breaking write barriers is to
> trade gossip and guess.
>
Write a program that writes backwards (every other block to avoid
write merges) with and without O_DSYNC, measure speed.

I think you can also deduce driver and drive cache flush correctness
by calculating the best theoretical correct speed (which should be
really slow, one write per disc spin)

this has been on my TODO list for ages.. :(

James Andrewartha

2008-Sep-10 07:46 UTC

head link

[zfs-discuss] ZFS over multiple iSCSI targets

Tuomas Leikola wrote:> On Mon, Sep 8, 2008 at 8:35 PM, Miles Nordin <carton at ivy.net>
wrote:
>>    ps> iSCSI with respect to write barriers?
>>
>> +1.
>>
>> Does anyone even know of a good way to actually test it?  So far it
>> seems the only way to know if your OS is breaking write barriers is to
>> trade gossip and guess.
> 
> Write a program that writes backwards (every other block to avoid
> write merges) with and without O_DSYNC, measure speed.
> 
> I think you can also deduce driver and drive cache flush correctness
> by calculating the best theoretical correct speed (which should be
> really slow, one write per disc spin)
> 
> this has been on my TODO list for ages.. :(
Does the perl script at http://brad.livejournal.com/2116715.html do what you
want?

-- 
James Andrewartha

zfs discuss - Sep 2008 - ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets

[zfs-discuss] ZFS over multiple iSCSI targets