thr3ads.net - zfs discuss - [zfs-discuss] Two-level ZFS [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Gary Mills

2009-Feb-01 22:26 UTC

[zfs-discuss] Two-level ZFS

I realize that this configuration is not supported.  What''s required
to make it work?  Consider a file server running ZFS that exports a
volume with Iscsi.  Consider also an application server that imports
the LUN with Iscsi and runs a ZFS filesystem on that LUN.  All of the
redundancy and disk management takes place on the file server, but
end-to-end error detection takes place on the application server.
This is a reasonable configuration, is it not?

When the application server detects a checksum error, what information
does it have to return to the file server so that it can correct the
error?  The file server could then retry the read from its redundant
source, which might be a mirror or might be synthentic data from
RAID-5.  It might also indicate that a disk must be replaced.

Must any information accompany each block of data sent to the
application server so that the file server can identify the source
of the data in the event of an error?

Does this additional exchange of information fit into the Iscsi
protocol, or does it have to flow out of band somehow?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Nicolas Williams

2009-Feb-01 22:37 UTC

head link

[zfs-discuss] Two-level ZFS

On Sun, Feb 01, 2009 at 04:26:13PM -0600, Gary Mills
wrote:> I realize that this configuration is not supported.  What''s
required
It would be silly for ZFS to support zvols as iSCSI LUNs and then say
"you can put anything but ZFS on them."  I''m pretty sure
there''s no such
restriction.

(That said, I can''t speak for the ZFS team, and it''s remotely
imaginable
-barely- that there could be such a limitation, but it would be strange
indeed.)
> to make it work?  Consider a file server running ZFS that exports a
> volume with Iscsi.  Consider also an application server that imports
> the LUN with Iscsi and runs a ZFS filesystem on that LUN.  All of the
> redundancy and disk management takes place on the file server, but
> end-to-end error detection takes place on the application server.
> This is a reasonable configuration, is it not?
IMO it''s no more and no less reasonable than putting a DB on an iSCSI
LUN backed by ZFS.
> When the application server detects a checksum error, what information
> does it have to return to the file server so that it can correct the
> error?  The file server could then retry the read from its redundant
> source, which might be a mirror or might be synthentic data from
> RAID-5.  It might also indicate that a disk must be replaced.
ZFS relies on doing all the mirroring and/or RAID-Z work itself in order
to be able to recover from bad blocks.  If you put the redundancy below
ZFS (e.g., using HW RAID) then you typically lose that capability, BUT
with ZFS on the target side the target-side ZFS will be able to do the
recovery while still retaining end-to-end integrity protection.  Neat,
eh?
> Must any information accompany each block of data sent to the
> application server so that the file server can identify the source
> of the data in the event of an error?
In this case there''s no need: the target will know which blocks are bad
and automatically correct (assuming enough redundancy exists).

Nico
--

Jim Dunham

2009-Feb-02 04:44 UTC

head link

[zfs-discuss] Two-level ZFS

Gary,
> I realize that this configuration is not supported.
The configuration is supported, but not in the manner mentioned below.

If there are two (or more) instances of ZFS in the end-to-end data  
path, each instance is responsible for its own redundancy and error  
recovery. There is no in-band communication between one instance of  
ZFS and another instances of ZFS located elsewhere in the same end-to- 
end data path.

A key understanding is the a ZVOL provides the same block I/O  
semantics as any other Solaris block device, therefore when a ZVOL is  
configured as an iSCSI Target, and the target is accessed by an iSCSI  
Initiator LU, there is no awareness that a ZVOL is the backing-store  
of this LU.

Although not quite the same , this ZFS discussion list raises  
questions about configuring ZFS on RAID enable storage arrays, and how  
using simple JBODs might be a better solution.

Jim Dunham
Engineering Manager
Sun Microsystems, Inc.
Storage Platform Software Group
> What''s required
> to make it work?  Consider a file server running ZFS that exports a
> volume with Iscsi.  Consider also an application server that imports
> the LUN with Iscsi and runs a ZFS filesystem on that LUN.  All of the
> redundancy and disk management takes place on the file server, but
> end-to-end error detection takes place on the application server.
> This is a reasonable configuration, is it not?
>
> When the application server detects a checksum error, what information
> does it have to return to the file server so that it can correct the
> error?  The file server could then retry the read from its redundant
> source, which might be a mirror or might be synthentic data from
> RAID-5.  It might also indicate that a disk must be replaced.
>
> Must any information accompany each block of data sent to the
> application server so that the file server can identify the source
> of the data in the event of an error?
>
> Does this additional exchange of information fit into the Iscsi
> protocol, or does it have to flow out of band somehow?
>
> -- 
> -Gary Mills-    -Unix Support-    -U of M Academic Computing and  
> Networking-
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Gary Mills

2009-Feb-02 14:22 UTC

head link

[zfs-discuss] Two-level ZFS

On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham
wrote:> I wrote:
> 
> >I realize that this configuration is not supported.
> 
> The configuration is supported, but not in the manner mentioned below.
> 
> If there are two (or more) instances of ZFS in the end-to-end data  
> path, each instance is responsible for its own redundancy and error  
> recovery. There is no in-band communication between one instance of  
> ZFS and another instances of ZFS located elsewhere in the same end-to- 
> end data path.
I must have been unclear when I stated my question.  The
configuration, with ZFS on both systems, redundancy only on the
file server, and end-to-end error detection and correction, does
not exist.  What additions to ZFS are required to make this work?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Fajar A. Nugraha

2009-Feb-02 14:53 UTC

head link

[zfs-discuss] Two-level ZFS

On Mon, Feb 2, 2009 at 9:22 PM, Gary Mills <mills at cc.umanitoba.ca>
wrote:> On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote:
>> If there are two (or more) instances of ZFS in the end-to-end data
>> path, each instance is responsible for its own redundancy and error
>> recovery. There is no in-band communication between one instance of
>> ZFS and another instances of ZFS located elsewhere in the same end-to-
>> end data path.
>
> I must have been unclear when I stated my question.  The
> configuration, with ZFS on both systems, redundancy only on the
> file server, and end-to-end error detection and correction, does
> not exist.
>  What additions to ZFS are required to make this work?
None. It''s simply not possible.

I believe Jim already state that, but let me give some additional
comment that might be helpful.

(1) zfs can provide end-to-end protection ONLY if you use it end-end.
This means :
- no other filesystem on top of it (e.g. do not use UFS on zvol or
something similar)
- no RAID/MIRROR under it (i.e. it must have access to the disk as JBOD)

(2) When (1) is not fulfilled, you get limited protection. For example:
- when using ufs on top of zvol, or exporting zvol as iscsi, zfs can
only provide protection from zvol downwards. It can not manage
protection for whatever runs on top of it.
- when using zfs on top of HW/SW raid or iscsi, zfs can provide SOME
protection, but if certain errors occur on the HW/SW raid or iscsi it
MIGHT be unable to recover from it.

Here''s a scenario :
(1) file server (or in this case iscsi server) exports a redundant
zvol to app server
(2) app server uses the iscsi LUN to create zpool (this would be a
single-vdev pool)
(3) app server has bad memory/mobo
(4) after some writes, app server will show some files have checksum errors

In this scenario, app server can NOT correct the error (it doesn''t
have enough redundancy), and file server can NOT detect the error
(because the error is not under its control).

Now consider a second scenario
(1) file server exports several RAW DISK to app server
(2) app server uses the iscsi LUNs to create zpool with redundancy
(either mirror, raidz, or raidz2)
(3) app server has bad memory/mobo
(4) after some writes, app server will show some files have checksum errors

In this scenario, app server SHOULD be able to detect and correct the
errors properly, but it might be hard to find which one is at fault :
app server, file server, or the disks.

Third scenario
(1) file server exports several RAW DISK to app server
(2) app server uses the iscsi LUNs to create zpool with redundancy
(either mirror, raidz, or raidz2)
(3) file server has a bad disk
(4) after some writes, app server will show some files have checksum
errors, or it shows that a disk is bad

In this scenario, app server SHOULD be able to detect and correct the
errors properly, and it should be able to identify which iscsi LUN
(and consequently, which disk on file server) is broken.

Fourth scenario
(1) file server exports several redundant zvols to app server
(2) app server uses the iscsi LUNs to create zpool with redundancy
(either mirror, raidz, or raidz2)
(3) file server has a bad disk, or app server has memory errors

In this scenario, app server or file server SHOULD be able to detect
and correct the errors properly, so you get end-to-end protection.
Sort of.

Fourth scenario requires redundancy on both file and app server, while
you mentioned that you only want redundancy on file server while
running zfs on both file and app server. That''s why I said
it''s not
possible.

Hope this helps.

Regards,

Fajar

Gary Mills

2009-Feb-02 23:27 UTC

head link

[zfs-discuss] Two-level ZFS

On Mon, Feb 02, 2009 at 09:53:15PM +0700, Fajar A. Nugraha
wrote:> On Mon, Feb 2, 2009 at 9:22 PM, Gary Mills <mills at cc.umanitoba.ca>
wrote:
> > On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote:
> >> If there are two (or more) instances of ZFS in the end-to-end data
> >> path, each instance is responsible for its own redundancy and
error
> >> recovery. There is no in-band communication between one instance
of
> >> ZFS and another instances of ZFS located elsewhere in the same
end-to-
> >> end data path.
> >
> > I must have been unclear when I stated my question.  The
> > configuration, with ZFS on both systems, redundancy only on the
> > file server, and end-to-end error detection and correction, does
> > not exist.
> 
> >  What additions to ZFS are required to make this work?
> 
> None. It''s simply not possible.
You''re talking about the existing ZFS implementation; I''m not!
Is ZFS now frozen in time, with only bug being fixed?  I have
difficulty believing that.  Putting a wire between two layers
of ZFS should indeed be possible.  Think about the Amber Road
products, from the Fishworks team.  They run ZFS and export Iscsi
and FC-AL.  Redundancy and disk management is already present in
these products.  Should it be implimented again in each of the
servers that imports LUNs from these products?  I think not.
> I believe Jim already state that, but let me give some additional
> comment that might be helpful.
> 
> (1) zfs can provide end-to-end protection ONLY if you use it end-end.
> This means :
> - no other filesystem on top of it (e.g. do not use UFS on zvol or
> something similar)
> - no RAID/MIRROR under it (i.e. it must have access to the disk as JBOD)
Exactly!  That leads to my question.  What information needs to be
exchanged between ZFS on the file server and ZFS on the application
server so that end-to-end protection can be maintained with redundancy
and disk management only on the file server?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Nicolas Williams

2009-Feb-02 23:43 UTC

head link

[zfs-discuss] Two-level ZFS

On Mon, Feb 02, 2009 at 08:22:13AM -0600, Gary Mills
wrote:> On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote:
> > I wrote:
> > 
> > >I realize that this configuration is not supported.
> > 
> > The configuration is supported, but not in the manner mentioned below.
> > 
> > If there are two (or more) instances of ZFS in the end-to-end data  
> > path, each instance is responsible for its own redundancy and error  
> > recovery. There is no in-band communication between one instance of  
> > ZFS and another instances of ZFS located elsewhere in the same end-to-
> > end data path.
> 
> I must have been unclear when I stated my question.  The
> configuration, with ZFS on both systems, redundancy only on the
> file server, and end-to-end error detection and correction, does
> not exist.  What additions to ZFS are required to make this work?
This is a variant of the HW RAID thread that recurs every so often.

When redundancy happens below ZFS then ZFS cannot provide end-to-end
error correction other than by using ditto blocks.  But people using HW
RAID typically don''t want to dedicate even more space to redundancy by
using ditto blocks for data.  You still get end-to-end error detection,
of course.

ZFS layered atop ZFS across iSCSI, with the lower layer providing
redundancy, exhibits the same result.  You get end-to-end error
detection, but not end-to-end error correction.

Nico
--

zfs discuss - Feb 2009 - Two-level ZFS

[zfs-discuss] Two-level ZFS

[zfs-discuss] Two-level ZFS

[zfs-discuss] Two-level ZFS

[zfs-discuss] Two-level ZFS

[zfs-discuss] Two-level ZFS

[zfs-discuss] Two-level ZFS

[zfs-discuss] Two-level ZFS