I realize that this configuration is not supported. What''s required to make it work? Consider a file server running ZFS that exports a volume with Iscsi. Consider also an application server that imports the LUN with Iscsi and runs a ZFS filesystem on that LUN. All of the redundancy and disk management takes place on the file server, but end-to-end error detection takes place on the application server. This is a reasonable configuration, is it not? When the application server detects a checksum error, what information does it have to return to the file server so that it can correct the error? The file server could then retry the read from its redundant source, which might be a mirror or might be synthentic data from RAID-5. It might also indicate that a disk must be replaced. Must any information accompany each block of data sent to the application server so that the file server can identify the source of the data in the event of an error? Does this additional exchange of information fit into the Iscsi protocol, or does it have to flow out of band somehow? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Sun, Feb 01, 2009 at 04:26:13PM -0600, Gary Mills wrote:> I realize that this configuration is not supported. What''s requiredIt would be silly for ZFS to support zvols as iSCSI LUNs and then say "you can put anything but ZFS on them." I''m pretty sure there''s no such restriction. (That said, I can''t speak for the ZFS team, and it''s remotely imaginable -barely- that there could be such a limitation, but it would be strange indeed.)> to make it work? Consider a file server running ZFS that exports a > volume with Iscsi. Consider also an application server that imports > the LUN with Iscsi and runs a ZFS filesystem on that LUN. All of the > redundancy and disk management takes place on the file server, but > end-to-end error detection takes place on the application server. > This is a reasonable configuration, is it not?IMO it''s no more and no less reasonable than putting a DB on an iSCSI LUN backed by ZFS.> When the application server detects a checksum error, what information > does it have to return to the file server so that it can correct the > error? The file server could then retry the read from its redundant > source, which might be a mirror or might be synthentic data from > RAID-5. It might also indicate that a disk must be replaced.ZFS relies on doing all the mirroring and/or RAID-Z work itself in order to be able to recover from bad blocks. If you put the redundancy below ZFS (e.g., using HW RAID) then you typically lose that capability, BUT with ZFS on the target side the target-side ZFS will be able to do the recovery while still retaining end-to-end integrity protection. Neat, eh?> Must any information accompany each block of data sent to the > application server so that the file server can identify the source > of the data in the event of an error?In this case there''s no need: the target will know which blocks are bad and automatically correct (assuming enough redundancy exists). Nico --
Gary,> I realize that this configuration is not supported.The configuration is supported, but not in the manner mentioned below. If there are two (or more) instances of ZFS in the end-to-end data path, each instance is responsible for its own redundancy and error recovery. There is no in-band communication between one instance of ZFS and another instances of ZFS located elsewhere in the same end-to- end data path. A key understanding is the a ZVOL provides the same block I/O semantics as any other Solaris block device, therefore when a ZVOL is configured as an iSCSI Target, and the target is accessed by an iSCSI Initiator LU, there is no awareness that a ZVOL is the backing-store of this LU. Although not quite the same , this ZFS discussion list raises questions about configuring ZFS on RAID enable storage arrays, and how using simple JBODs might be a better solution. Jim Dunham Engineering Manager Sun Microsystems, Inc. Storage Platform Software Group> What''s required > to make it work? Consider a file server running ZFS that exports a > volume with Iscsi. Consider also an application server that imports > the LUN with Iscsi and runs a ZFS filesystem on that LUN. All of the > redundancy and disk management takes place on the file server, but > end-to-end error detection takes place on the application server. > This is a reasonable configuration, is it not? > > When the application server detects a checksum error, what information > does it have to return to the file server so that it can correct the > error? The file server could then retry the read from its redundant > source, which might be a mirror or might be synthentic data from > RAID-5. It might also indicate that a disk must be replaced. > > Must any information accompany each block of data sent to the > application server so that the file server can identify the source > of the data in the event of an error? > > Does this additional exchange of information fit into the Iscsi > protocol, or does it have to flow out of band somehow? > > -- > -Gary Mills- -Unix Support- -U of M Academic Computing and > Networking- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote:> I wrote: > > >I realize that this configuration is not supported. > > The configuration is supported, but not in the manner mentioned below. > > If there are two (or more) instances of ZFS in the end-to-end data > path, each instance is responsible for its own redundancy and error > recovery. There is no in-band communication between one instance of > ZFS and another instances of ZFS located elsewhere in the same end-to- > end data path.I must have been unclear when I stated my question. The configuration, with ZFS on both systems, redundancy only on the file server, and end-to-end error detection and correction, does not exist. What additions to ZFS are required to make this work? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Mon, Feb 2, 2009 at 9:22 PM, Gary Mills <mills at cc.umanitoba.ca> wrote:> On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote: >> If there are two (or more) instances of ZFS in the end-to-end data >> path, each instance is responsible for its own redundancy and error >> recovery. There is no in-band communication between one instance of >> ZFS and another instances of ZFS located elsewhere in the same end-to- >> end data path. > > I must have been unclear when I stated my question. The > configuration, with ZFS on both systems, redundancy only on the > file server, and end-to-end error detection and correction, does > not exist.> What additions to ZFS are required to make this work?None. It''s simply not possible. I believe Jim already state that, but let me give some additional comment that might be helpful. (1) zfs can provide end-to-end protection ONLY if you use it end-end. This means : - no other filesystem on top of it (e.g. do not use UFS on zvol or something similar) - no RAID/MIRROR under it (i.e. it must have access to the disk as JBOD) (2) When (1) is not fulfilled, you get limited protection. For example: - when using ufs on top of zvol, or exporting zvol as iscsi, zfs can only provide protection from zvol downwards. It can not manage protection for whatever runs on top of it. - when using zfs on top of HW/SW raid or iscsi, zfs can provide SOME protection, but if certain errors occur on the HW/SW raid or iscsi it MIGHT be unable to recover from it. Here''s a scenario : (1) file server (or in this case iscsi server) exports a redundant zvol to app server (2) app server uses the iscsi LUN to create zpool (this would be a single-vdev pool) (3) app server has bad memory/mobo (4) after some writes, app server will show some files have checksum errors In this scenario, app server can NOT correct the error (it doesn''t have enough redundancy), and file server can NOT detect the error (because the error is not under its control). Now consider a second scenario (1) file server exports several RAW DISK to app server (2) app server uses the iscsi LUNs to create zpool with redundancy (either mirror, raidz, or raidz2) (3) app server has bad memory/mobo (4) after some writes, app server will show some files have checksum errors In this scenario, app server SHOULD be able to detect and correct the errors properly, but it might be hard to find which one is at fault : app server, file server, or the disks. Third scenario (1) file server exports several RAW DISK to app server (2) app server uses the iscsi LUNs to create zpool with redundancy (either mirror, raidz, or raidz2) (3) file server has a bad disk (4) after some writes, app server will show some files have checksum errors, or it shows that a disk is bad In this scenario, app server SHOULD be able to detect and correct the errors properly, and it should be able to identify which iscsi LUN (and consequently, which disk on file server) is broken. Fourth scenario (1) file server exports several redundant zvols to app server (2) app server uses the iscsi LUNs to create zpool with redundancy (either mirror, raidz, or raidz2) (3) file server has a bad disk, or app server has memory errors In this scenario, app server or file server SHOULD be able to detect and correct the errors properly, so you get end-to-end protection. Sort of. Fourth scenario requires redundancy on both file and app server, while you mentioned that you only want redundancy on file server while running zfs on both file and app server. That''s why I said it''s not possible. Hope this helps. Regards, Fajar
On Mon, Feb 02, 2009 at 09:53:15PM +0700, Fajar A. Nugraha wrote:> On Mon, Feb 2, 2009 at 9:22 PM, Gary Mills <mills at cc.umanitoba.ca> wrote: > > On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote: > >> If there are two (or more) instances of ZFS in the end-to-end data > >> path, each instance is responsible for its own redundancy and error > >> recovery. There is no in-band communication between one instance of > >> ZFS and another instances of ZFS located elsewhere in the same end-to- > >> end data path. > > > > I must have been unclear when I stated my question. The > > configuration, with ZFS on both systems, redundancy only on the > > file server, and end-to-end error detection and correction, does > > not exist. > > > What additions to ZFS are required to make this work? > > None. It''s simply not possible.You''re talking about the existing ZFS implementation; I''m not! Is ZFS now frozen in time, with only bug being fixed? I have difficulty believing that. Putting a wire between two layers of ZFS should indeed be possible. Think about the Amber Road products, from the Fishworks team. They run ZFS and export Iscsi and FC-AL. Redundancy and disk management is already present in these products. Should it be implimented again in each of the servers that imports LUNs from these products? I think not.> I believe Jim already state that, but let me give some additional > comment that might be helpful. > > (1) zfs can provide end-to-end protection ONLY if you use it end-end. > This means : > - no other filesystem on top of it (e.g. do not use UFS on zvol or > something similar) > - no RAID/MIRROR under it (i.e. it must have access to the disk as JBOD)Exactly! That leads to my question. What information needs to be exchanged between ZFS on the file server and ZFS on the application server so that end-to-end protection can be maintained with redundancy and disk management only on the file server? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Mon, Feb 02, 2009 at 08:22:13AM -0600, Gary Mills wrote:> On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote: > > I wrote: > > > > >I realize that this configuration is not supported. > > > > The configuration is supported, but not in the manner mentioned below. > > > > If there are two (or more) instances of ZFS in the end-to-end data > > path, each instance is responsible for its own redundancy and error > > recovery. There is no in-band communication between one instance of > > ZFS and another instances of ZFS located elsewhere in the same end-to- > > end data path. > > I must have been unclear when I stated my question. The > configuration, with ZFS on both systems, redundancy only on the > file server, and end-to-end error detection and correction, does > not exist. What additions to ZFS are required to make this work?This is a variant of the HW RAID thread that recurs every so often. When redundancy happens below ZFS then ZFS cannot provide end-to-end error correction other than by using ditto blocks. But people using HW RAID typically don''t want to dedicate even more space to redundancy by using ditto blocks for data. You still get end-to-end error detection, of course. ZFS layered atop ZFS across iSCSI, with the lower layer providing redundancy, exhibits the same result. You get end-to-end error detection, but not end-to-end error correction. Nico --