I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here''s my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m27 11.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m36 11.34MB/s SMB share: 1m45 29.81MB/s Is there something obvious I''ve missed here? -- This message posted from opensolaris.org
iSCSI writes require a sync to disk for every write. SMB writes get cached in memory, therefore are much faster. I am not sure why it is so slow for reads. Have you tried comstar iSCSI? I have read in these forums that it is faster. -Scott -- This message posted from opensolaris.org
On May 26, 2010, at 5:08 AM, Matt Connolly wrote:> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: > > sh-4.0# zfs create rpool/iscsi > sh-4.0# zfs set shareiscsi=on rpool/iscsi > sh-4.0# zfs create -s -V 10g rpool/iscsi/testNB shareiscsi uses the legacy iSCSI target and is deprecated after b134. For more information see project COMSTAR.> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. > > Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. > > Here''s my test results copying 3GB data: > > iScsi: 44m01s 1.185MB/s > SMB share: 4m27 11.73MB/sIs the Nagle algorithm enabled? It is perhaps the most common cause of really slow iSCSI performance. Next, is the write cache enabled? You can enable or disable on both the target and initiator side. They should agree, and best performance is with the write cache enabled. -- richard> > Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: > > iScsi: 4m36 11.34MB/s > SMB share: 1m45 29.81MB/s > > > Is there something obvious I''ve missed here? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
-----Original Message----- From: Matt Connolly Sent: Wednesday, May 26, 2010 5:08 AM I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here''s my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m27 11.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m36 11.34MB/s SMB share: 1m45 29.81MB/s Is there something obvious I''ve missed here? -- Hi Matt, here is a decent post on how to setup the COMSTAR and disable the old iscsitgt service. http://toic.org/2009/11/08/opensolaris-server-with-comstar-and-zfs/ Have a great day! Geoff
> > I''ve set up an iScsi volume on OpenSolaris (snv_134) with these > commands: > > sh-4.0# zfs create rpool/iscsi > sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g > rpool/iscsi/test > > The underlying zpool is a mirror of two SATA drives. I''m connecting > from a > Mac client with global SAN initiator software, connected via Gigabit > LAN. It > connects fine, and I''ve initialiased a mac format volume on that iScsi > volume. > > Performance, however, is terribly slow, about 10 times slower than an > SMB share on the same pool. I expected it would be very similar, if > not faster > than SMB. > > Here''s my test results copying 3GB data: > > iScsi: 44m01s 1.185MB/s > SMB share: 4m27 11.73MB/s > > Reading (the same 3GB) is also worse than SMB, but only by a factor of > about > 3: > > iScsi: 4m36 11.34MB/s > SMB share: 1m45 29.81MB/s > > > Is there something obvious I''ve missed here? > -- > > Hi Matt, here is a decent post on how to setup the COMSTAR and disable > the old iscsitgt service. > > http://toic.org/2009/11/08/opensolaris-server-with-comstar-and-zfs/ > > Have a great day! > > Geoff >You should also look into block alignment as well, to prevent unnecessary hits on the drives during reads/writes. -- iMx imx at streamvia.net www.slashdevslashnull.com
On Wed, May 26, 2010 at 5:08 AM, Matt Connolly <matt.connolly.au at gmail.com> wrote:> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: > > sh-4.0# zfs create rpool/iscsi > sh-4.0# zfs set shareiscsi=on rpool/iscsi > sh-4.0# zfs create -s -V 10g rpool/iscsi/test > > The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. > > Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. > > Here''s my test results copying 3GB data: > > iScsi: ? ? ? ? ? ? ? ? ?44m01s ? ? ? ? ?1.185MB/s > SMB share: ? ? ? ? ? ? ?4m27 ? ? ? ? ? ?11.73MB/s > > Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: > > iScsi: ? ? ? ? ? ? ? ? ?4m36 ? ? ? ? ? ?11.34MB/s > SMB share: ? ? ? ? ? ? ?1m45 ? ? ? ? ? ?29.81MB/s > > > Is there something obvious I''ve missed here? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Try jumbo frames, and making sure flow control is enabled on your iSCSI switches and all network cards -- Brent Jones brent at servuhome.net
Le 27 mai 2010 ? 07:03, Brent Jones a ?crit :> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly > <matt.connolly.au at gmail.com> wrote: >> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >> >> sh-4.0# zfs create rpool/iscsi >> sh-4.0# zfs set shareiscsi=on rpool/iscsi >> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >> >> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >> >> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >> >> Here''s my test results copying 3GB data: >> >> iScsi: 44m01s 1.185MB/s >> SMB share: 4m27 11.73MB/s >> >> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >> >> iScsi: 4m36 11.34MB/s >> SMB share: 1m45 29.81MB/s >><cleaning up some old mail> Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. Put a filesystem on top of iscsi and try again. For writes, iSCSI is synchronous and SMB is not. -r>> >> Is there something obvious I''ve missed here? >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > Try jumbo frames, and making sure flow control is enabled on your > iSCSI switches and all network cards > > -- > Brent Jones > brent at servuhome.net > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais <roch.bourbonnais at sun.com> wrote:> > Le 27 mai 2010 ? 07:03, Brent Jones a ?crit : > >> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >> <matt.connolly.au at gmail.com> wrote: >>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>> >>> sh-4.0# zfs create rpool/iscsi >>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>> >>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >>> >>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >>> >>> Here''s my test results copying 3GB data: >>> >>> iScsi: 44m01s 1.185MB/s >>> SMB share: 4m27 11.73MB/s >>> >>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >>> >>> iScsi: 4m36 11.34MB/s >>> SMB share: 1m45 29.81MB/s >>> > > <cleaning up some old mail> > > Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. > Put a filesystem on top of iscsi and try again. > > For writes, iSCSI is synchronous and SMB is not.It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. I think the ZFS developers didn''t quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn''t have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. -Ross
On 03/08/2010 22:49, Ross Walker wrote:> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais<roch.bourbonnais at sun.com> wrote: > > >> Le 27 mai 2010 ? 07:03, Brent Jones a ?crit : >> >> >>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >>> <matt.connolly.au at gmail.com> wrote: >>> >>>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>>> >>>> sh-4.0# zfs create rpool/iscsi >>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>>> >>>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >>>> >>>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >>>> >>>> Here''s my test results copying 3GB data: >>>> >>>> iScsi: 44m01s 1.185MB/s >>>> SMB share: 4m27 11.73MB/s >>>> >>>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >>>> >>>> iScsi: 4m36 11.34MB/s >>>> SMB share: 1m45 29.81MB/s >>>> >>>> >> <cleaning up some old mail> >> >> Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. >> Put a filesystem on top of iscsi and try again. >> >> For writes, iSCSI is synchronous and SMB is not. >> > It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. > > It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. > > I think the ZFS developers didn''t quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn''t have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. > >Nothing has been violated here. Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. It is just that the default is synchronous. -- Robert Milkowski http://milek.blogspot.com
On Aug 3, 2010, at 5:56 PM, Robert Milkowski <milek at task.gda.pl> wrote:> On 03/08/2010 22:49, Ross Walker wrote: >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais<roch.bourbonnais at sun.com> wrote: >> >> >>> Le 27 mai 2010 ? 07:03, Brent Jones a ?crit : >>> >>> >>>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >>>> <matt.connolly.au at gmail.com> wrote: >>>> >>>>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>>>> >>>>> sh-4.0# zfs create rpool/iscsi >>>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>>>> >>>>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >>>>> >>>>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >>>>> >>>>> Here''s my test results copying 3GB data: >>>>> >>>>> iScsi: 44m01s 1.185MB/s >>>>> SMB share: 4m27 11.73MB/s >>>>> >>>>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >>>>> >>>>> iScsi: 4m36 11.34MB/s >>>>> SMB share: 1m45 29.81MB/s >>>>> >>>>> >>> <cleaning up some old mail> >>> >>> Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. >>> Put a filesystem on top of iscsi and try again. >>> >>> For writes, iSCSI is synchronous and SMB is not. >>> >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. >> >> It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. >> >> I think the ZFS developers didn''t quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn''t have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. >> >> > Nothing has been violated here. > Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. > > So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. > It is just that the default is synchronous.Ah, ok, my experience has been with Solaris and the iscsitgt which, correct me if I am wrong, is still synchronous only. -Ross
> -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of > Robert Milkowski > Sent: Tuesday, August 03, 2010 5:57 PM > To: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] iScsi slow > > On 03/08/2010 22:49, Ross Walker wrote: > > On Aug 3, 2010, at 12:13 PM, Roch > Bourbonnais<roch.bourbonnais at sun.com> wrote: > > > > > >> Le 27 mai 2010 ? 07:03, Brent Jones a ?crit : > >> > >> > >>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly > >>> <matt.connolly.au at gmail.com> wrote: > >>> > >>>> I''ve set up an iScsi volume on OpenSolaris (snv_134) > with these commands: > >>>> > >>>> sh-4.0# zfs create rpool/iscsi > >>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi > >>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test > >>>> > >>>> The underlying zpool is a mirror of two SATA drives. I''m > connecting from a Mac client with global SAN initiator > software, connected via Gigabit LAN. It connects fine, and > I''ve initialiased a mac format volume on that iScsi volume. > >>>> > >>>> Performance, however, is terribly slow, about 10 times > slower than an SMB share on the same pool. I expected it > would be very similar, if not faster than SMB. > >>>> > >>>> Here''s my test results copying 3GB data: > >>>> > >>>> iScsi: 44m01s 1.185MB/s > >>>> SMB share: 4m27 11.73MB/s > >>>> > >>>> Reading (the same 3GB) is also worse than SMB, but only > by a factor of about 3: > >>>> > >>>> iScsi: 4m36 11.34MB/s > >>>> SMB share: 1m45 29.81MB/s > >>>> > >>>> > >> <cleaning up some old mail> > >> > >> Not unexpected. Filesystems have readahead code to > prefetch enough to cover the latency of the read request. > iSCSI only responds to the request. > >> Put a filesystem on top of iscsi and try again. > >> > >> For writes, iSCSI is synchronous and SMB is not. > >> > > It may be with ZFS, but iSCSI is neither synchronous nor > asynchronous is is simply SCSI over IP. > > > > It is the application using the iSCSI protocol that > determines whether it is synchronous, issue a flush after > write, or asynchronous, wait until target flushes. > > > > I think the ZFS developers didn''t quite understand that and > wanted strict guidelines like NFS has, but iSCSI doesn''t have > those, it is a lower level protocol than NFS is, so they > forced guidelines on it and violated the standard. > > > > > Nothing has been violated here. > Look for WCE flag in COMSTAR where you can control how a given zvol > should behave (synchronous or asynchronous). Additionally in recent > build you have zfs set sync={disabled|default|always} which > also works > with zvols. > > So you do have a control over how it is supposed to behave > and to make > it nice it is even on per zvol basis. > It is just that the default is synchronous.And if it''s synchronous, you can still accelerate performance by using L2ARC and SLOG devices, just like you can with NFS, correct? -Will
On Aug 3, 2010, at 5:56 PM, Robert Milkowski <milek at task.gda.pl> wrote:> On 03/08/2010 22:49, Ross Walker wrote: >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais<roch.bourbonnais at sun.com> wrote: >> >> >>> Le 27 mai 2010 ? 07:03, Brent Jones a ?crit : >>> >>> >>>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >>>> <matt.connolly.au at gmail.com> wrote: >>>> >>>>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>>>> >>>>> sh-4.0# zfs create rpool/iscsi >>>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>>>> >>>>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >>>>> >>>>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >>>>> >>>>> Here''s my test results copying 3GB data: >>>>> >>>>> iScsi: 44m01s 1.185MB/s >>>>> SMB share: 4m27 11.73MB/s >>>>> >>>>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >>>>> >>>>> iScsi: 4m36 11.34MB/s >>>>> SMB share: 1m45 29.81MB/s >>>>> >>>>> >>> <cleaning up some old mail> >>> >>> Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. >>> Put a filesystem on top of iscsi and try again. >>> >>> For writes, iSCSI is synchronous and SMB is not. >>> >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. >> >> It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. >> >> I think the ZFS developers didn''t quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn''t have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. >> >> > Nothing has been violated here. > Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. > > So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. > It is just that the default is synchronous.I forgot to ask, if the ZVOL is set async with WCE will it still honor a flush command from the initiator and flush those TXGs held for the ZVOL? -Ross
On 03/08/2010 23:20, Ross Walker wrote:> >> Nothing has been violated here. >> Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. >> >> So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. >> It is just that the default is synchronous. >> > Ah, ok, my experience has been with Solaris and the iscsitgt which, correct me if I am wrong, is still synchronous only. > >I don''t remember if it offered or not an ability to manipulate zvol''s WCE flag but if it didn''t then you can do it anyway as it is a zvol property. For an example see http://milek.blogspot.com/2010/02/zvols-write-cache.html -- Robert Milkowski http://milek.blogspot.com
On 04/08/2010, at 2:13, Roch Bourbonnais <roch.bourbonnais at sun.com> wrote:> > Le 27 mai 2010 ? 07:03, Brent Jones a ?crit : > >> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >> <matt.connolly.au at gmail.com> wrote: >>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>> >>> sh-4.0# zfs create rpool/iscsi >>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>> >>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >>> >>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >>> >>> Here''s my test results copying 3GB data: >>> >>> iScsi: 44m01s 1.185MB/s >>> SMB share: 4m27 11.73MB/s >>> >>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >>> >>> iScsi: 4m36 11.34MB/s >>> SMB share: 1m45 29.81MB/s >>> > > <cleaning up some old mail> > > Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. > Put a filesystem on top of iscsi and try again.As I indicated above, there is a mac filesystem on the iscsi volume. Matt.
Ross Walker writes: > On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais <roch.bourbonnais at sun.com> wrote: > > > > > Le 27 mai 2010 ?? 07:03, Brent Jones a ??crit : > > > >> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly > >> <matt.connolly.au at gmail.com> wrote: > >>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: > >>> > >>> sh-4.0# zfs create rpool/iscsi > >>> sh-4.0# zfs set shareiscsi=on rpool/iscsi > >>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test > >>> > >>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. > >>> > >>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. > >>> > >>> Here''s my test results copying 3GB data: > >>> > >>> iScsi: 44m01s 1.185MB/s > >>> SMB share: 4m27 11.73MB/s > >>> > >>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: > >>> > >>> iScsi: 4m36 11.34MB/s > >>> SMB share: 1m45 29.81MB/s > >>> > > > > <cleaning up some old mail> > > > > Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. > > Put a filesystem on top of iscsi and try again. > > > > For writes, iSCSI is synchronous and SMB is not. > > It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. > Hey Ross, Nothing to do with ZFS here, but you''re right to point out that iSCSI is neither. It was just that in the context of this test (and 99+% of iSCSI usage) it will be. SMB is not. Thus a large discrepancy on the write test. Resilient storage, by default, should expose iSCSI channels with write caches disabled. > It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. > True. > I think the ZFS developers didn''t quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn''t have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. > > -Ross > Not True. ZFS exposes LUNS (or ZVOL) and while at first we didn''t support DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you need it to be. Now in the context of iSCSI luns hosted by a resilient storage system, enabling write caches is to be used only in very specific circumstances. The situation is not symmetrical with WCE in disks of a JBOD since that can be setup with enough redundancy to deal with potential data loss. When using a resilient storage, you need to trust the storage for persistence of SCSI commands and building a resilient system on top of write cache enabled SCSI channels is not trivial. Then Matts points out As I indicated above, there is a mac filesystem on the iscsi volume. Matt. On the read side, single threaded performance is just very much controled by the readahead. Each filesystem will implement something different, the fact that you got 3X more throughput with SMB that with the Mac (HSFS+?) simply means that SMB had a 3X larger readahead buffer than HSFS. -r
On Aug 4, 2010, at 3:52 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote:> > Ross Walker writes: > >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais <roch.bourbonnais at sun.com> wrote: >> >>> >>> Le 27 mai 2010 ? 07:03, Brent Jones a ??crit : >>> >>>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >>>> <matt.connolly.au at gmail.com> wrote: >>>>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>>>> >>>>> sh-4.0# zfs create rpool/iscsi >>>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>>>> >>>>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. >>>>> >>>>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. >>>>> >>>>> Here''s my test results copying 3GB data: >>>>> >>>>> iScsi: 44m01s 1.185MB/s >>>>> SMB share: 4m27 11.73MB/s >>>>> >>>>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: >>>>> >>>>> iScsi: 4m36 11.34MB/s >>>>> SMB share: 1m45 29.81MB/s >>>>> >>> >>> <cleaning up some old mail> >>> >>> Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. >>> Put a filesystem on top of iscsi and try again. >>> >>> For writes, iSCSI is synchronous and SMB is not. >> >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. >> > > Hey Ross, > > Nothing to do with ZFS here, but you''re right to point out > that iSCSI is neither. It was just that in the context of > this test (and 99+% of iSCSI usage) it will be. SMB is > not. Thus a large discrepancy on the write test. > > Resilient storage, by default, should expose iSCSI channels > with write caches disabled.So on that note, ZFS should disable the disks'' write cache, not enable them despite ZFS''s COW properties because it should be resilient.>> It is the application using the iSCSI protocol that > determines whether it is synchronous, issue a flush after > write, or asynchronous, wait until target flushes. >> > > True. > >> I think the ZFS developers didn''t quite understand that > and wanted strict guidelines like NFS has, but iSCSI doesn''t > have those, it is a lower level protocol than NFS is, so > they forced guidelines on it and violated the standard. >> >> -Ross >> > > Not True. > > > ZFS exposes LUNS (or ZVOL) and while at first we didn''t support > DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you > need it to be.I asked this question earlier, but got no answer: while an iSCSI target is presented WCE does it respect the flush command?> Now in the context of iSCSI luns hosted by a resilient > storage system, enabling write caches is to be used only in > very specific circumstances. The situation is not symmetrical > with WCE in disks of a JBOD since that can be setup with > enough redundancy to deal with potential data loss. When > using a resilient storage, you need to trust the storage for > persistence of SCSI commands and building a resilient system > on top of write cache enabled SCSI channels is not trivial.Not true, advertise WCE, support flush and tagged command queuing and the initiator will be able to use the resilient storage appropriate for it''s needs. -Ross
Ross Asks: So on that note, ZFS should disable the disks'' write cache, not enable them despite ZFS''s COW properties because it should be resilient. No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal with contained failures (lost state) of the disk write cache. It can then export LUNS that have WC enabled or disabled. But if we enable the WC on the exported LUNS, then the consumer of these LUNS must be able to say the same. The discussion at that level then needs to focus on failure groups. Ross also Said : I asked this question earlier, but got no answer: while an iSCSI target is presented WCE does it respect the flush command? Yes. I would like to say "obviously" but it''s been anything but. -r Ross Walker writes: > On Aug 4, 2010, at 3:52 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: > > > > > Ross Walker writes: > > > >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais <roch.bourbonnais at sun.com> wrote: > >> > >>> > >>> Le 27 mai 2010 ?? 07:03, Brent Jones a ????crit : > >>> > >>>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly > >>>> <matt.connolly.au at gmail.com> wrote: > >>>>> I''ve set up an iScsi volume on OpenSolaris (snv_134) with these commands: > >>>>> > >>>>> sh-4.0# zfs create rpool/iscsi > >>>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi > >>>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test > >>>>> > >>>>> The underlying zpool is a mirror of two SATA drives. I''m connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I''ve initialiased a mac format volume on that iScsi volume. > >>>>> > >>>>> Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. > >>>>> > >>>>> Here''s my test results copying 3GB data: > >>>>> > >>>>> iScsi: 44m01s 1.185MB/s > >>>>> SMB share: 4m27 11.73MB/s > >>>>> > >>>>> Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: > >>>>> > >>>>> iScsi: 4m36 11.34MB/s > >>>>> SMB share: 1m45 29.81MB/s > >>>>> > >>> > >>> <cleaning up some old mail> > >>> > >>> Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. > >>> Put a filesystem on top of iscsi and try again. > >>> > >>> For writes, iSCSI is synchronous and SMB is not. > >> > >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. > >> > > > > Hey Ross, > > > > Nothing to do with ZFS here, but you''re right to point out > > that iSCSI is neither. It was just that in the context of > > this test (and 99+% of iSCSI usage) it will be. SMB is > > not. Thus a large discrepancy on the write test. > > > > Resilient storage, by default, should expose iSCSI channels > > with write caches disabled. > > > So on that note, ZFS should disable the disks'' write cache, not enable them despite ZFS''s COW properties because it should be resilient. > > > >> It is the application using the iSCSI protocol that > > determines whether it is synchronous, issue a flush after > > write, or asynchronous, wait until target flushes. > >> > > > > True. > > > >> I think the ZFS developers didn''t quite understand that > > and wanted strict guidelines like NFS has, but iSCSI doesn''t > > have those, it is a lower level protocol than NFS is, so > > they forced guidelines on it and violated the standard. > >> > >> -Ross > >> > > > > Not True. > > > > > > ZFS exposes LUNS (or ZVOL) and while at first we didn''t support > > DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you > > need it to be. > > I asked this question earlier, but got no answer: while an iSCSI target is presented WCE does it respect the flush command? > > > Now in the context of iSCSI luns hosted by a resilient > > storage system, enabling write caches is to be used only in > > very specific circumstances. The situation is not symmetrical > > with WCE in disks of a JBOD since that can be setup with > > enough redundancy to deal with potential data loss. When > > using a resilient storage, you need to trust the storage for > > persistence of SCSI commands and building a resilient system > > on top of write cache enabled SCSI channels is not trivial. > > Not true, advertise WCE, support flush and tagged command queuing and the initiator will be able to use the resilient storage appropriate for it''s needs. > > -Ross >
On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote:> > > Ross Asks: > So on that note, ZFS should disable the disks'' write cache, > not enable them despite ZFS''s COW properties because it > should be resilient. > > No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal > with contained failures (lost state) of the disk write cache. > > It can then export LUNS that have WC enabled or > disabled. But if we enable the WC on the exported LUNS, then > the consumer of these LUNS must be able to say the same. > The discussion at that level then needs to focus on failure groups. > > > Ross also Said : > I asked this question earlier, but got no answer: while an > iSCSI target is presented WCE does it respect the flush > command? > > Yes. I would like to say "obviously" but it''s been anything > but.Sorry to probe further, but can you expand on but... Just if we had a bunch of zvols exported via iSCSI to another Solaris box which used them to form another zpool and had WCE turned on would it be reliable? -Ross
Ross Walker writes: > On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: > > > > > > > Ross Asks: > > So on that note, ZFS should disable the disks'' write cache, > > not enable them despite ZFS''s COW properties because it > > should be resilient. > > > > No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal > > with contained failures (lost state) of the disk write cache. > > > > It can then export LUNS that have WC enabled or > > disabled. But if we enable the WC on the exported LUNS, then > > the consumer of these LUNS must be able to say the same. > > The discussion at that level then needs to focus on failure groups. > > > > > > Ross also Said : > > I asked this question earlier, but got no answer: while an > > iSCSI target is presented WCE does it respect the flush > > command? > > > > Yes. I would like to say "obviously" but it''s been anything > > but. > > Sorry to probe further, but can you expand on but... > > Just if we had a bunch of zvols exported via iSCSI to another Solaris > box which used them to form another zpool and had WCE turned on would > it be reliable? > Nope. That''s because all the iSCSI are in the same fault domain as they share a unified back-end cache. What works, in principle, is mirroring SCSI channels hosted on different storage controllers (or N SCSI channels on N controller in a raid group). Which is why keeping the WC set to the default, is really better in general. -r > -Ross >
On Aug 4, 2010, at 12:04 PM, Roch <Roch.Bourbonnais at Sun.COM> wrote:> > Ross Walker writes: >> On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >> >>> >>> >>> Ross Asks: >>> So on that note, ZFS should disable the disks'' write cache, >>> not enable them despite ZFS''s COW properties because it >>> should be resilient. >>> >>> No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal >>> with contained failures (lost state) of the disk write cache. >>> >>> It can then export LUNS that have WC enabled or >>> disabled. But if we enable the WC on the exported LUNS, then >>> the consumer of these LUNS must be able to say the same. >>> The discussion at that level then needs to focus on failure groups. >>> >>> >>> Ross also Said : >>> I asked this question earlier, but got no answer: while an >>> iSCSI target is presented WCE does it respect the flush >>> command? >>> >>> Yes. I would like to say "obviously" but it''s been anything >>> but. >> >> Sorry to probe further, but can you expand on but... >> >> Just if we had a bunch of zvols exported via iSCSI to another Solaris >> box which used them to form another zpool and had WCE turned on would >> it be reliable? >> > > Nope. That''s because all the iSCSI are in the same fault > domain as they share a unified back-end cache. What works, > in principle, is mirroring SCSI channels hosted on > different storage controllers (or N SCSI channels on N > controller in a raid group). > > Which is why keeping the WC set to the default, is really > better in general.Well I was actually talking about two backend Solaris storage servers serving up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I have redundancy there, but want the storage to be performant, so I want the iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush requests from the front-end NFS server. Does that make sense? Is it possible? -Ross
Ross Walker writes: > On Aug 4, 2010, at 12:04 PM, Roch <Roch.Bourbonnais at Sun.COM> wrote: > > > > > Ross Walker writes: > >> On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: > >> > >>> > >>> > >>> Ross Asks: > >>> So on that note, ZFS should disable the disks'' write cache, > >>> not enable them despite ZFS''s COW properties because it > >>> should be resilient. > >>> > >>> No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal > >>> with contained failures (lost state) of the disk write cache. > >>> > >>> It can then export LUNS that have WC enabled or > >>> disabled. But if we enable the WC on the exported LUNS, then > >>> the consumer of these LUNS must be able to say the same. > >>> The discussion at that level then needs to focus on failure groups. > >>> > >>> > >>> Ross also Said : > >>> I asked this question earlier, but got no answer: while an > >>> iSCSI target is presented WCE does it respect the flush > >>> command? > >>> > >>> Yes. I would like to say "obviously" but it''s been anything > >>> but. > >> > >> Sorry to probe further, but can you expand on but... > >> > >> Just if we had a bunch of zvols exported via iSCSI to another Solaris > >> box which used them to form another zpool and had WCE turned on would > >> it be reliable? > >> > > > > Nope. That''s because all the iSCSI are in the same fault > > domain as they share a unified back-end cache. What works, > > in principle, is mirroring SCSI channels hosted on > > different storage controllers (or N SCSI channels on N > > controller in a raid group). > > > > Which is why keeping the WC set to the default, is really > > better in general. > > Well I was actually talking about two backend Solaris storage servers serving up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I have redundancy there, but want the storage to be performant, so I want the iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush requests from the front-end NFS server. > > Does that make sense? Is it possible? > Well in response to a commit (say after a file creation) then the front end server will end up sending flush write caches on both side of the iscsi mirror which will reach the backend server which will flush disk write caches. This will all work but probably not unleash performance the way you would like it to. If you setup to have the backend server not honor the backend disk flush write caches, then the 2 backend pools become at risk of corruption, mostly because the ordering of IOs around the ueberblock updates. If you have faith, then you could consider that you won''t hit 2 backend pool corruption together and rely on the frontend resilvering to rebuild the corrupted backend. I wouldn''t know how to calculate the MTTDL. -r > -Ross >
On Aug 5, 2010, at 11:10 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote:> > Ross Walker writes: >> On Aug 4, 2010, at 12:04 PM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >> >>> >>> Ross Walker writes: >>>> On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >>>> >>>>> >>>>> >>>>> Ross Asks: >>>>> So on that note, ZFS should disable the disks'' write cache, >>>>> not enable them despite ZFS''s COW properties because it >>>>> should be resilient. >>>>> >>>>> No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal >>>>> with contained failures (lost state) of the disk write cache. >>>>> >>>>> It can then export LUNS that have WC enabled or >>>>> disabled. But if we enable the WC on the exported LUNS, then >>>>> the consumer of these LUNS must be able to say the same. >>>>> The discussion at that level then needs to focus on failure groups. >>>>> >>>>> >>>>> Ross also Said : >>>>> I asked this question earlier, but got no answer: while an >>>>> iSCSI target is presented WCE does it respect the flush >>>>> command? >>>>> >>>>> Yes. I would like to say "obviously" but it''s been anything >>>>> but. >>>> >>>> Sorry to probe further, but can you expand on but... >>>> >>>> Just if we had a bunch of zvols exported via iSCSI to another Solaris >>>> box which used them to form another zpool and had WCE turned on would >>>> it be reliable? >>>> >>> >>> Nope. That''s because all the iSCSI are in the same fault >>> domain as they share a unified back-end cache. What works, >>> in principle, is mirroring SCSI channels hosted on >>> different storage controllers (or N SCSI channels on N >>> controller in a raid group). >>> >>> Which is why keeping the WC set to the default, is really >>> better in general. >> >> Well I was actually talking about two backend Solaris storage servers serving up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I have redundancy there, but want the storage to be performant, so I want the iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush requests from the front-end NFS server. >> >> Does that make sense? Is it possible? >> > > Well in response to a commit (say after a file creation) then the > front end server will end up sending flush write caches on > both side of the iscsi mirror which will reach the backend server > which will flush disk write caches. This will all work but > probably not unleash performance the way you would like it > to.> If you setup to have the backend server not honor the > backend disk flush write caches, then the 2 backend pools become at > risk of corruption, mostly because the ordering of IOs > around the ueberblock updates. If you have faith, then you > could consider that you won''t hit 2 backend pool corruption > together and rely on the frontend resilvering to rebuild the > corrupted backend.So you are saying setting WCE disables cache flush on the target and setting WCD forces a flush for every WRITE? How about a way to enable WCE on the target, yet still perform cache flush when the initiator requests one, like a real SCSI target should do, or is that just not possible with ZVOLs today? -Ross
Le 5 ao?t 2010 ? 19:49, Ross Walker a ?crit :> On Aug 5, 2010, at 11:10 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: > >> >> Ross Walker writes: >>> On Aug 4, 2010, at 12:04 PM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >>> >>>> >>>> Ross Walker writes: >>>>> On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >>>>> >>>>>> >>>>>> >>>>>> Ross Asks: >>>>>> So on that note, ZFS should disable the disks'' write cache, >>>>>> not enable them despite ZFS''s COW properties because it >>>>>> should be resilient. >>>>>> >>>>>> No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal >>>>>> with contained failures (lost state) of the disk write cache. >>>>>> >>>>>> It can then export LUNS that have WC enabled or >>>>>> disabled. But if we enable the WC on the exported LUNS, then >>>>>> the consumer of these LUNS must be able to say the same. >>>>>> The discussion at that level then needs to focus on failure groups. >>>>>> >>>>>> >>>>>> Ross also Said : >>>>>> I asked this question earlier, but got no answer: while an >>>>>> iSCSI target is presented WCE does it respect the flush >>>>>> command? >>>>>> >>>>>> Yes. I would like to say "obviously" but it''s been anything >>>>>> but. >>>>> >>>>> Sorry to probe further, but can you expand on but... >>>>> >>>>> Just if we had a bunch of zvols exported via iSCSI to another Solaris >>>>> box which used them to form another zpool and had WCE turned on would >>>>> it be reliable? >>>>> >>>> >>>> Nope. That''s because all the iSCSI are in the same fault >>>> domain as they share a unified back-end cache. What works, >>>> in principle, is mirroring SCSI channels hosted on >>>> different storage controllers (or N SCSI channels on N >>>> controller in a raid group). >>>> >>>> Which is why keeping the WC set to the default, is really >>>> better in general. >>> >>> Well I was actually talking about two backend Solaris storage servers serving up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I have redundancy there, but want the storage to be performant, so I want the iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush requests from the front-end NFS server. >>> >>> Does that make sense? Is it possible? >>> >> >> Well in response to a commit (say after a file creation) then the >> front end server will end up sending flush write caches on >> both side of the iscsi mirror which will reach the backend server >> which will flush disk write caches. This will all work but >> probably not unleash performance the way you would like it >> to. > > > >> If you setup to have the backend server not honor the >> backend disk flush write caches, then the 2 backend pools become at >> risk of corruption, mostly because the ordering of IOs >> around the ueberblock updates. If you have faith, then you >> could consider that you won''t hit 2 backend pool corruption >> together and rely on the frontend resilvering to rebuild the >> corrupted backend. > > So you are saying setting WCE disables cache flush on the target and setting WCD forces a flush for every WRITE?Nope. Setting WC either way has not implication on the response to a flush request. We flush the cache in response to a request to do so, unless one sets the unsupported zfs_nocacheflush, if set then the pool is at risk> > How about a way to enable WCE on the target, yet still perform cache flush when the initiator requests one, like a real SCSI target should do, or is that just not possible with ZVOLs today? >I hope I''ve cleared that up. Not sure what I said that implicated otherwise. But if you honor the flush write cache request all the way to the disk device, then 1, 2 or 3 layers of ZFS won''t make a dent in the performance of NFS tar x. Only a device accepting low latency writes which survives power outtage can do that. -r> -Ross >
On Aug 5, 2010, at 2:24 PM, Roch Bourbonnais <Roch.Bourbonnais at Sun.COM> wrote:> > Le 5 ao?t 2010 ? 19:49, Ross Walker a ?crit : > >> On Aug 5, 2010, at 11:10 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >> >>> >>> Ross Walker writes: >>>> On Aug 4, 2010, at 12:04 PM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >>>> >>>>> >>>>> Ross Walker writes: >>>>>> On Aug 4, 2010, at 9:20 AM, Roch <Roch.Bourbonnais at Sun.COM> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> Ross Asks: >>>>>>> So on that note, ZFS should disable the disks'' write cache, >>>>>>> not enable them despite ZFS''s COW properties because it >>>>>>> should be resilient. >>>>>>> >>>>>>> No, because ZFS builds resiliency on top of unreliable parts. it''s able to deal >>>>>>> with contained failures (lost state) of the disk write cache. >>>>>>> >>>>>>> It can then export LUNS that have WC enabled or >>>>>>> disabled. But if we enable the WC on the exported LUNS, then >>>>>>> the consumer of these LUNS must be able to say the same. >>>>>>> The discussion at that level then needs to focus on failure groups. >>>>>>> >>>>>>> >>>>>>> Ross also Said : >>>>>>> I asked this question earlier, but got no answer: while an >>>>>>> iSCSI target is presented WCE does it respect the flush >>>>>>> command? >>>>>>> >>>>>>> Yes. I would like to say "obviously" but it''s been anything >>>>>>> but. >>>>>> >>>>>> Sorry to probe further, but can you expand on but... >>>>>> >>>>>> Just if we had a bunch of zvols exported via iSCSI to another Solaris >>>>>> box which used them to form another zpool and had WCE turned on would >>>>>> it be reliable? >>>>>> >>>>> >>>>> Nope. That''s because all the iSCSI are in the same fault >>>>> domain as they share a unified back-end cache. What works, >>>>> in principle, is mirroring SCSI channels hosted on >>>>> different storage controllers (or N SCSI channels on N >>>>> controller in a raid group). >>>>> >>>>> Which is why keeping the WC set to the default, is really >>>>> better in general. >>>> >>>> Well I was actually talking about two backend Solaris storage servers serving up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I have redundancy there, but want the storage to be performant, so I want the iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush requests from the front-end NFS server. >>>> >>>> Does that make sense? Is it possible? >>>> >>> >>> Well in response to a commit (say after a file creation) then the >>> front end server will end up sending flush write caches on >>> both side of the iscsi mirror which will reach the backend server >>> which will flush disk write caches. This will all work but >>> probably not unleash performance the way you would like it >>> to. >> >> >> >>> If you setup to have the backend server not honor the >>> backend disk flush write caches, then the 2 backend pools become at >>> risk of corruption, mostly because the ordering of IOs >>> around the ueberblock updates. If you have faith, then you >>> could consider that you won''t hit 2 backend pool corruption >>> together and rely on the frontend resilvering to rebuild the >>> corrupted backend. >> >> So you are saying setting WCE disables cache flush on the target and setting WCD forces a flush for every WRITE? > > Nope. Setting WC either way has not implication on the response to a flush request. We flush the cache in response to a request to do so, > unless one sets the unsupported zfs_nocacheflush, if set then the pool is at risk >> >> How about a way to enable WCE on the target, yet still perform cache flush when the initiator requests one, like a real SCSI target should do, or is that just not possible with ZVOLs today? >> > I hope I''ve cleared that up. Not sure what I said that implicated otherwise. > > But if you honor the flush write cache request all the way to the disk device, then 1, 2 or 3 layers of ZFS won''t make a dent in the performance of NFS tar x. > Only a device accepting low latency writes which survives power outtage can do that.Understood and thanks for the clarification, if the NFS synchronicity has too much of a negative impact then that can be alleviated through an SSD or NVRAM slog device on the head server. -Ross
hi <br> as already said above. zfs property shareiscsi is obsolet and slow. use comstar instead!<br> <br> be careful.<br> if you switch to comstar, your current iscsi is no longer available. save the data first. and if you want to have it more user-friendly, you could als try napp-it, my free web-gui for opensolaris/ nexenta. there is full comstar support.<br> <br> try comstar setup at <br> http://www.napp-it.org/pop_en.html gea -- This message posted from opensolaris.org