Hi, I work on a support team for the Sun StorEdge 6920 and have a question about the use of the SCSI sync cache command in Solaris and ZFS. We have a bug in our 6920 software that exposes us to a memory leak when we receive the SCSI sync cache command: 6456312 - SCSI Synchronize Cache Command is flawed It will take some time for this bug fix to role out to the field so we need to understand our exposure here. I have been informed that ZFS may use this in S10 thru a new sd/ssd ioctl. Can anyone confirm that as well as whether there is a config option to disable this command? Thanks, Steve
Yes, ZFS uses this command very frequently. However, it only does this if the whole disk is under the control of ZFS, I believe; so a workaround could be to use slices rather than whole disks when creating a ZFS pool on a buggy device. This message posted from opensolaris.org
On Mon, Aug 21, 2006 at 02:40:40PM -0700, Anton B. Rang wrote:> Yes, ZFS uses this command very frequently. However, it only does this > if the whole disk is under the control of ZFS, I believe; so a > workaround could be to use slices rather than whole disks when > creating a ZFS pool on a buggy device.Actually, we issue the command no matter if we are using a whole disk or just a slice. Short of an mdb script, there is not a way to disable it. We are trying to figure out ways to allow users to specify workarounds for broken hardware without getting the ZFS code all messy as a result. --Bill
On 22/08/06, Bill Moore <Bill.Moore at sun.com> wrote:> On Mon, Aug 21, 2006 at 02:40:40PM -0700, Anton B. Rang wrote: > > Yes, ZFS uses this command very frequently. However, it only does this > > if the whole disk is under the control of ZFS, I believe; so a > > workaround could be to use slices rather than whole disks when > > creating a ZFS pool on a buggy device. > > Actually, we issue the command no matter if we are using a whole disk or > just a slice. Short of an mdb script, there is not a way to disable it. > We are trying to figure out ways to allow users to specify workarounds > for broken hardware without getting the ZFS code all messy as a result.Has that behaviour changed then? I was definitely told (on list) that write cache was only enabled for a ''full ZFS disk''. Am I wrong in thinking this could be risky for UFS slices on the same disk (or does UFS journalling mitigate that)? -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/
Dick Davies writes: > On 22/08/06, Bill Moore <Bill.Moore at sun.com> wrote: > > On Mon, Aug 21, 2006 at 02:40:40PM -0700, Anton B. Rang wrote: > > > Yes, ZFS uses this command very frequently. However, it only does this > > > if the whole disk is under the control of ZFS, I believe; so a > > > workaround could be to use slices rather than whole disks when > > > creating a ZFS pool on a buggy device. > > > > Actually, we issue the command no matter if we are using a whole disk or > > just a slice. Short of an mdb script, there is not a way to disable it. > > We are trying to figure out ways to allow users to specify workarounds > > for broken hardware without getting the ZFS code all messy as a result. > > Has that behaviour changed then? I was definitely told (on list) that > write cache was only enabled for a ''full ZFS disk''. Am I wrong in > thinking this could be risky for UFS slices on the same disk > (or does UFS journalling mitigate that)? There are 2 things: enabling write cache (done once on disk open) and flushing the write cache every time it''s required (say after O_DSYNC write). ZFS does a WCE if it owns a whole disk. And it DKIOCFLUSHWRITECACHE when needed on all disks, whether or not it enabled the cache (as somebody else may have). If the device respond that this DKIOC is unsupported, ZFS stops issuing the requests. -r > > -- > Rasputin :: Jack of All Trades - Master of Nuns > http://number9.hellooperator.net/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch wrote:> Dick Davies writes: > > On 22/08/06, Bill Moore <Bill.Moore at sun.com> wrote: > > > On Mon, Aug 21, 2006 at 02:40:40PM -0700, Anton B. Rang wrote: > > > > Yes, ZFS uses this command very frequently. However, it only does this > > > > if the whole disk is under the control of ZFS, I believe; so a > > > > workaround could be to use slices rather than whole disks when > > > > creating a ZFS pool on a buggy device. > > > > > > Actually, we issue the command no matter if we are using a whole disk or > > > just a slice. Short of an mdb script, there is not a way to disable it. > > > We are trying to figure out ways to allow users to specify workarounds > > > for broken hardware without getting the ZFS code all messy as a result. > > > > Has that behaviour changed then? I was definitely told (on list) that > > write cache was only enabled for a ''full ZFS disk''. Am I wrong in > > thinking this could be risky for UFS slices on the same disk > > (or does UFS journalling mitigate that)? > > There are 2 things: enabling write cache (done once on disk open) > and flushing the write cache every time it''s required (say after > O_DSYNC write). > > ZFS does a WCE if it owns a whole disk. > > And it DKIOCFLUSHWRITECACHE when needed on all disks, > whether or not it enabled the cache (as somebody else may > have). If the device respond that this DKIOC is unsupported, > ZFS stops issuing the requests.Also worth noting that some devices may have the write cache enabled by default (like SATA/IDE) so we issue the DKIOCFLUSHWRITECACHE even if we didn''t enable it. Safety first. Thanks, George> > -r > > > > > -- > > Rasputin :: Jack of All Trades - Master of Nuns > > http://number9.hellooperator.net/ > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi folks, thanks for the responses. We''ve noticed a couple of switches in this code: un_f_write_cache_enabled - loaded in sd_get_write_cache_enabled() after looking at sense data and un_f_sync_cache_supported - referenced in sdioctl : 22025 case DKIOCFLUSHWRITECACHE: 22026 { 22027 struct dk_callback *dkc = (struct dk_callback *)arg; 22028 22029 mutex_enter(SD_MUTEX(un)); 22030 if (!un->un_f_sync_cache_supported || 22031 !un->un_f_write_cache_enabled) { Is there a configuration switch that might toggle un_f_sync_cache_supported ? - Steve This message posted from opensolaris.org
Bill, I realized just now that we''re actually sending the wrong variant of SYNCHRONIZE CACHE, at least for SCSI devices which support SBC-2. SBC-2 (or possibly even SBC-1, I don''t have it handy) added the SYNC_NV bit to the command. If SYNC_NV is set to 0, the device is required to flush data from any non-volatile cache to the storage medium. If SYNC_NV is 1, however, the device is only required to ensure that data in any volatile caches is flushed to non-volatile cache if present. We should be setting SYNC_NV for devices which support SBC-2. This should solve the problem where the larger array systems are flushing their non-volatile cache to disk when ZFS sends the cache flush; though, of course, it assumes that vendors actually check this bit. ;-) Should I file a bug on this? Anton This message posted from opensolaris.org
On Tue, Aug 22, 2006 at 11:46:30AM -0700, Anton B. Rang wrote:> I realized just now that we''re actually sending the wrong variant of > SYNCHRONIZE CACHE, at least for SCSI devices which support SBC-2. > > SBC-2 (or possibly even SBC-1, I don''t have it handy) added the > SYNC_NV bit to the command. If SYNC_NV is set to 0, the device is > required to flush data from any non-volatile cache to the storage > medium. If SYNC_NV is 1, however, the device is only required to > ensure that data in any volatile caches is flushed to non-volatile > cache if present. > > We should be setting SYNC_NV for devices which support SBC-2. This > should solve the problem where the larger array systems are flushing > their non-volatile cache to disk when ZFS sends the cache flush; > though, of course, it assumes that vendors actually check this bit. > ;-)Dang, I must have missed this when I initially read the specs. Either that, or I was looking at the original SBC document, which doesn''t have that bit. We should probably change this immediately to behave as you suggest. My only reservation would be if older devices puke on the command if that bit is set.> Should I file a bug on this?Yes, please. Thanks for catching this. --Bill
Filed as 6462690. If our storage qualification test suite doesn''t yet check for support of this bit, we might want to get that added; it would be useful to know (and gently nudge vendors who don''t yet support it). This message posted from opensolaris.org
Robert Milkowski
2006-Aug-22 22:48 UTC
[zfs-discuss] Re: Re: Re: SCSI synchronize cache cmd
Hello Anton, Tuesday, August 22, 2006, 9:53:57 PM, you wrote: ABR> Filed as 6462690. ABR> If our storage qualification test suite doesn''t yet check for ABR> support of this bit, we might want to get that added; it would be ABR> useful to know (and gently nudge vendors who don''t yet support it). Does it affect 3510? If it does any workaround (mdb?)? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Richard L. Hamilton
2006-Aug-23 05:42 UTC
[zfs-discuss] Re: Re: Re: SCSI synchronize cache cmd
> Filed as 6462690. > > If our storage qualification test suite doesn''t yet > check for support of this bit, we might want to get > that added; it would be useful to know (and gently > nudge vendors who don''t yet support it).Is either the test suite, or at least a list of what it tests (which it looks like may more or less track what Solaris requires) publically available, or could it be made so? Seems to me that if people can independently discover problem hardware, that might make your job easier insofar as they''re smarter before they start asking you questions; even more so if they feed back what they find (not unlike the do-it-yourself x86 compatibility testing). This message posted from opensolaris.org
Hello, The change to add the SYNC_NV bit came with SBC-2 rev. 14 ( May 2004 ). In SBC-2 rev. 13 ( March 2004 ) the bit was Reserved. It looks like devices that don''t support this bit should continue to sync to media and ignore the fact that the bit is set, but, it was a Reserved bit before rev. 14 and there could be devices that will post check condition status when any Reserved bit is set. The SPC-3 spec rev. 19 ( May 2004 ) is the first SPC-3 spec that added the NV_SUP ( NV supported ) bit to Inquiry Vital Product Data page -> 86h byte 6 bit 1 ( NV_SUP ). Before setting the SYNC_NV bit in commands, you should check that the device reports supporting it by checking this Inquiry VPD page. Also, the Synchronize Cache command is not the only command that changed with the NV_SUP change, the FUA_NV bit was also added to the following commands so that non-volitale cache is considered when this bit is set in those commands; READ(10) and larger WRITE(10) and larger XDWRITE all XDWRITEREAD all XPWRITE all -Dave ============== Bill Moore wrote:>On Tue, Aug 22, 2006 at 11:46:30AM -0700, Anton B. Rang wrote: > > >>I realized just now that we''re actually sending the wrong variant of >>SYNCHRONIZE CACHE, at least for SCSI devices which support SBC-2. >> >>SBC-2 (or possibly even SBC-1, I don''t have it handy) added the >>SYNC_NV bit to the command. If SYNC_NV is set to 0, the device is >>required to flush data from any non-volatile cache to the storage >>medium. If SYNC_NV is 1, however, the device is only required to >>ensure that data in any volatile caches is flushed to non-volatile >>cache if present. >> >>We should be setting SYNC_NV for devices which support SBC-2. This >>should solve the problem where the larger array systems are flushing >>their non-volatile cache to disk when ZFS sends the cache flush; >>though, of course, it assumes that vendors actually check this bit. >>;-) >> >> > >Dang, I must have missed this when I initially read the specs. Either >that, or I was looking at the original SBC document, which doesn''t have >that bit. > >We should probably change this immediately to behave as you suggest. My >only reservation would be if older devices puke on the command if that >bit is set. > > > >>Should I file a bug on this? >> >> > >Yes, please. > >Thanks for catching this. > > >--Bill >_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >