Moving this discussion to the alias. See comments inline. The answers
have been provided by Mark Maybee.
Gehr, Chuck R wrote:
>Gehr, Chuck R wrote:
>
>
>>I have a few more questions:
>>
>>1) If the application is using open and write (i.e. not fopen and
>>fwrite), does the zfs snapshot command cause a flush (sync). For
>>example, if we have only one writer to the file system and we stop
>>writing to issue the snapshot command, can we be assured that the last
>>
>>
>
>
>
>>write issued will be in the snapshot? and will it be on disk? If not
>>what is the best way (performance) for us to force the flush.
>>
>>
>
>The snapshot command is executed in "syncing context", so yes, if
you
>issue the snapshot command after the last write, the write data is
>guaranteed to be in the snapshot.
>
>CRG: Ok, great. That is consistent with the behavior that I''ve
>observed in test programs. However, what guarantee do we have in terms
>of consistency on disk. Does the snapshot also force all associated
>data buffers to disk? Can I be guaranteed that if I loose power
>immediately after the snapshot command completes, that snapshot will be
>available and accurate when the system comes back up?? Also, what
>happens if I loose power in the middle of creating a snapshot. Is it
>save to assume that once/if the snapshot name appears in the file system
>(under the .zfs/snapshot directory), it''s guaranteed to be an
accurate
>representation of the file system at the time the snapshot command was
>issued?
>
>
ZFS is *always* consistent on disk. All data associated with the
transaction group that the snapshot was issued from will be part of
the snapshot. If the snapshot shows up in the namespace following a
crash, the snapshot is complete/accurate/consistent.
>
>
>>2) This may be relate to question number 1. I noticed that the numbers
>>
>>
>
>
>
>>reported on Thumper are very good if write cache is turned on. How
>>does one turn write cache on and off? Is this a Solaris wide
>>parameter or can it be turned on/off by file system? If write cache
>>is on, is there still a way an application can be assured that writes
>>have made it to disk?
>>
>>
>>
>
>ZFS uses an ioctl, DKIOCSETWCE, to enable the drive write cache when the
>disk device is opened. This is implemented in scsi (and sata)
>internally in the driver via mode sense and mode select. It is
>functionally equivalent to format -e.
>
>At the moment, there isn''t a solaris wide way to turn this feature
on.
>If an application is writing directly to the device (say, a db writing
>directly to a raw device), then the application would use
>DKIOCFLUSHWRITECACHE to, well, force any cached writes through. This is
>exactly what zfs does (as needed).
>
>CRG: So if I understand you correctly, zfs always turns write cache on
>at the device level? However, I''m still puzzled by the Thumper
>performance results. Were two different ZFS code bases used to produce
>the results, one that issues the DKIOCSETWCE and DKIOCFLUSHWRITECACHE
>ioctls, and one that does not? Also, if ZFS has write cache turned on,
>when/how can an application be assured that data written has been
>committed in such a way that it is guaranteed to be retrievable after a
>power failure? That''s a related question to my question about
snapshots
>above.
>
>
ZFS is always consistent on disk. If you want to guarantee that a write
is ondisk use synchronous semantics.
-Sanjay