Christoph Hellwig
2018-Oct-15 09:27 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:> From: Changpeng Liu <changpeng.liu at intel.com> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtioThere is some issues in this spec. For one using the multiple ranges also for write zeroes is rather inefficient. Write zeroes really should use the same format as read and write. Second the unmap flag isn't properly specified at all, as nothing says the device may not unmap without the unmap flag. Please take a look at the SCSI or NVMe ?pec for some guidance.> +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > + bool unmap)Why is this an inline function?
Daniel Verkamp
2018-Oct-15 23:16 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig <hch at infradead.org> wrote:> On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > From: Changpeng Liu <changpeng.liu at intel.com> > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > > There is some issues in this spec. For one using the multiple ranges > also for write zeroes is rather inefficient. Write zeroes really should > use the same format as read and write.I wasn't involved in the writing of the spec, so I'll defer to Michael and Changpeng here, but I'm not sure how "set in stone" the virtio specification is, or if it can be updated somehow without breaking compatibility. I agree that Write Zeroes would be simpler to implement as a single LBA + length rather than a list. However, it's not really possible to use the same format as the regular virtio block read/write commands (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do not specify a length explicitly; length is implied by the length of the data buffer as defined by the virtio descriptor, but a Write Zeroes command does not require a data buffer. At best, this could be a separate command mirroring the layout of struct virtio_blk_req but with data replaced with a length field; I'm not sure that buys much in the way of consistency.> Second the unmap flag isn't properly specified at all, as nothing > says the device may not unmap without the unmap flag. Please take > a look at the SCSI or NVMe ?pec for some guidance.This could probably use some clarifying text in the specification, but given that there is nothing in the spec describing what the device needs to do when unmap = 0, I would assume that the device can do whatever it likes, as long as the blocks read back as 0s afterwards. Reading back 0s is required by the definition of the Write Zeroes command in the same virtio spec change. It would probably be good to clarify this and explicitly define what the device is allowed to do in response to both settings of the unmap bit. My understanding of the corresponding feature in NVMe (the Deallocate bit in the Write Zeroes command) is that the only difference between Deallocate = 1 and 0 is that the device "should" versus "may" (no "shall" on either side) deallocate the corresponding blocks, but only if the device supports reading 0s back after blocks are deallocated. If the device does not support reading 0s after deallocation, it is not allowed to deallocate blocks as part of a Write Zeroes command regardless of the setting of the Deallocate bit. Some similar wording could probably be added to the virtio spec to clarify the meaning of unmap, although I would prefer something that makes it a little clearer that the bit is only intended as a hint from the driver to indicate whether the device should attempt to keep storage allocated for the zeroed blocks, if that is indeed the intended behavior. Is there some in-kernel doc that describes what behavior the Linux block layer needs from a write zeroes command?> > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > > + bool unmap) > > Why is this an inline function?I don't think there's any reason it needs to be inline; I can drop the inline in the next revision. Given (as far as I can tell) your concerns seem to apply to the Write Zeroes command specifically, would it be reasonable to start with a patch that just adds support for the Discard command (along with fixes for Ming's feedback)? This would be sufficient for my particular use case (although I can't speak for Changpeng), and we can revisit Write Zeroes once the spec concerns are worked out. Thanks, -- Daniel
Liu, Changpeng
2018-Oct-16 01:40 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
> -----Original Message----- > From: Christoph Hellwig [mailto:hch at infradead.org] > Sent: Monday, October 15, 2018 5:28 PM > To: Daniel Verkamp <dverkamp at chromium.org> > Cc: virtualization at lists.linux-foundation.org; linux-block at vger.kernel.org; > Michael S. Tsirkin <mst at redhat.com>; Jason Wang <jasowang at redhat.com>; > Jens Axboe <axboe at kernel.dk>; Stefan Hajnoczi <stefanha at redhat.com>; Liu, > Changpeng <changpeng.liu at intel.com> > Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > From: Changpeng Liu <changpeng.liu at intel.com> > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > > There is some issues in this spec. For one using the multiple ranges > also for write zeroes is rather inefficient. Write zeroes really should > use the same format as read and write.Because there is no length parameter for virtio block specification, adding the two extra commands will not break the existing specification and driver implementation. Also existing Linux implementation for write zeroes will not use multiple segment at all so there is always one range in practice.> > Second the unmap flag isn't properly specified at all, as nothing > says the device may not unmap without the unmap flag. Please take > a look at the SCSI or NVMe ?pec for some guidance.The unmap flag is only used for write zeroes command, as discard command will not guarantee the spaces will be zeroed, so adding this flag means (Discard + Write Zeroes), so this definitely is backend related, the backend implementation can use same code to implement discard and write zeroes commands.> > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > > + bool unmap) > > Why is this an inline function?
Liu, Changpeng
2018-Oct-16 01:45 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
> -----Original Message----- > From: Daniel Verkamp [mailto:dverkamp at chromium.org] > Sent: Tuesday, October 16, 2018 7:16 AM > To: hch at infradead.org > Cc: virtualization at lists.linux-foundation.org; linux-block at vger.kernel.org; > mst at redhat.com; jasowang at redhat.com; axboe at kernel.dk; > stefanha at redhat.com; Liu, Changpeng <changpeng.liu at intel.com> > Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support > > On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig <hch at infradead.org> wrote: > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > > From: Changpeng Liu <changpeng.liu at intel.com> > > > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > > > > There is some issues in this spec. For one using the multiple ranges > > also for write zeroes is rather inefficient. Write zeroes really should > > use the same format as read and write. > > I wasn't involved in the writing of the spec, so I'll defer to Michael > and Changpeng here, but I'm not sure how "set in stone" the virtio > specification is, or if it can be updated somehow without breaking > compatibility. > > I agree that Write Zeroes would be simpler to implement as a single > LBA + length rather than a list. However, it's not really possible to > use the same format as the regular virtio block read/write commands > (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do > not specify a length explicitly; length is implied by the length of > the data buffer as defined by the virtio descriptor, but a Write > Zeroes command does not require a data buffer. At best, this could be > a separate command mirroring the layout of struct virtio_blk_req but > with data replaced with a length field; I'm not sure that buys much in > the way of consistency.Yeah, that's the consideration here.> > > Second the unmap flag isn't properly specified at all, as nothing > > says the device may not unmap without the unmap flag. Please take > > a look at the SCSI or NVMe ?pec for some guidance. > > This could probably use some clarifying text in the specification, but > given that there is nothing in the spec describing what the device > needs to do when unmap = 0, I would assume that the device can do > whatever it likes, as long as the blocks read back as 0s afterwards. > Reading back 0s is required by the definition of the Write Zeroes > command in the same virtio spec change. It would probably be good to > clarify this and explicitly define what the device is allowed to do in > response to both settings of the unmap bit. > > My understanding of the corresponding feature in NVMe (the Deallocate > bit in the Write Zeroes command) is that the only difference between > Deallocate = 1 and 0 is that the device "should" versus "may" (no > "shall" on either side) deallocate the corresponding blocks, but only > if the device supports reading 0s back after blocks are deallocated. > If the device does not support reading 0s after deallocation, it is > not allowed to deallocate blocks as part of a Write Zeroes command > regardless of the setting of the Deallocate bit. > > Some similar wording could probably be added to the virtio spec to > clarify the meaning of unmap, although I would prefer something that > makes it a little clearer that the bit is only intended as a hint from > the driver to indicate whether the device should attempt to keep > storage allocated for the zeroed blocks, if that is indeed the > intended behavior.Yes, that's the original idea. Adding a clear description to the specification may be better.> > Is there some in-kernel doc that describes what behavior the Linux > block layer needs from a write zeroes command? > > > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > > > + bool unmap) > > > > Why is this an inline function? > > I don't think there's any reason it needs to be inline; I can drop the > inline in the next revision. > > Given (as far as I can tell) your concerns seem to apply to the Write > Zeroes command specifically, would it be reasonable to start with a > patch that just adds support for the Discard command (along with fixes > for Ming's feedback)? This would be sufficient for my particular use > case (although I can't speak for Changpeng), and we can revisit Write > Zeroes once the spec concerns are worked out. > > Thanks, > -- Daniel
Possibly Parallel Threads
- [PATCH v8] virtio_blk: add discard and write zeroes support
- [PATCH v8] virtio_blk: add discard and write zeroes support
- [PATCH v8] virtio_blk: add discard and write zeroes support
- [PATCH v9] virtio_blk: add discard and write zeroes support
- [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support