Changpeng Liu
2018-Jun-06 04:19 UTC
[PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands support, this will impact the performance when using SSD backend over file systems. Commit 88c85538 "virtio-blk: add discard and write zeroes features to specification"(see https://github.com/oasis-tcs/virtio-spec) extended existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES commands support. While here, using 16 bytes descriptor to describe one segment of DISCARD or WRITE ZEROES commands, each command may contain one or more decriptors. The following data structure shows the definition of one descriptor: struct virtio_blk_discard_write_zeroes { le64 sector; le32 num_sectors; le32 unmap; }; Field 'sector' means the start sector for DISCARD and WRITE ZEROES, filed 'num_sectors' means the number of sectors for DISCARD and WRITE ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap' maybe used for WRITE ZEROES command with DISCARD enabled. We also extended the virtio-blk configuration space to let backend device put DISCARD and WRITE ZEROES configuration parameters. struct virtio_blk_config { [...] le32 max_discard_sectors; le32 max_discard_seg; le32 discard_sector_alignment; le32 max_write_zeroes_sectors; le32 max_write_zeroes_seg; u8 write_zeroes_may_unmap; } New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard command, maximum discard sectors size in field 'max_discard_sectors' and maximum discard segment number in field 'max_discard_seg'. New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write zeroes command, maximum write zeroes sectors size in field 'max_write_zeroes_sectors' and maximum write zeroes segment number in field 'max_write_zeroes_seg'. The parameters in the configuration space of the device field 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The field 'max_write_zeroes_sectors' is expressed in 512-byte units if the VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> --- CHANGELOG: v6: don't set T_OUT bit to discard and write zeroes commands. v5: use new block layer API: blk_queue_flag_set. v4: several optimizations based on MST's comments, remove bit field usage for command descriptor. v3: define the virtio-blk protocol to add discard and write zeroes support, first version implementation based on proposed specification. v2: add write zeroes command support. v1: initial proposal implementation for discard command. --- drivers/block/virtio_blk.c | 89 ++++++++++++++++++++++++++++++++++++++++- include/uapi/linux/virtio_blk.h | 43 ++++++++++++++++++++ 2 files changed, 130 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 4a07593c..5aabc63 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -172,10 +172,45 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); } +static inline int virtblk_setup_discard_write_zeroes(struct request *req, + bool unmap) +{ + unsigned short segments = blk_rq_nr_discard_segments(req); + unsigned short n = 0; + struct virtio_blk_discard_write_zeroes *range; + struct bio *bio; + + range = kmalloc_array(segments, sizeof(*range), GFP_KERNEL); + if (!range) + return -ENOMEM; + + __rq_for_each_bio(bio, req) { + u64 sector = bio->bi_iter.bi_sector; + u32 num_sectors = bio->bi_iter.bi_size >> 9; + + range[n].unmap = cpu_to_le32(unmap); + range[n].num_sectors = cpu_to_le32(num_sectors); + range[n].sector = cpu_to_le64(sector); + n++; + } + + req->special_vec.bv_page = virt_to_page(range); + req->special_vec.bv_offset = offset_in_page(range); + req->special_vec.bv_len = sizeof(*range) * segments; + req->rq_flags |= RQF_SPECIAL_PAYLOAD; + + return 0; +} + static inline void virtblk_request_done(struct request *req) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { + kfree(page_address(req->special_vec.bv_page) + + req->special_vec.bv_offset); + } + switch (req_op(req)) { case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, int qid = hctx->queue_num; int err; bool notify = false; + bool unmap = false; u32 type; BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, case REQ_OP_FLUSH: type = VIRTIO_BLK_T_FLUSH; break; + case REQ_OP_DISCARD: + type = VIRTIO_BLK_T_DISCARD; + break; + case REQ_OP_WRITE_ZEROES: + type = VIRTIO_BLK_T_WRITE_ZEROES; + unmap = !(req->cmd_flags & REQ_NOUNMAP); + break; case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: type = VIRTIO_BLK_T_SCSI_CMD; @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(req); + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) { + err = virtblk_setup_discard_write_zeroes(req, unmap); + if (err) + return BLK_STS_RESOURCE; + } + num = blk_rq_map_sg(hctx->queue, req, vbr->sg); if (num) { if (rq_data_dir(req) == WRITE) @@ -777,6 +826,42 @@ static int virtblk_probe(struct virtio_device *vdev) if (!err && opt_io_size) blk_queue_io_opt(q, blk_size * opt_io_size); + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { + q->limits.discard_granularity = blk_size; + + virtio_cread(vdev, struct virtio_blk_config, + discard_sector_alignment, &v); + if (v) + q->limits.discard_alignment = v << 9; + else + q->limits.discard_alignment = 0; + + virtio_cread(vdev, struct virtio_blk_config, + max_discard_sectors, &v); + if (v) + blk_queue_max_discard_sectors(q, v); + else + blk_queue_max_discard_sectors(q, UINT_MAX); + + virtio_cread(vdev, struct virtio_blk_config, max_discard_seg, + &v); + if (v) + blk_queue_max_discard_segments(q, v); + else + blk_queue_max_discard_segments(q, USHRT_MAX); + + blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); + } + + if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) { + virtio_cread(vdev, struct virtio_blk_config, + max_write_zeroes_sectors, &v); + if (v) + blk_queue_max_write_zeroes_sectors(q, v); + else + blk_queue_max_write_zeroes_sectors(q, UINT_MAX); + } + virtblk_update_capacity(vblk, false); virtio_device_ready(vdev); @@ -885,14 +970,14 @@ static int virtblk_restore(struct virtio_device *vdev) VIRTIO_BLK_F_SCSI, #endif VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, - VIRTIO_BLK_F_MQ, + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, } ; static unsigned int features[] = { VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, - VIRTIO_BLK_F_MQ, + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, }; static struct virtio_driver virtio_blk = { diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h index 9ebe4d9..8e7a015 100644 --- a/include/uapi/linux/virtio_blk.h +++ b/include/uapi/linux/virtio_blk.h @@ -38,6 +38,8 @@ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ +#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ +#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ /* Legacy feature bits */ #ifndef VIRTIO_BLK_NO_LEGACY @@ -86,6 +88,29 @@ struct virtio_blk_config { /* number of vqs, only available when VIRTIO_BLK_F_MQ is set */ __u16 num_queues; + /* The maximum discard sectors (in 512-byte sectors) for + * one segment (if VIRTIO_BLK_F_DISCARD) + */ + __u32 max_discard_sectors; + /* The maximum number of discard segments in a + * discard command (if VIRTIO_BLK_F_DISCARD) + */ + __u32 max_discard_seg; + /* The alignment sectors for discard (if VIRTIO_BLK_F_DISCARD) */ + __u32 discard_sector_alignment; + /* The maximum number of write zeroes sectors (in 512-byte sectors) in + * one segment (if VIRTIO_BLK_F_WRITE_ZEROES) + */ + __u32 max_write_zeroes_sectors; + /* The maximum number of segments in a write zeroes + * command (if VIRTIO_BLK_F_WRITE_ZEROES) + */ + __u32 max_write_zeroes_seg; + /* Device clear this bit when write zeroes command can't result in + * unmapping sectors (if VIRITO_BLK_F_WRITE_ZEROES and with unmap) + */ + __u8 write_zeroes_may_unmap; + __u8 unused1[3]; } __attribute__((packed)); /* @@ -114,6 +139,12 @@ struct virtio_blk_config { /* Get device ID command */ #define VIRTIO_BLK_T_GET_ID 8 +/* Discard command */ +#define VIRTIO_BLK_T_DISCARD 11 + +/* Write zeroes command */ +#define VIRTIO_BLK_T_WRITE_ZEROES 13 + #ifndef VIRTIO_BLK_NO_LEGACY /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x80000000 @@ -133,6 +164,18 @@ struct virtio_blk_outhdr { __virtio64 sector; }; +/* + * discard/write zeroes range for each request. + */ +struct virtio_blk_discard_write_zeroes { + /* discard/write zeroes start sector */ + __virtio64 sector; + /* number of discard/write zeroes sectors */ + __virtio32 num_sectors; + /* valid for write zeroes command */ + __virtio32 unmap; +}; + #ifndef VIRTIO_BLK_NO_LEGACY struct virtio_scsi_inhdr { __virtio32 errors; -- 1.9.3
Stefan Hajnoczi
2018-Jun-07 13:10 UTC
[PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote:> Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands > support, this will impact the performance when using SSD backend over > file systems. > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to > specification"(see https://github.com/oasis-tcs/virtio-spec) extended > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES > commands support. > > While here, using 16 bytes descriptor to describe one segment of DISCARD > or WRITE ZEROES commands, each command may contain one or more decriptors. > > The following data structure shows the definition of one descriptor: > > struct virtio_blk_discard_write_zeroes { > le64 sector; > le32 num_sectors; > le32 unmap; > }; > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES, > filed 'num_sectors' means the number of sectors for DISCARD and WRITE > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap' > maybe used for WRITE ZEROES command with DISCARD enabled. > > We also extended the virtio-blk configuration space to let backend > device put DISCARD and WRITE ZEROES configuration parameters. > > struct virtio_blk_config { > [...] > > le32 max_discard_sectors; > le32 max_discard_seg; > le32 discard_sector_alignment; > le32 max_write_zeroes_sectors; > le32 max_write_zeroes_seg; > u8 write_zeroes_may_unmap; > } > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard > command, maximum discard sectors size in field 'max_discard_sectors' and > maximum discard segment number in field 'max_discard_seg'. > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write > zeroes command, maximum write zeroes sectors size in field > 'max_write_zeroes_sectors' and maximum write zeroes segment number in > field 'max_write_zeroes_seg'. > > The parameters in the configuration space of the device field > 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. > > Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> > --- > CHANGELOG: > v6: don't set T_OUT bit to discard and write zeroes commands.I don't see this in the patch...> @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > int qid = hctx->queue_num; > int err; > bool notify = false; > + bool unmap = false; > u32 type; > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > case REQ_OP_FLUSH: > type = VIRTIO_BLK_T_FLUSH; > break; > + case REQ_OP_DISCARD: > + type = VIRTIO_BLK_T_DISCARD; > + break; > + case REQ_OP_WRITE_ZEROES: > + type = VIRTIO_BLK_T_WRITE_ZEROES; > + unmap = !(req->cmd_flags & REQ_NOUNMAP); > + break; > case REQ_OP_SCSI_IN: > case REQ_OP_SCSI_OUT: > type = VIRTIO_BLK_T_SCSI_CMD; > @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > > blk_mq_start_request(req); > > + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) { > + err = virtblk_setup_discard_write_zeroes(req, unmap); > + if (err) > + return BLK_STS_RESOURCE; > + } > + > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > if (num) { > if (rq_data_dir(req) == WRITE)...since we still do blk_rq_map_sg() here and num should be != 0. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20180607/4f9d11fd/attachment.sig>
Liu, Changpeng
2018-Jun-07 23:07 UTC
[PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
> -----Original Message----- > From: Stefan Hajnoczi [mailto:stefanha at redhat.com] > Sent: Thursday, June 7, 2018 9:10 PM > To: Liu, Changpeng <changpeng.liu at intel.com> > Cc: virtualization at lists.linux-foundation.org; cavery at redhat.com; > jasowang at redhat.com; pbonzini at redhat.com; Wang, Wei W > <wei.w.wang at intel.com> > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands > support > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote: > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands > > support, this will impact the performance when using SSD backend over > > file systems. > > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES > > commands support. > > > > While here, using 16 bytes descriptor to describe one segment of DISCARD > > or WRITE ZEROES commands, each command may contain one or more > decriptors. > > > > The following data structure shows the definition of one descriptor: > > > > struct virtio_blk_discard_write_zeroes { > > le64 sector; > > le32 num_sectors; > > le32 unmap; > > }; > > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES, > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap' > > maybe used for WRITE ZEROES command with DISCARD enabled. > > > > We also extended the virtio-blk configuration space to let backend > > device put DISCARD and WRITE ZEROES configuration parameters. > > > > struct virtio_blk_config { > > [...] > > > > le32 max_discard_sectors; > > le32 max_discard_seg; > > le32 discard_sector_alignment; > > le32 max_write_zeroes_sectors; > > le32 max_write_zeroes_seg; > > u8 write_zeroes_may_unmap; > > } > > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard > > command, maximum discard sectors size in field 'max_discard_sectors' and > > maximum discard segment number in field 'max_discard_seg'. > > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write > > zeroes command, maximum write zeroes sectors size in field > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in > > field 'max_write_zeroes_seg'. > > > > The parameters in the configuration space of the device field > > 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. > > > > Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> > > --- > > CHANGELOG: > > v6: don't set T_OUT bit to discard and write zeroes commands. > > I don't see this in the patch...Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT again.> > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > int qid = hctx->queue_num; > > int err; > > bool notify = false; > > + bool unmap = false; > > u32 type; > > > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > case REQ_OP_FLUSH: > > type = VIRTIO_BLK_T_FLUSH; > > break; > > + case REQ_OP_DISCARD: > > + type = VIRTIO_BLK_T_DISCARD; > > + break; > > + case REQ_OP_WRITE_ZEROES: > > + type = VIRTIO_BLK_T_WRITE_ZEROES; > > + unmap = !(req->cmd_flags & REQ_NOUNMAP); > > + break; > > case REQ_OP_SCSI_IN: > > case REQ_OP_SCSI_OUT: > > type = VIRTIO_BLK_T_SCSI_CMD; > > @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > > > blk_mq_start_request(req); > > > > + if (type == VIRTIO_BLK_T_DISCARD || type => VIRTIO_BLK_T_WRITE_ZEROES) { > > + err = virtblk_setup_discard_write_zeroes(req, unmap); > > + if (err) > > + return BLK_STS_RESOURCE; > > + } > > + > > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > > if (num) { > > if (rq_data_dir(req) == WRITE) > > ...since we still do blk_rq_map_sg() here and num should be != 0.No, while here, we should keep the original logic for READ/WRITE commands.
Daniel Verkamp
2018-Oct-12 21:06 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
From: Changpeng Liu <changpeng.liu at intel.com> In commit 88c85538, "virtio-blk: add discard and write zeroes features to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio block specification has been extended to add VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for discard and write zeroes in the virtio-blk driver when the device advertises the corresponding features, VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES. Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> Signed-off-by: Daniel Verkamp <dverkamp at chromium.org> --- dverkamp: I've picked up this patch and made a few minor changes (as listed below); most notably, I changed the kmalloc back to GFP_ATOMIC, since it can be called from a context where sleeping is not allowed. To prevent large allocations, I've also clamped the maximum number of discard segments to 256; this results in a 4K allocation and should be plenty of descriptors for most use cases. I also removed most of the description from the commit message, since it was duplicating the comments from virtio_blk.h and quoting parts of the spec without adding any extra information. I have tested this iteration of the patch using crosvm with modifications to enable the new features: https://chromium.googlesource.com/chromiumos/platform/crosvm/ CHANGELOG: v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify descriptor flags field; comment wording cleanups. v6: don't set T_OUT bit to discard and write zeroes commands. v5: use new block layer API: blk_queue_flag_set. v4: several optimizations based on MST's comments, remove bit field usage for command descriptor. v3: define the virtio-blk protocol to add discard and write zeroes support, first version implementation based on proposed specification. v2: add write zeroes command support. v1: initial proposal implementation for discard command. --- drivers/block/virtio_blk.c | 95 ++++++++++++++++++++++++++++++++- include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++ 2 files changed, 147 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 23752dc99b00..04a7ae602e2f 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -18,6 +18,7 @@ #define PART_BITS 4 #define VQ_NAME_LEN 16 +#define MAX_DISCARD_SEGMENTS 256 static int major; static DEFINE_IDA(vd_index_ida); @@ -172,10 +173,50 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); } + +static inline int virtblk_setup_discard_write_zeroes(struct request *req, + bool unmap) +{ + unsigned short segments = blk_rq_nr_discard_segments(req); + unsigned short n = 0; + struct virtio_blk_discard_write_zeroes *range; + struct bio *bio; + u32 flags = 0; + + if (unmap) + flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP; + + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); + if (!range) + return -ENOMEM; + + __rq_for_each_bio(bio, req) { + u64 sector = bio->bi_iter.bi_sector; + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; + + range[n].flags = cpu_to_le32(flags); + range[n].num_sectors = cpu_to_le32(num_sectors); + range[n].sector = cpu_to_le64(sector); + n++; + } + + req->special_vec.bv_page = virt_to_page(range); + req->special_vec.bv_offset = offset_in_page(range); + req->special_vec.bv_len = sizeof(*range) * segments; + req->rq_flags |= RQF_SPECIAL_PAYLOAD; + + return 0; +} + static inline void virtblk_request_done(struct request *req) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { + kfree(page_address(req->special_vec.bv_page) + + req->special_vec.bv_offset); + } + switch (req_op(req)) { case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: @@ -225,6 +266,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, int qid = hctx->queue_num; int err; bool notify = false; + bool unmap = false; u32 type; BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); @@ -237,6 +279,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, case REQ_OP_FLUSH: type = VIRTIO_BLK_T_FLUSH; break; + case REQ_OP_DISCARD: + type = VIRTIO_BLK_T_DISCARD; + break; + case REQ_OP_WRITE_ZEROES: + type = VIRTIO_BLK_T_WRITE_ZEROES; + unmap = !(req->cmd_flags & REQ_NOUNMAP); + break; case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: type = VIRTIO_BLK_T_SCSI_CMD; @@ -256,6 +305,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(req); + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) { + err = virtblk_setup_discard_write_zeroes(req, unmap); + if (err) + return BLK_STS_RESOURCE; + } + num = blk_rq_map_sg(hctx->queue, req, vbr->sg); if (num) { if (rq_data_dir(req) == WRITE) @@ -777,6 +832,42 @@ static int virtblk_probe(struct virtio_device *vdev) if (!err && opt_io_size) blk_queue_io_opt(q, blk_size * opt_io_size); + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { + q->limits.discard_granularity = blk_size; + + virtio_cread(vdev, struct virtio_blk_config, + discard_sector_alignment, &v); + if (v) + q->limits.discard_alignment = v << SECTOR_SHIFT; + else + q->limits.discard_alignment = 0; + + virtio_cread(vdev, struct virtio_blk_config, + max_discard_sectors, &v); + if (v) + blk_queue_max_discard_sectors(q, v); + else + blk_queue_max_discard_sectors(q, UINT_MAX); + + virtio_cread(vdev, struct virtio_blk_config, max_discard_seg, + &v); + if (v && v <= MAX_DISCARD_SEGMENTS) + blk_queue_max_discard_segments(q, v); + else + blk_queue_max_discard_segments(q, MAX_DISCARD_SEGMENTS); + + blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); + } + + if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) { + virtio_cread(vdev, struct virtio_blk_config, + max_write_zeroes_sectors, &v); + if (v) + blk_queue_max_write_zeroes_sectors(q, v); + else + blk_queue_max_write_zeroes_sectors(q, UINT_MAX); + } + virtblk_update_capacity(vblk, false); virtio_device_ready(vdev); @@ -885,14 +976,14 @@ static unsigned int features_legacy[] = { VIRTIO_BLK_F_SCSI, #endif VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, - VIRTIO_BLK_F_MQ, + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, } ; static unsigned int features[] = { VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, - VIRTIO_BLK_F_MQ, + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, }; static struct virtio_driver virtio_blk = { diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h index 9ebe4d968dd5..682afbfe3aa4 100644 --- a/include/uapi/linux/virtio_blk.h +++ b/include/uapi/linux/virtio_blk.h @@ -38,6 +38,8 @@ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ +#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ +#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ /* Legacy feature bits */ #ifndef VIRTIO_BLK_NO_LEGACY @@ -86,6 +88,39 @@ struct virtio_blk_config { /* number of vqs, only available when VIRTIO_BLK_F_MQ is set */ __u16 num_queues; + + /* the next 3 entries are guarded by VIRTIO_BLK_F_DISCARD */ + /* + * The maximum discard sectors (in 512-byte sectors) for + * one segment. + */ + __u32 max_discard_sectors; + /* + * The maximum number of discard segments in a + * discard command. + */ + __u32 max_discard_seg; + /* Discard commands must be aligned to this number of sectors. */ + __u32 discard_sector_alignment; + + /* the next 3 entries are guarded by VIRTIO_BLK_F_WRITE_ZEROES */ + /* + * The maximum number of write zeroes sectors (in 512-byte sectors) in + * one segment. + */ + __u32 max_write_zeroes_sectors; + /* + * The maximum number of segments in a write zeroes + * command. + */ + __u32 max_write_zeroes_seg; + /* + * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the + * deallocation of one or more of the sectors. + */ + __u8 write_zeroes_may_unmap; + + __u8 unused1[3]; } __attribute__((packed)); /* @@ -114,6 +149,12 @@ struct virtio_blk_config { /* Get device ID command */ #define VIRTIO_BLK_T_GET_ID 8 +/* Discard command */ +#define VIRTIO_BLK_T_DISCARD 11 + +/* Write zeroes command */ +#define VIRTIO_BLK_T_WRITE_ZEROES 13 + #ifndef VIRTIO_BLK_NO_LEGACY /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x80000000 @@ -133,6 +174,19 @@ struct virtio_blk_outhdr { __virtio64 sector; }; +/* Unmap this range (only valid for write zeroes command) */ +#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 + +/* Discard/write zeroes range for each request. */ +struct virtio_blk_discard_write_zeroes { + /* discard/write zeroes start sector */ + __virtio64 sector; + /* number of discard/write zeroes sectors */ + __virtio32 num_sectors; + /* flags for this range */ + __virtio32 flags; +}; + #ifndef VIRTIO_BLK_NO_LEGACY struct virtio_scsi_inhdr { __virtio32 errors; -- 2.19.0.605.g01d371f741-goog
Michael S. Tsirkin
2018-Oct-15 00:54 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:> From: Changpeng Liu <changpeng.liu at intel.com> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > block specification has been extended to add VIRTIO_BLK_T_DISCARD and > VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for > discard and write zeroes in the virtio-blk driver when the device > advertises the corresponding features, VIRTIO_BLK_F_DISCARD and > VIRTIO_BLK_F_WRITE_ZEROES. > > Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> > Signed-off-by: Daniel Verkamp <dverkamp at chromium.org>Cc Paolo as well.> --- > dverkamp: I've picked up this patch and made a few minor changes (as > listed below); most notably, I changed the kmalloc back to GFP_ATOMIC, > since it can be called from a context where sleeping is not allowed. > To prevent large allocations, I've also clamped the maximum number of > discard segments to 256; this results in a 4K allocation and should be > plenty of descriptors for most use cases. > > I also removed most of the description from the commit message, since it > was duplicating the comments from virtio_blk.h and quoting parts of the > spec without adding any extra information. I have tested this iteration > of the patch using crosvm with modifications to enable the new features: > https://chromium.googlesource.com/chromiumos/platform/crosvm/ > > CHANGELOG: > v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant > v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify > descriptor flags field; comment wording cleanups. > v6: don't set T_OUT bit to discard and write zeroes commands. > v5: use new block layer API: blk_queue_flag_set. > v4: several optimizations based on MST's comments, remove bit field > usage for command descriptor. > v3: define the virtio-blk protocol to add discard and write zeroes > support, first version implementation based on proposed specification. > v2: add write zeroes command support. > v1: initial proposal implementation for discard command. > --- > drivers/block/virtio_blk.c | 95 ++++++++++++++++++++++++++++++++- > include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++ > 2 files changed, 147 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 23752dc99b00..04a7ae602e2f 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -18,6 +18,7 @@ > > #define PART_BITS 4 > #define VQ_NAME_LEN 16 > +#define MAX_DISCARD_SEGMENTS 256 > > static int major; > static DEFINE_IDA(vd_index_ida); > @@ -172,10 +173,50 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, > return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); > } > > + > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > + bool unmap) > +{ > + unsigned short segments = blk_rq_nr_discard_segments(req); > + unsigned short n = 0; > + struct virtio_blk_discard_write_zeroes *range; > + struct bio *bio; > + u32 flags = 0; > + > + if (unmap) > + flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP; > + > + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); > + if (!range) > + return -ENOMEM; > + > + __rq_for_each_bio(bio, req) { > + u64 sector = bio->bi_iter.bi_sector; > + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > + > + range[n].flags = cpu_to_le32(flags); > + range[n].num_sectors = cpu_to_le32(num_sectors); > + range[n].sector = cpu_to_le64(sector); > + n++; > + } > + > + req->special_vec.bv_page = virt_to_page(range); > + req->special_vec.bv_offset = offset_in_page(range); > + req->special_vec.bv_len = sizeof(*range) * segments; > + req->rq_flags |= RQF_SPECIAL_PAYLOAD; > + > + return 0; > +} > + > static inline void virtblk_request_done(struct request *req) > { > struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); > > + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { > + kfree(page_address(req->special_vec.bv_page) + > + req->special_vec.bv_offset); > + } > + > switch (req_op(req)) { > case REQ_OP_SCSI_IN: > case REQ_OP_SCSI_OUT: > @@ -225,6 +266,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > int qid = hctx->queue_num; > int err; > bool notify = false; > + bool unmap = false; > u32 type; > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > @@ -237,6 +279,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > case REQ_OP_FLUSH: > type = VIRTIO_BLK_T_FLUSH; > break; > + case REQ_OP_DISCARD: > + type = VIRTIO_BLK_T_DISCARD; > + break; > + case REQ_OP_WRITE_ZEROES: > + type = VIRTIO_BLK_T_WRITE_ZEROES; > + unmap = !(req->cmd_flags & REQ_NOUNMAP); > + break; > case REQ_OP_SCSI_IN: > case REQ_OP_SCSI_OUT: > type = VIRTIO_BLK_T_SCSI_CMD; > @@ -256,6 +305,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > > blk_mq_start_request(req); > > + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) { > + err = virtblk_setup_discard_write_zeroes(req, unmap); > + if (err) > + return BLK_STS_RESOURCE; > + } > + > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > if (num) { > if (rq_data_dir(req) == WRITE) > @@ -777,6 +832,42 @@ static int virtblk_probe(struct virtio_device *vdev) > if (!err && opt_io_size) > blk_queue_io_opt(q, blk_size * opt_io_size); > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { > + q->limits.discard_granularity = blk_size; > + > + virtio_cread(vdev, struct virtio_blk_config, > + discard_sector_alignment, &v); > + if (v) > + q->limits.discard_alignment = v << SECTOR_SHIFT; > + else > + q->limits.discard_alignment = 0; > + > + virtio_cread(vdev, struct virtio_blk_config, > + max_discard_sectors, &v); > + if (v) > + blk_queue_max_discard_sectors(q, v); > + else > + blk_queue_max_discard_sectors(q, UINT_MAX); > + > + virtio_cread(vdev, struct virtio_blk_config, max_discard_seg, > + &v); > + if (v && v <= MAX_DISCARD_SEGMENTS) > + blk_queue_max_discard_segments(q, v); > + else > + blk_queue_max_discard_segments(q, MAX_DISCARD_SEGMENTS); > + > + blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); > + } > + > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) { > + virtio_cread(vdev, struct virtio_blk_config, > + max_write_zeroes_sectors, &v); > + if (v) > + blk_queue_max_write_zeroes_sectors(q, v); > + else > + blk_queue_max_write_zeroes_sectors(q, UINT_MAX); > + } > + > virtblk_update_capacity(vblk, false); > virtio_device_ready(vdev); > > @@ -885,14 +976,14 @@ static unsigned int features_legacy[] = { > VIRTIO_BLK_F_SCSI, > #endif > VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, > - VIRTIO_BLK_F_MQ, > + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, > } > ; > static unsigned int features[] = { > VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, > VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, > VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, > - VIRTIO_BLK_F_MQ, > + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, > }; > > static struct virtio_driver virtio_blk = { > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h > index 9ebe4d968dd5..682afbfe3aa4 100644 > --- a/include/uapi/linux/virtio_blk.h > +++ b/include/uapi/linux/virtio_blk.h > @@ -38,6 +38,8 @@ > #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ > #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ > #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ > +#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ > +#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ > > /* Legacy feature bits */ > #ifndef VIRTIO_BLK_NO_LEGACY > @@ -86,6 +88,39 @@ struct virtio_blk_config { > > /* number of vqs, only available when VIRTIO_BLK_F_MQ is set */ > __u16 num_queues; > + > + /* the next 3 entries are guarded by VIRTIO_BLK_F_DISCARD */ > + /* > + * The maximum discard sectors (in 512-byte sectors) for > + * one segment. > + */ > + __u32 max_discard_sectors; > + /* > + * The maximum number of discard segments in a > + * discard command. > + */ > + __u32 max_discard_seg; > + /* Discard commands must be aligned to this number of sectors. */ > + __u32 discard_sector_alignment; > + > + /* the next 3 entries are guarded by VIRTIO_BLK_F_WRITE_ZEROES */ > + /* > + * The maximum number of write zeroes sectors (in 512-byte sectors) in > + * one segment. > + */ > + __u32 max_write_zeroes_sectors; > + /* > + * The maximum number of segments in a write zeroes > + * command. > + */ > + __u32 max_write_zeroes_seg; > + /* > + * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the > + * deallocation of one or more of the sectors. > + */ > + __u8 write_zeroes_may_unmap; > + > + __u8 unused1[3]; > } __attribute__((packed)); > > /* > @@ -114,6 +149,12 @@ struct virtio_blk_config { > /* Get device ID command */ > #define VIRTIO_BLK_T_GET_ID 8 > > +/* Discard command */ > +#define VIRTIO_BLK_T_DISCARD 11 > + > +/* Write zeroes command */ > +#define VIRTIO_BLK_T_WRITE_ZEROES 13 > + > #ifndef VIRTIO_BLK_NO_LEGACY > /* Barrier before this op. */ > #define VIRTIO_BLK_T_BARRIER 0x80000000 > @@ -133,6 +174,19 @@ struct virtio_blk_outhdr { > __virtio64 sector; > }; > > +/* Unmap this range (only valid for write zeroes command) */ > +#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 > + > +/* Discard/write zeroes range for each request. */ > +struct virtio_blk_discard_write_zeroes { > + /* discard/write zeroes start sector */ > + __virtio64 sector; > + /* number of discard/write zeroes sectors */ > + __virtio32 num_sectors; > + /* flags for this range */ > + __virtio32 flags; > +}; > + > #ifndef VIRTIO_BLK_NO_LEGACY > struct virtio_scsi_inhdr { > __virtio32 errors; > -- > 2.19.0.605.g01d371f741-goog
Christoph Hellwig
2018-Oct-15 09:27 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:> From: Changpeng Liu <changpeng.liu at intel.com> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtioThere is some issues in this spec. For one using the multiple ranges also for write zeroes is rather inefficient. Write zeroes really should use the same format as read and write. Second the unmap flag isn't properly specified at all, as nothing says the device may not unmap without the unmap flag. Please take a look at the SCSI or NVMe ?pec for some guidance.> +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > + bool unmap)Why is this an inline function?
Michael S. Tsirkin
2018-Nov-01 21:25 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
On Mon, Oct 29, 2018 at 05:05:21AM +0000, Stefan Hajnoczi wrote:> On Fri, Oct 26, 2018 at 10:47:16AM -0400, Michael S. Tsirkin wrote: > > On Fri, Oct 26, 2018 at 09:08:38AM +0100, Stefan Hajnoczi wrote: > > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > > > + range[n].flags = cpu_to_le32(flags); > > > > + range[n].num_sectors = cpu_to_le32(num_sectors); > > > > + range[n].sector = cpu_to_le64(sector); > > > ... > > > > +/* Discard/write zeroes range for each request. */ > > > > +struct virtio_blk_discard_write_zeroes { > > > > + /* discard/write zeroes start sector */ > > > > + __virtio64 sector; > > > > + /* number of discard/write zeroes sectors */ > > > > + __virtio32 num_sectors; > > > > + /* flags for this range */ > > > > + __virtio32 flags; > > > > > > cpu_to_le32() is being used on __virtio32 fields instead of cpu_to_virtio32(). > > > > > > From include/uapi/linux/virtio_types.h: > > > > > > /* > > > * __virtio{16,32,64} have the following meaning: > > > * - __u{16,32,64} for virtio devices in legacy mode, accessed in native endian > > > * - __le{16,32,64} for standard-compliant virtio devices > > > */ > > > > > > From the VIRTIO specification: > > > > > > struct virtio_blk_discard_write_zeroes { > > > le64 sector; > > > le32 num_sectors; > > > struct { > > > le32 unmap:1; > > > le32 reserved:31; > > > } flags; > > > }; > > > > > > > > > Since the VIRTIO spec says these fields are little-endian, I think these > > > fields should be declared just __u32 and __u64 instead of __virtio32 and > > > __virtio64. > > > > > > Stefan > > > > > > __le32/__le64 rather? > > Yes. > > StefanI agree. And further using bitfields for this is questionable - it is preferable to set bits in a full 32 bit field using "|". -- MST
Daniel Verkamp
2018-Nov-01 22:18 UTC
[PATCH v8] virtio_blk: add discard and write zeroes support
On Thu, Nov 1, 2018 at 2:25 PM Michael S. Tsirkin <mst at redhat.com> wrote:> > On Mon, Oct 29, 2018 at 05:05:21AM +0000, Stefan Hajnoczi wrote: > > On Fri, Oct 26, 2018 at 10:47:16AM -0400, Michael S. Tsirkin wrote: > > > On Fri, Oct 26, 2018 at 09:08:38AM +0100, Stefan Hajnoczi wrote: > > > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > > > > + range[n].flags = cpu_to_le32(flags); > > > > > + range[n].num_sectors = cpu_to_le32(num_sectors); > > > > > + range[n].sector = cpu_to_le64(sector); > > > > ... > > > > > +/* Discard/write zeroes range for each request. */ > > > > > +struct virtio_blk_discard_write_zeroes { > > > > > + /* discard/write zeroes start sector */ > > > > > + __virtio64 sector; > > > > > + /* number of discard/write zeroes sectors */ > > > > > + __virtio32 num_sectors; > > > > > + /* flags for this range */ > > > > > + __virtio32 flags; > > > > > > > > cpu_to_le32() is being used on __virtio32 fields instead of cpu_to_virtio32(). > > > > > > > > From include/uapi/linux/virtio_types.h: > > > > > > > > /* > > > > * __virtio{16,32,64} have the following meaning: > > > > * - __u{16,32,64} for virtio devices in legacy mode, accessed in native endian > > > > * - __le{16,32,64} for standard-compliant virtio devices > > > > */ > > > > > > > > From the VIRTIO specification: > > > > > > > > struct virtio_blk_discard_write_zeroes { > > > > le64 sector; > > > > le32 num_sectors; > > > > struct { > > > > le32 unmap:1; > > > > le32 reserved:31; > > > > } flags; > > > > }; > > > > > > > > > > > > Since the VIRTIO spec says these fields are little-endian, I think these > > > > fields should be declared just __u32 and __u64 instead of __virtio32 and > > > > __virtio64. > > > > > > > > Stefan > > > > > > > > > __le32/__le64 rather? > > > > Yes. > > > > Stefan > > I agree. And further using bitfields for this is questionable - > it is preferable to set bits in a full 32 bit field using "|".The bitfield is only in the specification, not the code - the actual implementation in this patch (quoted above from earlier in the thread) uses a 32-bit field with a flag #define. I did misunderstand the meaning of __virtio32 vs __le32 - I'll fix that up (I think the spec definition and code for these is already correct; the structure definition just needs to change to match). Thanks, -- Daniel
Daniel Verkamp
2018-Nov-01 22:40 UTC
[PATCH v9] virtio_blk: add discard and write zeroes support
From: Changpeng Liu <changpeng.liu at intel.com> In commit 88c85538, "virtio-blk: add discard and write zeroes features to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio block specification has been extended to add VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for discard and write zeroes in the virtio-blk driver when the device advertises the corresponding features, VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES. Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> Signed-off-by: Daniel Verkamp <dverkamp at chromium.org> --- dverkamp: I've picked up this patch and made a few minor changes (as listed below); most notably, I changed the kmalloc back to GFP_ATOMIC, since it can be called from a context where sleeping is not allowed. To prevent large allocations, I've also clamped the maximum number of discard segments to 256; this results in a 4K allocation and should be plenty of descriptors for most use cases. I also removed most of the description from the commit message, since it was duplicating the comments from virtio_blk.h and quoting parts of the spec without adding any extra information. I have tested this iteration of the patch using crosvm with modifications to enable the new features: https://chromium.googlesource.com/chromiumos/platform/crosvm/ v9 fixes a number of review issues; I didn't attempt to optimize the single-element write zeroes case, so it still does an allocation per request (I did not see any easy place to put the payload that would avoid the allocation). CHANGELOG: v9: [dverkamp] fix LE types in discard struct; cleanups from Ming Lei v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify descriptor flags field; comment wording cleanups. v6: don't set T_OUT bit to discard and write zeroes commands. v5: use new block layer API: blk_queue_flag_set. v4: several optimizations based on MST's comments, remove bit field usage for command descriptor. v3: define the virtio-blk protocol to add discard and write zeroes support, first version implementation based on proposed specification. v2: add write zeroes command support. v1: initial proposal implementation for discard command. --- drivers/block/virtio_blk.c | 83 ++++++++++++++++++++++++++++++++- include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++++ 2 files changed, 135 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 086c6bb12baa..0f39efb4b3aa 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -18,6 +18,7 @@ #define PART_BITS 4 #define VQ_NAME_LEN 16 +#define MAX_DISCARD_SEGMENTS 256u static int major; static DEFINE_IDA(vd_index_ida); @@ -172,10 +173,48 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); } +static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) +{ + unsigned short segments = blk_rq_nr_discard_segments(req); + unsigned short n = 0; + struct virtio_blk_discard_write_zeroes *range; + struct bio *bio; + u32 flags = 0; + + if (unmap) + flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP; + + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); + if (!range) + return -ENOMEM; + + __rq_for_each_bio(bio, req) { + u64 sector = bio->bi_iter.bi_sector; + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; + + range[n].flags = cpu_to_le32(flags); + range[n].num_sectors = cpu_to_le32(num_sectors); + range[n].sector = cpu_to_le64(sector); + n++; + } + + req->special_vec.bv_page = virt_to_page(range); + req->special_vec.bv_offset = offset_in_page(range); + req->special_vec.bv_len = sizeof(*range) * segments; + req->rq_flags |= RQF_SPECIAL_PAYLOAD; + + return 0; +} + static inline void virtblk_request_done(struct request *req) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { + kfree(page_address(req->special_vec.bv_page) + + req->special_vec.bv_offset); + } + switch (req_op(req)) { case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: @@ -225,6 +264,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, int qid = hctx->queue_num; int err; bool notify = false; + bool unmap = false; u32 type; BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); @@ -237,6 +277,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, case REQ_OP_FLUSH: type = VIRTIO_BLK_T_FLUSH; break; + case REQ_OP_DISCARD: + type = VIRTIO_BLK_T_DISCARD; + break; + case REQ_OP_WRITE_ZEROES: + type = VIRTIO_BLK_T_WRITE_ZEROES; + unmap = !(req->cmd_flags & REQ_NOUNMAP); + break; case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: type = VIRTIO_BLK_T_SCSI_CMD; @@ -256,6 +303,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(req); + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) { + err = virtblk_setup_discard_write_zeroes(req, unmap); + if (err) + return BLK_STS_RESOURCE; + } + num = blk_rq_map_sg(hctx->queue, req, vbr->sg); if (num) { if (rq_data_dir(req) == WRITE) @@ -802,6 +855,32 @@ static int virtblk_probe(struct virtio_device *vdev) if (!err && opt_io_size) blk_queue_io_opt(q, blk_size * opt_io_size); + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { + q->limits.discard_granularity = blk_size; + + virtio_cread(vdev, struct virtio_blk_config, + discard_sector_alignment, &v); + q->limits.discard_alignment = v ? v << SECTOR_SHIFT : 0; + + virtio_cread(vdev, struct virtio_blk_config, + max_discard_sectors, &v); + blk_queue_max_discard_sectors(q, v ? v : UINT_MAX); + + virtio_cread(vdev, struct virtio_blk_config, max_discard_seg, + &v); + blk_queue_max_discard_segments(q, + min_not_zero(v, + MAX_DISCARD_SEGMENTS)); + + blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); + } + + if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) { + virtio_cread(vdev, struct virtio_blk_config, + max_write_zeroes_sectors, &v); + blk_queue_max_write_zeroes_sectors(q, v ? v : UINT_MAX); + } + virtblk_update_capacity(vblk, false); virtio_device_ready(vdev); @@ -895,14 +974,14 @@ static unsigned int features_legacy[] = { VIRTIO_BLK_F_SCSI, #endif VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, - VIRTIO_BLK_F_MQ, + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, } ; static unsigned int features[] = { VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, - VIRTIO_BLK_F_MQ, + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, }; static struct virtio_driver virtio_blk = { diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h index 9ebe4d968dd5..0f99d7b49ede 100644 --- a/include/uapi/linux/virtio_blk.h +++ b/include/uapi/linux/virtio_blk.h @@ -38,6 +38,8 @@ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ +#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ +#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ /* Legacy feature bits */ #ifndef VIRTIO_BLK_NO_LEGACY @@ -86,6 +88,39 @@ struct virtio_blk_config { /* number of vqs, only available when VIRTIO_BLK_F_MQ is set */ __u16 num_queues; + + /* the next 3 entries are guarded by VIRTIO_BLK_F_DISCARD */ + /* + * The maximum discard sectors (in 512-byte sectors) for + * one segment. + */ + __u32 max_discard_sectors; + /* + * The maximum number of discard segments in a + * discard command. + */ + __u32 max_discard_seg; + /* Discard commands must be aligned to this number of sectors. */ + __u32 discard_sector_alignment; + + /* the next 3 entries are guarded by VIRTIO_BLK_F_WRITE_ZEROES */ + /* + * The maximum number of write zeroes sectors (in 512-byte sectors) in + * one segment. + */ + __u32 max_write_zeroes_sectors; + /* + * The maximum number of segments in a write zeroes + * command. + */ + __u32 max_write_zeroes_seg; + /* + * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the + * deallocation of one or more of the sectors. + */ + __u8 write_zeroes_may_unmap; + + __u8 unused1[3]; } __attribute__((packed)); /* @@ -114,6 +149,12 @@ struct virtio_blk_config { /* Get device ID command */ #define VIRTIO_BLK_T_GET_ID 8 +/* Discard command */ +#define VIRTIO_BLK_T_DISCARD 11 + +/* Write zeroes command */ +#define VIRTIO_BLK_T_WRITE_ZEROES 13 + #ifndef VIRTIO_BLK_NO_LEGACY /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x80000000 @@ -133,6 +174,19 @@ struct virtio_blk_outhdr { __virtio64 sector; }; +/* Unmap this range (only valid for write zeroes command) */ +#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 + +/* Discard/write zeroes range for each request. */ +struct virtio_blk_discard_write_zeroes { + /* discard/write zeroes start sector */ + __le64 sector; + /* number of discard/write zeroes sectors */ + __le32 num_sectors; + /* flags for this range */ + __le32 flags; +}; + #ifndef VIRTIO_BLK_NO_LEGACY struct virtio_scsi_inhdr { __virtio32 errors; -- 2.19.1.568.g152ad8e336-goog
Dongli Zhang
2018-Nov-01 23:43 UTC
[PATCH v9] virtio_blk: add discard and write zeroes support
Hi Daniel, Other than crosvm, is there any version of qemu (e.g., repositories developed in progress on github) where I can try with this feature? Thank you very much! Dongli Zhang On 11/02/2018 06:40 AM, Daniel Verkamp wrote:> From: Changpeng Liu <changpeng.liu at intel.com> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > block specification has been extended to add VIRTIO_BLK_T_DISCARD and > VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for > discard and write zeroes in the virtio-blk driver when the device > advertises the corresponding features, VIRTIO_BLK_F_DISCARD and > VIRTIO_BLK_F_WRITE_ZEROES. > > Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> > Signed-off-by: Daniel Verkamp <dverkamp at chromium.org> > --- > dverkamp: I've picked up this patch and made a few minor changes (as > listed below); most notably, I changed the kmalloc back to GFP_ATOMIC, > since it can be called from a context where sleeping is not allowed. > To prevent large allocations, I've also clamped the maximum number of > discard segments to 256; this results in a 4K allocation and should be > plenty of descriptors for most use cases. > > I also removed most of the description from the commit message, since it > was duplicating the comments from virtio_blk.h and quoting parts of the > spec without adding any extra information. I have tested this iteration > of the patch using crosvm with modifications to enable the new features: > https://chromium.googlesource.com/chromiumos/platform/crosvm/ > > v9 fixes a number of review issues; I didn't attempt to optimize the > single-element write zeroes case, so it still does an allocation per > request (I did not see any easy place to put the payload that would > avoid the allocation). > > CHANGELOG: > v9: [dverkamp] fix LE types in discard struct; cleanups from Ming Lei > v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant > v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify > descriptor flags field; comment wording cleanups. > v6: don't set T_OUT bit to discard and write zeroes commands. > v5: use new block layer API: blk_queue_flag_set. > v4: several optimizations based on MST's comments, remove bit field > usage for command descriptor. > v3: define the virtio-blk protocol to add discard and write zeroes > support, first version implementation based on proposed specification. > v2: add write zeroes command support. > v1: initial proposal implementation for discard command. > --- > drivers/block/virtio_blk.c | 83 ++++++++++++++++++++++++++++++++- > include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++++ > 2 files changed, 135 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 086c6bb12baa..0f39efb4b3aa 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -18,6 +18,7 @@ > > #define PART_BITS 4 > #define VQ_NAME_LEN 16 > +#define MAX_DISCARD_SEGMENTS 256u > > static int major; > static DEFINE_IDA(vd_index_ida); > @@ -172,10 +173,48 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, > return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); > } > > +static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) > +{ > + unsigned short segments = blk_rq_nr_discard_segments(req); > + unsigned short n = 0; > + struct virtio_blk_discard_write_zeroes *range; > + struct bio *bio; > + u32 flags = 0; > + > + if (unmap) > + flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP; > + > + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); > + if (!range) > + return -ENOMEM; > + > + __rq_for_each_bio(bio, req) { > + u64 sector = bio->bi_iter.bi_sector; > + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > + > + range[n].flags = cpu_to_le32(flags); > + range[n].num_sectors = cpu_to_le32(num_sectors); > + range[n].sector = cpu_to_le64(sector); > + n++; > + } > + > + req->special_vec.bv_page = virt_to_page(range); > + req->special_vec.bv_offset = offset_in_page(range); > + req->special_vec.bv_len = sizeof(*range) * segments; > + req->rq_flags |= RQF_SPECIAL_PAYLOAD; > + > + return 0; > +} > + > static inline void virtblk_request_done(struct request *req) > { > struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); > > + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { > + kfree(page_address(req->special_vec.bv_page) + > + req->special_vec.bv_offset); > + } > + > switch (req_op(req)) { > case REQ_OP_SCSI_IN: > case REQ_OP_SCSI_OUT: > @@ -225,6 +264,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > int qid = hctx->queue_num; > int err; > bool notify = false; > + bool unmap = false; > u32 type; > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > @@ -237,6 +277,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > case REQ_OP_FLUSH: > type = VIRTIO_BLK_T_FLUSH; > break; > + case REQ_OP_DISCARD: > + type = VIRTIO_BLK_T_DISCARD; > + break; > + case REQ_OP_WRITE_ZEROES: > + type = VIRTIO_BLK_T_WRITE_ZEROES; > + unmap = !(req->cmd_flags & REQ_NOUNMAP); > + break; > case REQ_OP_SCSI_IN: > case REQ_OP_SCSI_OUT: > type = VIRTIO_BLK_T_SCSI_CMD; > @@ -256,6 +303,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > > blk_mq_start_request(req); > > + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) { > + err = virtblk_setup_discard_write_zeroes(req, unmap); > + if (err) > + return BLK_STS_RESOURCE; > + } > + > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > if (num) { > if (rq_data_dir(req) == WRITE) > @@ -802,6 +855,32 @@ static int virtblk_probe(struct virtio_device *vdev) > if (!err && opt_io_size) > blk_queue_io_opt(q, blk_size * opt_io_size); > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { > + q->limits.discard_granularity = blk_size; > + > + virtio_cread(vdev, struct virtio_blk_config, > + discard_sector_alignment, &v); > + q->limits.discard_alignment = v ? v << SECTOR_SHIFT : 0; > + > + virtio_cread(vdev, struct virtio_blk_config, > + max_discard_sectors, &v); > + blk_queue_max_discard_sectors(q, v ? v : UINT_MAX); > + > + virtio_cread(vdev, struct virtio_blk_config, max_discard_seg, > + &v); > + blk_queue_max_discard_segments(q, > + min_not_zero(v, > + MAX_DISCARD_SEGMENTS)); > + > + blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); > + } > + > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) { > + virtio_cread(vdev, struct virtio_blk_config, > + max_write_zeroes_sectors, &v); > + blk_queue_max_write_zeroes_sectors(q, v ? v : UINT_MAX); > + } > + > virtblk_update_capacity(vblk, false); > virtio_device_ready(vdev); > > @@ -895,14 +974,14 @@ static unsigned int features_legacy[] = { > VIRTIO_BLK_F_SCSI, > #endif > VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, > - VIRTIO_BLK_F_MQ, > + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, > } > ; > static unsigned int features[] = { > VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, > VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, > VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, > - VIRTIO_BLK_F_MQ, > + VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, > }; > > static struct virtio_driver virtio_blk = { > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h > index 9ebe4d968dd5..0f99d7b49ede 100644 > --- a/include/uapi/linux/virtio_blk.h > +++ b/include/uapi/linux/virtio_blk.h > @@ -38,6 +38,8 @@ > #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ > #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ > #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ > +#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ > +#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ > > /* Legacy feature bits */ > #ifndef VIRTIO_BLK_NO_LEGACY > @@ -86,6 +88,39 @@ struct virtio_blk_config { > > /* number of vqs, only available when VIRTIO_BLK_F_MQ is set */ > __u16 num_queues; > + > + /* the next 3 entries are guarded by VIRTIO_BLK_F_DISCARD */ > + /* > + * The maximum discard sectors (in 512-byte sectors) for > + * one segment. > + */ > + __u32 max_discard_sectors; > + /* > + * The maximum number of discard segments in a > + * discard command. > + */ > + __u32 max_discard_seg; > + /* Discard commands must be aligned to this number of sectors. */ > + __u32 discard_sector_alignment; > + > + /* the next 3 entries are guarded by VIRTIO_BLK_F_WRITE_ZEROES */ > + /* > + * The maximum number of write zeroes sectors (in 512-byte sectors) in > + * one segment. > + */ > + __u32 max_write_zeroes_sectors; > + /* > + * The maximum number of segments in a write zeroes > + * command. > + */ > + __u32 max_write_zeroes_seg; > + /* > + * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the > + * deallocation of one or more of the sectors. > + */ > + __u8 write_zeroes_may_unmap; > + > + __u8 unused1[3]; > } __attribute__((packed)); > > /* > @@ -114,6 +149,12 @@ struct virtio_blk_config { > /* Get device ID command */ > #define VIRTIO_BLK_T_GET_ID 8 > > +/* Discard command */ > +#define VIRTIO_BLK_T_DISCARD 11 > + > +/* Write zeroes command */ > +#define VIRTIO_BLK_T_WRITE_ZEROES 13 > + > #ifndef VIRTIO_BLK_NO_LEGACY > /* Barrier before this op. */ > #define VIRTIO_BLK_T_BARRIER 0x80000000 > @@ -133,6 +174,19 @@ struct virtio_blk_outhdr { > __virtio64 sector; > }; > > +/* Unmap this range (only valid for write zeroes command) */ > +#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 > + > +/* Discard/write zeroes range for each request. */ > +struct virtio_blk_discard_write_zeroes { > + /* discard/write zeroes start sector */ > + __le64 sector; > + /* number of discard/write zeroes sectors */ > + __le32 num_sectors; > + /* flags for this range */ > + __le32 flags; > +}; > + > #ifndef VIRTIO_BLK_NO_LEGACY > struct virtio_scsi_inhdr { > __virtio32 errors; >
Stefan Hajnoczi
2018-Nov-02 04:17 UTC
[PATCH v9] virtio_blk: add discard and write zeroes support
On Thu, Nov 01, 2018 at 03:40:35PM -0700, Daniel Verkamp wrote:> From: Changpeng Liu <changpeng.liu at intel.com> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > block specification has been extended to add VIRTIO_BLK_T_DISCARD and > VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for > discard and write zeroes in the virtio-blk driver when the device > advertises the corresponding features, VIRTIO_BLK_F_DISCARD and > VIRTIO_BLK_F_WRITE_ZEROES. > > Signed-off-by: Changpeng Liu <changpeng.liu at intel.com> > Signed-off-by: Daniel Verkamp <dverkamp at chromium.org> > --- > dverkamp: I've picked up this patch and made a few minor changes (as > listed below); most notably, I changed the kmalloc back to GFP_ATOMIC, > since it can be called from a context where sleeping is not allowed. > To prevent large allocations, I've also clamped the maximum number of > discard segments to 256; this results in a 4K allocation and should be > plenty of descriptors for most use cases. > > I also removed most of the description from the commit message, since it > was duplicating the comments from virtio_blk.h and quoting parts of the > spec without adding any extra information. I have tested this iteration > of the patch using crosvm with modifications to enable the new features: > https://chromium.googlesource.com/chromiumos/platform/crosvm/ > > v9 fixes a number of review issues; I didn't attempt to optimize the > single-element write zeroes case, so it still does an allocation per > request (I did not see any easy place to put the payload that would > avoid the allocation). > > CHANGELOG: > v9: [dverkamp] fix LE types in discard struct; cleanups from Ming Lei > v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant > v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify > descriptor flags field; comment wording cleanups. > v6: don't set T_OUT bit to discard and write zeroes commands. > v5: use new block layer API: blk_queue_flag_set. > v4: several optimizations based on MST's comments, remove bit field > usage for command descriptor. > v3: define the virtio-blk protocol to add discard and write zeroes > support, first version implementation based on proposed specification. > v2: add write zeroes command support. > v1: initial proposal implementation for discard command. > --- > drivers/block/virtio_blk.c | 83 ++++++++++++++++++++++++++++++++- > include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++++ > 2 files changed, 135 insertions(+), 2 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20181102/989d2c4e/attachment.sig>