On 2020/7/30 ??4:30, Jeffle Xu wrote:> Before commit eded341c085b ("block: don't decrement
nr_phys_segments for
> physically contigous segments") applied, the generic block layer may
not
> guarantee that @req->nr_phys_segments equals the number of bios in the
> request. When limits. at max_discard_segments == 1 and the IO scheduler is
> set to scheduler except for "none" (mq-deadline, kyber or bfq,
e.g.),
> the requests merge routine may be called when enqueuing a DISCARD bio.
> When merging two requests, @req->nr_phys_segments will minus one if the
> middle two bios of the requests can be merged into one physical segment,
> causing @req->nr_phys_segments one less than the number of bios in the
> DISCARD request.
>
> In this case, we are in risk of overruning virtio_blk_discard_write_zeroes
> buffers. Though this issue has been fixed by commit eded341c085b
> ("block: don't decrement nr_phys_segments for physically contigous
segments"),
> it can recure once the generic block layer breaks the guarantee as code
> refactoring.
>
> commit 8cb6af7b3a6d ("nvme: Fix discard buffer overrun") has
fixed the similar
> issue in nvme driver. This patch is also inspired by this commit.
>
> Signed-off-by: Jeffle Xu <jefflexu at linux.alibaba.com>
> Reviewed-by: Joseph Qi <joseph.qi at linux.alibaba.com>
> ---
> drivers/block/virtio_blk.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 980df853ee49..248c8f46b51c 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -130,12 +130,19 @@ static int virtblk_setup_discard_write_zeroes(struct
request *req, bool unmap)
> u64 sector = bio->bi_iter.bi_sector;
> u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
>
> - range[n].flags = cpu_to_le32(flags);
> - range[n].num_sectors = cpu_to_le32(num_sectors);
> - range[n].sector = cpu_to_le64(sector);
> + if (n < segments) {
> + range[n].flags = cpu_to_le32(flags);
> + range[n].num_sectors = cpu_to_le32(num_sectors);
> + range[n].sector = cpu_to_le64(sector);
> + }
Not familiar with block but if we start to duplicate checks like this,
it's a hint to move it in the core.
> n++;
> }
>
> + if (WARN_ON_ONCE(n != segments)) {
> + kfree(range);
> + return -EIO;
> + }
> +
> req->special_vec.bv_page = virt_to_page(range);
> req->special_vec.bv_offset = offset_in_page(range);
> req->special_vec.bv_len = sizeof(*range) * segments;
> @@ -246,8 +253,14 @@ static blk_status_t virtio_queue_rq(struct
blk_mq_hw_ctx *hctx,
>
> if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {
> err = virtblk_setup_discard_write_zeroes(req, unmap);
> - if (err)
> - return BLK_STS_RESOURCE;
> + if (err) {
> + switch (err) {
> + case -ENOMEM:
> + return BLK_STS_RESOURCE;
> + default:
> + return BLK_STS_IOERR;
> + }
> + }
This looks not elegant, why not simple if (err? == -ENOMEM) else if
(err) ...
Thanks
> }
>
> num = blk_rq_map_sg(hctx->queue, req, vbr->sg);