Li Dongyang
2011-Sep-01 10:39 UTC
[Xen-devel] [PATCH V4 0/3] xen-blkfront/blkback discard support
Dear list, This is the V4 of the trim support for xen-blkfront/blkback, Now we move BLKIF_OP_TRIM to BLKIF_OP_DISCARD, and dropped all "trim" stuffs in the patches, and use "discard" instead. Also we updated the helpers of blkif_x86_{32|64}_request or we will meet problems using a non-native protocol. And this patch has been tested with both SSD and raw file, with SSD we will forward the discard command and with raw file, the disk usage will reduce as we send discard request in the guest. Changelog V4: switch from BLKIF_OP_TRIM to BLKIF_OP_DISCARD make blkback work with non-native protocol do not abort connection in blkback if we can not setup discard in xenstore Changelog V3: rebased on linus''s tree enum backend types in blkif instead of flags in the interface header more reasonable names in xenstore move trim requesting handling to a separate function do not re-enable interrupts unconditionally when handling response set info->feature-trim only when we have all info needed for request queue Changelog V2: rebased on Jeremy''s tree fixes according to Jan Beulich''s comments Li Dongyang (3): xen-blkfront: add BLKIF_OP_DISCARD and discard request struct xen-blkfront: teach blkfront driver to handle discard requests xen-blkback: discard requests handling in blkback driver drivers/block/xen-blkback/blkback.c | 87 +++++++++++++++++++++++----- drivers/block/xen-blkback/common.h | 93 ++++++++++++++++++++++++----- drivers/block/xen-blkback/xenbus.c | 58 ++++++++++++++++++ drivers/block/xen-blkfront.c | 111 +++++++++++++++++++++++++++------- include/xen/interface/io/blkif.h | 36 +++++++++++ 5 files changed, 331 insertions(+), 54 deletions(-) -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Sep-01 10:39 UTC
[Xen-devel] [PATCH V4 1/3] xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
Now we use BLKIF_OP_DISCARD and add blkif_request_discard to blkif_request union, the patch is taken from Owen Smith and Konrad, Thanks Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Li Dongyang <lidongyang@novell.com> --- include/xen/interface/io/blkif.h | 36 ++++++++++++++++++++++++++++++++++++ 1 files changed, 36 insertions(+), 0 deletions(-) diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h index 3d5d6db..9324488 100644 --- a/include/xen/interface/io/blkif.h +++ b/include/xen/interface/io/blkif.h @@ -57,6 +57,36 @@ typedef uint64_t blkif_sector_t; * "feature-flush-cache" node! */ #define BLKIF_OP_FLUSH_DISKCACHE 3 + +/* + * Recognised only if "feature-discard" is present in backend xenbus info. + * The "feature-discard" node contains a boolean indicating whether trim + * (ATA) or unmap (SCSI) - conviently called discard requests are likely + * to succeed or fail. Either way, a discard request + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by + * the underlying block-device hardware. The boolean simply indicates whether + * or not it is worthwhile for the frontend to attempt discard requests. + * If a backend does not recognise BLKIF_OP_DISCARD, it should *not* + * create the "feature-discard" node! + * + * Discard operation is a request for the underlying block device to mark + * extents to be erased. However, discard does not guarantee that the blocks + * will be erased from the device - it is just a hint to the device + * controller that these blocks are no longer in use. What the device + * controller does with that information is left to the controller. + * Discard operations are passed with sector_number as the + * sector index to begin discard operations at and nr_sectors as the number of + * sectors to be discarded. The specified sectors should be discarded if the + * underlying block device supports trim (ATA) or unmap (SCSI) operations, + * or a BLKIF_RSP_EOPNOTSUPP should be returned. + * More information about trim/unmap operations at: + * http://t13.org/Documents/UploadedDocuments/docs2008/ + * e07154r6-Data_Set_Management_Proposal_for_ATA-ACS2.doc + * http://www.seagate.com/staticfiles/support/disc/manuals/ + * Interface%20manuals/100293068c.pdf + */ +#define BLKIF_OP_DISCARD 5 + /* * Maximum scatter/gather segments per request. * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. @@ -74,6 +104,11 @@ struct blkif_request_rw { } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; }; +struct blkif_request_discard { + blkif_sector_t sector_number; + uint64_t nr_sectors; +}; + struct blkif_request { uint8_t operation; /* BLKIF_OP_??? */ uint8_t nr_segments; /* number of segments */ @@ -81,6 +116,7 @@ struct blkif_request { uint64_t id; /* private guest value, echoed in resp */ union { struct blkif_request_rw rw; + struct blkif_request_discard discard; } u; }; -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Sep-01 10:39 UTC
[Xen-devel] [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests
The blkfront driver now will read discard related nodes from xenstore, and set up the request queue, then we can forward the discard requests to backend driver. Signed-off-by: Li Dongyang <lidongyang@novell.com> --- drivers/block/xen-blkfront.c | 111 +++++++++++++++++++++++++++++++++--------- 1 files changed, 88 insertions(+), 23 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 9ea8c25..86e2c63 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -98,6 +98,9 @@ struct blkfront_info unsigned long shadow_free; unsigned int feature_flush; unsigned int flush_op; + unsigned int feature_discard; + unsigned int discard_granularity; + unsigned int discard_alignment; int is_ready; }; @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) ring_req->operation = info->flush_op; } - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); + if (unlikely(req->cmd_flags & REQ_DISCARD)) { + /* id, sector_number and handle are set above. */ + ring_req->operation = BLKIF_OP_DISCARD; + ring_req->nr_segments = 0; + ring_req->u.discard.nr_sectors = blk_rq_sectors(req); + } else { + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); - fsect = sg->offset >> 9; - lsect = fsect + (sg->length >> 9) - 1; - /* install a grant reference. */ - ref = gnttab_claim_grant_reference(&gref_head); - BUG_ON(ref == -ENOSPC); + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); + fsect = sg->offset >> 9; + lsect = fsect + (sg->length >> 9) - 1; + /* install a grant reference. */ + ref = gnttab_claim_grant_reference(&gref_head); + BUG_ON(ref == -ENOSPC); - gnttab_grant_foreign_access_ref( - ref, - info->xbdev->otherend_id, - buffer_mfn, - rq_data_dir(req) ); - - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); - ring_req->u.rw.seg[i] - (struct blkif_request_segment) { - .gref = ref, - .first_sect = fsect, - .last_sect = lsect }; + gnttab_grant_foreign_access_ref( + ref, + info->xbdev->otherend_id, + buffer_mfn, + rq_data_dir(req)); + + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); + ring_req->u.rw.seg[i] + (struct blkif_request_segment) { + .gref = ref, + .first_sect = fsect, + .last_sect = lsect }; + } } info->ring.req_prod_pvt++; @@ -399,6 +409,7 @@ wait: static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) { struct request_queue *rq; + struct blkfront_info *info = gd->private_data; rq = blk_init_queue(do_blkif_request, &blkif_io_lock); if (rq == NULL) @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); + if (info->feature_discard) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); + blk_queue_max_discard_sectors(rq, get_capacity(gd)); + rq->limits.discard_granularity = info->discard_granularity; + rq->limits.discard_alignment = info->discard_alignment; + } + /* Hard sector size and max sectors impersonate the equiv. hardware. */ blk_queue_logical_block_size(rq, sector_size); blk_queue_max_hw_sectors(rq, 512); @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; switch (bret->operation) { + case BLKIF_OP_DISCARD: + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { + struct request_queue *rq = info->rq; + printk(KERN_WARNING "blkfront: %s: discard op failed\n", + info->gd->disk_name); + error = -EOPNOTSUPP; + info->feature_discard = 0; + spin_lock(rq->queue_lock); + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); + spin_unlock(rq->queue_lock); + } + __blk_end_request_all(req, error); + break; case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { @@ -1098,6 +1129,33 @@ blkfront_closing(struct blkfront_info *info) bdput(bdev); } +static void blkfront_setup_discard(struct blkfront_info *info) +{ + int err; + char *type; + unsigned int discard_granularity; + unsigned int discard_alignment; + + type = xenbus_read(XBT_NIL, info->xbdev->otherend, "type", NULL); + if (IS_ERR(type)) + return; + + if (strncmp(type, "phy", 3) == 0) { + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "discard-granularity", "%u", &discard_granularity, + "discard-alignment", "%u", &discard_alignment, + NULL); + if (!err) { + info->feature_discard = 1; + info->discard_granularity = discard_granularity; + info->discard_alignment = discard_alignment; + } + } else if (strncmp(type, "file", 4) == 0) + info->feature_discard = 1; + + kfree(type); +} + /* * Invoked when the backend is finally ''ready'' (and has told produced * the details about the physical device - #sectors, size, etc). @@ -1108,7 +1166,7 @@ static void blkfront_connect(struct blkfront_info *info) unsigned long sector_size; unsigned int binfo; int err; - int barrier, flush; + int barrier, flush, discard; switch (info->connected) { case BLKIF_STATE_CONNECTED: @@ -1178,7 +1236,14 @@ static void blkfront_connect(struct blkfront_info *info) info->feature_flush = REQ_FLUSH; info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; } - + + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "feature-discard", "%d", &discard, + NULL); + + if (!err && discard) + blkfront_setup_discard(info); + err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); if (err) { xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Sep-01 10:39 UTC
[Xen-devel] [PATCH V4 3/3] xen-blkback: discard requests handling in blkback driver
Now blkback driver can handle the discard request from guest, we will forward the request to phy device if it really has discard support, or we''ll punch a hole on the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> --- drivers/block/xen-blkback/blkback.c | 87 +++++++++++++++++++++++++++----- drivers/block/xen-blkback/common.h | 93 ++++++++++++++++++++++++++++------ drivers/block/xen-blkback/xenbus.c | 58 ++++++++++++++++++++++ 3 files changed, 207 insertions(+), 31 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 2330a9a..a3b50d0 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -39,6 +39,9 @@ #include <linux/list.h> #include <linux/delay.h> #include <linux/freezer.h> +#include <linux/loop.h> +#include <linux/falloc.h> +#include <linux/fs.h> #include <xen/events.h> #include <xen/page.h> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) static void print_stats(struct xen_blkif *blkif) { - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" + " | ds %4d\n", current->comm, blkif->st_oo_req, - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); + blkif->st_rd_req, blkif->st_wr_req, + blkif->st_f_req, blkif->st_ds_req); blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); blkif->st_rd_req = 0; blkif->st_wr_req = 0; blkif->st_oo_req = 0; + blkif->st_ds_req = 0; } int xen_blkif_schedule(void *arg) @@ -410,6 +416,43 @@ static int xen_blkbk_map(struct blkif_request *req, return ret; } +static void xen_blk_discard(struct xen_blkif *blkif, struct blkif_request *req) +{ + int err = 0; + int status = BLKIF_RSP_OKAY; + struct block_device *bdev = blkif->vbd.bdev; + + if (blkif->blk_backend_type == BLKIF_BACKEND_PHY) + /* just forward the discard request */ + err = blkdev_issue_discard(bdev, + req->u.discard.sector_number, + req->u.discard.nr_sectors, + GFP_KERNEL, 0); + else if (blkif->blk_backend_type == BLKIF_BACKEND_FILE) { + /* punch a hole in the backing file */ + struct loop_device *lo = bdev->bd_disk->private_data; + struct file *file = lo->lo_backing_file; + + if (file->f_op->fallocate) + err = file->f_op->fallocate(file, + FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + req->u.discard.sector_number << 9, + req->u.discard.nr_sectors << 9); + else + err = -EOPNOTSUPP; + } else + err = -EOPNOTSUPP; + + if (err == -EOPNOTSUPP) { + pr_debug(DRV_PFX "blkback: discard op failed, " + "not supported\n"); + status = BLKIF_RSP_EOPNOTSUPP; + } else if (err) + status = BLKIF_RSP_ERROR; + + make_response(blkif, req->id, req->operation, status); +} + /* * Completion callback on the bio''s. Called as bh->b_end_io() */ @@ -563,6 +606,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, blkif->st_f_req++; operation = WRITE_FLUSH; break; + case BLKIF_OP_DISCARD: + blkif->st_ds_req++; + operation = REQ_DISCARD; + break; case BLKIF_OP_WRITE_BARRIER: default: operation = 0; /* make gcc happy */ @@ -572,7 +619,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, /* Check that the number of segments is sane. */ nseg = req->nr_segments; - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || + if (unlikely(nseg == 0 && operation != WRITE_FLUSH && + operation != REQ_DISCARD) || unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", nseg); @@ -627,10 +675,14 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, * the hypercall to unmap the grants - that is all done in * xen_blkbk_unmap. */ - if (xen_blkbk_map(req, pending_req, seg)) + if (operation != BLKIF_OP_DISCARD && + xen_blkbk_map(req, pending_req, seg)) goto fail_flush; - /* This corresponding xen_blkif_put is done in __end_block_io_op */ + /* + * This corresponding xen_blkif_put is done in __end_block_io_op, or + * below if we are handling a BLKIF_OP_DISCARD. + */ xen_blkif_get(blkif); for (i = 0; i < nseg; i++) { @@ -654,18 +706,25 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, preq.sector_number += seg[i].nsec; } - /* This will be hit if the operation was a flush. */ + /* This will be hit if the operation was a flush or discard. */ if (!bio) { - BUG_ON(operation != WRITE_FLUSH); + BUG_ON(operation != WRITE_FLUSH && operation != REQ_DISCARD); - bio = bio_alloc(GFP_KERNEL, 0); - if (unlikely(bio == NULL)) - goto fail_put_bio; + if (operation == WRITE_FLUSH) { + bio = bio_alloc(GFP_KERNEL, 0); + if (unlikely(bio == NULL)) + goto fail_put_bio; - biolist[nbio++] = bio; - bio->bi_bdev = preq.bdev; - bio->bi_private = pending_req; - bio->bi_end_io = end_block_io_op; + biolist[nbio++] = bio; + bio->bi_bdev = preq.bdev; + bio->bi_private = pending_req; + bio->bi_end_io = end_block_io_op; + } else if (operation == REQ_DISCARD) { + xen_blk_discard(blkif, req); + xen_blkif_put(blkif); + free_req(pending_req); + return 0; + } } /* diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 9e40b28..bfb532e 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -63,13 +63,26 @@ struct blkif_common_response { /* i386 protocol version */ #pragma pack(push, 4) + +struct blkif_x86_32_request_rw { + blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; + +struct blkif_x86_32_request_discard { + blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ + uint64_t nr_sectors; +}; + struct blkif_x86_32_request { uint8_t operation; /* BLKIF_OP_??? */ uint8_t nr_segments; /* number of segments */ blkif_vdev_t handle; /* only for read/write requests */ uint64_t id; /* private guest value, echoed in resp */ - blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ - struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + union { + struct blkif_x86_32_request_rw rw; + struct blkif_x86_32_request_discard discard; + } u; }; struct blkif_x86_32_response { uint64_t id; /* copied from request */ @@ -79,13 +92,26 @@ struct blkif_x86_32_response { #pragma pack(pop) /* x86_64 protocol version */ + +struct blkif_x86_64_request_rw { + blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; + +struct blkif_x86_64_request_discard { + blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ + uint64_t nr_sectors; +}; + struct blkif_x86_64_request { uint8_t operation; /* BLKIF_OP_??? */ uint8_t nr_segments; /* number of segments */ blkif_vdev_t handle; /* only for read/write requests */ uint64_t __attribute__((__aligned__(8))) id; - blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ - struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + union { + struct blkif_x86_64_request_rw rw; + struct blkif_x86_64_request_discard discard; + } u; }; struct blkif_x86_64_response { uint64_t __attribute__((__aligned__(8))) id; @@ -113,6 +139,11 @@ enum blkif_protocol { BLKIF_PROTOCOL_X86_64 = 3, }; +enum blkif_backend_type { + BLKIF_BACKEND_PHY = 1, + BLKIF_BACKEND_FILE = 2, +}; + struct xen_vbd { /* What the domain refers to this vbd as. */ blkif_vdev_t handle; @@ -138,6 +169,7 @@ struct xen_blkif { unsigned int irq; /* Comms information. */ enum blkif_protocol blk_protocol; + enum blkif_backend_type blk_backend_type; union blkif_back_rings blk_rings; struct vm_struct *blk_ring_area; /* The VBD attached to this interface. */ @@ -159,6 +191,7 @@ struct xen_blkif { int st_wr_req; int st_oo_req; int st_f_req; + int st_ds_req; int st_rd_sect; int st_wr_sect; @@ -182,7 +215,7 @@ struct xen_blkif { struct phys_req { unsigned short dev; - unsigned short nr_sects; + blkif_sector_t nr_sects; struct block_device *bdev; blkif_sector_t sector_number; }; @@ -206,12 +239,25 @@ static inline void blkif_get_x86_32_req(struct blkif_request *dst, dst->nr_segments = src->nr_segments; dst->handle = src->handle; dst->id = src->id; - dst->u.rw.sector_number = src->sector_number; - barrier(); - if (n > dst->nr_segments) - n = dst->nr_segments; - for (i = 0; i < n; i++) - dst->u.rw.seg[i] = src->seg[i]; + switch (src->operation) { + case BLKIF_OP_READ: + case BLKIF_OP_WRITE: + case BLKIF_OP_WRITE_BARRIER: + case BLKIF_OP_FLUSH_DISKCACHE: + dst->u.rw.sector_number = src->u.rw.sector_number; + barrier(); + if (n > dst->nr_segments) + n = dst->nr_segments; + for (i = 0; i < n; i++) + dst->u.rw.seg[i] = src->u.rw.seg[i]; + break; + case BLKIF_OP_DISCARD: + dst->u.discard.sector_number = src->u.discard.sector_number; + dst->u.discard.nr_sectors = src->u.discard.nr_sectors; + break; + default: + break; + } } static inline void blkif_get_x86_64_req(struct blkif_request *dst, @@ -222,12 +268,25 @@ static inline void blkif_get_x86_64_req(struct blkif_request *dst, dst->nr_segments = src->nr_segments; dst->handle = src->handle; dst->id = src->id; - dst->u.rw.sector_number = src->sector_number; - barrier(); - if (n > dst->nr_segments) - n = dst->nr_segments; - for (i = 0; i < n; i++) - dst->u.rw.seg[i] = src->seg[i]; + switch (src->operation) { + case BLKIF_OP_READ: + case BLKIF_OP_WRITE: + case BLKIF_OP_WRITE_BARRIER: + case BLKIF_OP_FLUSH_DISKCACHE: + dst->u.rw.sector_number = src->u.rw.sector_number; + barrier(); + if (n > dst->nr_segments) + n = dst->nr_segments; + for (i = 0; i < n; i++) + dst->u.rw.seg[i] = src->u.rw.seg[i]; + break; + case BLKIF_OP_DISCARD: + dst->u.discard.sector_number = src->u.discard.sector_number; + dst->u.discard.nr_sectors = src->u.discard.nr_sectors; + break; + default: + break; + } } #endif /* __XEN_BLKIF__BACKEND__COMMON_H__ */ diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 3f129b4..2b3aef0 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -272,6 +272,7 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); +VBD_SHOW(ds_req, "%d\n", be->blkif->st_ds_req); VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); @@ -280,6 +281,7 @@ static struct attribute *xen_vbdstat_attrs[] = { &dev_attr_rd_req.attr, &dev_attr_wr_req.attr, &dev_attr_f_req.attr, + &dev_attr_ds_req.attr, &dev_attr_rd_sect.attr, &dev_attr_wr_sect.attr, NULL @@ -419,6 +421,60 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, return err; } +int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info *be) +{ + struct xenbus_device *dev = be->dev; + struct xen_blkif *blkif = be->blkif; + char *type; + int err; + int state = 0; + + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); + if (!IS_ERR(type)) { + if (strncmp(type, "file", 4) == 0) { + state = 1; + blkif->blk_backend_type = BLKIF_BACKEND_FILE; + } + if (strncmp(type, "phy", 3) == 0) { + struct block_device *bdev = be->blkif->vbd.bdev; + struct request_queue *q = bdev_get_queue(bdev); + if (blk_queue_discard(q)) { + err = xenbus_printf(xbt, dev->nodename, + "discard-granularity", "%u", + q->limits.discard_granularity); + if (err) { + xenbus_dev_fatal(dev, err, + "writing discard-granularity"); + goto kfree; + } + err = xenbus_printf(xbt, dev->nodename, + "discard-alignment", "%u", + q->limits.discard_alignment); + if (err) { + xenbus_dev_fatal(dev, err, + "writing discard-alignment"); + goto kfree; + } + state = 1; + blkif->blk_backend_type = BLKIF_BACKEND_PHY; + } + } + } else { + err = PTR_ERR(type); + xenbus_dev_fatal(dev, err, "reading type"); + goto out; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-discard", + "%d", state); + if (err) + xenbus_dev_fatal(dev, err, "writing feature-discard"); +kfree: + kfree(type); +out: + return err; +} + /* * Entry point to this code when a new device is created. Allocate the basic * structures, and watch the store waiting for the hotplug scripts to tell us @@ -650,6 +706,8 @@ again: if (err) goto abort; + err = xen_blkbk_discard(xbt, be); + err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", (unsigned long long)vbd_sz(&be->blkif->vbd)); if (err) { -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-01 15:25 UTC
[Xen-devel] Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests
On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote:> The blkfront driver now will read discard related nodes from xenstore, > and set up the request queue, then we can forward the > discard requests to backend driver.A better description is as follow: xen-blkfront: Handle discard requests. If the backend advertises ''feature-discard'', then interrogate the backend for alignment, granularity, and max discard block size. Setup the request queue with the appropiate values and send the discard operation as required.> > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > drivers/block/xen-blkfront.c | 111 +++++++++++++++++++++++++++++++++--------- > 1 files changed, 88 insertions(+), 23 deletions(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index 9ea8c25..86e2c63 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -98,6 +98,9 @@ struct blkfront_info > unsigned long shadow_free; > unsigned int feature_flush; > unsigned int flush_op; > + unsigned int feature_discard; > + unsigned int discard_granularity; > + unsigned int discard_alignment; > int is_ready; > }; > > @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) > ring_req->operation = info->flush_op; > } > > - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > + if (unlikely(req->cmd_flags & REQ_DISCARD)) { > + /* id, sector_number and handle are set above. */ > + ring_req->operation = BLKIF_OP_DISCARD; > + ring_req->nr_segments = 0; > + ring_req->u.discard.nr_sectors = blk_rq_sectors(req); > + } else { > + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > > - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > - fsect = sg->offset >> 9; > - lsect = fsect + (sg->length >> 9) - 1; > - /* install a grant reference. */ > - ref = gnttab_claim_grant_reference(&gref_head); > - BUG_ON(ref == -ENOSPC); > + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > + fsect = sg->offset >> 9; > + lsect = fsect + (sg->length >> 9) - 1; > + /* install a grant reference. */ > + ref = gnttab_claim_grant_reference(&gref_head); > + BUG_ON(ref == -ENOSPC); > > - gnttab_grant_foreign_access_ref( > - ref, > - info->xbdev->otherend_id, > - buffer_mfn, > - rq_data_dir(req) ); > - > - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > - ring_req->u.rw.seg[i] > - (struct blkif_request_segment) { > - .gref = ref, > - .first_sect = fsect, > - .last_sect = lsect }; > + gnttab_grant_foreign_access_ref( > + ref, > + info->xbdev->otherend_id, > + buffer_mfn, > + rq_data_dir(req)); > + > + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > + ring_req->u.rw.seg[i] > + (struct blkif_request_segment) { > + .gref = ref, > + .first_sect = fsect, > + .last_sect = lsect }; > + } > } > > info->ring.req_prod_pvt++; > @@ -399,6 +409,7 @@ wait: > static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > { > struct request_queue *rq; > + struct blkfront_info *info = gd->private_data; > > rq = blk_init_queue(do_blkif_request, &blkif_io_lock); > if (rq == NULL) > @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > > queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); > > + if (info->feature_discard) { > + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); > + blk_queue_max_discard_sectors(rq, get_capacity(gd));This is not correct. I took a look at the SCSI support for this (''sd_config_discard'') and if you look there carefully you will see that the value can be 64KB for example. You need to provide a mechanism similar to ''discard-*'' to fetch that data from the backend.> + rq->limits.discard_granularity = info->discard_granularity; > + rq->limits.discard_alignment = info->discard_alignment; > + } > + > /* Hard sector size and max sectors impersonate the equiv. hardware. */ > blk_queue_logical_block_size(rq, sector_size); > blk_queue_max_hw_sectors(rq, 512); > @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) > > error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; > switch (bret->operation) { > + case BLKIF_OP_DISCARD: > + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > + struct request_queue *rq = info->rq; > + printk(KERN_WARNING "blkfront: %s: discard op failed\n", > + info->gd->disk_name); > + error = -EOPNOTSUPP; > + info->feature_discard = 0; > + spin_lock(rq->queue_lock); > + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); > + spin_unlock(rq->queue_lock); > + } > + __blk_end_request_all(req, error); > + break; > case BLKIF_OP_FLUSH_DISKCACHE: > case BLKIF_OP_WRITE_BARRIER: > if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > @@ -1098,6 +1129,33 @@ blkfront_closing(struct blkfront_info *info) > bdput(bdev); > } > > +static void blkfront_setup_discard(struct blkfront_info *info) > +{ > + int err; > + char *type; > + unsigned int discard_granularity; > + unsigned int discard_alignment; > + > + type = xenbus_read(XBT_NIL, info->xbdev->otherend, "type", NULL); > + if (IS_ERR(type)) > + return; > + > + if (strncmp(type, "phy", 3) == 0) { > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "discard-granularity", "%u", &discard_granularity, > + "discard-alignment", "%u", &discard_alignment, > + NULL); > + if (!err) { > + info->feature_discard = 1; > + info->discard_granularity = discard_granularity; > + info->discard_alignment = discard_alignment; > + } > + } else if (strncmp(type, "file", 4) == 0) > + info->feature_discard = 1; > + > + kfree(type); > +} > + > /* > * Invoked when the backend is finally ''ready'' (and has told produced > * the details about the physical device - #sectors, size, etc). > @@ -1108,7 +1166,7 @@ static void blkfront_connect(struct blkfront_info *info) > unsigned long sector_size; > unsigned int binfo; > int err; > - int barrier, flush; > + int barrier, flush, discard; > > switch (info->connected) { > case BLKIF_STATE_CONNECTED: > @@ -1178,7 +1236,14 @@ static void blkfront_connect(struct blkfront_info *info) > info->feature_flush = REQ_FLUSH; > info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; > } > - > + > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "feature-discard", "%d", &discard, > + NULL); > + > + if (!err && discard) > + blkfront_setup_discard(info); > + > err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); > if (err) { > xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", > -- > 1.7.6_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-01 15:28 UTC
[Xen-devel] Re: [PATCH V4 3/3] xen-blkback: discard requests handling in blkback driver
On Thu, Sep 01, 2011 at 06:39:10PM +0800, Li Dongyang wrote:> Now blkback driver can handle the discard request from guest, we will > forward the request to phy device if it really has discard support, or we''ll > punch a hole on the image file.The description should read: xen-blkback: Implement discard requests handling If the backend device (or loopback file) supports discard requests then advertise it to the frontend via ''feature-discard''. Implementation wise - if the backend is ''phy'' - use blkdev_issue_discard, while if it is ''file'', then punch a hole in the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> .. snip..> + if (err == -EOPNOTSUPP) { > + pr_debug(DRV_PFX "blkback: discard op failed, "^^^^^^^ - get rid of that. The DRV_PFX already provides the ''xen-blkback'' string. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Sep-02 06:30 UTC
[Xen-devel] Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests
On Thu, Sep 1, 2011 at 11:25 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote: >> The blkfront driver now will read discard related nodes from xenstore, >> and set up the request queue, then we can forward the >> discard requests to backend driver. > > > A better description is as follow: > > xen-blkfront: Handle discard requests. > > If the backend advertises ''feature-discard'', then interrogate > the backend for alignment, granularity, and max discard block size. > Setup the request queue with the appropiate values and send the > discard operation as required. >thanks for the description, way better than mine :-)> >> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> drivers/block/xen-blkfront.c | 111 +++++++++++++++++++++++++++++++++--------- >> 1 files changed, 88 insertions(+), 23 deletions(-) >> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c >> index 9ea8c25..86e2c63 100644 >> --- a/drivers/block/xen-blkfront.c >> +++ b/drivers/block/xen-blkfront.c >> @@ -98,6 +98,9 @@ struct blkfront_info >> unsigned long shadow_free; >> unsigned int feature_flush; >> unsigned int flush_op; >> + unsigned int feature_discard; >> + unsigned int discard_granularity; >> + unsigned int discard_alignment; >> int is_ready; >> }; >> >> @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) >> ring_req->operation = info->flush_op; >> } >> >> - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); >> - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); >> + if (unlikely(req->cmd_flags & REQ_DISCARD)) { >> + /* id, sector_number and handle are set above. */ >> + ring_req->operation = BLKIF_OP_DISCARD; >> + ring_req->nr_segments = 0; >> + ring_req->u.discard.nr_sectors = blk_rq_sectors(req); >> + } else { >> + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); >> + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); >> >> - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { >> - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); >> - fsect = sg->offset >> 9; >> - lsect = fsect + (sg->length >> 9) - 1; >> - /* install a grant reference. */ >> - ref = gnttab_claim_grant_reference(&gref_head); >> - BUG_ON(ref == -ENOSPC); >> + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { >> + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); >> + fsect = sg->offset >> 9; >> + lsect = fsect + (sg->length >> 9) - 1; >> + /* install a grant reference. */ >> + ref = gnttab_claim_grant_reference(&gref_head); >> + BUG_ON(ref == -ENOSPC); >> >> - gnttab_grant_foreign_access_ref( >> - ref, >> - info->xbdev->otherend_id, >> - buffer_mfn, >> - rq_data_dir(req) ); >> - >> - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); >> - ring_req->u.rw.seg[i] >> - (struct blkif_request_segment) { >> - .gref = ref, >> - .first_sect = fsect, >> - .last_sect = lsect }; >> + gnttab_grant_foreign_access_ref( >> + ref, >> + info->xbdev->otherend_id, >> + buffer_mfn, >> + rq_data_dir(req)); >> + >> + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); >> + ring_req->u.rw.seg[i] >> + (struct blkif_request_segment) { >> + .gref = ref, >> + .first_sect = fsect, >> + .last_sect = lsect }; >> + } >> } >> >> info->ring.req_prod_pvt++; >> @@ -399,6 +409,7 @@ wait: >> static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) >> { >> struct request_queue *rq; >> + struct blkfront_info *info = gd->private_data; >> >> rq = blk_init_queue(do_blkif_request, &blkif_io_lock); >> if (rq == NULL) >> @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) >> >> queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); >> >> + if (info->feature_discard) { >> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); >> + blk_queue_max_discard_sectors(rq, get_capacity(gd)); > > This is not correct. I took a look at the SCSI support for this > (''sd_config_discard'') and if you look there carefully you will see that > the value can be 64KB for example.well I don''t think so, max_discard_sectors are used to split the bio, make them not to exceed the max_discard_sectors, see blkdev_issue_discard, and that''s fine if we use the capacity as max_discard_sectors in guest: for file backend, there''s no need for max_discard_sectors, and for phy backend, we''ll make the blkfront sent out a single big discard request, and when we deal with that in blkback by calling blkdev_issue_discard, the real max_discard_sectors from phy dev will be used, and take care of splitting bios, Thanks> > You need to provide a mechanism similar to ''discard-*'' to fetch that data > from the backend. > >> + rq->limits.discard_granularity = info->discard_granularity; >> + rq->limits.discard_alignment = info->discard_alignment; >> + } >> + >> /* Hard sector size and max sectors impersonate the equiv. hardware. */ >> blk_queue_logical_block_size(rq, sector_size); >> blk_queue_max_hw_sectors(rq, 512); >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) >> >> error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; >> switch (bret->operation) { >> + case BLKIF_OP_DISCARD: >> + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { >> + struct request_queue *rq = info->rq; >> + printk(KERN_WARNING "blkfront: %s: discard op failed\n", >> + info->gd->disk_name); >> + error = -EOPNOTSUPP; >> + info->feature_discard = 0; >> + spin_lock(rq->queue_lock); >> + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); >> + spin_unlock(rq->queue_lock); >> + } >> + __blk_end_request_all(req, error); >> + break; >> case BLKIF_OP_FLUSH_DISKCACHE: >> case BLKIF_OP_WRITE_BARRIER: >> if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { >> @@ -1098,6 +1129,33 @@ blkfront_closing(struct blkfront_info *info) >> bdput(bdev); >> } >> >> +static void blkfront_setup_discard(struct blkfront_info *info) >> +{ >> + int err; >> + char *type; >> + unsigned int discard_granularity; >> + unsigned int discard_alignment; >> + >> + type = xenbus_read(XBT_NIL, info->xbdev->otherend, "type", NULL); >> + if (IS_ERR(type)) >> + return; >> + >> + if (strncmp(type, "phy", 3) == 0) { >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >> + "discard-granularity", "%u", &discard_granularity, >> + "discard-alignment", "%u", &discard_alignment, >> + NULL); >> + if (!err) { >> + info->feature_discard = 1; >> + info->discard_granularity = discard_granularity; >> + info->discard_alignment = discard_alignment; >> + } >> + } else if (strncmp(type, "file", 4) == 0) >> + info->feature_discard = 1; >> + >> + kfree(type); >> +} >> + >> /* >> * Invoked when the backend is finally ''ready'' (and has told produced >> * the details about the physical device - #sectors, size, etc). >> @@ -1108,7 +1166,7 @@ static void blkfront_connect(struct blkfront_info *info) >> unsigned long sector_size; >> unsigned int binfo; >> int err; >> - int barrier, flush; >> + int barrier, flush, discard; >> >> switch (info->connected) { >> case BLKIF_STATE_CONNECTED: >> @@ -1178,7 +1236,14 @@ static void blkfront_connect(struct blkfront_info *info) >> info->feature_flush = REQ_FLUSH; >> info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; >> } >> - >> + >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >> + "feature-discard", "%d", &discard, >> + NULL); >> + >> + if (!err && discard) >> + blkfront_setup_discard(info); >> + >> err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); >> if (err) { >> xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", >> -- >> 1.7.6 >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-02 13:28 UTC
[Xen-devel] Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests
On Fri, Sep 02, 2011 at 02:30:03PM +0800, Li Dongyang wrote:> On Thu, Sep 1, 2011 at 11:25 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote: > >> The blkfront driver now will read discard related nodes from xenstore, > >> and set up the request queue, then we can forward the > >> discard requests to backend driver. > > > > > > A better description is as follow: > > > > xen-blkfront: Handle discard requests. > > > > If the backend advertises ''feature-discard'', then interrogate > > the backend for alignment, granularity, and max discard block size. > > Setup the request queue with the appropiate values and send the > > discard operation as required. > > > thanks for the description, way better than mine :-)OK. Lets use that. .. snip..> >> + if (info->feature_discard) { > >> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); > >> + blk_queue_max_discard_sectors(rq, get_capacity(gd)); > > > > This is not correct. I took a look at the SCSI support for this > > (''sd_config_discard'') and if you look there carefully you will see that > > the value can be 64KB for example. > well I don''t think so, max_discard_sectors are used to split the bio, > make them not to > exceed the max_discard_sectors, see blkdev_issue_discard, and > that''s fine if we use the capacity as max_discard_sectors > in guest: for file backend, there''s no need for max_discard_sectors, > and for phy backend, we''ll make the blkfront sent out a single big > discard request,Right, one sector for the whole disk size.> and when we deal with that in blkback by calling blkdev_issue_discard, > the real max_discard_sectors > from phy dev will be used, and take care of splitting bios, ThanksAh, you are right: if (nr_sects > max_discard_sectors) { bio->bi_size = max_discard_sectors << 9; nr_sects -= max_discard_sectors; sector += max_discard_sectors; } else { handles when that value (nr_sects == get_capacity(x)) is extremely large. OK, that was the only issue I had - and that has been resolved. So stuck your patchset series in my tree - but they are not yet visisble at kernel.org> > > > > You need to provide a mechanism similar to ''discard-*'' to fetch that data > > from the backend. > > > >> + rq->limits.discard_granularity = info->discard_granularity; > >> + rq->limits.discard_alignment = info->discard_alignment; > >> + } > >> + > >> /* Hard sector size and max sectors impersonate the equiv. hardware. */ > >> blk_queue_logical_block_size(rq, sector_size); > >> blk_queue_max_hw_sectors(rq, 512); > >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) > >> > >> error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; > >> switch (bret->operation) { > >> + case BLKIF_OP_DISCARD: > >> + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > >> + struct request_queue *rq = info->rq; > >> + printk(KERN_WARNING "blkfront: %s: discard op failed\n", > >> + info->gd->disk_name); > >> + error = -EOPNOTSUPP; > >> + info->feature_discard = 0; > >> + spin_lock(rq->queue_lock); > >> + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); > >> + spin_unlock(rq->queue_lock); > >> + } > >> + __blk_end_request_all(req, error); > >> + break; > >> case BLKIF_OP_FLUSH_DISKCACHE: > >> case BLKIF_OP_WRITE_BARRIER: > >> if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > >> @@ -1098,6 +1129,33 @@ blkfront_closing(struct blkfront_info *info) > >> bdput(bdev); > >> } > >> > >> +static void blkfront_setup_discard(struct blkfront_info *info) > >> +{ > >> + int err; > >> + char *type; > >> + unsigned int discard_granularity; > >> + unsigned int discard_alignment; > >> + > >> + type = xenbus_read(XBT_NIL, info->xbdev->otherend, "type", NULL); > >> + if (IS_ERR(type)) > >> + return; > >> + > >> + if (strncmp(type, "phy", 3) == 0) { > >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > >> + "discard-granularity", "%u", &discard_granularity, > >> + "discard-alignment", "%u", &discard_alignment, > >> + NULL); > >> + if (!err) { > >> + info->feature_discard = 1; > >> + info->discard_granularity = discard_granularity; > >> + info->discard_alignment = discard_alignment; > >> + } > >> + } else if (strncmp(type, "file", 4) == 0) > >> + info->feature_discard = 1; > >> + > >> + kfree(type); > >> +} > >> + > >> /* > >> * Invoked when the backend is finally ''ready'' (and has told produced > >> * the details about the physical device - #sectors, size, etc). > >> @@ -1108,7 +1166,7 @@ static void blkfront_connect(struct blkfront_info *info) > >> unsigned long sector_size; > >> unsigned int binfo; > >> int err; > >> - int barrier, flush; > >> + int barrier, flush, discard; > >> > >> switch (info->connected) { > >> case BLKIF_STATE_CONNECTED: > >> @@ -1178,7 +1236,14 @@ static void blkfront_connect(struct blkfront_info *info) > >> info->feature_flush = REQ_FLUSH; > >> info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; > >> } > >> - > >> + > >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > >> + "feature-discard", "%d", &discard, > >> + NULL); > >> + > >> + if (!err && discard) > >> + blkfront_setup_discard(info); > >> + > >> err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); > >> if (err) { > >> xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", > >> -- > >> 1.7.6 > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Sep-13 08:33 UTC
[Xen-devel] Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests
Hi, Konrad On Thu, Sep 1, 2011 at 11:25 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote: >> The blkfront driver now will read discard related nodes from xenstore, >> and set up the request queue, then we can forward the >> discard requests to backend driver. > > > A better description is as follow: > > xen-blkfront: Handle discard requests. > > If the backend advertises ''feature-discard'', then interrogate > the backend for alignment, granularity, and max discard block size. > Setup the request queue with the appropiate values and send the > discard operation as required. > > >> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> drivers/block/xen-blkfront.c | 111 +++++++++++++++++++++++++++++++++--------- >> 1 files changed, 88 insertions(+), 23 deletions(-) >> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c >> index 9ea8c25..86e2c63 100644 >> --- a/drivers/block/xen-blkfront.c >> +++ b/drivers/block/xen-blkfront.c >> @@ -98,6 +98,9 @@ struct blkfront_info >> unsigned long shadow_free; >> unsigned int feature_flush; >> unsigned int flush_op; >> + unsigned int feature_discard; >> + unsigned int discard_granularity; >> + unsigned int discard_alignment; >> int is_ready; >> }; >> >> @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) >> ring_req->operation = info->flush_op; >> } >> >> - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); >> - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); >> + if (unlikely(req->cmd_flags & REQ_DISCARD)) { >> + /* id, sector_number and handle are set above. */ >> + ring_req->operation = BLKIF_OP_DISCARD; >> + ring_req->nr_segments = 0; >> + ring_req->u.discard.nr_sectors = blk_rq_sectors(req); >> + } else { >> + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); >> + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); >> >> - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { >> - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); >> - fsect = sg->offset >> 9; >> - lsect = fsect + (sg->length >> 9) - 1; >> - /* install a grant reference. */ >> - ref = gnttab_claim_grant_reference(&gref_head); >> - BUG_ON(ref == -ENOSPC); >> + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { >> + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); >> + fsect = sg->offset >> 9; >> + lsect = fsect + (sg->length >> 9) - 1; >> + /* install a grant reference. */ >> + ref = gnttab_claim_grant_reference(&gref_head); >> + BUG_ON(ref == -ENOSPC); >> >> - gnttab_grant_foreign_access_ref( >> - ref, >> - info->xbdev->otherend_id, >> - buffer_mfn, >> - rq_data_dir(req) ); >> - >> - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); >> - ring_req->u.rw.seg[i] >> - (struct blkif_request_segment) { >> - .gref = ref, >> - .first_sect = fsect, >> - .last_sect = lsect }; >> + gnttab_grant_foreign_access_ref( >> + ref, >> + info->xbdev->otherend_id, >> + buffer_mfn, >> + rq_data_dir(req)); >> + >> + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); >> + ring_req->u.rw.seg[i] >> + (struct blkif_request_segment) { >> + .gref = ref, >> + .first_sect = fsect, >> + .last_sect = lsect }; >> + } >> } >> >> info->ring.req_prod_pvt++; >> @@ -399,6 +409,7 @@ wait: >> static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) >> { >> struct request_queue *rq; >> + struct blkfront_info *info = gd->private_data; >> >> rq = blk_init_queue(do_blkif_request, &blkif_io_lock); >> if (rq == NULL) >> @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) >> >> queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); >> >> + if (info->feature_discard) { >> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); >> + blk_queue_max_discard_sectors(rq, get_capacity(gd)); > > This is not correct. I took a look at the SCSI support for this > (''sd_config_discard'') and if you look there carefully you will see that > the value can be 64KB for example. > > You need to provide a mechanism similar to ''discard-*'' to fetch that data > from the backend. > >> + rq->limits.discard_granularity = info->discard_granularity; >> + rq->limits.discard_alignment = info->discard_alignment; >> + } >> + >> /* Hard sector size and max sectors impersonate the equiv. hardware. */ >> blk_queue_logical_block_size(rq, sector_size); >> blk_queue_max_hw_sectors(rq, 512); >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) >> >> error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; >> switch (bret->operation) { >> + case BLKIF_OP_DISCARD: >> + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { >> + struct request_queue *rq = info->rq; >> + printk(KERN_WARNING "blkfront: %s: discard op failed\n", >> + info->gd->disk_name); >> + error = -EOPNOTSUPP; >> + info->feature_discard = 0; >> + spin_lock(rq->queue_lock);we should not take the spin_lock here, as the rq->queue_lock is actually blkif_io_lock, we pass the blkif_io_lock to blk_init_queue when we init the queue, and blkif_io_lock has already acquired above in blkif_interrupt(), so we will deadlock here. I''m so sorry if this has already in your tree, I''ve tested with BLKIF_RSP_ERROR but forgot the EOPNOTSUPP case, I feel like I''m a stupid ass. could u get rid od the spin_lock and spin_unlock below? Thanks Li Dongyang>> + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); >> + spin_unlock(rq->queue_lock); >> + } >> + __blk_end_request_all(req, error); >> + break; >> case BLKIF_OP_FLUSH_DISKCACHE: >> case BLKIF_OP_WRITE_BARRIER: >> if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { >> @@ -1098,6 +1129,33 @@ blkfront_closing(struct blkfront_info *info) >> bdput(bdev); >> } >> >> +static void blkfront_setup_discard(struct blkfront_info *info) >> +{ >> + int err; >> + char *type; >> + unsigned int discard_granularity; >> + unsigned int discard_alignment; >> + >> + type = xenbus_read(XBT_NIL, info->xbdev->otherend, "type", NULL); >> + if (IS_ERR(type)) >> + return; >> + >> + if (strncmp(type, "phy", 3) == 0) { >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >> + "discard-granularity", "%u", &discard_granularity, >> + "discard-alignment", "%u", &discard_alignment, >> + NULL); >> + if (!err) { >> + info->feature_discard = 1; >> + info->discard_granularity = discard_granularity; >> + info->discard_alignment = discard_alignment; >> + } >> + } else if (strncmp(type, "file", 4) == 0) >> + info->feature_discard = 1; >> + >> + kfree(type); >> +} >> + >> /* >> * Invoked when the backend is finally ''ready'' (and has told produced >> * the details about the physical device - #sectors, size, etc). >> @@ -1108,7 +1166,7 @@ static void blkfront_connect(struct blkfront_info *info) >> unsigned long sector_size; >> unsigned int binfo; >> int err; >> - int barrier, flush; >> + int barrier, flush, discard; >> >> switch (info->connected) { >> case BLKIF_STATE_CONNECTED: >> @@ -1178,7 +1236,14 @@ static void blkfront_connect(struct blkfront_info *info) >> info->feature_flush = REQ_FLUSH; >> info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; >> } >> - >> + >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >> + "feature-discard", "%d", &discard, >> + NULL); >> + >> + if (!err && discard) >> + blkfront_setup_discard(info); >> + >> err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); >> if (err) { >> xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", >> -- >> 1.7.6 >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-13 13:15 UTC
[Xen-devel] Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests
On Tue, Sep 13, 2011 at 04:33:27PM +0800, Li Dongyang wrote:> Hi, Konrad > On Thu, Sep 1, 2011 at 11:25 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote: > >> The blkfront driver now will read discard related nodes from xenstore, > >> and set up the request queue, then we can forward the > >> discard requests to backend driver. > > > > > > A better description is as follow: > > > > xen-blkfront: Handle discard requests. > > > > If the backend advertises ''feature-discard'', then interrogate > > the backend for alignment, granularity, and max discard block size. > > Setup the request queue with the appropiate values and send the > > discard operation as required. > > > > > >> > >> Signed-off-by: Li Dongyang <lidongyang@novell.com> > >> --- > >> drivers/block/xen-blkfront.c | 111 +++++++++++++++++++++++++++++++++--------- > >> 1 files changed, 88 insertions(+), 23 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > >> index 9ea8c25..86e2c63 100644 > >> --- a/drivers/block/xen-blkfront.c > >> +++ b/drivers/block/xen-blkfront.c > >> @@ -98,6 +98,9 @@ struct blkfront_info > >> unsigned long shadow_free; > >> unsigned int feature_flush; > >> unsigned int flush_op; > >> + unsigned int feature_discard; > >> + unsigned int discard_granularity; > >> + unsigned int discard_alignment; > >> int is_ready; > >> }; > >> > >> @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) > >> ring_req->operation = info->flush_op; > >> } > >> > >> - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > >> - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > >> + if (unlikely(req->cmd_flags & REQ_DISCARD)) { > >> + /* id, sector_number and handle are set above. */ > >> + ring_req->operation = BLKIF_OP_DISCARD; > >> + ring_req->nr_segments = 0; > >> + ring_req->u.discard.nr_sectors = blk_rq_sectors(req); > >> + } else { > >> + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > >> + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > >> > >> - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > >> - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > >> - fsect = sg->offset >> 9; > >> - lsect = fsect + (sg->length >> 9) - 1; > >> - /* install a grant reference. */ > >> - ref = gnttab_claim_grant_reference(&gref_head); > >> - BUG_ON(ref == -ENOSPC); > >> + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > >> + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > >> + fsect = sg->offset >> 9; > >> + lsect = fsect + (sg->length >> 9) - 1; > >> + /* install a grant reference. */ > >> + ref = gnttab_claim_grant_reference(&gref_head); > >> + BUG_ON(ref == -ENOSPC); > >> > >> - gnttab_grant_foreign_access_ref( > >> - ref, > >> - info->xbdev->otherend_id, > >> - buffer_mfn, > >> - rq_data_dir(req) ); > >> - > >> - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > >> - ring_req->u.rw.seg[i] > >> - (struct blkif_request_segment) { > >> - .gref = ref, > >> - .first_sect = fsect, > >> - .last_sect = lsect }; > >> + gnttab_grant_foreign_access_ref( > >> + ref, > >> + info->xbdev->otherend_id, > >> + buffer_mfn, > >> + rq_data_dir(req)); > >> + > >> + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > >> + ring_req->u.rw.seg[i] > >> + (struct blkif_request_segment) { > >> + .gref = ref, > >> + .first_sect = fsect, > >> + .last_sect = lsect }; > >> + } > >> } > >> > >> info->ring.req_prod_pvt++; > >> @@ -399,6 +409,7 @@ wait: > >> static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > >> { > >> struct request_queue *rq; > >> + struct blkfront_info *info = gd->private_data; > >> > >> rq = blk_init_queue(do_blkif_request, &blkif_io_lock); > >> if (rq == NULL) > >> @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > >> > >> queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); > >> > >> + if (info->feature_discard) { > >> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); > >> + blk_queue_max_discard_sectors(rq, get_capacity(gd)); > > > > This is not correct. I took a look at the SCSI support for this > > (''sd_config_discard'') and if you look there carefully you will see that > > the value can be 64KB for example. > > > > You need to provide a mechanism similar to ''discard-*'' to fetch that data > > from the backend. > > > >> + rq->limits.discard_granularity = info->discard_granularity; > >> + rq->limits.discard_alignment = info->discard_alignment; > >> + } > >> + > >> /* Hard sector size and max sectors impersonate the equiv. hardware. */ > >> blk_queue_logical_block_size(rq, sector_size); > >> blk_queue_max_hw_sectors(rq, 512); > >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) > >> > >> error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; > >> switch (bret->operation) { > >> + case BLKIF_OP_DISCARD: > >> + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > >> + struct request_queue *rq = info->rq; > >> + printk(KERN_WARNING "blkfront: %s: discard op failed\n", > >> + info->gd->disk_name); > >> + error = -EOPNOTSUPP; > >> + info->feature_discard = 0; > >> + spin_lock(rq->queue_lock); > we should not take the spin_lock here, > as the rq->queue_lock is actually blkif_io_lock, we pass the > blkif_io_lock to blk_init_queue > when we init the queue, > and blkif_io_lock has already acquired above in blkif_interrupt(), so > we will deadlock here. > > I''m so sorry if this has already in your tree, I''ve tested with > BLKIF_RSP_ERROR but forgot the EOPNOTSUPP case, > I feel like I''m a stupid ass. > could u get rid od the spin_lock and spin_unlock below? ThanksJust send me a patch with the change and I will apply it to the tree. Thanks for testing it even further! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel