Li Dongyang
2011-Aug-18 09:34 UTC
[Xen-devel] [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
Hi, List Here is V2 of the trim support for blkfront/blkback, it''s now rebased on Jeremy''s next-3.1 tree and has some adjustments according to the comments from Jan Beulich, Thanks _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Aug-18 09:34 UTC
[Xen-devel] [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling us the type of the backend, used in blkback to determine what to do when we see a trim request. Part of the patch is just taken from Owen Smith, Thanks Signed-off-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Li Dongyang <lidongyang@novell.com> --- include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h index 3d5d6db..b92cf23 100644 --- a/include/xen/interface/io/blkif.h +++ b/include/xen/interface/io/blkif.h @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; * "feature-flush-cache" node! */ #define BLKIF_OP_FLUSH_DISKCACHE 3 + +/* + * Recognised only if "feature-trim" is present in backend xenbus info. + * The "feature-trim" node contains a boolean indicating whether barrier + * requests are likely to succeed or fail. Either way, a trim request + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by + * the underlying block-device hardware. The boolean simply indicates whether + * or not it is worthwhile for the frontend to attempt trim requests. + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* + * create the "feature-trim" node! + */ +#define BLKIF_OP_TRIM 5 + /* * Maximum scatter/gather segments per request. * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. @@ -74,6 +87,11 @@ struct blkif_request_rw { } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; }; +struct blkif_request_trim { + blkif_sector_t sector_number; + uint64_t nr_sectors; +}; + struct blkif_request { uint8_t operation; /* BLKIF_OP_??? */ uint8_t nr_segments; /* number of segments */ @@ -81,6 +99,7 @@ struct blkif_request { uint64_t id; /* private guest value, echoed in resp */ union { struct blkif_request_rw rw; + struct blkif_request_trim trim; } u; }; @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); #define VDISK_CDROM 0x1 #define VDISK_REMOVABLE 0x2 #define VDISK_READONLY 0x4 +#define VDISK_FILE_BACKEND 0x8 +#define VDISK_PHY_BACKEND 0x10 /* Xen-defined major numbers for virtual disks, they look strangely * familiar */ -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Aug-18 09:34 UTC
[Xen-devel] [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request
The blkfront driver now will read feature-trim from xenstore, and set up the request queue with trim params, then we can forward the discard requests to backend driver. Signed-off-by: Li Dongyang <lidongyang@novell.com> --- drivers/block/xen-blkfront.c | 96 ++++++++++++++++++++++++++++++++---------- 1 files changed, 73 insertions(+), 23 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index b536a9c..c22fb07 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -98,6 +98,9 @@ struct blkfront_info unsigned long shadow_free; unsigned int feature_flush; unsigned int flush_op; + unsigned int feature_trim; + unsigned int discard_granularity; + unsigned int discard_alignment; int is_ready; }; @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) ring_req->operation = info->flush_op; } - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); + if (unlikely(req->cmd_flags & REQ_DISCARD)) { + /* id, sector_number and handle are set above. */ + ring_req->operation = BLKIF_OP_TRIM; + ring_req->nr_segments = 0; + ring_req->u.trim.nr_sectors = blk_rq_sectors(req); + } else { + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); - fsect = sg->offset >> 9; - lsect = fsect + (sg->length >> 9) - 1; - /* install a grant reference. */ - ref = gnttab_claim_grant_reference(&gref_head); - BUG_ON(ref == -ENOSPC); + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); + fsect = sg->offset >> 9; + lsect = fsect + (sg->length >> 9) - 1; + /* install a grant reference. */ + ref = gnttab_claim_grant_reference(&gref_head); + BUG_ON(ref == -ENOSPC); - gnttab_grant_foreign_access_ref( - ref, - info->xbdev->otherend_id, - buffer_mfn, - rq_data_dir(req) ); - - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); - ring_req->u.rw.seg[i] - (struct blkif_request_segment) { - .gref = ref, - .first_sect = fsect, - .last_sect = lsect }; + gnttab_grant_foreign_access_ref( + ref, + info->xbdev->otherend_id, + buffer_mfn, + rq_data_dir(req)); + + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); + ring_req->u.rw.seg[i] + (struct blkif_request_segment) { + .gref = ref, + .first_sect = fsect, + .last_sect = lsect }; + } } info->ring.req_prod_pvt++; @@ -399,6 +409,7 @@ wait: static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) { struct request_queue *rq; + struct blkfront_info *info = gd->private_data; rq = blk_init_queue(do_blkif_request, &blkif_io_lock); if (rq == NULL) @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); + if (info->feature_trim) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); + blk_queue_max_discard_sectors(rq, get_capacity(gd)); + rq->limits.discard_granularity = info->discard_granularity; + rq->limits.discard_alignment = info->discard_alignment; + } + /* Hard sector size and max sectors impersonate the equiv. hardware. */ blk_queue_logical_block_size(rq, sector_size); blk_queue_max_hw_sectors(rq, 512); @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; switch (bret->operation) { + case BLKIF_OP_TRIM: + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { + struct request_queue *rq = info->rq; + printk(KERN_WARNING "blkfront: %s: trim op failed\n", + info->gd->disk_name); + error = -EOPNOTSUPP; + info->feature_trim = 0; + spin_lock_irq(rq->queue_lock); + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); + spin_unlock_irq(rq->queue_lock); + } + __blk_end_request_all(req, error); + break; case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { @@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info *info) unsigned long sector_size; unsigned int binfo; int err; - int barrier, flush; + unsigned int discard_granularity; + unsigned int discard_alignment; + int barrier, flush, trim; switch (info->connected) { case BLKIF_STATE_CONNECTED: @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info *info) info->feature_flush = REQ_FLUSH; info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; } - + + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "feature-trim", "%d", &trim, + NULL); + + if (!err && trim) { + info->feature_trim = 1; + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "discard_granularity", "%u", &discard_granularity, + "discard_alignment", "%u", &discard_alignment, + NULL); + if (!err) { + info->discard_granularity = discard_granularity; + info->discard_alignment = discard_alignment; + } + } else + info->feature_trim = 0; + err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); if (err) { xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Aug-18 09:34 UTC
[Xen-devel] [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
Now blkback driver can handle the trim request from guest, we will forward the request to phy device if it really has trim support, or we''ll punch a hole on the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> --- drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ drivers/block/xen-blkback/common.h | 4 +- drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ 3 files changed, 135 insertions(+), 15 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 2330a9a..5acc37a 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -39,6 +39,9 @@ #include <linux/list.h> #include <linux/delay.h> #include <linux/freezer.h> +#include <linux/loop.h> +#include <linux/falloc.h> +#include <linux/fs.h> #include <xen/events.h> #include <xen/page.h> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) static void print_stats(struct xen_blkif *blkif) { - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" + " | tr %4d\n", current->comm, blkif->st_oo_req, - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); + blkif->st_rd_req, blkif->st_wr_req, + blkif->st_f_req, blkif->st_tr_req); blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); blkif->st_rd_req = 0; blkif->st_wr_req = 0; blkif->st_oo_req = 0; + blkif->st_tr_req = 0; } int xen_blkif_schedule(void *arg) @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, blkif->st_f_req++; operation = WRITE_FLUSH; break; + case BLKIF_OP_TRIM: + blkif->st_tr_req++; + operation = REQ_DISCARD; + break; case BLKIF_OP_WRITE_BARRIER: default: operation = 0; /* make gcc happy */ @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, /* Check that the number of segments is sane. */ nseg = req->nr_segments; - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", nseg); @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, * the hypercall to unmap the grants - that is all done in * xen_blkbk_unmap. */ - if (xen_blkbk_map(req, pending_req, seg)) + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) goto fail_flush; - /* This corresponding xen_blkif_put is done in __end_block_io_op */ + /* + * This corresponding xen_blkif_put is done in __end_block_io_op, or + * below if we are handling a BLKIF_OP_TRIM. + */ xen_blkif_get(blkif); for (i = 0; i < nseg; i++) { @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, preq.sector_number += seg[i].nsec; } - /* This will be hit if the operation was a flush. */ + /* This will be hit if the operation was a flush or trim. */ if (!bio) { - BUG_ON(operation != WRITE_FLUSH); + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); - bio = bio_alloc(GFP_KERNEL, 0); - if (unlikely(bio == NULL)) - goto fail_put_bio; + if (operation == WRITE_FLUSH) { + bio = bio_alloc(GFP_KERNEL, 0); + if (unlikely(bio == NULL)) + goto fail_put_bio; - biolist[nbio++] = bio; - bio->bi_bdev = preq.bdev; - bio->bi_private = pending_req; - bio->bi_end_io = end_block_io_op; + biolist[nbio++] = bio; + bio->bi_bdev = preq.bdev; + bio->bi_private = pending_req; + bio->bi_end_io = end_block_io_op; + } else if (operation == REQ_DISCARD) { + int err = 0; + int status = BLKIF_RSP_OKAY; + struct block_device *bdev = blkif->vbd.bdev; + + preq.nr_sects = req->u.trim.nr_sectors; + if (blkif->vbd.type & VDISK_PHY_BACKEND) + /* just forward the trim request */ + err = blkdev_issue_discard(bdev, + preq.sector_number, + preq.nr_sects, + GFP_KERNEL, 0); + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { + /* punch a hole in the backing file */ + struct loop_device *lo + bdev->bd_disk->private_data; + struct file *file = lo->lo_backing_file; + + if (file->f_op->fallocate) + err = file->f_op->fallocate(file, + FALLOC_FL_KEEP_SIZE | + FALLOC_FL_PUNCH_HOLE, + preq.sector_number << 9, + preq.nr_sects << 9); + else + err = -EOPNOTSUPP; + } else + status = BLKIF_RSP_EOPNOTSUPP; + + if (err == -EOPNOTSUPP) { + DPRINTK("blkback: discard op failed, " + "not supported\n"); + status = BLKIF_RSP_EOPNOTSUPP; + } else if (err) + status = BLKIF_RSP_ERROR; + + if (status == BLKIF_RSP_OKAY) + blkif->st_tr_sect += preq.nr_sects; + make_response(blkif, req->id, req->operation, status); + xen_blkif_put(blkif); + free_req(pending_req); + return 0; + } } /* diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 9e40b28..1fef727 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -159,8 +159,10 @@ struct xen_blkif { int st_wr_req; int st_oo_req; int st_f_req; + int st_tr_req; int st_rd_sect; int st_wr_sect; + int st_tr_sect; wait_queue_head_t waiting_to_free; @@ -182,7 +184,7 @@ struct xen_blkif { struct phys_req { unsigned short dev; - unsigned short nr_sects; + blkif_sector_t nr_sects; struct block_device *bdev; blkif_sector_t sector_number; }; diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 3f129b4..05ea8e0 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -272,16 +272,20 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req); VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect); static struct attribute *xen_vbdstat_attrs[] = { &dev_attr_oo_req.attr, &dev_attr_rd_req.attr, &dev_attr_wr_req.attr, &dev_attr_f_req.attr, + &dev_attr_tr_req.attr, &dev_attr_rd_sect.attr, &dev_attr_wr_sect.attr, + &dev_attr_tr_sect.attr, NULL }; @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, return err; } +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be) +{ + struct xenbus_device *dev = be->dev; + struct xen_vbd *vbd = &be->blkif->vbd; + char *type; + int err; + int state = 0; + + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); + if (!IS_ERR(type)) { + if (strcmp(type, "file") == 0) + state = 1; + vbd->type |= VDISK_FILE_BACKEND; + if (strcmp(type, "phy") == 0) { + struct block_device *bdev = be->blkif->vbd.bdev; + struct request_queue *q = bdev_get_queue(bdev); + if (blk_queue_discard(q)) { + err = xenbus_printf(xbt, dev->nodename, + "discard_granularity", "%u", + q->limits.discard_granularity); + if (err) { + xenbus_dev_fatal(dev, err, + "writing discard_granularity"); + goto kfree; + } + err = xenbus_printf(xbt, dev->nodename, + "discard_alignment", "%u", + q->limits.discard_alignment); + if (err) { + xenbus_dev_fatal(dev, err, + "writing discard_alignment"); + goto kfree; + } + state = 1; + vbd->type |= VDISK_PHY_BACKEND; + } + } + } else { + err = PTR_ERR(type); + xenbus_dev_fatal(dev, err, "reading type"); + goto out; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-trim", + "%d", state); + if (err) + xenbus_dev_fatal(dev, err, "writing feature-trim"); +kfree: + kfree(type); +out: + return err; +} + /* * Entry point to this code when a new device is created. Allocate the basic * structures, and watch the store waiting for the hotplug scripts to tell us @@ -650,6 +707,10 @@ again: if (err) goto abort; + err = xen_blkbk_trim(xbt, be); + if (err) + goto abort; + err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", (unsigned long long)vbd_sz(&be->blkif->vbd)); if (err) { -- 1.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-18 14:56 UTC
Re: [Xen-devel] [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote:> Now blkback driver can handle the trim request from guest, we will > forward the request to phy device if it really has trim support, or we''ll > punch a hole on the image file. > > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ > drivers/block/xen-blkback/common.h | 4 +- > drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ > 3 files changed, 135 insertions(+), 15 deletions(-) > > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > index 2330a9a..5acc37a 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -39,6 +39,9 @@ > #include <linux/list.h> > #include <linux/delay.h> > #include <linux/freezer.h> > +#include <linux/loop.h> > +#include <linux/falloc.h> > +#include <linux/fs.h> > > #include <xen/events.h> > #include <xen/page.h> > @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) > > static void print_stats(struct xen_blkif *blkif) > { > - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", > + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" > + " | tr %4d\n", > current->comm, blkif->st_oo_req, > - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); > + blkif->st_rd_req, blkif->st_wr_req, > + blkif->st_f_req, blkif->st_tr_req); > blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); > blkif->st_rd_req = 0; > blkif->st_wr_req = 0; > blkif->st_oo_req = 0; > + blkif->st_tr_req = 0; > } > > int xen_blkif_schedule(void *arg) > @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > blkif->st_f_req++; > operation = WRITE_FLUSH; > break; > + case BLKIF_OP_TRIM: > + blkif->st_tr_req++; > + operation = REQ_DISCARD; > + break; > case BLKIF_OP_WRITE_BARRIER: > default: > operation = 0; /* make gcc happy */ > @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > > /* Check that the number of segments is sane. */ > nseg = req->nr_segments; > - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || > + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || > unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { > pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", > nseg); > @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > * the hypercall to unmap the grants - that is all done in > * xen_blkbk_unmap. > */ > - if (xen_blkbk_map(req, pending_req, seg)) > + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) > goto fail_flush; > > - /* This corresponding xen_blkif_put is done in __end_block_io_op */ > + /* > + * This corresponding xen_blkif_put is done in __end_block_io_op, or > + * below if we are handling a BLKIF_OP_TRIM. > + */ > xen_blkif_get(blkif); > > for (i = 0; i < nseg; i++) { > @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > preq.sector_number += seg[i].nsec; > } > > - /* This will be hit if the operation was a flush. */ > + /* This will be hit if the operation was a flush or trim. */ > if (!bio) { > - BUG_ON(operation != WRITE_FLUSH); > + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); > > - bio = bio_alloc(GFP_KERNEL, 0); > - if (unlikely(bio == NULL)) > - goto fail_put_bio; > + if (operation == WRITE_FLUSH) { > + bio = bio_alloc(GFP_KERNEL, 0); > + if (unlikely(bio == NULL)) > + goto fail_put_bio; > > - biolist[nbio++] = bio; > - bio->bi_bdev = preq.bdev; > - bio->bi_private = pending_req; > - bio->bi_end_io = end_block_io_op; > + biolist[nbio++] = bio; > + bio->bi_bdev = preq.bdev; > + bio->bi_private = pending_req; > + bio->bi_end_io = end_block_io_op; > + } else if (operation == REQ_DISCARD) { > + int err = 0; > + int status = BLKIF_RSP_OKAY; > + struct block_device *bdev = blkif->vbd.bdev; > + > + preq.nr_sects = req->u.trim.nr_sectors; > + if (blkif->vbd.type & VDISK_PHY_BACKEND) > + /* just forward the trim request */ > + err = blkdev_issue_discard(bdev, > + preq.sector_number, > + preq.nr_sects, > + GFP_KERNEL, 0); > + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { > + /* punch a hole in the backing file */ > + struct loop_device *lo > + bdev->bd_disk->private_data; > + struct file *file = lo->lo_backing_file; > + > + if (file->f_op->fallocate) > + err = file->f_op->fallocate(file, > + FALLOC_FL_KEEP_SIZE | > + FALLOC_FL_PUNCH_HOLE, > + preq.sector_number << 9, > + preq.nr_sects << 9); > + else > + err = -EOPNOTSUPP; > + } else > + status = BLKIF_RSP_EOPNOTSUPP; > + > + if (err == -EOPNOTSUPP) { > + DPRINTK("blkback: discard op failed, " > + "not supported\n");Use pr_debug like the rest of the file does.> + status = BLKIF_RSP_EOPNOTSUPP; > + } else if (err) > + status = BLKIF_RSP_ERROR; > + > + if (status == BLKIF_RSP_OKAY) > + blkif->st_tr_sect += preq.nr_sects; > + make_response(blkif, req->id, req->operation, status); > + xen_blkif_put(blkif); > + free_req(pending_req); > + return 0; > + }All of this should really be moved to its own function.> } > > /* > diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h > index 9e40b28..1fef727 100644 > --- a/drivers/block/xen-blkback/common.h > +++ b/drivers/block/xen-blkback/common.h > @@ -159,8 +159,10 @@ struct xen_blkif { > int st_wr_req; > int st_oo_req; > int st_f_req; > + int st_tr_req; > int st_rd_sect; > int st_wr_sect; > + int st_tr_sect; > > wait_queue_head_t waiting_to_free; > > @@ -182,7 +184,7 @@ struct xen_blkif { > > struct phys_req { > unsigned short dev; > - unsigned short nr_sects; > + blkif_sector_t nr_sects; > struct block_device *bdev; > blkif_sector_t sector_number; > }; > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > index 3f129b4..05ea8e0 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -272,16 +272,20 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); > VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); > VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); > VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); > +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req); > VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); > VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); > +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect); > > static struct attribute *xen_vbdstat_attrs[] = { > &dev_attr_oo_req.attr, > &dev_attr_rd_req.attr, > &dev_attr_wr_req.attr, > &dev_attr_f_req.attr, > + &dev_attr_tr_req.attr, > &dev_attr_rd_sect.attr, > &dev_attr_wr_sect.attr, > + &dev_attr_tr_sect.attr, > NULL > }; > > @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, > return err; > } > > +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be) > +{ > + struct xenbus_device *dev = be->dev; > + struct xen_vbd *vbd = &be->blkif->vbd; > + char *type; > + int err; > + int state = 0; > + > + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); > + if (!IS_ERR(type)) { > + if (strcmp(type, "file") == 0) > + state = 1; > + vbd->type |= VDISK_FILE_BACKEND; > + if (strcmp(type, "phy") == 0) {Use ''strncmp'' please.> + struct block_device *bdev = be->blkif->vbd.bdev; > + struct request_queue *q = bdev_get_queue(bdev); > + if (blk_queue_discard(q)) { > + err = xenbus_printf(xbt, dev->nodename, > + "discard_granularity", "%u",Hm, most of the items written to the Xenbus use ''-'', not ''_''. Any particular reason for using ''_''?> + q->limits.discard_granularity); > + if (err) { > + xenbus_dev_fatal(dev, err, > + "writing discard_granularity"); > + goto kfree; > + } > + err = xenbus_printf(xbt, dev->nodename, > + "discard_alignment", "%u",Ditto here.> + q->limits.discard_alignment); > + if (err) { > + xenbus_dev_fatal(dev, err, > + "writing discard_alignment"); > + goto kfree; > + } > + state = 1; > + vbd->type |= VDISK_PHY_BACKEND; > + } > + } > + } else { > + err = PTR_ERR(type); > + xenbus_dev_fatal(dev, err, "reading type"); > + goto out; > + } > + > + err = xenbus_printf(xbt, dev->nodename, "feature-trim", > + "%d", state); > + if (err) > + xenbus_dev_fatal(dev, err, "writing feature-trim"); > +kfree: > + kfree(type); > +out: > + return err; > +} > + > /* > * Entry point to this code when a new device is created. Allocate the basic > * structures, and watch the store waiting for the hotplug scripts to tell us > @@ -650,6 +707,10 @@ again: > if (err) > goto abort; > > + err = xen_blkbk_trim(xbt, be); > + if (err) > + goto abort; > + > err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", > (unsigned long long)vbd_sz(&be->blkif->vbd)); > if (err) { > -- > 1.7.6 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-18 14:56 UTC
Re: [Xen-devel] [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
On Thu, Aug 18, 2011 at 05:34:28PM +0800, Li Dongyang wrote:> Hi, List > Here is V2 of the trim support for blkfront/blkback, > it''s now rebased on Jeremy''s next-3.1 tree and has some adjustments > according to the comments from Jan Beulich, ThanksGreat! Thanks for posting them!> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-18 15:05 UTC
Re: [Xen-devel] [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request
On Thu, Aug 18, 2011 at 05:34:30PM +0800, Li Dongyang wrote:> The blkfront driver now will read feature-trim from xenstore, > and set up the request queue with trim params, then we can forward the > discard requests to backend driver. > > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > drivers/block/xen-blkfront.c | 96 ++++++++++++++++++++++++++++++++---------- > 1 files changed, 73 insertions(+), 23 deletions(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index b536a9c..c22fb07 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -98,6 +98,9 @@ struct blkfront_info > unsigned long shadow_free; > unsigned int feature_flush; > unsigned int flush_op; > + unsigned int feature_trim; > + unsigned int discard_granularity; > + unsigned int discard_alignment; > int is_ready; > }; > > @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) > ring_req->operation = info->flush_op; > } > > - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > + if (unlikely(req->cmd_flags & REQ_DISCARD)) { > + /* id, sector_number and handle are set above. */ > + ring_req->operation = BLKIF_OP_TRIM; > + ring_req->nr_segments = 0; > + ring_req->u.trim.nr_sectors = blk_rq_sectors(req); > + } else { > + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > > - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > - fsect = sg->offset >> 9; > - lsect = fsect + (sg->length >> 9) - 1; > - /* install a grant reference. */ > - ref = gnttab_claim_grant_reference(&gref_head); > - BUG_ON(ref == -ENOSPC); > + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > + fsect = sg->offset >> 9; > + lsect = fsect + (sg->length >> 9) - 1; > + /* install a grant reference. */ > + ref = gnttab_claim_grant_reference(&gref_head); > + BUG_ON(ref == -ENOSPC); > > - gnttab_grant_foreign_access_ref( > - ref, > - info->xbdev->otherend_id, > - buffer_mfn, > - rq_data_dir(req) ); > - > - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > - ring_req->u.rw.seg[i] > - (struct blkif_request_segment) { > - .gref = ref, > - .first_sect = fsect, > - .last_sect = lsect }; > + gnttab_grant_foreign_access_ref( > + ref, > + info->xbdev->otherend_id, > + buffer_mfn, > + rq_data_dir(req)); > + > + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > + ring_req->u.rw.seg[i] > + (struct blkif_request_segment) { > + .gref = ref, > + .first_sect = fsect, > + .last_sect = lsect }; > + } > } > > info->ring.req_prod_pvt++; > @@ -399,6 +409,7 @@ wait: > static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > { > struct request_queue *rq; > + struct blkfront_info *info = gd->private_data; > > rq = blk_init_queue(do_blkif_request, &blkif_io_lock); > if (rq == NULL) > @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > > queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); > > + if (info->feature_trim) { > + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); > + blk_queue_max_discard_sectors(rq, get_capacity(gd)); > + rq->limits.discard_granularity = info->discard_granularity; > + rq->limits.discard_alignment = info->discard_alignment; > + } > + > /* Hard sector size and max sectors impersonate the equiv. hardware. */ > blk_queue_logical_block_size(rq, sector_size); > blk_queue_max_hw_sectors(rq, 512); > @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) > > error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; > switch (bret->operation) { > + case BLKIF_OP_TRIM: > + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > + struct request_queue *rq = info->rq; > + printk(KERN_WARNING "blkfront: %s: trim op failed\n", > + info->gd->disk_name); > + error = -EOPNOTSUPP; > + info->feature_trim = 0; > + spin_lock_irq(rq->queue_lock); > + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); > + spin_unlock_irq(rq->queue_lock); > + } > + __blk_end_request_all(req, error); > + break; > case BLKIF_OP_FLUSH_DISKCACHE: > case BLKIF_OP_WRITE_BARRIER: > if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > @@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info *info) > unsigned long sector_size; > unsigned int binfo; > int err; > - int barrier, flush; > + unsigned int discard_granularity; > + unsigned int discard_alignment; > + int barrier, flush, trim; > > switch (info->connected) { > case BLKIF_STATE_CONNECTED: > @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info *info) > info->feature_flush = REQ_FLUSH; > info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; > } > - > + > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "feature-trim", "%d", &trim, > + NULL); > + > + if (!err && trim) { > + info->feature_trim = 1; > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "discard_granularity", "%u", &discard_granularity, > + "discard_alignment", "%u", &discard_alignment,I am just not sure why the ''_'' was choosen. It seems so odds with the rest of what is put in XenBus.> + NULL); > + if (!err) { > + info->discard_granularity = discard_granularity; > + info->discard_alignment = discard_alignment; > + } > + } else > + info->feature_trim = 0; > + > err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); > if (err) { > xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", > -- > 1.7.6 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-18 15:06 UTC
Re: [Xen-devel] [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
On Thu, Aug 18, 2011 at 05:34:28PM +0800, Li Dongyang wrote:> Hi, List > Here is V2 of the trim support for blkfront/blkback, > it''s now rebased on Jeremy''s next-3.1 tree and has some adjustmentsIn the future, please base it on top of a Linus''s kernel - say 3.1-rc2.> according to the comments from Jan Beulich, Thanks > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Aug-20 00:38 UTC
Re: [Xen-devel] [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
On 08/18/2011 02:34 AM, Li Dongyang wrote:> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling > us the type of the backend, used in blkback to determine what to do when we > see a trim request. > Part of the patch is just taken from Owen Smith, Thanks > > Signed-off-by: Owen Smith <owen.smith@citrix.com> > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ > 1 files changed, 21 insertions(+), 0 deletions(-) > > diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h > index 3d5d6db..b92cf23 100644 > --- a/include/xen/interface/io/blkif.h > +++ b/include/xen/interface/io/blkif.h > @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; > * "feature-flush-cache" node! > */ > #define BLKIF_OP_FLUSH_DISKCACHE 3 > + > +/* > + * Recognised only if "feature-trim" is present in backend xenbus info. > + * The "feature-trim" node contains a boolean indicating whether barrier > + * requests are likely to succeed or fail. Either way, a trim requestBarrier requests?> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by > + * the underlying block-device hardware. The boolean simply indicates whether > + * or not it is worthwhile for the frontend to attempt trim requests. > + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* > + * create the "feature-trim" node!Is all this necessary? What happens if guests just send OP_TRIM requests, and if the host doesn''t understand them then it will fails them with EOPNOTSUPP? Is a TRIM request ever anything more than a hint to the backend that certain blocks are no longer needed?> + */ > +#define BLKIF_OP_TRIM 5 > + > /* > * Maximum scatter/gather segments per request. > * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. > @@ -74,6 +87,11 @@ struct blkif_request_rw { > } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > }; > > +struct blkif_request_trim { > + blkif_sector_t sector_number; > + uint64_t nr_sectors; > +}; > + > struct blkif_request { > uint8_t operation; /* BLKIF_OP_??? */ > uint8_t nr_segments; /* number of segments */ > @@ -81,6 +99,7 @@ struct blkif_request { > uint64_t id; /* private guest value, echoed in resp */ > union { > struct blkif_request_rw rw; > + struct blkif_request_trim trim; > } u; > }; > > @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); > #define VDISK_CDROM 0x1 > #define VDISK_REMOVABLE 0x2 > #define VDISK_READONLY 0x4 > +#define VDISK_FILE_BACKEND 0x8 > +#define VDISK_PHY_BACKEND 0x10What are these for? Why does a frontend care about these? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Aug-20 00:41 UTC
Re: [Xen-devel] [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
On 08/18/2011 02:34 AM, Li Dongyang wrote:> Hi, List > Here is V2 of the trim support for blkfront/blkback, > it''s now rebased on Jeremy''s next-3.1 tree and has some adjustmentsPlease don''t use that branch; its really just for my personal use, and I don''t intend it for general use. New patches should be based on Linus''s linux.git. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Aug-22 09:36 UTC
Re: [Xen-devel] [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
On Sat, Aug 20, 2011 at 8:38 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 08/18/2011 02:34 AM, Li Dongyang wrote: >> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling >> us the type of the backend, used in blkback to determine what to do when we >> see a trim request. >> Part of the patch is just taken from Owen Smith, Thanks >> >> Signed-off-by: Owen Smith <owen.smith@citrix.com> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ >> 1 files changed, 21 insertions(+), 0 deletions(-) >> >> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h >> index 3d5d6db..b92cf23 100644 >> --- a/include/xen/interface/io/blkif.h >> +++ b/include/xen/interface/io/blkif.h >> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; >> * "feature-flush-cache" node! >> */ >> #define BLKIF_OP_FLUSH_DISKCACHE 3 >> + >> +/* >> + * Recognised only if "feature-trim" is present in backend xenbus info. >> + * The "feature-trim" node contains a boolean indicating whether barrier >> + * requests are likely to succeed or fail. Either way, a trim request > > Barrier requests?hm, I wonder the same, seems it''s a copy & paste mistake, the BLKIF_OP_TRIM part is taken from Owen''s patch back in Jan 2011: http://lists.xensource.com/archives/html/xen-devel/2011-01/msg01059.html> >> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by >> + * the underlying block-device hardware. The boolean simply indicates whether >> + * or not it is worthwhile for the frontend to attempt trim requests. >> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* >> + * create the "feature-trim" node! > > Is all this necessary? What happens if guests just send OP_TRIM > requests, and if the host doesn''t understand them then it will fails > them with EOPNOTSUPP? Is a TRIM request ever anything more than a hint > to the backend that certain blocks are no longer needed?that won''t happen: we only mark the queue in the guest has TRIM if blkback tells blkfront via xenstore. if we don''t init the queue with TRIM in guest, if guest send OP_TRIM, it gonna fail with ENONOTSUPP in the guest''s block layer, see blkdev_issue_discard. and yes, trim is just a hint, the basic idea is forward the hint to phy dev if it has trim, or punch a hole to reduce disk usage if the backend is a file. and this comment is taken from Owen, I think he could give sth here.> >> + */ >> +#define BLKIF_OP_TRIM 5 >> + >> /* >> * Maximum scatter/gather segments per request. >> * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. >> @@ -74,6 +87,11 @@ struct blkif_request_rw { >> } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; >> }; >> >> +struct blkif_request_trim { >> + blkif_sector_t sector_number; >> + uint64_t nr_sectors; >> +}; >> + >> struct blkif_request { >> uint8_t operation; /* BLKIF_OP_??? */ >> uint8_t nr_segments; /* number of segments */ >> @@ -81,6 +99,7 @@ struct blkif_request { >> uint64_t id; /* private guest value, echoed in resp */ >> union { >> struct blkif_request_rw rw; >> + struct blkif_request_trim trim; >> } u; >> }; >> >> @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); >> #define VDISK_CDROM 0x1 >> #define VDISK_REMOVABLE 0x2 >> #define VDISK_READONLY 0x4 >> +#define VDISK_FILE_BACKEND 0x8 >> +#define VDISK_PHY_BACKEND 0x10 > > What are these for? Why does a frontend care about these?they are used for the backend driver to decide what to do when we get a OP_TRIM, if it''s phy then forward the trim, or punch a hole in the file, Jan also mentioned this is not good, gonna find another place for the flags, thanks> > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li Dongyang
2011-Aug-22 09:43 UTC
Re: [Xen-devel] [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
On Thu, Aug 18, 2011 at 10:56 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote: >> Now blkback driver can handle the trim request from guest, we will >> forward the request to phy device if it really has trim support, or we''ll >> punch a hole on the image file. >> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ >> drivers/block/xen-blkback/common.h | 4 +- >> drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ >> 3 files changed, 135 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c >> index 2330a9a..5acc37a 100644 >> --- a/drivers/block/xen-blkback/blkback.c >> +++ b/drivers/block/xen-blkback/blkback.c >> @@ -39,6 +39,9 @@ >> #include <linux/list.h> >> #include <linux/delay.h> >> #include <linux/freezer.h> >> +#include <linux/loop.h> >> +#include <linux/falloc.h> >> +#include <linux/fs.h> >> >> #include <xen/events.h> >> #include <xen/page.h> >> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) >> >> static void print_stats(struct xen_blkif *blkif) >> { >> - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", >> + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" >> + " | tr %4d\n", >> current->comm, blkif->st_oo_req, >> - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); >> + blkif->st_rd_req, blkif->st_wr_req, >> + blkif->st_f_req, blkif->st_tr_req); >> blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); >> blkif->st_rd_req = 0; >> blkif->st_wr_req = 0; >> blkif->st_oo_req = 0; >> + blkif->st_tr_req = 0; >> } >> >> int xen_blkif_schedule(void *arg) >> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> blkif->st_f_req++; >> operation = WRITE_FLUSH; >> break; >> + case BLKIF_OP_TRIM: >> + blkif->st_tr_req++; >> + operation = REQ_DISCARD; >> + break; >> case BLKIF_OP_WRITE_BARRIER: >> default: >> operation = 0; /* make gcc happy */ >> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> >> /* Check that the number of segments is sane. */ >> nseg = req->nr_segments; >> - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || >> + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || >> unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { >> pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", >> nseg); >> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> * the hypercall to unmap the grants - that is all done in >> * xen_blkbk_unmap. >> */ >> - if (xen_blkbk_map(req, pending_req, seg)) >> + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) >> goto fail_flush; >> >> - /* This corresponding xen_blkif_put is done in __end_block_io_op */ >> + /* >> + * This corresponding xen_blkif_put is done in __end_block_io_op, or >> + * below if we are handling a BLKIF_OP_TRIM. >> + */ >> xen_blkif_get(blkif); >> >> for (i = 0; i < nseg; i++) { >> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> preq.sector_number += seg[i].nsec; >> } >> >> - /* This will be hit if the operation was a flush. */ >> + /* This will be hit if the operation was a flush or trim. */ >> if (!bio) { >> - BUG_ON(operation != WRITE_FLUSH); >> + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); >> >> - bio = bio_alloc(GFP_KERNEL, 0); >> - if (unlikely(bio == NULL)) >> - goto fail_put_bio; >> + if (operation == WRITE_FLUSH) { >> + bio = bio_alloc(GFP_KERNEL, 0); >> + if (unlikely(bio == NULL)) >> + goto fail_put_bio; >> >> - biolist[nbio++] = bio; >> - bio->bi_bdev = preq.bdev; >> - bio->bi_private = pending_req; >> - bio->bi_end_io = end_block_io_op; >> + biolist[nbio++] = bio; >> + bio->bi_bdev = preq.bdev; >> + bio->bi_private = pending_req; >> + bio->bi_end_io = end_block_io_op; >> + } else if (operation == REQ_DISCARD) { >> + int err = 0; >> + int status = BLKIF_RSP_OKAY; >> + struct block_device *bdev = blkif->vbd.bdev; >> + >> + preq.nr_sects = req->u.trim.nr_sectors; >> + if (blkif->vbd.type & VDISK_PHY_BACKEND) >> + /* just forward the trim request */ >> + err = blkdev_issue_discard(bdev, >> + preq.sector_number, >> + preq.nr_sects, >> + GFP_KERNEL, 0); >> + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { >> + /* punch a hole in the backing file */ >> + struct loop_device *lo >> + bdev->bd_disk->private_data; >> + struct file *file = lo->lo_backing_file; >> + >> + if (file->f_op->fallocate) >> + err = file->f_op->fallocate(file, >> + FALLOC_FL_KEEP_SIZE | >> + FALLOC_FL_PUNCH_HOLE, >> + preq.sector_number << 9, >> + preq.nr_sects << 9); >> + else >> + err = -EOPNOTSUPP; >> + } else >> + status = BLKIF_RSP_EOPNOTSUPP; >> + >> + if (err == -EOPNOTSUPP) { >> + DPRINTK("blkback: discard op failed, " >> + "not supported\n"); > > Use pr_debug like the rest of the file does.gonna fix> >> + status = BLKIF_RSP_EOPNOTSUPP; >> + } else if (err) >> + status = BLKIF_RSP_ERROR; >> + >> + if (status == BLKIF_RSP_OKAY) >> + blkif->st_tr_sect += preq.nr_sects; >> + make_response(blkif, req->id, req->operation, status); >> + xen_blkif_put(blkif); >> + free_req(pending_req); >> + return 0; >> + } > > All of this should really be moved to its own function.not quite clear about this, do you mean we should make sth like dispatch_trim_block_io only for OP_TRIM? I added the trim handling stuff to dispatch_rw_block_io because it also handles flush stuff.> >> } >> >> /* >> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h >> index 9e40b28..1fef727 100644 >> --- a/drivers/block/xen-blkback/common.h >> +++ b/drivers/block/xen-blkback/common.h >> @@ -159,8 +159,10 @@ struct xen_blkif { >> int st_wr_req; >> int st_oo_req; >> int st_f_req; >> + int st_tr_req; >> int st_rd_sect; >> int st_wr_sect; >> + int st_tr_sect; >> >> wait_queue_head_t waiting_to_free; >> >> @@ -182,7 +184,7 @@ struct xen_blkif { >> >> struct phys_req { >> unsigned short dev; >> - unsigned short nr_sects; >> + blkif_sector_t nr_sects; >> struct block_device *bdev; >> blkif_sector_t sector_number; >> }; >> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c >> index 3f129b4..05ea8e0 100644 >> --- a/drivers/block/xen-blkback/xenbus.c >> +++ b/drivers/block/xen-blkback/xenbus.c >> @@ -272,16 +272,20 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); >> VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); >> VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); >> VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); >> +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req); >> VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); >> VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); >> +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect); >> >> static struct attribute *xen_vbdstat_attrs[] = { >> &dev_attr_oo_req.attr, >> &dev_attr_rd_req.attr, >> &dev_attr_wr_req.attr, >> &dev_attr_f_req.attr, >> + &dev_attr_tr_req.attr, >> &dev_attr_rd_sect.attr, >> &dev_attr_wr_sect.attr, >> + &dev_attr_tr_sect.attr, >> NULL >> }; >> >> @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, >> return err; >> } >> >> +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be) >> +{ >> + struct xenbus_device *dev = be->dev; >> + struct xen_vbd *vbd = &be->blkif->vbd; >> + char *type; >> + int err; >> + int state = 0; >> + >> + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); >> + if (!IS_ERR(type)) { >> + if (strcmp(type, "file") == 0) >> + state = 1; >> + vbd->type |= VDISK_FILE_BACKEND; >> + if (strcmp(type, "phy") == 0) { > > Use ''strncmp'' please.gonna fix> >> + struct block_device *bdev = be->blkif->vbd.bdev; >> + struct request_queue *q = bdev_get_queue(bdev); >> + if (blk_queue_discard(q)) { >> + err = xenbus_printf(xbt, dev->nodename, >> + "discard_granularity", "%u", > > Hm, most of the items written to the Xenbus use ''-'', not ''_''. > Any particular reason for using ''_''?they are taken from struct queue_limits so I used the underscore, no problem to change them to ''-''> >> + q->limits.discard_granularity); >> + if (err) { >> + xenbus_dev_fatal(dev, err, >> + "writing discard_granularity"); >> + goto kfree; >> + } >> + err = xenbus_printf(xbt, dev->nodename, >> + "discard_alignment", "%u", > > Ditto here. >> + q->limits.discard_alignment); >> + if (err) { >> + xenbus_dev_fatal(dev, err, >> + "writing discard_alignment"); >> + goto kfree; >> + } >> + state = 1; >> + vbd->type |= VDISK_PHY_BACKEND; >> + } >> + } >> + } else { >> + err = PTR_ERR(type); >> + xenbus_dev_fatal(dev, err, "reading type"); >> + goto out; >> + } >> + >> + err = xenbus_printf(xbt, dev->nodename, "feature-trim", >> + "%d", state); >> + if (err) >> + xenbus_dev_fatal(dev, err, "writing feature-trim"); >> +kfree: >> + kfree(type); >> +out: >> + return err; >> +} >> + >> /* >> * Entry point to this code when a new device is created. Allocate the basic >> * structures, and watch the store waiting for the hotplug scripts to tell us >> @@ -650,6 +707,10 @@ again: >> if (err) >> goto abort; >> >> + err = xen_blkbk_trim(xbt, be); >> + if (err) >> + goto abort; >> + >> err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", >> (unsigned long long)vbd_sz(&be->blkif->vbd)); >> if (err) { >> -- >> 1.7.6 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-22 13:27 UTC
Re: [Xen-devel] [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
On Mon, Aug 22, 2011 at 05:43:28PM +0800, Li Dongyang wrote:> On Thu, Aug 18, 2011 at 10:56 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote: > >> Now blkback driver can handle the trim request from guest, we will > >> forward the request to phy device if it really has trim support, or we''ll > >> punch a hole on the image file. > >> > >> Signed-off-by: Li Dongyang <lidongyang@novell.com> > >> --- > >> drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ > >> drivers/block/xen-blkback/common.h | 4 +- > >> drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ > >> 3 files changed, 135 insertions(+), 15 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > >> index 2330a9a..5acc37a 100644 > >> --- a/drivers/block/xen-blkback/blkback.c > >> +++ b/drivers/block/xen-blkback/blkback.c > >> @@ -39,6 +39,9 @@ > >> #include <linux/list.h> > >> #include <linux/delay.h> > >> #include <linux/freezer.h> > >> +#include <linux/loop.h> > >> +#include <linux/falloc.h> > >> +#include <linux/fs.h> > >> > >> #include <xen/events.h> > >> #include <xen/page.h> > >> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) > >> > >> static void print_stats(struct xen_blkif *blkif) > >> { > >> - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", > >> + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" > >> + " | tr %4d\n", > >> current->comm, blkif->st_oo_req, > >> - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); > >> + blkif->st_rd_req, blkif->st_wr_req, > >> + blkif->st_f_req, blkif->st_tr_req); > >> blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); > >> blkif->st_rd_req = 0; > >> blkif->st_wr_req = 0; > >> blkif->st_oo_req = 0; > >> + blkif->st_tr_req = 0; > >> } > >> > >> int xen_blkif_schedule(void *arg) > >> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> blkif->st_f_req++; > >> operation = WRITE_FLUSH; > >> break; > >> + case BLKIF_OP_TRIM: > >> + blkif->st_tr_req++; > >> + operation = REQ_DISCARD; > >> + break; > >> case BLKIF_OP_WRITE_BARRIER: > >> default: > >> operation = 0; /* make gcc happy */ > >> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> > >> /* Check that the number of segments is sane. */ > >> nseg = req->nr_segments; > >> - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || > >> + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || > >> unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { > >> pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", > >> nseg); > >> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> * the hypercall to unmap the grants - that is all done in > >> * xen_blkbk_unmap. > >> */ > >> - if (xen_blkbk_map(req, pending_req, seg)) > >> + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) > >> goto fail_flush; > >> > >> - /* This corresponding xen_blkif_put is done in __end_block_io_op */ > >> + /* > >> + * This corresponding xen_blkif_put is done in __end_block_io_op, or > >> + * below if we are handling a BLKIF_OP_TRIM. > >> + */ > >> xen_blkif_get(blkif); > >> > >> for (i = 0; i < nseg; i++) { > >> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> preq.sector_number += seg[i].nsec; > >> } > >> > >> - /* This will be hit if the operation was a flush. */ > >> + /* This will be hit if the operation was a flush or trim. */ > >> if (!bio) { > >> - BUG_ON(operation != WRITE_FLUSH); > >> + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); > >> > >> - bio = bio_alloc(GFP_KERNEL, 0); > >> - if (unlikely(bio == NULL)) > >> - goto fail_put_bio; > >> + if (operation == WRITE_FLUSH) { > >> + bio = bio_alloc(GFP_KERNEL, 0); > >> + if (unlikely(bio == NULL)) > >> + goto fail_put_bio; > >> > >> - biolist[nbio++] = bio; > >> - bio->bi_bdev = preq.bdev; > >> - bio->bi_private = pending_req; > >> - bio->bi_end_io = end_block_io_op; > >> + biolist[nbio++] = bio; > >> + bio->bi_bdev = preq.bdev; > >> + bio->bi_private = pending_req; > >> + bio->bi_end_io = end_block_io_op; > >> + } else if (operation == REQ_DISCARD) { > >> + int err = 0; > >> + int status = BLKIF_RSP_OKAY; > >> + struct block_device *bdev = blkif->vbd.bdev; > >> + > >> + preq.nr_sects = req->u.trim.nr_sectors; > >> + if (blkif->vbd.type & VDISK_PHY_BACKEND) > >> + /* just forward the trim request */ > >> + err = blkdev_issue_discard(bdev, > >> + preq.sector_number, > >> + preq.nr_sects, > >> + GFP_KERNEL, 0); > >> + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { > >> + /* punch a hole in the backing file */ > >> + struct loop_device *lo > >> + bdev->bd_disk->private_data; > >> + struct file *file = lo->lo_backing_file; > >> + > >> + if (file->f_op->fallocate) > >> + err = file->f_op->fallocate(file, > >> + FALLOC_FL_KEEP_SIZE | > >> + FALLOC_FL_PUNCH_HOLE, > >> + preq.sector_number << 9, > >> + preq.nr_sects << 9); > >> + else > >> + err = -EOPNOTSUPP; > >> + } else > >> + status = BLKIF_RSP_EOPNOTSUPP; > >> + > >> + if (err == -EOPNOTSUPP) { > >> + DPRINTK("blkback: discard op failed, " > >> + "not supported\n"); > > > > Use pr_debug like the rest of the file does. > gonna fix > > > >> + status = BLKIF_RSP_EOPNOTSUPP; > >> + } else if (err) > >> + status = BLKIF_RSP_ERROR; > >> + > >> + if (status == BLKIF_RSP_OKAY) > >> + blkif->st_tr_sect += preq.nr_sects; > >> + make_response(blkif, req->id, req->operation, status); > >> + xen_blkif_put(blkif); > >> + free_req(pending_req); > >> + return 0; > >> + } > > > > All of this should really be moved to its own function. > not quite clear about this, do you mean we should make sth like > dispatch_trim_block_io only for OP_TRIM? > I added the trim handling stuff to dispatch_rw_block_io because it > also handles flush stuff.That function is getting quite busy - it has a lot of code. My thought was that you could seperate it out. Like if (!bio) { if (operation == WRITE_FLUSH) { bio = bio_alloc(..) .. snip (the same as your patch has it. } else if (operation == REQ_DISCARD) { xen_blk_trim(blkif, preq); return 0; } And do all of the operation in xen_blk_trim. Including the make_response .. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Aug-22 17:44 UTC
Re: [Xen-devel] [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
On 08/22/2011 02:36 AM, Li Dongyang wrote:> On Sat, Aug 20, 2011 at 8:38 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> On 08/18/2011 02:34 AM, Li Dongyang wrote: >>> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling >>> us the type of the backend, used in blkback to determine what to do when we >>> see a trim request. >>> Part of the patch is just taken from Owen Smith, Thanks >>> >>> Signed-off-by: Owen Smith <owen.smith@citrix.com> >>> Signed-off-by: Li Dongyang <lidongyang@novell.com> >>> --- >>> include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ >>> 1 files changed, 21 insertions(+), 0 deletions(-) >>> >>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h >>> index 3d5d6db..b92cf23 100644 >>> --- a/include/xen/interface/io/blkif.h >>> +++ b/include/xen/interface/io/blkif.h >>> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; >>> * "feature-flush-cache" node! >>> */ >>> #define BLKIF_OP_FLUSH_DISKCACHE 3 >>> + >>> +/* >>> + * Recognised only if "feature-trim" is present in backend xenbus info. >>> + * The "feature-trim" node contains a boolean indicating whether barrier >>> + * requests are likely to succeed or fail. Either way, a trim request >> Barrier requests? > hm, I wonder the same, seems it''s a copy & paste mistake, > the BLKIF_OP_TRIM part is taken from Owen''s patch back in Jan 2011: > http://lists.xensource.com/archives/html/xen-devel/2011-01/msg01059.html >>> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by >>> + * the underlying block-device hardware. The boolean simply indicates whether >>> + * or not it is worthwhile for the frontend to attempt trim requests. >>> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* >>> + * create the "feature-trim" node! >> Is all this necessary? What happens if guests just send OP_TRIM >> requests, and if the host doesn''t understand them then it will fails >> them with EOPNOTSUPP? Is a TRIM request ever anything more than a hint >> to the backend that certain blocks are no longer needed? > that won''t happen: we only mark the queue in the guest has TRIM if > blkback tells blkfront > via xenstore. if we don''t init the queue with TRIM in guest, if guest > send OP_TRIM, > it gonna fail with ENONOTSUPP in the guest''s block layer, see > blkdev_issue_discard. > and yes, trim is just a hint, the basic idea is forward the hint to > phy dev if it has trim, or > punch a hole to reduce disk usage if the backend is a file. > and this comment is taken from Owen, I think he could give sth here.Right. So if this is just a hint, and there''s no correctness implications for the frontend if the backend doesn''t support trim, then the frontend should just start trying to use trim and then suppress them if they come back as failed. I guess if the frontend needs other information from the backend (about trim chunk size?) then this is moot, but this kind of feature negotiation via xenbus seems pretty flaky and it would be nice to avoid repeating the mistake. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel