Hello. For own needs i''m write simple blkback disk i/o limit patch, that can limit disk i/o based on iops. I need xen based iops shaper because of own storage architecture. Our storages node provide disks via scst over infiniband network. On xen nodes we via srp attach this disks. Each xen connects to 2 storages in same time and multipath provide failover. Each disk contains LVM (not CLVM), for each virtual machine we create PV disk. And via device mapper raid1 we create disk, used for domU. In this case if one node failed VM works fine with one disk in raid1. All works greate, but in this setup we can''t use cgroups and dm-ioband. Some times ago CFQ disk scheduler top working with BIO devices and provide control only on buttom layer. (In our case we can use CFQ only on srp disk, and shape i/o only for all clients on xen node). dm-ioband work''s unstable when the some domU have massive i/o (our tests says that if domU have ext4 and have 20000 iops sometimes dom0 crashed, or disk coccupted. And with dm-ioband if one storage node down sometimes we miss some data from disk. And dm-ioband can''t provide on the fly control of iops. This patch tryes to solve own problems. May someone from xen team look at it and says how code looks? What i need to change/rewrite? May be sometime this can be used in main linux xen tree... (i hope). This patch is only for phy devices. For blktap devices i speak with Thanos Makatos (author of blktap3) and may be in future this functionality may be added to blktap3.. Thank You. -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
Sorry forget to send patch https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch Patch for kernel 3.6.9, but if that needed i can rebase it to current git Linus tree. 2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:> Hello. For own needs i''m write simple blkback disk i/o limit patch, > that can limit disk i/o based on iops. I need xen based iops shaper > because of own storage architecture. > Our storages node provide disks via scst over infiniband network. > On xen nodes we via srp attach this disks. Each xen connects to 2 > storages in same time and multipath provide failover. > > Each disk contains LVM (not CLVM), for each virtual machine we create > PV disk. And via device mapper raid1 we create disk, used for domU. In > this case if one node failed VM works fine with one disk in raid1. > > All works greate, but in this setup we can''t use cgroups and dm-ioband. > Some times ago CFQ disk scheduler top working with BIO devices and > provide control only on buttom layer. (In our case we can use CFQ only > on srp disk, and shape i/o only for all clients on xen node). > dm-ioband work''s unstable when the some domU have massive i/o (our > tests says that if domU have ext4 and have 20000 iops sometimes dom0 > crashed, or disk coccupted. And with dm-ioband if one storage node > down sometimes we miss some data from disk. And dm-ioband can''t > provide on the fly control of iops. > > This patch tryes to solve own problems. May someone from xen team look > at it and says how code looks? What i need to change/rewrite? May be > sometime this can be used in main linux xen tree... (i hope). > This patch is only for phy devices. For blktap devices i speak with > Thanos Makatos (author of blktap3) and may be in future this > functionality may be added to blktap3.. > > Thank You. > > -- > Vasiliy Tolstov, > Clodo.ru > e-mail: v.tolstov@selfip.ru > jabber: vase@selfip.ru-- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote:> Sorry forget to send patch > https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch > Patch for kernel 3.6.9, but if that needed i can rebase it to current > git Linus tree.Can you inline your patch in your email so that developer can comment on it. Wei.
Sorry, diff -NruabBEp xen_blkback_limit.orig/blkback.c xen_blkback_limit.new//blkback.c --- xen_blkback_limit.orig/blkback.c 2012-12-04 13:03:58.000000000 +0400 +++ xen_blkback_limit.new//blkback.c 2013-01-28 08:11:30.000000000 +0400 @@ -211,10 +211,18 @@ static void print_stats(blkif_t *blkif) blkif->st_pk_req = 0; } +static void refill_iops(blkif_t *blkif) +{ + blkif->reqtime = jiffies + msecs_to_jiffies(1000); + blkif->reqcount = 0; +} + int blkif_schedule(void *arg) { blkif_t *blkif = arg; struct vbd *vbd = &blkif->vbd; + int ret = 0; + struct timeval cur_time; blkif_get(blkif); @@ -237,10 +245,22 @@ int blkif_schedule(void *arg) blkif->waiting_reqs = 0; smp_mb(); /* clear flag *before* checking for work */ - if (do_block_io_op(blkif)) + ret = do_block_io_op(blkif); + if (ret) blkif->waiting_reqs = 1; unplug_queue(blkif); + if (blkif->reqrate) { + if (2 == ret && (blkif->reqtime > jiffies)) { + jiffies_to_timeval(jiffies, &cur_time); + + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(blkif->reqtime - jiffies); + } + if (time_after(jiffies, blkif->reqtime)) + refill_iops(blkif); + } + if (log_stats && time_after(jiffies, blkif->st_print)) print_stats(blkif); } @@ -394,10 +414,19 @@ static int _do_block_io_op(blkif_t *blki rp = blk_rings->common.sring->req_prod; rmb(); /* Ensure we see queued requests up to ''rp''. */ + if (blkif->reqrate && (blkif->reqcount >= blkif->reqrate)) { + return (rc != rp) ? 2 : 0; + } + while (rc != rp) { if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc)) break; + if (blkif->reqrate) { + if (blkif->reqcount >= blkif->reqrate) + return 2; + } + if (kthread_should_stop()) return 1; @@ -434,8 +463,8 @@ static int _do_block_io_op(blkif_t *blki /* Apply all sanity checks to /private copy/ of request. */ barrier(); - dispatch_rw_block_io(blkif, &req, pending_req); + blkif->reqcount++; break; case BLKIF_OP_DISCARD: blk_rings->common.req_cons = rc; @@ -452,7 +481,7 @@ static int _do_block_io_op(blkif_t *blki break; default: /* A good sign something is wrong: sleep for a while to - * avoid excessive CPU consumption by a bad guest. */ + * avoid excessive CPU consumption by a bad guest.*/ msleep(1); blk_rings->common.req_cons = rc; barrier(); @@ -501,6 +530,7 @@ static void dispatch_rw_block_io(blkif_t uint32_t flags; int ret, i; int operation; + struct timeval cur_time; switch (req->operation) { case BLKIF_OP_READ: @@ -658,6 +688,7 @@ static void dispatch_rw_block_io(blkif_t else blkif->st_wr_sect += preq.nr_sects; + jiffies_to_timeval(jiffies, &cur_time); return; fail_flush: diff -NruabBEp xen_blkback_limit.orig/common.h xen_blkback_limit.new//common.h --- xen_blkback_limit.orig/common.h 2012-12-04 13:03:58.000000000 +0400 +++ xen_blkback_limit.new//common.h 2013-01-28 08:09:35.000000000 +0400 @@ -82,6 +82,11 @@ typedef struct blkif_st { unsigned int waiting_reqs; struct request_queue *plug; + /* qos information */ + unsigned long reqtime; + int reqcount; + int reqrate; + /* statistics */ unsigned long st_print; int st_rd_req; @@ -106,6 +111,8 @@ struct backend_info unsigned major; unsigned minor; char *mode; + /* qos information */ + struct xenbus_watch reqrate_watch; }; blkif_t *blkif_alloc(domid_t domid); diff -NruabBEp xen_blkback_limit.orig/xenbus.c xen_blkback_limit.new//xenbus.c --- xen_blkback_limit.orig/xenbus.c 2012-12-04 13:03:58.000000000 +0400 +++ xen_blkback_limit.new//xenbus.c 2013-01-28 08:22:26.000000000 +0400 @@ -120,6 +120,79 @@ static void update_blkif_status(blkif_t } \ static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL) +static ssize_t +show_reqrate(struct device *_dev, struct device_attribute *attr, char *buf) +{ + ssize_t ret = -ENODEV; + struct xenbus_device *dev; + struct backend_info *be; + + if (!get_device(_dev)) + return ret; + + dev = to_xenbus_device(_dev); + be = dev_get_drvdata(&dev->dev); + + if (be != NULL) + ret = sprintf(buf, "%d\n", be->blkif->reqrate); + + put_device(_dev); + + return ret; +} + +static ssize_t +store_reqrate(struct device *_dev, struct device_attribute *attr, + const char *buf, size_t size) +{ + int value; + struct xenbus_device *dev; + struct backend_info *be; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!get_device(_dev)) + return -ENODEV; + + if (sscanf(buf, "%d", &value) != 1) + return -EINVAL; + + dev = to_xenbus_device(_dev); + be = dev_get_drvdata(&dev->dev); + + if (be != NULL) + be->blkif->reqrate = value; + + put_device(_dev); + + return size; +} +static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate, + store_reqrate); + +static ssize_t +show_reqcount(struct device *_dev, struct device_attribute *attr, char *buf) +{ + ssize_t ret = -ENODEV; + struct xenbus_device *dev; + struct backend_info *be; + + if (!get_device(_dev)) + return ret; + + dev = to_xenbus_device(_dev); + be = dev_get_drvdata(&dev->dev); + + if (be != NULL) + ret = sprintf(buf, "%d\n", be->blkif->reqcount); + + put_device(_dev); + + return ret; +} +static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL); + VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); @@ -146,6 +219,17 @@ static const struct attribute_group vbds .attrs = vbdstat_attrs, }; +static struct attribute *vbdreq_attrs[] = { + &dev_attr_reqrate.attr, + &dev_attr_reqcount.attr, + NULL +}; + +static const struct attribute_group vbdreq_group = { + .name = "qos", + .attrs = vbdreq_attrs, +}; + VBD_SHOW(physical_device, "%x:%x\n", be->major, be->minor); VBD_SHOW(mode, "%s\n", be->mode); @@ -165,8 +249,13 @@ int xenvbd_sysfs_addif(struct xenbus_dev if (error) goto fail3; + error = sysfs_create_group(&dev->dev.kobj, &vbdreq_group); + if (error) + goto fail4; + return 0; +fail4: sysfs_remove_group(&dev->dev.kobj, &vbdreq_group); fail3: sysfs_remove_group(&dev->dev.kobj, &vbdstat_group); fail2: device_remove_file(&dev->dev, &dev_attr_mode); fail1: device_remove_file(&dev->dev, &dev_attr_physical_device); @@ -175,6 +264,7 @@ fail1: device_remove_file(&dev->dev, &de void xenvbd_sysfs_delif(struct xenbus_device *dev) { + sysfs_remove_group(&dev->dev.kobj, &vbdreq_group); sysfs_remove_group(&dev->dev.kobj, &vbdstat_group); device_remove_file(&dev->dev, &dev_attr_mode); device_remove_file(&dev->dev, &dev_attr_physical_device); @@ -201,6 +291,12 @@ static int blkback_remove(struct xenbus_ be->cdrom_watch.node = NULL; } + if (be->reqrate_watch.node) { + unregister_xenbus_watch(&be->reqrate_watch); + kfree(be->reqrate_watch.node); + be->reqrate_watch.node = NULL; + } + if (be->blkif) { blkif_disconnect(be->blkif); vbd_free(&be->blkif->vbd); @@ -338,6 +434,7 @@ static void backend_changed(struct xenbu struct xenbus_device *dev = be->dev; int cdrom = 0; char *device_type; + char name[TASK_COMM_LEN]; DPRINTK(""); @@ -376,6 +473,21 @@ static void backend_changed(struct xenbu kfree(device_type); } + /* gather information about QoS policy for this device. */ + err = blkback_name(be->blkif, name); + if (err) { + xenbus_dev_error(be->dev, err, "get blkback dev name"); + return; + } + + err = xenbus_gather(XBT_NIL, dev->otherend, + "reqrate", "%d", &be->blkif->reqrate, + NULL); + if (err) + DPRINTK("%s xenbus_gather(reqrate) error", name); + + be->blkif->reqtime = jiffies; + if (be->major == 0 && be->minor == 0) { /* Front end dir is a number, which is used as the handle. */ @@ -482,6 +594,30 @@ static void frontend_changed(struct xenb /* ** Connection ** */ +static void reqrate_changed(struct xenbus_watch *watch, + const char **vec, unsigned int len) +{ + struct backend_info *be = container_of(watch, struct backend_info, + reqrate_watch); + int err; + char name[TASK_COMM_LEN]; + + err = blkback_name(be->blkif, name); + if (err) { + xenbus_dev_error(be->dev, err, "get blkback dev name"); + return; + } + + err = xenbus_gather(XBT_NIL, be->dev->otherend, + "reqrate", "%d", + &be->blkif->reqrate, NULL); + if (err) { + DPRINTK("%s xenbus_gather(reqrate) error", name); + } else { + if (be->blkif->reqrate <= 0) + be->blkif->reqrate = 0; + } +} /** * Write the physical details regarding the block device to the store, and @@ -542,6 +678,21 @@ again: xenbus_dev_fatal(dev, err, "%s: switching to Connected state", dev->nodename); + if (be->reqrate_watch.node) { + unregister_xenbus_watch(&be->reqrate_watch); + kfree(be->reqrate_watch.node); + be->reqrate_watch.node = NULL; + } + + err = xenbus_watch_path2(dev, dev->otherend, "reqrate", + &be->reqrate_watch, + reqrate_changed); + if (err) { + xenbus_dev_fatal(dev, err, "%s: watching reqrate", + dev->nodename); + goto abort; + } + return; abort: xenbus_transaction_end(xbt, 1); 2013/1/31 Wei Liu <wei.liu2@citrix.com>:> On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote: >> Sorry forget to send patch >> https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch >> Patch for kernel 3.6.9, but if that needed i can rebase it to current >> git Linus tree. > > Can you inline your patch in your email so that developer can comment on > it. > > > Wei. >-- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:> Sorry forget to send patch > https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch > Patch for kernel 3.6.9, but if that needed i can rebase it to current > git Linus tree.Patch based on work andrew.xu ( http://xen.1045712.n5.nabble.com/VM-disk-I-O-limit-patch-td4509813.html ) -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru 2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:> Sorry forget to send patch > https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch > Patch for kernel 3.6.9, but if that needed i can rebase it to current > git Linus tree. > > 2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>: >> Hello. For own needs i''m write simple blkback disk i/o limit patch, >> that can limit disk i/o based on iops. I need xen based iops shaper >> because of own storage architecture. >> Our storages node provide disks via scst over infiniband network. >> On xen nodes we via srp attach this disks. Each xen connects to 2 >> storages in same time and multipath provide failover. >> >> Each disk contains LVM (not CLVM), for each virtual machine we create >> PV disk. And via device mapper raid1 we create disk, used for domU. In >> this case if one node failed VM works fine with one disk in raid1. >> >> All works greate, but in this setup we can''t use cgroups and dm-ioband. >> Some times ago CFQ disk scheduler top working with BIO devices and >> provide control only on buttom layer. (In our case we can use CFQ only >> on srp disk, and shape i/o only for all clients on xen node). >> dm-ioband work''s unstable when the some domU have massive i/o (our >> tests says that if domU have ext4 and have 20000 iops sometimes dom0 >> crashed, or disk coccupted. And with dm-ioband if one storage node >> down sometimes we miss some data from disk. And dm-ioband can''t >> provide on the fly control of iops. >> >> This patch tryes to solve own problems. May someone from xen team look >> at it and says how code looks? What i need to change/rewrite? May be >> sometime this can be used in main linux xen tree... (i hope). >> This patch is only for phy devices. For blktap devices i speak with >> Thanos Makatos (author of blktap3) and may be in future this >> functionality may be added to blktap3.. >> >> Thank You. >> >> -- >> Vasiliy Tolstov, >> Clodo.ru >> e-mail: v.tolstov@selfip.ru >> jabber: vase@selfip.ru > > > > -- > Vasiliy Tolstov, > Clodo.ru > e-mail: v.tolstov@selfip.ru > jabber: vase@selfip.ru-- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
On Fri, Feb 01, 2013 at 10:53:46AM +0400, Vasiliy Tolstov wrote:> Sorry,Ugh, you didn''t inline it - you just copied and pasted it. Also you are missing an SoB and a description of what this patch does and why is it better than existing device mapper I/O limiting work?> > diff -NruabBEp xen_blkback_limit.orig/blkback.c xen_blkback_limit.new//blkback.c > --- xen_blkback_limit.orig/blkback.c 2012-12-04 13:03:58.000000000 +0400 > +++ xen_blkback_limit.new//blkback.c 2013-01-28 08:11:30.000000000 +0400 > @@ -211,10 +211,18 @@ static void print_stats(blkif_t *blkif) > blkif->st_pk_req = 0; > } > > +static void refill_iops(blkif_t *blkif) > +{ > + blkif->reqtime = jiffies + msecs_to_jiffies(1000); > + blkif->reqcount = 0; > +} > + > int blkif_schedule(void *arg) > { > blkif_t *blkif = arg; > struct vbd *vbd = &blkif->vbd; > + int ret = 0; > + struct timeval cur_time; > > blkif_get(blkif); > > @@ -237,10 +245,22 @@ int blkif_schedule(void *arg) > blkif->waiting_reqs = 0; > smp_mb(); /* clear flag *before* checking for work */ > > - if (do_block_io_op(blkif)) > + ret = do_block_io_op(blkif); > + if (ret) > blkif->waiting_reqs = 1; > unplug_queue(blkif); > > + if (blkif->reqrate) { > + if (2 == ret && (blkif->reqtime > jiffies)) { > + jiffies_to_timeval(jiffies, &cur_time); > + > + set_current_state(TASK_INTERRUPTIBLE); > + schedule_timeout(blkif->reqtime - jiffies); > + } > + if (time_after(jiffies, blkif->reqtime)) > + refill_iops(blkif); > + } > + > if (log_stats && time_after(jiffies, blkif->st_print)) > print_stats(blkif); > } > @@ -394,10 +414,19 @@ static int _do_block_io_op(blkif_t *blki > rp = blk_rings->common.sring->req_prod; > rmb(); /* Ensure we see queued requests up to ''rp''. */ > > + if (blkif->reqrate && (blkif->reqcount >= blkif->reqrate)) { > + return (rc != rp) ? 2 : 0; > + } > + > while (rc != rp) { > if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc)) > break; > > + if (blkif->reqrate) { > + if (blkif->reqcount >= blkif->reqrate) > + return 2; > + } > + > if (kthread_should_stop()) > return 1; > > @@ -434,8 +463,8 @@ static int _do_block_io_op(blkif_t *blki > > /* Apply all sanity checks to /private copy/ of request. */ > barrier(); > - > dispatch_rw_block_io(blkif, &req, pending_req); > + blkif->reqcount++; > break; > case BLKIF_OP_DISCARD: > blk_rings->common.req_cons = rc; > @@ -452,7 +481,7 @@ static int _do_block_io_op(blkif_t *blki > break; > default: > /* A good sign something is wrong: sleep for a while to > - * avoid excessive CPU consumption by a bad guest. */ > + * avoid excessive CPU consumption by a bad guest.*/ > msleep(1); > blk_rings->common.req_cons = rc; > barrier(); > @@ -501,6 +530,7 @@ static void dispatch_rw_block_io(blkif_t > uint32_t flags; > int ret, i; > int operation; > + struct timeval cur_time; > > switch (req->operation) { > case BLKIF_OP_READ: > @@ -658,6 +688,7 @@ static void dispatch_rw_block_io(blkif_t > else > blkif->st_wr_sect += preq.nr_sects; > > + jiffies_to_timeval(jiffies, &cur_time); > return; > > fail_flush: > diff -NruabBEp xen_blkback_limit.orig/common.h xen_blkback_limit.new//common.h > --- xen_blkback_limit.orig/common.h 2012-12-04 13:03:58.000000000 +0400 > +++ xen_blkback_limit.new//common.h 2013-01-28 08:09:35.000000000 +0400 > @@ -82,6 +82,11 @@ typedef struct blkif_st { > unsigned int waiting_reqs; > struct request_queue *plug; > > + /* qos information */ > + unsigned long reqtime; > + int reqcount; > + int reqrate; > + > /* statistics */ > unsigned long st_print; > int st_rd_req; > @@ -106,6 +111,8 @@ struct backend_info > unsigned major; > unsigned minor; > char *mode; > + /* qos information */ > + struct xenbus_watch reqrate_watch; > }; > > blkif_t *blkif_alloc(domid_t domid); > diff -NruabBEp xen_blkback_limit.orig/xenbus.c xen_blkback_limit.new//xenbus.c > --- xen_blkback_limit.orig/xenbus.c 2012-12-04 13:03:58.000000000 +0400 > +++ xen_blkback_limit.new//xenbus.c 2013-01-28 08:22:26.000000000 +0400 > @@ -120,6 +120,79 @@ static void update_blkif_status(blkif_t > } \ > static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL) > > +static ssize_t > +show_reqrate(struct device *_dev, struct device_attribute *attr, char *buf) > +{ > + ssize_t ret = -ENODEV; > + struct xenbus_device *dev; > + struct backend_info *be; > + > + if (!get_device(_dev)) > + return ret; > + > + dev = to_xenbus_device(_dev); > + be = dev_get_drvdata(&dev->dev); > + > + if (be != NULL) > + ret = sprintf(buf, "%d\n", be->blkif->reqrate); > + > + put_device(_dev); > + > + return ret; > +} > + > +static ssize_t > +store_reqrate(struct device *_dev, struct device_attribute *attr, > + const char *buf, size_t size) > +{ > + int value; > + struct xenbus_device *dev; > + struct backend_info *be; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + if (!get_device(_dev)) > + return -ENODEV; > + > + if (sscanf(buf, "%d", &value) != 1) > + return -EINVAL; > + > + dev = to_xenbus_device(_dev); > + be = dev_get_drvdata(&dev->dev); > + > + if (be != NULL) > + be->blkif->reqrate = value; > + > + put_device(_dev); > + > + return size; > +} > +static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate, > + store_reqrate); > + > +static ssize_t > +show_reqcount(struct device *_dev, struct device_attribute *attr, char *buf) > +{ > + ssize_t ret = -ENODEV; > + struct xenbus_device *dev; > + struct backend_info *be; > + > + if (!get_device(_dev)) > + return ret; > + > + dev = to_xenbus_device(_dev); > + be = dev_get_drvdata(&dev->dev); > + > + if (be != NULL) > + ret = sprintf(buf, "%d\n", be->blkif->reqcount); > + > + put_device(_dev); > + > + return ret; > +} > +static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL); > + > VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); > VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); > VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); > @@ -146,6 +219,17 @@ static const struct attribute_group vbds > .attrs = vbdstat_attrs, > }; > > +static struct attribute *vbdreq_attrs[] = { > + &dev_attr_reqrate.attr, > + &dev_attr_reqcount.attr, > + NULL > +}; > + > +static const struct attribute_group vbdreq_group = { > + .name = "qos", > + .attrs = vbdreq_attrs, > +}; > + > VBD_SHOW(physical_device, "%x:%x\n", be->major, be->minor); > VBD_SHOW(mode, "%s\n", be->mode); > > @@ -165,8 +249,13 @@ int xenvbd_sysfs_addif(struct xenbus_dev > if (error) > goto fail3; > > + error = sysfs_create_group(&dev->dev.kobj, &vbdreq_group); > + if (error) > + goto fail4; > + > return 0; > > +fail4: sysfs_remove_group(&dev->dev.kobj, &vbdreq_group); > fail3: sysfs_remove_group(&dev->dev.kobj, &vbdstat_group); > fail2: device_remove_file(&dev->dev, &dev_attr_mode); > fail1: device_remove_file(&dev->dev, &dev_attr_physical_device); > @@ -175,6 +264,7 @@ fail1: device_remove_file(&dev->dev, &de > > void xenvbd_sysfs_delif(struct xenbus_device *dev) > { > + sysfs_remove_group(&dev->dev.kobj, &vbdreq_group); > sysfs_remove_group(&dev->dev.kobj, &vbdstat_group); > device_remove_file(&dev->dev, &dev_attr_mode); > device_remove_file(&dev->dev, &dev_attr_physical_device); > @@ -201,6 +291,12 @@ static int blkback_remove(struct xenbus_ > be->cdrom_watch.node = NULL; > } > > + if (be->reqrate_watch.node) { > + unregister_xenbus_watch(&be->reqrate_watch); > + kfree(be->reqrate_watch.node); > + be->reqrate_watch.node = NULL; > + } > + > if (be->blkif) { > blkif_disconnect(be->blkif); > vbd_free(&be->blkif->vbd); > @@ -338,6 +434,7 @@ static void backend_changed(struct xenbu > struct xenbus_device *dev = be->dev; > int cdrom = 0; > char *device_type; > + char name[TASK_COMM_LEN]; > > DPRINTK(""); > > @@ -376,6 +473,21 @@ static void backend_changed(struct xenbu > kfree(device_type); > } > > + /* gather information about QoS policy for this device. */ > + err = blkback_name(be->blkif, name); > + if (err) { > + xenbus_dev_error(be->dev, err, "get blkback dev name"); > + return; > + } > + > + err = xenbus_gather(XBT_NIL, dev->otherend, > + "reqrate", "%d", &be->blkif->reqrate, > + NULL); > + if (err) > + DPRINTK("%s xenbus_gather(reqrate) error", name); > + > + be->blkif->reqtime = jiffies; > + > if (be->major == 0 && be->minor == 0) { > /* Front end dir is a number, which is used as the handle. */ > > @@ -482,6 +594,30 @@ static void frontend_changed(struct xenb > > /* ** Connection ** */ > > +static void reqrate_changed(struct xenbus_watch *watch, > + const char **vec, unsigned int len) > +{ > + struct backend_info *be = container_of(watch, struct backend_info, > + reqrate_watch); > + int err; > + char name[TASK_COMM_LEN]; > + > + err = blkback_name(be->blkif, name); > + if (err) { > + xenbus_dev_error(be->dev, err, "get blkback dev name"); > + return; > + } > + > + err = xenbus_gather(XBT_NIL, be->dev->otherend, > + "reqrate", "%d", > + &be->blkif->reqrate, NULL); > + if (err) { > + DPRINTK("%s xenbus_gather(reqrate) error", name); > + } else { > + if (be->blkif->reqrate <= 0) > + be->blkif->reqrate = 0; > + } > +} > > /** > * Write the physical details regarding the block device to the store, and > @@ -542,6 +678,21 @@ again: > xenbus_dev_fatal(dev, err, "%s: switching to Connected state", > dev->nodename); > > + if (be->reqrate_watch.node) { > + unregister_xenbus_watch(&be->reqrate_watch); > + kfree(be->reqrate_watch.node); > + be->reqrate_watch.node = NULL; > + } > + > + err = xenbus_watch_path2(dev, dev->otherend, "reqrate", > + &be->reqrate_watch, > + reqrate_changed); > + if (err) { > + xenbus_dev_fatal(dev, err, "%s: watching reqrate", > + dev->nodename); > + goto abort; > + } > + > return; > abort: > xenbus_transaction_end(xbt, 1); > > 2013/1/31 Wei Liu <wei.liu2@citrix.com>: > > On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote: > >> Sorry forget to send patch > >> https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch > >> Patch for kernel 3.6.9, but if that needed i can rebase it to current > >> git Linus tree. > > > > Can you inline your patch in your email so that developer can comment on > > it. > > > > > > Wei. > > > > > > -- > Vasiliy Tolstov, > Clodo.ru > e-mail: v.tolstov@selfip.ru > jabber: vase@selfip.ru > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Vasiliy Tolstov
2013-Feb-05 13:14 UTC
[PATCH 1/1] drivers/block/xen-blkback: Limit blkback i/o
From: Vasiliy Tolstov <vase@clodo.ru> This patch provide ability to limit i/o for each domU block device. With this patch dom0 administrator can specify maximum iops for each block device. Changes apply dinamicaly and domU does not need shutdown (in case of dm-ioband). Another good thing that dom0 may not use CFQ scheduler. Afer apply this patch we can control domU disk speed by writing needed iops maximum to specific block device. via sysfs: echo 1500 > /sys/devices/xen-backend/vbd-1-51712/qos/reqrate via xenstore: xenstore write /local/domain/1/device/vbd/51712/reqrate 1500 Current xen i/o limiting solutions have following disadvantages: 1) dm-ioband It need to create another dm layer on top of block device. Lacks of ability to change weight on the fly (needs recreate layer). Its not in kernel yet. Patches need to backport/forwardport to specific kernel version. Under our heavy load, sometimes dm-ioband layer crash dom0. If we use dm-ioband on srp->lvm->raid1 setup and srp target disconnects dm-ioband may breaks data and domU fs have many errors. 2) cgroups Very good thing. But in our setup we can''t use it. cgroups needs CFQ scheduler, but CFQ not apply to bio devices see device-mapper list http://goo.gl/YHiyI Our setup contains 2 storage nodes that export disks by srp. On each storage we have lvm (not clvm). Each domU have disk on lvm. Before start domain on xen node we construct raid1 from two lvm vg. In this case CFQ scheduler may be applied only to srp disk (/dev/sd*), but in this case we only limit all domU on this xen node in the same time. Signed-off-by: Vasiliy Tolstov <vase@clodo.ru> --- drivers/block/xen-blkback/blkback.c | 35 +++++++- drivers/block/xen-blkback/common.h | 5 ++ drivers/block/xen-blkback/xenbus.c | 152 +++++++++++++++++++++++++++++++++++ 3 files changed, 191 insertions(+), 1 deletion(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 74374fb..0672ab0 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -387,10 +387,18 @@ static void print_stats(struct xen_blkif *blkif) blkif->st_ds_req = 0; } +static void refill_iops(blkif_t *blkif) +{ + blkif->reqtime = jiffies + msecs_to_jiffies(1000); + blkif->reqcount = 0; +} + int xen_blkif_schedule(void *arg) { struct xen_blkif *blkif = arg; struct xen_vbd *vbd = &blkif->vbd; + int ret = 0; + struct timeval cur_time; xen_blkif_get(blkif); @@ -411,9 +419,20 @@ int xen_blkif_schedule(void *arg) blkif->waiting_reqs = 0; smp_mb(); /* clear flag *before* checking for work */ - if (do_block_io_op(blkif)) + ret = do_block_io_op(blkif); + if (ret) blkif->waiting_reqs = 1; + if (blkif->reqrate) { + if (2 == ret && (blkif->reqtime > jiffies)) { + jiffies_to_timeval(jiffies, &cur_time); + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(blkif->reqtime - jiffies); + } + if (time_after(jiffies, blkif->reqtime)) + refill_iops(blkif); + } + if (log_stats && time_after(jiffies, blkif->st_print)) print_stats(blkif); } @@ -760,6 +779,10 @@ __do_block_io_op(struct xen_blkif *blkif) rp = blk_rings->common.sring->req_prod; rmb(); /* Ensure we see queued requests up to ''rp''. */ + if (blkif->reqrate && (blkif->reqcount >= blkif->reqrate)) { + return (rc != rp) ? 2 : 0; + } + while (rc != rp) { if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc)) @@ -770,6 +793,13 @@ __do_block_io_op(struct xen_blkif *blkif) break; } + if (blkif->reqrate) { + if (blkif->reqcount >= blkif->reqrate) { + more_to_do = 2; + break; + } + } + pending_req = alloc_req(); if (NULL == pending_req) { blkif->st_oo_req++; @@ -792,6 +822,7 @@ __do_block_io_op(struct xen_blkif *blkif) } blk_rings->common.req_cons = ++rc; /* before make_response() */ + blkif->reqcount++; /* Apply all sanity checks to /private copy/ of request. */ barrier(); if (unlikely(req.operation == BLKIF_OP_DISCARD)) { @@ -842,6 +873,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, struct blk_plug plug; bool drain = false; struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + struct timeval cur_time; switch (req->operation) { case BLKIF_OP_READ: @@ -992,6 +1024,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, else if (operation & WRITE) blkif->st_wr_sect += preq.nr_sects; + jiffies_to_timeval(jiffies, &cur_time); return 0; fail_flush: diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 6072390..0552ce3 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -206,6 +206,11 @@ struct xen_blkif { struct rb_root persistent_gnts; unsigned int persistent_gnt_c; + /* qos information */ + unsigned long reqtime; + int reqcount; + int reqrate; + /* statistics */ unsigned long st_print; int st_rd_req; diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 6398072..f8afe76 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -25,6 +25,7 @@ struct backend_info { struct xenbus_device *dev; struct xen_blkif *blkif; struct xenbus_watch backend_watch; + struct xenbus_watch reqrate_watch; unsigned major; unsigned minor; char *mode; @@ -230,6 +231,79 @@ int __init xen_blkif_interface_init(void) } \ static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL) +static ssize_t +show_reqrate(struct device *_dev, struct device_attribute *attr, char *buf) +{ + ssize_t ret = -ENODEV; + struct xenbus_device *dev; + struct backend_info *be; + + if (!get_device(_dev)) + return ret; + + dev = to_xenbus_device(_dev); + be = dev_get_drvdata(&dev->dev); + + if (be != NULL) + ret = sprintf(buf, "%d\n", be->blkif->reqrate); + + put_device(_dev); + + return ret; +} + +static ssize_t +store_reqrate(struct device *_dev, struct device_attribute *attr, + const char *buf, size_t size) +{ + int value; + struct xenbus_device *dev; + struct backend_info *be; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!get_device(_dev)) + return -ENODEV; + + if (sscanf(buf, "%d", &value) != 1) + return -EINVAL; + + dev = to_xenbus_device(_dev); + be = dev_get_drvdata(&dev->dev); + + if (be != NULL) + be->blkif->reqrate = value; + + put_device(_dev); + + return size; +} +static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate, + store_reqrate); + +static ssize_t +show_reqcount(struct device *_dev, struct device_attribute *attr, char *buf) +{ + ssize_t ret = -ENODEV; + struct xenbus_device *dev; + struct backend_info *be; + + if (!get_device(_dev)) + return ret; + + dev = to_xenbus_device(_dev); + be = dev_get_drvdata(&dev->dev); + + if (be != NULL) + ret = sprintf(buf, "%d\n", be->blkif->reqcount); + + put_device(_dev); + + return ret; +} +static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL); + VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); @@ -254,6 +328,17 @@ static struct attribute_group xen_vbdstat_group = { .attrs = xen_vbdstat_attrs, }; +static struct attribute *vbdreq_attrs[] = { + &dev_attr_reqrate.attr, + &dev_attr_reqcount.attr, + NULL +}; + +static const struct attribute_group vbdreq_group = { + .name = "qos", + .attrs = vbdreq_attrs, +}; + VBD_SHOW(physical_device, "%x:%x\n", be->major, be->minor); VBD_SHOW(mode, "%s\n", be->mode); @@ -273,8 +358,13 @@ static int xenvbd_sysfs_addif(struct xenbus_device *dev) if (error) goto fail3; + error = sysfs_create_group(&dev->dev.kobj, &xen_vbdreq_group); + if (error) + goto fail4; + return 0; +fail4: sysfs_remove_group(&dev->dev.kobj, &xen_vbdreq_group); fail3: sysfs_remove_group(&dev->dev.kobj, &xen_vbdstat_group); fail2: device_remove_file(&dev->dev, &dev_attr_mode); fail1: device_remove_file(&dev->dev, &dev_attr_physical_device); @@ -283,6 +373,7 @@ fail1: device_remove_file(&dev->dev, &dev_attr_physical_device); static void xenvbd_sysfs_delif(struct xenbus_device *dev) { + sysfs_remove_group(&dev->dev.kobj, &xen_vbdreq_group); sysfs_remove_group(&dev->dev.kobj, &xen_vbdstat_group); device_remove_file(&dev->dev, &dev_attr_mode); device_remove_file(&dev->dev, &dev_attr_physical_device); @@ -360,6 +451,12 @@ static int xen_blkbk_remove(struct xenbus_device *dev) be->backend_watch.node = NULL; } + if (be->reqrate_watch.node) { + unregister_xenbus_watch(&be->reqrate_watch); + kfree(be->reqrate_watch.node); + be->reqrate_watch.node = NULL; + } + if (be->blkif) { xen_blkif_disconnect(be->blkif); xen_vbd_free(&be->blkif->vbd); @@ -503,6 +600,7 @@ static void backend_changed(struct xenbus_watch *watch, struct xenbus_device *dev = be->dev; int cdrom = 0; char *device_type; + char name[TASK_COMM_LEN]; DPRINTK(""); @@ -542,6 +640,21 @@ static void backend_changed(struct xenbus_watch *watch, kfree(device_type); } + /* gather information about QoS policy for this device. */ + err = blkback_name(be->blkif, name); + if (err) { + xenbus_dev_error(be->dev, err, "get blkback dev name"); + return; + } + + err = xenbus_gather(XBT_NIL, dev->otherend, + "reqrate", "%d", &be->blkif->reqrate, + NULL); + if (err) + DPRINTK("%s xenbus_gather(reqrate) error", name); + + be->blkif->reqtime = jiffies; + if (be->major == 0 && be->minor == 0) { /* Front end dir is a number, which is used as the handle. */ @@ -645,6 +758,30 @@ static void frontend_changed(struct xenbus_device *dev, /* ** Connection ** */ +static void reqrate_changed(struct xenbus_watch *watch, + const char **vec, unsigned int len) +{ + struct backend_info *be = container_of(watch, struct backend_info, + reqrate_watch); + int err; + char name[TASK_COMM_LEN]; + + err = blkback_name(be->blkif, name); + if (err) { + xenbus_dev_error(be->dev, err, "get blkback dev name"); + return; + } + + err = xenbus_gather(XBT_NIL, be->dev->otherend, + "reqrate", "%d", + &be->blkif->reqrate, NULL); + if (err) { + DPRINTK("%s xenbus_gather(reqrate) error", name); + } else { + if (be->blkif->reqrate <= 0) + be->blkif->reqrate = 0; + } +} /* * Write the physical details regarding the block device to the store, and @@ -717,6 +854,21 @@ again: xenbus_dev_fatal(dev, err, "%s: switching to Connected state", dev->nodename); + if (be->reqrate_watch.node) { + unregister_xenbus_watch(&be->reqrate_watch); + kfree(be->reqrate_watch.node); + be->reqrate_watch.node = NULL; + } + + err = xenbus_watch_path2(dev, dev->otherend, "reqrate", + &be->reqrate_watch, + reqrate_changed); + if (err) { + xenbus_dev_fatal(dev, err, "%s: watching reqrate", + dev->nodename); + goto abort; + } + return; abort: xenbus_transaction_end(xbt, 1); -- 1.7.9.5
Thanks for help and sorry for delay. I''m not fully understand how to send patch via git send-email for specific mail thread. May be i have missing message id (i cur it from last message, not first) 2013/2/1 Konrad Rzeszutek Wilk <konrad@kernel.org>:> On Fri, Feb 01, 2013 at 10:53:46AM +0400, Vasiliy Tolstov wrote: >> Sorry, > > Ugh, you didn''t inline it - you just copied and pasted it. > > Also you are missing an SoB and a description of what this patch does > and why is it better than existing device mapper I/O limiting work? > >> >> diff -NruabBEp xen_blkback_limit.orig/blkback.c xen_blkback_limit.new//blkback.c >> --- xen_blkback_limit.orig/blkback.c 2012-12-04 13:03:58.000000000 +0400 >> +++ xen_blkback_limit.new//blkback.c 2013-01-28 08:11:30.000000000 +0400 >> @@ -211,10 +211,18 @@ static void print_stats(blkif_t *blkif) >> blkif->st_pk_req = 0; >> } >> >> +static void refill_iops(blkif_t *blkif) >> +{ >> + blkif->reqtime = jiffies + msecs_to_jiffies(1000); >> + blkif->reqcount = 0; >> +} >> + >> int blkif_schedule(void *arg) >> { >> blkif_t *blkif = arg; >> struct vbd *vbd = &blkif->vbd; >> + int ret = 0; >> + struct timeval cur_time; >> >> blkif_get(blkif); >> >> @@ -237,10 +245,22 @@ int blkif_schedule(void *arg) >> blkif->waiting_reqs = 0; >> smp_mb(); /* clear flag *before* checking for work */ >> >> - if (do_block_io_op(blkif)) >> + ret = do_block_io_op(blkif); >> + if (ret) >> blkif->waiting_reqs = 1; >> unplug_queue(blkif); >> >> + if (blkif->reqrate) { >> + if (2 == ret && (blkif->reqtime > jiffies)) { >> + jiffies_to_timeval(jiffies, &cur_time); >> + >> + set_current_state(TASK_INTERRUPTIBLE); >> + schedule_timeout(blkif->reqtime - jiffies); >> + } >> + if (time_after(jiffies, blkif->reqtime)) >> + refill_iops(blkif); >> + } >> + >> if (log_stats && time_after(jiffies, blkif->st_print)) >> print_stats(blkif); >> } >> @@ -394,10 +414,19 @@ static int _do_block_io_op(blkif_t *blki >> rp = blk_rings->common.sring->req_prod; >> rmb(); /* Ensure we see queued requests up to ''rp''. */ >> >> + if (blkif->reqrate && (blkif->reqcount >= blkif->reqrate)) { >> + return (rc != rp) ? 2 : 0; >> + } >> + >> while (rc != rp) { >> if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc)) >> break; >> >> + if (blkif->reqrate) { >> + if (blkif->reqcount >= blkif->reqrate) >> + return 2; >> + } >> + >> if (kthread_should_stop()) >> return 1; >> >> @@ -434,8 +463,8 @@ static int _do_block_io_op(blkif_t *blki >> >> /* Apply all sanity checks to /private copy/ of request. */ >> barrier(); >> - >> dispatch_rw_block_io(blkif, &req, pending_req); >> + blkif->reqcount++; >> break; >> case BLKIF_OP_DISCARD: >> blk_rings->common.req_cons = rc; >> @@ -452,7 +481,7 @@ static int _do_block_io_op(blkif_t *blki >> break; >> default: >> /* A good sign something is wrong: sleep for a while to >> - * avoid excessive CPU consumption by a bad guest. */ >> + * avoid excessive CPU consumption by a bad guest.*/ >> msleep(1); >> blk_rings->common.req_cons = rc; >> barrier(); >> @@ -501,6 +530,7 @@ static void dispatch_rw_block_io(blkif_t >> uint32_t flags; >> int ret, i; >> int operation; >> + struct timeval cur_time; >> >> switch (req->operation) { >> case BLKIF_OP_READ: >> @@ -658,6 +688,7 @@ static void dispatch_rw_block_io(blkif_t >> else >> blkif->st_wr_sect += preq.nr_sects; >> >> + jiffies_to_timeval(jiffies, &cur_time); >> return; >> >> fail_flush: >> diff -NruabBEp xen_blkback_limit.orig/common.h xen_blkback_limit.new//common.h >> --- xen_blkback_limit.orig/common.h 2012-12-04 13:03:58.000000000 +0400 >> +++ xen_blkback_limit.new//common.h 2013-01-28 08:09:35.000000000 +0400 >> @@ -82,6 +82,11 @@ typedef struct blkif_st { >> unsigned int waiting_reqs; >> struct request_queue *plug; >> >> + /* qos information */ >> + unsigned long reqtime; >> + int reqcount; >> + int reqrate; >> + >> /* statistics */ >> unsigned long st_print; >> int st_rd_req; >> @@ -106,6 +111,8 @@ struct backend_info >> unsigned major; >> unsigned minor; >> char *mode; >> + /* qos information */ >> + struct xenbus_watch reqrate_watch; >> }; >> >> blkif_t *blkif_alloc(domid_t domid); >> diff -NruabBEp xen_blkback_limit.orig/xenbus.c xen_blkback_limit.new//xenbus.c >> --- xen_blkback_limit.orig/xenbus.c 2012-12-04 13:03:58.000000000 +0400 >> +++ xen_blkback_limit.new//xenbus.c 2013-01-28 08:22:26.000000000 +0400 >> @@ -120,6 +120,79 @@ static void update_blkif_status(blkif_t >> } \ >> static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL) >> >> +static ssize_t >> +show_reqrate(struct device *_dev, struct device_attribute *attr, char *buf) >> +{ >> + ssize_t ret = -ENODEV; >> + struct xenbus_device *dev; >> + struct backend_info *be; >> + >> + if (!get_device(_dev)) >> + return ret; >> + >> + dev = to_xenbus_device(_dev); >> + be = dev_get_drvdata(&dev->dev); >> + >> + if (be != NULL) >> + ret = sprintf(buf, "%d\n", be->blkif->reqrate); >> + >> + put_device(_dev); >> + >> + return ret; >> +} >> + >> +static ssize_t >> +store_reqrate(struct device *_dev, struct device_attribute *attr, >> + const char *buf, size_t size) >> +{ >> + int value; >> + struct xenbus_device *dev; >> + struct backend_info *be; >> + >> + if (!capable(CAP_SYS_ADMIN)) >> + return -EPERM; >> + >> + if (!get_device(_dev)) >> + return -ENODEV; >> + >> + if (sscanf(buf, "%d", &value) != 1) >> + return -EINVAL; >> + >> + dev = to_xenbus_device(_dev); >> + be = dev_get_drvdata(&dev->dev); >> + >> + if (be != NULL) >> + be->blkif->reqrate = value; >> + >> + put_device(_dev); >> + >> + return size; >> +} >> +static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate, >> + store_reqrate); >> + >> +static ssize_t >> +show_reqcount(struct device *_dev, struct device_attribute *attr, char *buf) >> +{ >> + ssize_t ret = -ENODEV; >> + struct xenbus_device *dev; >> + struct backend_info *be; >> + >> + if (!get_device(_dev)) >> + return ret; >> + >> + dev = to_xenbus_device(_dev); >> + be = dev_get_drvdata(&dev->dev); >> + >> + if (be != NULL) >> + ret = sprintf(buf, "%d\n", be->blkif->reqcount); >> + >> + put_device(_dev); >> + >> + return ret; >> +} >> +static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL); >> + >> VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); >> VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); >> VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); >> @@ -146,6 +219,17 @@ static const struct attribute_group vbds >> .attrs = vbdstat_attrs, >> }; >> >> +static struct attribute *vbdreq_attrs[] = { >> + &dev_attr_reqrate.attr, >> + &dev_attr_reqcount.attr, >> + NULL >> +}; >> + >> +static const struct attribute_group vbdreq_group = { >> + .name = "qos", >> + .attrs = vbdreq_attrs, >> +}; >> + >> VBD_SHOW(physical_device, "%x:%x\n", be->major, be->minor); >> VBD_SHOW(mode, "%s\n", be->mode); >> >> @@ -165,8 +249,13 @@ int xenvbd_sysfs_addif(struct xenbus_dev >> if (error) >> goto fail3; >> >> + error = sysfs_create_group(&dev->dev.kobj, &vbdreq_group); >> + if (error) >> + goto fail4; >> + >> return 0; >> >> +fail4: sysfs_remove_group(&dev->dev.kobj, &vbdreq_group); >> fail3: sysfs_remove_group(&dev->dev.kobj, &vbdstat_group); >> fail2: device_remove_file(&dev->dev, &dev_attr_mode); >> fail1: device_remove_file(&dev->dev, &dev_attr_physical_device); >> @@ -175,6 +264,7 @@ fail1: device_remove_file(&dev->dev, &de >> >> void xenvbd_sysfs_delif(struct xenbus_device *dev) >> { >> + sysfs_remove_group(&dev->dev.kobj, &vbdreq_group); >> sysfs_remove_group(&dev->dev.kobj, &vbdstat_group); >> device_remove_file(&dev->dev, &dev_attr_mode); >> device_remove_file(&dev->dev, &dev_attr_physical_device); >> @@ -201,6 +291,12 @@ static int blkback_remove(struct xenbus_ >> be->cdrom_watch.node = NULL; >> } >> >> + if (be->reqrate_watch.node) { >> + unregister_xenbus_watch(&be->reqrate_watch); >> + kfree(be->reqrate_watch.node); >> + be->reqrate_watch.node = NULL; >> + } >> + >> if (be->blkif) { >> blkif_disconnect(be->blkif); >> vbd_free(&be->blkif->vbd); >> @@ -338,6 +434,7 @@ static void backend_changed(struct xenbu >> struct xenbus_device *dev = be->dev; >> int cdrom = 0; >> char *device_type; >> + char name[TASK_COMM_LEN]; >> >> DPRINTK(""); >> >> @@ -376,6 +473,21 @@ static void backend_changed(struct xenbu >> kfree(device_type); >> } >> >> + /* gather information about QoS policy for this device. */ >> + err = blkback_name(be->blkif, name); >> + if (err) { >> + xenbus_dev_error(be->dev, err, "get blkback dev name"); >> + return; >> + } >> + >> + err = xenbus_gather(XBT_NIL, dev->otherend, >> + "reqrate", "%d", &be->blkif->reqrate, >> + NULL); >> + if (err) >> + DPRINTK("%s xenbus_gather(reqrate) error", name); >> + >> + be->blkif->reqtime = jiffies; >> + >> if (be->major == 0 && be->minor == 0) { >> /* Front end dir is a number, which is used as the handle. */ >> >> @@ -482,6 +594,30 @@ static void frontend_changed(struct xenb >> >> /* ** Connection ** */ >> >> +static void reqrate_changed(struct xenbus_watch *watch, >> + const char **vec, unsigned int len) >> +{ >> + struct backend_info *be = container_of(watch, struct backend_info, >> + reqrate_watch); >> + int err; >> + char name[TASK_COMM_LEN]; >> + >> + err = blkback_name(be->blkif, name); >> + if (err) { >> + xenbus_dev_error(be->dev, err, "get blkback dev name"); >> + return; >> + } >> + >> + err = xenbus_gather(XBT_NIL, be->dev->otherend, >> + "reqrate", "%d", >> + &be->blkif->reqrate, NULL); >> + if (err) { >> + DPRINTK("%s xenbus_gather(reqrate) error", name); >> + } else { >> + if (be->blkif->reqrate <= 0) >> + be->blkif->reqrate = 0; >> + } >> +} >> >> /** >> * Write the physical details regarding the block device to the store, and >> @@ -542,6 +678,21 @@ again: >> xenbus_dev_fatal(dev, err, "%s: switching to Connected state", >> dev->nodename); >> >> + if (be->reqrate_watch.node) { >> + unregister_xenbus_watch(&be->reqrate_watch); >> + kfree(be->reqrate_watch.node); >> + be->reqrate_watch.node = NULL; >> + } >> + >> + err = xenbus_watch_path2(dev, dev->otherend, "reqrate", >> + &be->reqrate_watch, >> + reqrate_changed); >> + if (err) { >> + xenbus_dev_fatal(dev, err, "%s: watching reqrate", >> + dev->nodename); >> + goto abort; >> + } >> + >> return; >> abort: >> xenbus_transaction_end(xbt, 1); >> >> 2013/1/31 Wei Liu <wei.liu2@citrix.com>: >> > On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote: >> >> Sorry forget to send patch >> >> https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch >> >> Patch for kernel 3.6.9, but if that needed i can rebase it to current >> >> git Linus tree. >> > >> > Can you inline your patch in your email so that developer can comment on >> > it. >> > >> > >> > Wei. >> > >> >> >> >> -- >> Vasiliy Tolstov, >> Clodo.ru >> e-mail: v.tolstov@selfip.ru >> jabber: vase@selfip.ru >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >>-- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
--On 31 January 2013 09:12:28 +0400 Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:> Hello. For own needs i''m write simple blkback disk i/o limit patch, > that can limit disk i/o based on iops. I need xen based iops shaper > because of own storage architecture.Another approach (when using the qemu-upstream DM) would presumably be to use the block_io_throttle QMP command that QEMU provides. -- Alex Bligh
2013/2/5 Alex Bligh <alex@alex.org.uk>:> Another approach (when using the qemu-upstream DM) would presumably > be to use the block_io_throttle QMP command that QEMU provides.Hmm. Does this play good with phy devices used by xen? -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
--On 5 February 2013 20:36:05 +0400 Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:> 2013/2/5 Alex Bligh <alex@alex.org.uk>: >> Another approach (when using the qemu-upstream DM) would presumably >> be to use the block_io_throttle QMP command that QEMU provides. > > Hmm. Does this play good with phy devices used by xen?Pass. -- Alex Bligh