thr3ads.net - Xen devel - blkback disk I/O limit patch [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Vasiliy Tolstov

2013-Jan-31 05:12 UTC

blkback disk I/O limit patch

Hello. For own needs i''m write simple blkback disk i/o limit patch,
that can limit disk i/o based on iops. I need xen based iops shaper
because of own storage architecture.
Our storages node provide disks via scst over infiniband network.
On xen nodes we via srp attach this disks. Each xen connects to 2
storages in same time and multipath provide failover.

Each disk contains LVM (not CLVM), for each virtual machine we create
PV disk. And via device mapper raid1 we create disk, used for domU. In
this case if one node failed VM works fine with one disk in raid1.

All works greate, but in this setup we can''t use cgroups and dm-ioband.
Some times ago CFQ disk scheduler top working with BIO devices and
provide control only on buttom layer. (In our case we can use CFQ only
on srp disk, and shape i/o only for all clients on xen node).
dm-ioband work''s unstable when the some domU have massive i/o (our
tests says that if domU have ext4 and have 20000 iops sometimes dom0
crashed, or disk coccupted. And with dm-ioband if one storage node
down sometimes we miss some data from disk. And dm-ioband can''t
provide on the fly control of iops.

This patch tryes to solve own problems. May someone from xen team look
at it and says how code looks? What i need to change/rewrite? May be
sometime this can be used in main linux xen tree... (i hope).
This patch is only for phy devices. For blktap devices i speak with
Thanos Makatos (author of blktap3) and may be in future this
functionality may be added to blktap3..

Thank You.

--
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Vasiliy Tolstov

2013-Jan-31 05:14 UTC

head link

Re: blkback disk I/O limit patch

Sorry forget to send patch
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
Patch for kernel 3.6.9, but if that needed i can rebase it to current
git Linus tree.

2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:> Hello. For own needs i''m write simple blkback disk i/o limit
patch,
> that can limit disk i/o based on iops. I need xen based iops shaper
> because of own storage architecture.
> Our storages node provide disks via scst over infiniband network.
> On xen nodes we via srp attach this disks. Each xen connects to 2
> storages in same time and multipath provide failover.
>
> Each disk contains LVM (not CLVM), for each virtual machine we create
> PV disk. And via device mapper raid1 we create disk, used for domU. In
> this case if one node failed VM works fine with one disk in raid1.
>
> All works greate, but in this setup we can''t use cgroups and
dm-ioband.
> Some times ago CFQ disk scheduler top working with BIO devices and
> provide control only on buttom layer. (In our case we can use CFQ only
> on srp disk, and shape i/o only for all clients on xen node).
> dm-ioband work''s unstable when the some domU have massive i/o (our
> tests says that if domU have ext4 and have 20000 iops sometimes dom0
> crashed, or disk coccupted. And with dm-ioband if one storage node
> down sometimes we miss some data from disk. And dm-ioband can''t
> provide on the fly control of iops.
>
> This patch tryes to solve own problems. May someone from xen team look
> at it and says how code looks? What i need to change/rewrite? May be
> sometime this can be used in main linux xen tree... (i hope).
> This patch is only for phy devices. For blktap devices i speak with
> Thanos Makatos (author of blktap3) and may be in future this
> functionality may be added to blktap3..
>
> Thank You.
>
> --
> Vasiliy Tolstov,
> Clodo.ru
> e-mail: v.tolstov@selfip.ru
> jabber: vase@selfip.ru


-- 
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Wei Liu

2013-Jan-31 18:05 UTC

head link

Re: blkback disk I/O limit patch

On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov
wrote:> Sorry forget to send patch
>
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
> Patch for kernel 3.6.9, but if that needed i can rebase it to current
> git Linus tree.
Can you inline your patch in your email so that developer can comment on
it.


Wei.

Vasiliy Tolstov

2013-Feb-01 06:53 UTC

head link

Re: blkback disk I/O limit patch

Sorry,

diff -NruabBEp xen_blkback_limit.orig/blkback.c xen_blkback_limit.new//blkback.c
--- xen_blkback_limit.orig/blkback.c 2012-12-04 13:03:58.000000000 +0400
+++ xen_blkback_limit.new//blkback.c 2013-01-28 08:11:30.000000000 +0400
@@ -211,10 +211,18 @@ static void print_stats(blkif_t *blkif)
  blkif->st_pk_req = 0;
 }

+static void refill_iops(blkif_t *blkif)
+{
+ blkif->reqtime = jiffies + msecs_to_jiffies(1000);
+ blkif->reqcount = 0;
+}
+
 int blkif_schedule(void *arg)
 {
  blkif_t *blkif = arg;
  struct vbd *vbd = &blkif->vbd;
+ int     ret = 0;
+ struct timeval cur_time;

  blkif_get(blkif);

@@ -237,10 +245,22 @@ int blkif_schedule(void *arg)
  blkif->waiting_reqs = 0;
  smp_mb(); /* clear flag *before* checking for work */

- if (do_block_io_op(blkif))
+ ret = do_block_io_op(blkif);
+ if (ret)
  blkif->waiting_reqs = 1;
  unplug_queue(blkif);

+ if (blkif->reqrate) {
+ if (2 == ret && (blkif->reqtime > jiffies)) {
+ jiffies_to_timeval(jiffies, &cur_time);
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(blkif->reqtime - jiffies);
+ }
+ if (time_after(jiffies, blkif->reqtime))
+ refill_iops(blkif);
+ }
+
  if (log_stats && time_after(jiffies, blkif->st_print))
  print_stats(blkif);
  }
@@ -394,10 +414,19 @@ static int _do_block_io_op(blkif_t *blki
  rp = blk_rings->common.sring->req_prod;
  rmb(); /* Ensure we see queued requests up to ''rp''. */

+ if (blkif->reqrate && (blkif->reqcount >= blkif->reqrate))
{
+ return (rc != rp) ? 2 : 0;
+ }
+
  while (rc != rp) {
  if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc))
  break;

+ if (blkif->reqrate) {
+ if (blkif->reqcount >= blkif->reqrate)
+ return 2;
+ }
+
  if (kthread_should_stop())
  return 1;

@@ -434,8 +463,8 @@ static int _do_block_io_op(blkif_t *blki

  /* Apply all sanity checks to /private copy/ of request. */
  barrier();
-
  dispatch_rw_block_io(blkif, &req, pending_req);
+ blkif->reqcount++;
  break;
  case BLKIF_OP_DISCARD:
  blk_rings->common.req_cons = rc;
@@ -452,7 +481,7 @@ static int _do_block_io_op(blkif_t *blki
  break;
  default:
  /* A good sign something is wrong: sleep for a while to
- * avoid excessive CPU consumption by a bad guest. */
+ * avoid excessive CPU consumption by a bad guest.*/
  msleep(1);
  blk_rings->common.req_cons = rc;
  barrier();
@@ -501,6 +530,7 @@ static void dispatch_rw_block_io(blkif_t
  uint32_t flags;
  int ret, i;
  int operation;
+ struct timeval cur_time;

  switch (req->operation) {
  case BLKIF_OP_READ:
@@ -658,6 +688,7 @@ static void dispatch_rw_block_io(blkif_t
  else
  blkif->st_wr_sect += preq.nr_sects;

+ jiffies_to_timeval(jiffies, &cur_time);
  return;

  fail_flush:
diff -NruabBEp xen_blkback_limit.orig/common.h xen_blkback_limit.new//common.h
--- xen_blkback_limit.orig/common.h 2012-12-04 13:03:58.000000000 +0400
+++ xen_blkback_limit.new//common.h 2013-01-28 08:09:35.000000000 +0400
@@ -82,6 +82,11 @@ typedef struct blkif_st {
  unsigned int        waiting_reqs;
  struct request_queue *plug;

+ /* qos information */
+ unsigned long   reqtime;
+ int    reqcount;
+ int    reqrate;
+
  /* statistics */
  unsigned long       st_print;
  int                 st_rd_req;
@@ -106,6 +111,8 @@ struct backend_info
  unsigned major;
  unsigned minor;
  char *mode;
+ /* qos information */
+ struct xenbus_watch reqrate_watch;
 };

 blkif_t *blkif_alloc(domid_t domid);
diff -NruabBEp xen_blkback_limit.orig/xenbus.c xen_blkback_limit.new//xenbus.c
--- xen_blkback_limit.orig/xenbus.c 2012-12-04 13:03:58.000000000 +0400
+++ xen_blkback_limit.new//xenbus.c 2013-01-28 08:22:26.000000000 +0400
@@ -120,6 +120,79 @@ static void update_blkif_status(blkif_t
  } \
  static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)

+static ssize_t
+show_reqrate(struct device *_dev, struct device_attribute *attr, char *buf)
+{
+ ssize_t ret = -ENODEV;
+ struct xenbus_device *dev;
+ struct backend_info *be;
+
+ if (!get_device(_dev))
+ return ret;
+
+ dev = to_xenbus_device(_dev);
+ be = dev_get_drvdata(&dev->dev);
+
+ if (be != NULL)
+ ret = sprintf(buf, "%d\n", be->blkif->reqrate);
+
+ put_device(_dev);
+
+ return ret;
+}
+
+static ssize_t
+store_reqrate(struct device *_dev, struct device_attribute *attr,
+ const char *buf, size_t size)
+{
+ int value;
+ struct xenbus_device *dev;
+ struct backend_info *be;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (!get_device(_dev))
+ return -ENODEV;
+
+ if (sscanf(buf, "%d", &value) != 1)
+ return -EINVAL;
+
+ dev = to_xenbus_device(_dev);
+ be = dev_get_drvdata(&dev->dev);
+
+ if (be != NULL)
+ be->blkif->reqrate = value;
+
+ put_device(_dev);
+
+ return size;
+}
+static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate,
+ store_reqrate);
+
+static ssize_t
+show_reqcount(struct device *_dev, struct device_attribute *attr, char *buf)
+{
+ ssize_t ret = -ENODEV;
+ struct xenbus_device *dev;
+ struct backend_info *be;
+
+ if (!get_device(_dev))
+ return ret;
+
+ dev = to_xenbus_device(_dev);
+ be = dev_get_drvdata(&dev->dev);
+
+ if (be != NULL)
+ ret = sprintf(buf, "%d\n", be->blkif->reqcount);
+
+ put_device(_dev);
+
+ return ret;
+}
+static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL);
+
 VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
 VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
 VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
@@ -146,6 +219,17 @@ static const struct attribute_group vbds
  .attrs = vbdstat_attrs,
 };

+static struct attribute *vbdreq_attrs[] = {
+ &dev_attr_reqrate.attr,
+ &dev_attr_reqcount.attr,
+ NULL
+};
+
+static const struct attribute_group vbdreq_group = {
+ .name = "qos",
+ .attrs = vbdreq_attrs,
+};
+
 VBD_SHOW(physical_device, "%x:%x\n", be->major, be->minor);
 VBD_SHOW(mode, "%s\n", be->mode);

@@ -165,8 +249,13 @@ int xenvbd_sysfs_addif(struct xenbus_dev
  if (error)
  goto fail3;

+ error = sysfs_create_group(&dev->dev.kobj, &vbdreq_group);
+ if (error)
+ goto fail4;
+
  return 0;

+fail4:  sysfs_remove_group(&dev->dev.kobj, &vbdreq_group);
 fail3: sysfs_remove_group(&dev->dev.kobj, &vbdstat_group);
 fail2: device_remove_file(&dev->dev, &dev_attr_mode);
 fail1: device_remove_file(&dev->dev, &dev_attr_physical_device);
@@ -175,6 +264,7 @@ fail1: device_remove_file(&dev->dev, &de

 void xenvbd_sysfs_delif(struct xenbus_device *dev)
 {
+ sysfs_remove_group(&dev->dev.kobj, &vbdreq_group);
  sysfs_remove_group(&dev->dev.kobj, &vbdstat_group);
  device_remove_file(&dev->dev, &dev_attr_mode);
  device_remove_file(&dev->dev, &dev_attr_physical_device);
@@ -201,6 +291,12 @@ static int blkback_remove(struct xenbus_
  be->cdrom_watch.node = NULL;
  }

+ if (be->reqrate_watch.node) {
+ unregister_xenbus_watch(&be->reqrate_watch);
+ kfree(be->reqrate_watch.node);
+ be->reqrate_watch.node = NULL;
+ }
+
  if (be->blkif) {
  blkif_disconnect(be->blkif);
  vbd_free(&be->blkif->vbd);
@@ -338,6 +434,7 @@ static void backend_changed(struct xenbu
  struct xenbus_device *dev = be->dev;
  int cdrom = 0;
  char *device_type;
+ char name[TASK_COMM_LEN];

  DPRINTK("");

@@ -376,6 +473,21 @@ static void backend_changed(struct xenbu
  kfree(device_type);
  }

+ /* gather information about QoS policy for this device. */
+ err = blkback_name(be->blkif, name);
+ if (err) {
+ xenbus_dev_error(be->dev, err, "get blkback dev name");
+ return;
+ }
+
+ err = xenbus_gather(XBT_NIL, dev->otherend,
+ "reqrate", "%d", &be->blkif->reqrate,
+ NULL);
+ if (err)
+ DPRINTK("%s xenbus_gather(reqrate) error", name);
+
+ be->blkif->reqtime = jiffies;
+
  if (be->major == 0 && be->minor == 0) {
  /* Front end dir is a number, which is used as the handle. */

@@ -482,6 +594,30 @@ static void frontend_changed(struct xenb

 /* ** Connection ** */

+static void reqrate_changed(struct xenbus_watch *watch,
+ const char **vec, unsigned int len)
+{
+ struct backend_info *be = container_of(watch, struct backend_info,
+ reqrate_watch);
+ int err;
+ char name[TASK_COMM_LEN];
+
+ err = blkback_name(be->blkif, name);
+ if (err) {
+ xenbus_dev_error(be->dev, err, "get blkback dev name");
+ return;
+ }
+
+ err = xenbus_gather(XBT_NIL, be->dev->otherend,
+ "reqrate",  "%d",
+ &be->blkif->reqrate, NULL);
+ if (err) {
+ DPRINTK("%s xenbus_gather(reqrate) error", name);
+ } else {
+ if (be->blkif->reqrate <= 0)
+ be->blkif->reqrate = 0;
+ }
+}

 /**
  * Write the physical details regarding the block device to the store, and
@@ -542,6 +678,21 @@ again:
  xenbus_dev_fatal(dev, err, "%s: switching to Connected state",
  dev->nodename);

+ if (be->reqrate_watch.node) {
+ unregister_xenbus_watch(&be->reqrate_watch);
+ kfree(be->reqrate_watch.node);
+ be->reqrate_watch.node = NULL;
+ }
+
+ err = xenbus_watch_path2(dev, dev->otherend, "reqrate",
+ &be->reqrate_watch,
+ reqrate_changed);
+ if (err) {
+ xenbus_dev_fatal(dev, err, "%s: watching reqrate",
+ dev->nodename);
+ goto abort;
+ }
+
  return;
  abort:
  xenbus_transaction_end(xbt, 1);

2013/1/31 Wei Liu <wei.liu2@citrix.com>:> On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote:
>> Sorry forget to send patch
>>
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
>> Patch for kernel 3.6.9, but if that needed i can rebase it to current
>> git Linus tree.
>
> Can you inline your patch in your email so that developer can comment on
> it.
>
>
> Wei.
>


-- 
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Vasiliy Tolstov

2013-Feb-01 10:59 UTC

head link

Re: blkback disk I/O limit patch

2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:> Sorry forget to send patch
>
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
> Patch for kernel 3.6.9, but if that needed i can rebase it to current
> git Linus tree.
Patch based on work andrew.xu (
http://xen.1045712.n5.nabble.com/VM-disk-I-O-limit-patch-td4509813.html
)

--
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:> Sorry forget to send patch
>
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
> Patch for kernel 3.6.9, but if that needed i can rebase it to current
> git Linus tree.
>
> 2013/1/31 Vasiliy Tolstov <v.tolstov@selfip.ru>:
>> Hello. For own needs i''m write simple blkback disk i/o limit
patch,
>> that can limit disk i/o based on iops. I need xen based iops shaper
>> because of own storage architecture.
>> Our storages node provide disks via scst over infiniband network.
>> On xen nodes we via srp attach this disks. Each xen connects to 2
>> storages in same time and multipath provide failover.
>>
>> Each disk contains LVM (not CLVM), for each virtual machine we create
>> PV disk. And via device mapper raid1 we create disk, used for domU. In
>> this case if one node failed VM works fine with one disk in raid1.
>>
>> All works greate, but in this setup we can''t use cgroups and
dm-ioband.
>> Some times ago CFQ disk scheduler top working with BIO devices and
>> provide control only on buttom layer. (In our case we can use CFQ only
>> on srp disk, and shape i/o only for all clients on xen node).
>> dm-ioband work''s unstable when the some domU have massive i/o
(our
>> tests says that if domU have ext4 and have 20000 iops sometimes dom0
>> crashed, or disk coccupted. And with dm-ioband if one storage node
>> down sometimes we miss some data from disk. And dm-ioband
can''t
>> provide on the fly control of iops.
>>
>> This patch tryes to solve own problems. May someone from xen team look
>> at it and says how code looks? What i need to change/rewrite? May be
>> sometime this can be used in main linux xen tree... (i hope).
>> This patch is only for phy devices. For blktap devices i speak with
>> Thanos Makatos (author of blktap3) and may be in future this
>> functionality may be added to blktap3..
>>
>> Thank You.
>>
>> --
>> Vasiliy Tolstov,
>> Clodo.ru
>> e-mail: v.tolstov@selfip.ru
>> jabber: vase@selfip.ru
>
>
>
> --
> Vasiliy Tolstov,
> Clodo.ru
> e-mail: v.tolstov@selfip.ru
> jabber: vase@selfip.ru


-- 
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Konrad Rzeszutek Wilk

2013-Feb-01 14:42 UTC

head link

Re: blkback disk I/O limit patch

On Fri, Feb 01, 2013 at 10:53:46AM +0400, Vasiliy Tolstov
wrote:> Sorry,
Ugh, you didn''t inline it - you just copied and pasted it.

Also you are missing an SoB and a description of what this patch does
and why is it better than existing device mapper I/O limiting work?
> 
> diff -NruabBEp xen_blkback_limit.orig/blkback.c
xen_blkback_limit.new//blkback.c
> --- xen_blkback_limit.orig/blkback.c 2012-12-04 13:03:58.000000000 +0400
> +++ xen_blkback_limit.new//blkback.c 2013-01-28 08:11:30.000000000 +0400
> @@ -211,10 +211,18 @@ static void print_stats(blkif_t *blkif)
>   blkif->st_pk_req = 0;
>  }
> 
> +static void refill_iops(blkif_t *blkif)
> +{
> + blkif->reqtime = jiffies + msecs_to_jiffies(1000);
> + blkif->reqcount = 0;
> +}
> +
>  int blkif_schedule(void *arg)
>  {
>   blkif_t *blkif = arg;
>   struct vbd *vbd = &blkif->vbd;
> + int     ret = 0;
> + struct timeval cur_time;
> 
>   blkif_get(blkif);
> 
> @@ -237,10 +245,22 @@ int blkif_schedule(void *arg)
>   blkif->waiting_reqs = 0;
>   smp_mb(); /* clear flag *before* checking for work */
> 
> - if (do_block_io_op(blkif))
> + ret = do_block_io_op(blkif);
> + if (ret)
>   blkif->waiting_reqs = 1;
>   unplug_queue(blkif);
> 
> + if (blkif->reqrate) {
> + if (2 == ret && (blkif->reqtime > jiffies)) {
> + jiffies_to_timeval(jiffies, &cur_time);
> +
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule_timeout(blkif->reqtime - jiffies);
> + }
> + if (time_after(jiffies, blkif->reqtime))
> + refill_iops(blkif);
> + }
> +
>   if (log_stats && time_after(jiffies, blkif->st_print))
>   print_stats(blkif);
>   }
> @@ -394,10 +414,19 @@ static int _do_block_io_op(blkif_t *blki
>   rp = blk_rings->common.sring->req_prod;
>   rmb(); /* Ensure we see queued requests up to ''rp''. */
> 
> + if (blkif->reqrate && (blkif->reqcount >=
blkif->reqrate)) {
> + return (rc != rp) ? 2 : 0;
> + }
> +
>   while (rc != rp) {
>   if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc))
>   break;
> 
> + if (blkif->reqrate) {
> + if (blkif->reqcount >= blkif->reqrate)
> + return 2;
> + }
> +
>   if (kthread_should_stop())
>   return 1;
> 
> @@ -434,8 +463,8 @@ static int _do_block_io_op(blkif_t *blki
> 
>   /* Apply all sanity checks to /private copy/ of request. */
>   barrier();
> -
>   dispatch_rw_block_io(blkif, &req, pending_req);
> + blkif->reqcount++;
>   break;
>   case BLKIF_OP_DISCARD:
>   blk_rings->common.req_cons = rc;
> @@ -452,7 +481,7 @@ static int _do_block_io_op(blkif_t *blki
>   break;
>   default:
>   /* A good sign something is wrong: sleep for a while to
> - * avoid excessive CPU consumption by a bad guest. */
> + * avoid excessive CPU consumption by a bad guest.*/
>   msleep(1);
>   blk_rings->common.req_cons = rc;
>   barrier();
> @@ -501,6 +530,7 @@ static void dispatch_rw_block_io(blkif_t
>   uint32_t flags;
>   int ret, i;
>   int operation;
> + struct timeval cur_time;
> 
>   switch (req->operation) {
>   case BLKIF_OP_READ:
> @@ -658,6 +688,7 @@ static void dispatch_rw_block_io(blkif_t
>   else
>   blkif->st_wr_sect += preq.nr_sects;
> 
> + jiffies_to_timeval(jiffies, &cur_time);
>   return;
> 
>   fail_flush:
> diff -NruabBEp xen_blkback_limit.orig/common.h
xen_blkback_limit.new//common.h
> --- xen_blkback_limit.orig/common.h 2012-12-04 13:03:58.000000000 +0400
> +++ xen_blkback_limit.new//common.h 2013-01-28 08:09:35.000000000 +0400
> @@ -82,6 +82,11 @@ typedef struct blkif_st {
>   unsigned int        waiting_reqs;
>   struct request_queue *plug;
> 
> + /* qos information */
> + unsigned long   reqtime;
> + int    reqcount;
> + int    reqrate;
> +
>   /* statistics */
>   unsigned long       st_print;
>   int                 st_rd_req;
> @@ -106,6 +111,8 @@ struct backend_info
>   unsigned major;
>   unsigned minor;
>   char *mode;
> + /* qos information */
> + struct xenbus_watch reqrate_watch;
>  };
> 
>  blkif_t *blkif_alloc(domid_t domid);
> diff -NruabBEp xen_blkback_limit.orig/xenbus.c
xen_blkback_limit.new//xenbus.c
> --- xen_blkback_limit.orig/xenbus.c 2012-12-04 13:03:58.000000000 +0400
> +++ xen_blkback_limit.new//xenbus.c 2013-01-28 08:22:26.000000000 +0400
> @@ -120,6 +120,79 @@ static void update_blkif_status(blkif_t
>   } \
>   static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
> 
> +static ssize_t
> +show_reqrate(struct device *_dev, struct device_attribute *attr, char
*buf)
> +{
> + ssize_t ret = -ENODEV;
> + struct xenbus_device *dev;
> + struct backend_info *be;
> +
> + if (!get_device(_dev))
> + return ret;
> +
> + dev = to_xenbus_device(_dev);
> + be = dev_get_drvdata(&dev->dev);
> +
> + if (be != NULL)
> + ret = sprintf(buf, "%d\n", be->blkif->reqrate);
> +
> + put_device(_dev);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +store_reqrate(struct device *_dev, struct device_attribute *attr,
> + const char *buf, size_t size)
> +{
> + int value;
> + struct xenbus_device *dev;
> + struct backend_info *be;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (!get_device(_dev))
> + return -ENODEV;
> +
> + if (sscanf(buf, "%d", &value) != 1)
> + return -EINVAL;
> +
> + dev = to_xenbus_device(_dev);
> + be = dev_get_drvdata(&dev->dev);
> +
> + if (be != NULL)
> + be->blkif->reqrate = value;
> +
> + put_device(_dev);
> +
> + return size;
> +}
> +static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate,
> + store_reqrate);
> +
> +static ssize_t
> +show_reqcount(struct device *_dev, struct device_attribute *attr, char
*buf)
> +{
> + ssize_t ret = -ENODEV;
> + struct xenbus_device *dev;
> + struct backend_info *be;
> +
> + if (!get_device(_dev))
> + return ret;
> +
> + dev = to_xenbus_device(_dev);
> + be = dev_get_drvdata(&dev->dev);
> +
> + if (be != NULL)
> + ret = sprintf(buf, "%d\n", be->blkif->reqcount);
> +
> + put_device(_dev);
> +
> + return ret;
> +}
> +static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL);
> +
>  VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
>  VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
>  VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
> @@ -146,6 +219,17 @@ static const struct attribute_group vbds
>   .attrs = vbdstat_attrs,
>  };
> 
> +static struct attribute *vbdreq_attrs[] = {
> + &dev_attr_reqrate.attr,
> + &dev_attr_reqcount.attr,
> + NULL
> +};
> +
> +static const struct attribute_group vbdreq_group = {
> + .name = "qos",
> + .attrs = vbdreq_attrs,
> +};
> +
>  VBD_SHOW(physical_device, "%x:%x\n", be->major,
be->minor);
>  VBD_SHOW(mode, "%s\n", be->mode);
> 
> @@ -165,8 +249,13 @@ int xenvbd_sysfs_addif(struct xenbus_dev
>   if (error)
>   goto fail3;
> 
> + error = sysfs_create_group(&dev->dev.kobj, &vbdreq_group);
> + if (error)
> + goto fail4;
> +
>   return 0;
> 
> +fail4:  sysfs_remove_group(&dev->dev.kobj, &vbdreq_group);
>  fail3: sysfs_remove_group(&dev->dev.kobj, &vbdstat_group);
>  fail2: device_remove_file(&dev->dev, &dev_attr_mode);
>  fail1: device_remove_file(&dev->dev,
&dev_attr_physical_device);
> @@ -175,6 +264,7 @@ fail1: device_remove_file(&dev->dev, &de
> 
>  void xenvbd_sysfs_delif(struct xenbus_device *dev)
>  {
> + sysfs_remove_group(&dev->dev.kobj, &vbdreq_group);
>   sysfs_remove_group(&dev->dev.kobj, &vbdstat_group);
>   device_remove_file(&dev->dev, &dev_attr_mode);
>   device_remove_file(&dev->dev, &dev_attr_physical_device);
> @@ -201,6 +291,12 @@ static int blkback_remove(struct xenbus_
>   be->cdrom_watch.node = NULL;
>   }
> 
> + if (be->reqrate_watch.node) {
> + unregister_xenbus_watch(&be->reqrate_watch);
> + kfree(be->reqrate_watch.node);
> + be->reqrate_watch.node = NULL;
> + }
> +
>   if (be->blkif) {
>   blkif_disconnect(be->blkif);
>   vbd_free(&be->blkif->vbd);
> @@ -338,6 +434,7 @@ static void backend_changed(struct xenbu
>   struct xenbus_device *dev = be->dev;
>   int cdrom = 0;
>   char *device_type;
> + char name[TASK_COMM_LEN];
> 
>   DPRINTK("");
> 
> @@ -376,6 +473,21 @@ static void backend_changed(struct xenbu
>   kfree(device_type);
>   }
> 
> + /* gather information about QoS policy for this device. */
> + err = blkback_name(be->blkif, name);
> + if (err) {
> + xenbus_dev_error(be->dev, err, "get blkback dev name");
> + return;
> + }
> +
> + err = xenbus_gather(XBT_NIL, dev->otherend,
> + "reqrate", "%d", &be->blkif->reqrate,
> + NULL);
> + if (err)
> + DPRINTK("%s xenbus_gather(reqrate) error", name);
> +
> + be->blkif->reqtime = jiffies;
> +
>   if (be->major == 0 && be->minor == 0) {
>   /* Front end dir is a number, which is used as the handle. */
> 
> @@ -482,6 +594,30 @@ static void frontend_changed(struct xenb
> 
>  /* ** Connection ** */
> 
> +static void reqrate_changed(struct xenbus_watch *watch,
> + const char **vec, unsigned int len)
> +{
> + struct backend_info *be = container_of(watch, struct backend_info,
> + reqrate_watch);
> + int err;
> + char name[TASK_COMM_LEN];
> +
> + err = blkback_name(be->blkif, name);
> + if (err) {
> + xenbus_dev_error(be->dev, err, "get blkback dev name");
> + return;
> + }
> +
> + err = xenbus_gather(XBT_NIL, be->dev->otherend,
> + "reqrate",  "%d",
> + &be->blkif->reqrate, NULL);
> + if (err) {
> + DPRINTK("%s xenbus_gather(reqrate) error", name);
> + } else {
> + if (be->blkif->reqrate <= 0)
> + be->blkif->reqrate = 0;
> + }
> +}
> 
>  /**
>   * Write the physical details regarding the block device to the store, and
> @@ -542,6 +678,21 @@ again:
>   xenbus_dev_fatal(dev, err, "%s: switching to Connected state",
>   dev->nodename);
> 
> + if (be->reqrate_watch.node) {
> + unregister_xenbus_watch(&be->reqrate_watch);
> + kfree(be->reqrate_watch.node);
> + be->reqrate_watch.node = NULL;
> + }
> +
> + err = xenbus_watch_path2(dev, dev->otherend, "reqrate",
> + &be->reqrate_watch,
> + reqrate_changed);
> + if (err) {
> + xenbus_dev_fatal(dev, err, "%s: watching reqrate",
> + dev->nodename);
> + goto abort;
> + }
> +
>   return;
>   abort:
>   xenbus_transaction_end(xbt, 1);
> 
> 2013/1/31 Wei Liu <wei.liu2@citrix.com>:
> > On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote:
> >> Sorry forget to send patch
> >>
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
> >> Patch for kernel 3.6.9, but if that needed i can rebase it to
current
> >> git Linus tree.
> >
> > Can you inline your patch in your email so that developer can comment
on
> > it.
> >
> >
> > Wei.
> >
> 
> 
> 
> -- 
> Vasiliy Tolstov,
> Clodo.ru
> e-mail: v.tolstov@selfip.ru
> jabber: vase@selfip.ru
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Vasiliy Tolstov

2013-Feb-05 13:14 UTC

head link

[PATCH 1/1] drivers/block/xen-blkback: Limit blkback i/o

From: Vasiliy Tolstov <vase@clodo.ru>

This patch provide ability to limit i/o for each domU block device.
With this patch dom0 administrator can specify maximum iops for
each block device. Changes apply dinamicaly and domU does not need
shutdown (in case of dm-ioband). Another good thing that dom0 may
not use CFQ scheduler.

Afer apply this patch we can control domU disk speed by writing
needed iops maximum to specific block device.

via sysfs:
echo 1500 > /sys/devices/xen-backend/vbd-1-51712/qos/reqrate

via xenstore:
xenstore write /local/domain/1/device/vbd/51712/reqrate 1500

Current xen i/o limiting solutions have following disadvantages:

1) dm-ioband
  It need to create another dm layer on top of block device. Lacks
  of ability to change weight on the fly (needs recreate layer).
  Its not in kernel yet. Patches need to backport/forwardport to
  specific kernel version. Under our heavy load, sometimes
  dm-ioband layer crash dom0. If we use dm-ioband on srp->lvm->raid1
  setup and srp target disconnects dm-ioband may breaks data and
  domU fs have many errors.

2) cgroups
  Very good thing. But in our setup we can''t use it. cgroups needs
  CFQ scheduler, but CFQ not apply to bio devices see device-mapper
  list http://goo.gl/YHiyI
  Our setup contains 2 storage nodes that export disks by srp.
  On each storage we have lvm (not clvm). Each domU have disk on lvm.
  Before start domain on xen node we construct raid1 from two lvm vg.
  In this case CFQ scheduler may be applied only to srp disk (/dev/sd*),
  but in this case we only limit all domU on this xen node in the same
  time.

Signed-off-by: Vasiliy Tolstov <vase@clodo.ru>
---
 drivers/block/xen-blkback/blkback.c |   35 +++++++-
 drivers/block/xen-blkback/common.h  |    5 ++
 drivers/block/xen-blkback/xenbus.c  |  152 +++++++++++++++++++++++++++++++++++
 3 files changed, 191 insertions(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkback/blkback.c
b/drivers/block/xen-blkback/blkback.c
index 74374fb..0672ab0 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -387,10 +387,18 @@ static void print_stats(struct xen_blkif *blkif)
 	blkif->st_ds_req = 0;
 }
 
+static void refill_iops(blkif_t *blkif)
+{
+	blkif->reqtime = jiffies + msecs_to_jiffies(1000);
+	blkif->reqcount = 0;
+}
+
 int xen_blkif_schedule(void *arg)
 {
 	struct xen_blkif *blkif = arg;
 	struct xen_vbd *vbd = &blkif->vbd;
+	int     ret = 0;
+	struct timeval cur_time;
 
 	xen_blkif_get(blkif);
 
@@ -411,9 +419,20 @@ int xen_blkif_schedule(void *arg)
 		blkif->waiting_reqs = 0;
 		smp_mb(); /* clear flag *before* checking for work */
 
-		if (do_block_io_op(blkif))
+		ret = do_block_io_op(blkif);
+		if (ret)
 			blkif->waiting_reqs = 1;
 
+		if (blkif->reqrate) {
+			if (2 == ret && (blkif->reqtime > jiffies)) {
+				jiffies_to_timeval(jiffies, &cur_time);
+				set_current_state(TASK_INTERRUPTIBLE);
+				schedule_timeout(blkif->reqtime - jiffies);
+			}
+			if (time_after(jiffies, blkif->reqtime))
+				refill_iops(blkif);
+		}
+
 		if (log_stats && time_after(jiffies, blkif->st_print))
 			print_stats(blkif);
 	}
@@ -760,6 +779,10 @@ __do_block_io_op(struct xen_blkif *blkif)
 	rp = blk_rings->common.sring->req_prod;
 	rmb(); /* Ensure we see queued requests up to ''rp''. */
 
+	if (blkif->reqrate && (blkif->reqcount >= blkif->reqrate))
{
+		return (rc != rp) ? 2 : 0;
+	}
+
 	while (rc != rp) {
 
 		if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc))
@@ -770,6 +793,13 @@ __do_block_io_op(struct xen_blkif *blkif)
 			break;
 		}
 
+		if (blkif->reqrate) {
+			if (blkif->reqcount >= blkif->reqrate) {
+				more_to_do = 2;
+				break;
+			}
+		}
+
 		pending_req = alloc_req();
 		if (NULL == pending_req) {
 			blkif->st_oo_req++;
@@ -792,6 +822,7 @@ __do_block_io_op(struct xen_blkif *blkif)
 		}
 		blk_rings->common.req_cons = ++rc; /* before make_response() */
 
+		blkif->reqcount++;
 		/* Apply all sanity checks to /private copy/ of request. */
 		barrier();
 		if (unlikely(req.operation == BLKIF_OP_DISCARD)) {
@@ -842,6 +873,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	struct blk_plug plug;
 	bool drain = false;
 	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+	struct timeval cur_time;
 
 	switch (req->operation) {
 	case BLKIF_OP_READ:
@@ -992,6 +1024,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	else if (operation & WRITE)
 		blkif->st_wr_sect += preq.nr_sects;
 
+	jiffies_to_timeval(jiffies, &cur_time);
 	return 0;
 
  fail_flush:
diff --git a/drivers/block/xen-blkback/common.h
b/drivers/block/xen-blkback/common.h
index 6072390..0552ce3 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -206,6 +206,11 @@ struct xen_blkif {
 	struct rb_root		persistent_gnts;
 	unsigned int		persistent_gnt_c;
 
+	/* qos information */
+	unsigned long   reqtime;
+	int    reqcount;
+	int    reqrate;
+
 	/* statistics */
 	unsigned long		st_print;
 	int			st_rd_req;
diff --git a/drivers/block/xen-blkback/xenbus.c
b/drivers/block/xen-blkback/xenbus.c
index 6398072..f8afe76 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -25,6 +25,7 @@ struct backend_info {
 	struct xenbus_device	*dev;
 	struct xen_blkif	*blkif;
 	struct xenbus_watch	backend_watch;
+	struct xenbus_watch	reqrate_watch;
 	unsigned		major;
 	unsigned		minor;
 	char			*mode;
@@ -230,6 +231,79 @@ int __init xen_blkif_interface_init(void)
 	}								\
 	static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
 
+static ssize_t
+show_reqrate(struct device *_dev, struct device_attribute *attr, char *buf)
+{
+	ssize_t ret = -ENODEV;
+	struct xenbus_device *dev;
+	struct backend_info *be;
+
+	if (!get_device(_dev))
+		return ret;
+
+	dev = to_xenbus_device(_dev);
+	be = dev_get_drvdata(&dev->dev);
+
+	if (be != NULL)
+		ret = sprintf(buf, "%d\n", be->blkif->reqrate);
+
+	put_device(_dev);
+
+	return ret;
+}
+
+static ssize_t
+store_reqrate(struct device *_dev, struct device_attribute *attr,
+		const char *buf, size_t size)
+{
+	int value;
+	struct xenbus_device *dev;
+	struct backend_info *be;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (!get_device(_dev))
+		return -ENODEV;
+
+	if (sscanf(buf, "%d", &value) != 1)
+		return -EINVAL;
+
+	dev = to_xenbus_device(_dev);
+	be = dev_get_drvdata(&dev->dev);
+
+	if (be != NULL)
+		be->blkif->reqrate = value;
+
+	put_device(_dev);
+
+	return size;
+}
+static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate,
+			store_reqrate);
+
+static ssize_t
+show_reqcount(struct device *_dev, struct device_attribute *attr, char *buf)
+{
+	ssize_t ret = -ENODEV;
+	struct xenbus_device *dev;
+	struct backend_info *be;
+
+	if (!get_device(_dev))
+		return ret;
+
+	dev = to_xenbus_device(_dev);
+	be = dev_get_drvdata(&dev->dev);
+
+	if (be != NULL)
+		ret = sprintf(buf, "%d\n", be->blkif->reqcount);
+
+	put_device(_dev);
+
+	return ret;
+}
+static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL);
+
 VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
 VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
 VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
@@ -254,6 +328,17 @@ static struct attribute_group xen_vbdstat_group = {
 	.attrs = xen_vbdstat_attrs,
 };
 
+static struct attribute *vbdreq_attrs[] = {
+	&dev_attr_reqrate.attr,
+	&dev_attr_reqcount.attr,
+	NULL
+};
+
+static const struct attribute_group vbdreq_group = {
+	.name = "qos",
+	.attrs = vbdreq_attrs,
+};
+
 VBD_SHOW(physical_device, "%x:%x\n", be->major, be->minor);
 VBD_SHOW(mode, "%s\n", be->mode);
 
@@ -273,8 +358,13 @@ static int xenvbd_sysfs_addif(struct xenbus_device *dev)
 	if (error)
 		goto fail3;
 
+	error = sysfs_create_group(&dev->dev.kobj, &xen_vbdreq_group);
+	if (error)
+		goto fail4;
+
 	return 0;
 
+fail4:	sysfs_remove_group(&dev->dev.kobj, &xen_vbdreq_group);
 fail3:	sysfs_remove_group(&dev->dev.kobj, &xen_vbdstat_group);
 fail2:	device_remove_file(&dev->dev, &dev_attr_mode);
 fail1:	device_remove_file(&dev->dev, &dev_attr_physical_device);
@@ -283,6 +373,7 @@ fail1:	device_remove_file(&dev->dev,
&dev_attr_physical_device);
 
 static void xenvbd_sysfs_delif(struct xenbus_device *dev)
 {
+	sysfs_remove_group(&dev->dev.kobj, &xen_vbdreq_group);
 	sysfs_remove_group(&dev->dev.kobj, &xen_vbdstat_group);
 	device_remove_file(&dev->dev, &dev_attr_mode);
 	device_remove_file(&dev->dev, &dev_attr_physical_device);
@@ -360,6 +451,12 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
 		be->backend_watch.node = NULL;
 	}
 
+	if (be->reqrate_watch.node) {
+		unregister_xenbus_watch(&be->reqrate_watch);
+		kfree(be->reqrate_watch.node);
+		be->reqrate_watch.node = NULL;
+	}
+
 	if (be->blkif) {
 		xen_blkif_disconnect(be->blkif);
 		xen_vbd_free(&be->blkif->vbd);
@@ -503,6 +600,7 @@ static void backend_changed(struct xenbus_watch *watch,
 	struct xenbus_device *dev = be->dev;
 	int cdrom = 0;
 	char *device_type;
+	char name[TASK_COMM_LEN];
 
 	DPRINTK("");
 
@@ -542,6 +640,21 @@ static void backend_changed(struct xenbus_watch *watch,
 		kfree(device_type);
 	}
 
+	/* gather information about QoS policy for this device. */
+	err = blkback_name(be->blkif, name);
+	if (err) {
+		xenbus_dev_error(be->dev, err, "get blkback dev name");
+		return;
+	}
+
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+				"reqrate", "%d", &be->blkif->reqrate,
+				NULL);
+	if (err)
+		DPRINTK("%s xenbus_gather(reqrate) error", name);
+
+	be->blkif->reqtime = jiffies;
+
 	if (be->major == 0 && be->minor == 0) {
 		/* Front end dir is a number, which is used as the handle. */
 
@@ -645,6 +758,30 @@ static void frontend_changed(struct xenbus_device *dev,
 
 /* ** Connection ** */
 
+static void reqrate_changed(struct xenbus_watch *watch,
+			const char **vec, unsigned int len)
+{
+	struct backend_info *be = container_of(watch, struct backend_info,
+						reqrate_watch);
+	int err;
+	char name[TASK_COMM_LEN];
+
+	err = blkback_name(be->blkif, name);
+	if (err) {
+		xenbus_dev_error(be->dev, err, "get blkback dev name");
+		return;
+	}
+
+	err = xenbus_gather(XBT_NIL, be->dev->otherend,
+					"reqrate",  "%d",
+					&be->blkif->reqrate, NULL);
+	if (err) {
+		DPRINTK("%s xenbus_gather(reqrate) error", name);
+	} else {
+		if (be->blkif->reqrate <= 0)
+			be->blkif->reqrate = 0;
+	}
+}
 
 /*
  * Write the physical details regarding the block device to the store, and
@@ -717,6 +854,21 @@ again:
 		xenbus_dev_fatal(dev, err, "%s: switching to Connected state",
 				 dev->nodename);
 
+	if (be->reqrate_watch.node) {
+		unregister_xenbus_watch(&be->reqrate_watch);
+		kfree(be->reqrate_watch.node);
+		be->reqrate_watch.node = NULL;
+	}
+
+	err = xenbus_watch_path2(dev, dev->otherend, "reqrate",
+					&be->reqrate_watch,
+					reqrate_changed);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "%s: watching reqrate",
+					dev->nodename);
+		goto abort;
+	}
+
 	return;
  abort:
 	xenbus_transaction_end(xbt, 1);
-- 
1.7.9.5

Vasiliy Tolstov

2013-Feb-05 13:17 UTC

head link

Re: blkback disk I/O limit patch

Thanks for help and sorry for delay. I''m not fully understand how to
send patch via git send-email for specific mail thread.
May be i have missing message id (i cur it from last message, not first)

2013/2/1 Konrad Rzeszutek Wilk
<konrad@kernel.org>:> On Fri, Feb 01, 2013 at 10:53:46AM +0400, Vasiliy Tolstov wrote:
>> Sorry,
>
> Ugh, you didn''t inline it - you just copied and pasted it.
>
> Also you are missing an SoB and a description of what this patch does
> and why is it better than existing device mapper I/O limiting work?
>
>>
>> diff -NruabBEp xen_blkback_limit.orig/blkback.c
xen_blkback_limit.new//blkback.c
>> --- xen_blkback_limit.orig/blkback.c 2012-12-04 13:03:58.000000000
+0400
>> +++ xen_blkback_limit.new//blkback.c 2013-01-28 08:11:30.000000000
+0400
>> @@ -211,10 +211,18 @@ static void print_stats(blkif_t *blkif)
>>   blkif->st_pk_req = 0;
>>  }
>>
>> +static void refill_iops(blkif_t *blkif)
>> +{
>> + blkif->reqtime = jiffies + msecs_to_jiffies(1000);
>> + blkif->reqcount = 0;
>> +}
>> +
>>  int blkif_schedule(void *arg)
>>  {
>>   blkif_t *blkif = arg;
>>   struct vbd *vbd = &blkif->vbd;
>> + int     ret = 0;
>> + struct timeval cur_time;
>>
>>   blkif_get(blkif);
>>
>> @@ -237,10 +245,22 @@ int blkif_schedule(void *arg)
>>   blkif->waiting_reqs = 0;
>>   smp_mb(); /* clear flag *before* checking for work */
>>
>> - if (do_block_io_op(blkif))
>> + ret = do_block_io_op(blkif);
>> + if (ret)
>>   blkif->waiting_reqs = 1;
>>   unplug_queue(blkif);
>>
>> + if (blkif->reqrate) {
>> + if (2 == ret && (blkif->reqtime > jiffies)) {
>> + jiffies_to_timeval(jiffies, &cur_time);
>> +
>> + set_current_state(TASK_INTERRUPTIBLE);
>> + schedule_timeout(blkif->reqtime - jiffies);
>> + }
>> + if (time_after(jiffies, blkif->reqtime))
>> + refill_iops(blkif);
>> + }
>> +
>>   if (log_stats && time_after(jiffies, blkif->st_print))
>>   print_stats(blkif);
>>   }
>> @@ -394,10 +414,19 @@ static int _do_block_io_op(blkif_t *blki
>>   rp = blk_rings->common.sring->req_prod;
>>   rmb(); /* Ensure we see queued requests up to ''rp''.
*/
>>
>> + if (blkif->reqrate && (blkif->reqcount >=
blkif->reqrate)) {
>> + return (rc != rp) ? 2 : 0;
>> + }
>> +
>>   while (rc != rp) {
>>   if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rc))
>>   break;
>>
>> + if (blkif->reqrate) {
>> + if (blkif->reqcount >= blkif->reqrate)
>> + return 2;
>> + }
>> +
>>   if (kthread_should_stop())
>>   return 1;
>>
>> @@ -434,8 +463,8 @@ static int _do_block_io_op(blkif_t *blki
>>
>>   /* Apply all sanity checks to /private copy/ of request. */
>>   barrier();
>> -
>>   dispatch_rw_block_io(blkif, &req, pending_req);
>> + blkif->reqcount++;
>>   break;
>>   case BLKIF_OP_DISCARD:
>>   blk_rings->common.req_cons = rc;
>> @@ -452,7 +481,7 @@ static int _do_block_io_op(blkif_t *blki
>>   break;
>>   default:
>>   /* A good sign something is wrong: sleep for a while to
>> - * avoid excessive CPU consumption by a bad guest. */
>> + * avoid excessive CPU consumption by a bad guest.*/
>>   msleep(1);
>>   blk_rings->common.req_cons = rc;
>>   barrier();
>> @@ -501,6 +530,7 @@ static void dispatch_rw_block_io(blkif_t
>>   uint32_t flags;
>>   int ret, i;
>>   int operation;
>> + struct timeval cur_time;
>>
>>   switch (req->operation) {
>>   case BLKIF_OP_READ:
>> @@ -658,6 +688,7 @@ static void dispatch_rw_block_io(blkif_t
>>   else
>>   blkif->st_wr_sect += preq.nr_sects;
>>
>> + jiffies_to_timeval(jiffies, &cur_time);
>>   return;
>>
>>   fail_flush:
>> diff -NruabBEp xen_blkback_limit.orig/common.h
xen_blkback_limit.new//common.h
>> --- xen_blkback_limit.orig/common.h 2012-12-04 13:03:58.000000000 +0400
>> +++ xen_blkback_limit.new//common.h 2013-01-28 08:09:35.000000000 +0400
>> @@ -82,6 +82,11 @@ typedef struct blkif_st {
>>   unsigned int        waiting_reqs;
>>   struct request_queue *plug;
>>
>> + /* qos information */
>> + unsigned long   reqtime;
>> + int    reqcount;
>> + int    reqrate;
>> +
>>   /* statistics */
>>   unsigned long       st_print;
>>   int                 st_rd_req;
>> @@ -106,6 +111,8 @@ struct backend_info
>>   unsigned major;
>>   unsigned minor;
>>   char *mode;
>> + /* qos information */
>> + struct xenbus_watch reqrate_watch;
>>  };
>>
>>  blkif_t *blkif_alloc(domid_t domid);
>> diff -NruabBEp xen_blkback_limit.orig/xenbus.c
xen_blkback_limit.new//xenbus.c
>> --- xen_blkback_limit.orig/xenbus.c 2012-12-04 13:03:58.000000000 +0400
>> +++ xen_blkback_limit.new//xenbus.c 2013-01-28 08:22:26.000000000 +0400
>> @@ -120,6 +120,79 @@ static void update_blkif_status(blkif_t
>>   } \
>>   static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
>>
>> +static ssize_t
>> +show_reqrate(struct device *_dev, struct device_attribute *attr, char
*buf)
>> +{
>> + ssize_t ret = -ENODEV;
>> + struct xenbus_device *dev;
>> + struct backend_info *be;
>> +
>> + if (!get_device(_dev))
>> + return ret;
>> +
>> + dev = to_xenbus_device(_dev);
>> + be = dev_get_drvdata(&dev->dev);
>> +
>> + if (be != NULL)
>> + ret = sprintf(buf, "%d\n", be->blkif->reqrate);
>> +
>> + put_device(_dev);
>> +
>> + return ret;
>> +}
>> +
>> +static ssize_t
>> +store_reqrate(struct device *_dev, struct device_attribute *attr,
>> + const char *buf, size_t size)
>> +{
>> + int value;
>> + struct xenbus_device *dev;
>> + struct backend_info *be;
>> +
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + if (!get_device(_dev))
>> + return -ENODEV;
>> +
>> + if (sscanf(buf, "%d", &value) != 1)
>> + return -EINVAL;
>> +
>> + dev = to_xenbus_device(_dev);
>> + be = dev_get_drvdata(&dev->dev);
>> +
>> + if (be != NULL)
>> + be->blkif->reqrate = value;
>> +
>> + put_device(_dev);
>> +
>> + return size;
>> +}
>> +static DEVICE_ATTR(reqrate, S_IRUGO | S_IWUSR, show_reqrate,
>> + store_reqrate);
>> +
>> +static ssize_t
>> +show_reqcount(struct device *_dev, struct device_attribute *attr, char
*buf)
>> +{
>> + ssize_t ret = -ENODEV;
>> + struct xenbus_device *dev;
>> + struct backend_info *be;
>> +
>> + if (!get_device(_dev))
>> + return ret;
>> +
>> + dev = to_xenbus_device(_dev);
>> + be = dev_get_drvdata(&dev->dev);
>> +
>> + if (be != NULL)
>> + ret = sprintf(buf, "%d\n", be->blkif->reqcount);
>> +
>> + put_device(_dev);
>> +
>> + return ret;
>> +}
>> +static DEVICE_ATTR(reqcount, S_IRUGO | S_IWUSR, show_reqcount, NULL);
>> +
>>  VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
>>  VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
>>  VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
>> @@ -146,6 +219,17 @@ static const struct attribute_group vbds
>>   .attrs = vbdstat_attrs,
>>  };
>>
>> +static struct attribute *vbdreq_attrs[] = {
>> + &dev_attr_reqrate.attr,
>> + &dev_attr_reqcount.attr,
>> + NULL
>> +};
>> +
>> +static const struct attribute_group vbdreq_group = {
>> + .name = "qos",
>> + .attrs = vbdreq_attrs,
>> +};
>> +
>>  VBD_SHOW(physical_device, "%x:%x\n", be->major,
be->minor);
>>  VBD_SHOW(mode, "%s\n", be->mode);
>>
>> @@ -165,8 +249,13 @@ int xenvbd_sysfs_addif(struct xenbus_dev
>>   if (error)
>>   goto fail3;
>>
>> + error = sysfs_create_group(&dev->dev.kobj, &vbdreq_group);
>> + if (error)
>> + goto fail4;
>> +
>>   return 0;
>>
>> +fail4:  sysfs_remove_group(&dev->dev.kobj, &vbdreq_group);
>>  fail3: sysfs_remove_group(&dev->dev.kobj, &vbdstat_group);
>>  fail2: device_remove_file(&dev->dev, &dev_attr_mode);
>>  fail1: device_remove_file(&dev->dev,
&dev_attr_physical_device);
>> @@ -175,6 +264,7 @@ fail1: device_remove_file(&dev->dev, &de
>>
>>  void xenvbd_sysfs_delif(struct xenbus_device *dev)
>>  {
>> + sysfs_remove_group(&dev->dev.kobj, &vbdreq_group);
>>   sysfs_remove_group(&dev->dev.kobj, &vbdstat_group);
>>   device_remove_file(&dev->dev, &dev_attr_mode);
>>   device_remove_file(&dev->dev, &dev_attr_physical_device);
>> @@ -201,6 +291,12 @@ static int blkback_remove(struct xenbus_
>>   be->cdrom_watch.node = NULL;
>>   }
>>
>> + if (be->reqrate_watch.node) {
>> + unregister_xenbus_watch(&be->reqrate_watch);
>> + kfree(be->reqrate_watch.node);
>> + be->reqrate_watch.node = NULL;
>> + }
>> +
>>   if (be->blkif) {
>>   blkif_disconnect(be->blkif);
>>   vbd_free(&be->blkif->vbd);
>> @@ -338,6 +434,7 @@ static void backend_changed(struct xenbu
>>   struct xenbus_device *dev = be->dev;
>>   int cdrom = 0;
>>   char *device_type;
>> + char name[TASK_COMM_LEN];
>>
>>   DPRINTK("");
>>
>> @@ -376,6 +473,21 @@ static void backend_changed(struct xenbu
>>   kfree(device_type);
>>   }
>>
>> + /* gather information about QoS policy for this device. */
>> + err = blkback_name(be->blkif, name);
>> + if (err) {
>> + xenbus_dev_error(be->dev, err, "get blkback dev name");
>> + return;
>> + }
>> +
>> + err = xenbus_gather(XBT_NIL, dev->otherend,
>> + "reqrate", "%d", &be->blkif->reqrate,
>> + NULL);
>> + if (err)
>> + DPRINTK("%s xenbus_gather(reqrate) error", name);
>> +
>> + be->blkif->reqtime = jiffies;
>> +
>>   if (be->major == 0 && be->minor == 0) {
>>   /* Front end dir is a number, which is used as the handle. */
>>
>> @@ -482,6 +594,30 @@ static void frontend_changed(struct xenb
>>
>>  /* ** Connection ** */
>>
>> +static void reqrate_changed(struct xenbus_watch *watch,
>> + const char **vec, unsigned int len)
>> +{
>> + struct backend_info *be = container_of(watch, struct backend_info,
>> + reqrate_watch);
>> + int err;
>> + char name[TASK_COMM_LEN];
>> +
>> + err = blkback_name(be->blkif, name);
>> + if (err) {
>> + xenbus_dev_error(be->dev, err, "get blkback dev name");
>> + return;
>> + }
>> +
>> + err = xenbus_gather(XBT_NIL, be->dev->otherend,
>> + "reqrate",  "%d",
>> + &be->blkif->reqrate, NULL);
>> + if (err) {
>> + DPRINTK("%s xenbus_gather(reqrate) error", name);
>> + } else {
>> + if (be->blkif->reqrate <= 0)
>> + be->blkif->reqrate = 0;
>> + }
>> +}
>>
>>  /**
>>   * Write the physical details regarding the block device to the store,
and
>> @@ -542,6 +678,21 @@ again:
>>   xenbus_dev_fatal(dev, err, "%s: switching to Connected
state",
>>   dev->nodename);
>>
>> + if (be->reqrate_watch.node) {
>> + unregister_xenbus_watch(&be->reqrate_watch);
>> + kfree(be->reqrate_watch.node);
>> + be->reqrate_watch.node = NULL;
>> + }
>> +
>> + err = xenbus_watch_path2(dev, dev->otherend, "reqrate",
>> + &be->reqrate_watch,
>> + reqrate_changed);
>> + if (err) {
>> + xenbus_dev_fatal(dev, err, "%s: watching reqrate",
>> + dev->nodename);
>> + goto abort;
>> + }
>> +
>>   return;
>>   abort:
>>   xenbus_transaction_end(xbt, 1);
>>
>> 2013/1/31 Wei Liu <wei.liu2@citrix.com>:
>> > On Thu, 2013-01-31 at 05:14 +0000, Vasiliy Tolstov wrote:
>> >> Sorry forget to send patch
>> >>
https://bitbucket.org/go2clouds/patches/raw/master/xen_blkback_limit/3.6.9-1.patch
>> >> Patch for kernel 3.6.9, but if that needed i can rebase it to
current
>> >> git Linus tree.
>> >
>> > Can you inline your patch in your email so that developer can
comment on
>> > it.
>> >
>> >
>> > Wei.
>> >
>>
>>
>>
>> --
>> Vasiliy Tolstov,
>> Clodo.ru
>> e-mail: v.tolstov@selfip.ru
>> jabber: vase@selfip.ru
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>


-- 
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Alex Bligh

2013-Feb-05 15:37 UTC

head link

Re: blkback disk I/O limit patch

--On 31 January 2013 09:12:28 +0400 Vasiliy Tolstov <v.tolstov@selfip.ru> 
wrote:
> Hello. For own needs i''m write simple blkback disk i/o limit
patch,
> that can limit disk i/o based on iops. I need xen based iops shaper
> because of own storage architecture.
Another approach (when using the qemu-upstream DM) would presumably
be to use the block_io_throttle QMP command that QEMU provides.

-- 
Alex Bligh

Vasiliy Tolstov

2013-Feb-05 16:36 UTC

head link

Re: blkback disk I/O limit patch

2013/2/5 Alex Bligh <alex@alex.org.uk>:> Another approach (when using the qemu-upstream DM) would presumably
> be to use the block_io_throttle QMP command that QEMU provides.

Hmm. Does this play good with phy devices used by xen?

--
Vasiliy Tolstov,
Clodo.ru
e-mail: v.tolstov@selfip.ru
jabber: vase@selfip.ru

Alex Bligh

2013-Feb-05 18:01 UTC

head link

Re: blkback disk I/O limit patch

--On 5 February 2013 20:36:05 +0400 Vasiliy Tolstov <v.tolstov@selfip.ru> 
wrote:
> 2013/2/5 Alex Bligh <alex@alex.org.uk>:
>> Another approach (when using the qemu-upstream DM) would presumably
>> be to use the block_io_throttle QMP command that QEMU provides.
>
> Hmm. Does this play good with phy devices used by xen?
Pass.

-- 
Alex Bligh

Xen devel - Jan 2013 - blkback disk I/O limit patch

blkback disk I/O limit patch

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch

[PATCH 1/1] drivers/block/xen-blkback: Limit blkback i/o

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch

Re: blkback disk I/O limit patch