Hi everyone, This is a new release of dm-ioband and bio-cgroup. With this release, the overhead of bio-cgroup is significantly reduced and the accuracy of block I/O tracking is much improved. These patches are for 2.6.28-rc2-mm1. Enjoy it! dm-ioband ======== Dm-ioband is an I/O bandwidth controller implemented as a device-mapper driver, which gives specified bandwidth to each job running on the same block device. A job is a group of processes or a virtual machine such as KVM or Xen. I/O throughput on dm-ioband is excellent not only on SATA storage but on SDD, which as good as the one without dm-ioband. Changes from the previous release: - Fix a bug that create_workqueue() is called during spin lock when creating a new ioband group. - A new tunable parameter "carryover" is added, which specifies how many tokens an ioband group can keep for the future use when the group isn't so active. TODO: - Other policies to schedule BIOs. - Policies which fits SSD. e.g.) - Guarantee response time. - Guarantee throughput. - Policies which fits Highend Storage or hardware raid storage. - Some LUNs may share the same bandwidth. - Support WRITE_BARRIER when the device-mapper layer supports it. - Implement the algorithm of dm-ioband in the block I/O layer experimentally. bio-cgroup ========= Bio-cgroup is a BIO tracking mechanism, which is implemented on the cgroup memory subsystem. With the mechanism, it is able to determine which cgroup each of bio belongs to, even when the bio is one of delayed-write requests issued from a kernel thread such as pdflush. Changes from the previous release: - This release is a new implementation. - This is based on the new design of the cgroup memory controller framework, which pre-allocates all cgroup-page data structures to reduce the overhead. - The overhead to trace block I/O requests is much smaller than that of the previous one. This is done by making every page have the id of its corresponding bio-cgroup instead of the pointer to it and most of spin-locks and atomic operations are gone. - This implementation uses only 4 bytes per page for I/O tracking while the previous version uses 12 bytes on a 32 bit machine and 24 bytes on a 64 bit machine. - The accuracy of I/O tracking is improved that it can trace I/O requests even when the processes which issued these requests get moved into another bio-cgroup. - Support bounce buffers tracking. They will have the same bio-cgroup owners as the original I/O requests. TODO: - Support to track I/O requests that will be generated in Linux kernel, such as those of RAID0 and RAID5. A list of patches ================ The following is a list of patches: [PATCH 0/8] I/O bandwidth controller and BIO tracking [PATCH 1/8] dm-ioband: Introduction [PATCH 2/8] dm-ioband: Source code and patch [PATCH 3/8] dm-ioband: Document [PATCH 4/8] bio-cgroup: Introduction [PATCH 5/8] bio-cgroup: The new page_cgroup framework [PATCH 6/8] bio-cgroup: The body of bio-cgroup [PATCH 7/8] bio-cgroup: Page tracking hooks [PATCH 8/8] bio-cgroup: Add a cgroup support to dm-ioband Please see the following site for more information: Linux Block I/O Bandwidth Control Project http://people.valinux.co.jp/~ryov/bwctl/ Thanks, Ryo Tsuruta
What's am-ioband all about? ========================== dm-ioband is an I/O bandwidth controller implemented as a device-mapper driver, which gives specified bandwidth to each job running on the same block device. A job is a group of processes with the same pid or pgrp or uid or a virtual machine such as KVM or Xen. A job can also be a cgroup by applying the bio-cgroup patch. Setup and Installation ===================== Build a kernel with these options enabled: CONFIG_MD CONFIG_BLK_DEV_DM CONFIG_DM_IOBAND If compiled as module, use modprobe to load dm-ioband. # make modules # make modules_install # depmod -a # modprobe dm-ioband "dmsetup targets" command shows all available device-mapper targets. "ioband" and the version number are displayed when dm-ioband has been loaded. # dmsetup targets | grep ioband ioband v1.9.0 Getting started ============== The following is a brief description how to control the I/O bandwidth of disks. In this description, we'll take one disk with two partitions as an example target. Create and map ioband devices ----------------------------- Create two ioband devices "ioband1" and "ioband2". "ioband1" is mapped to "/dev/sda1" and has a weight of 40. "ioband2" is mapped to "/dev/sda2" and has a weight of 10. "ioband1" can use 80% --- 40/(40+10)*100 --- of the bandwidth of the physical disk "/dev/sda" while "ioband2" can use 20%. # echo "0 $(blockdev --getsize /dev/sda1) ioband /dev/sda1 1 0 0 none" \ "weight 0 :40" | dmsetup create ioband1 # echo "0 $(blockdev --getsize /dev/sda2) ioband /dev/sda2 1 0 0 none" \ "weight 0 :10" | dmsetup create ioband2 If the commands are successful then the device files "/dev/mapper/ioband1" and "/dev/mapper/ioband2" will have been created. Additional bandwidth control ---------------------------- In this example two extra ioband groups are created on "ioband1". The first group consists of all the processes with user-id 1000 and the second group consists of all the processes with user-id 2000. Their weights are 30 and 20 respectively. # dmsetup message ioband1 0 type user # dmsetup message ioband1 0 attach 1000 # dmsetup message ioband1 0 attach 2000 # dmsetup message ioband1 0 weight 1000:30 # dmsetup message ioband1 0 weight 2000:20 Now the processes in the user-id 1000 group can use 30% --- 30/(30+20+40+10)*100 --- of the bandwidth of the physical disk. Table 1. Weight assignments ioband device ioband group ioband weight ioband1 user id 1000 30 ioband1 user id 2000 20 ioband1 default group 40 ioband2 default group 10 Remove the ioband devices ------------------------- Remove the ioband devices when no longer used. # dmsetup remove ioband1 # dmsetup remove ioband2
KAMEZAWA Hiroyuki
2008-Nov-13 06:30 UTC
[PATCH 0/8] I/O bandwidth controller and BIO tracking
On Thu, 13 Nov 2008 12:10:19 +0900 (JST) Ryo Tsuruta <ryov at valinux.co.jp> wrote:> Hi everyone, > > This is a new release of dm-ioband and bio-cgroup. With this release, > the overhead of bio-cgroup is significantly reduced and the accuracy > of block I/O tracking is much improved. These patches are for > 2.6.28-rc2-mm1. >
Ryo Tsuruta wrote:> Create two ioband devices "ioband1" and "ioband2". "ioband1" is mapped > to "/dev/sda1" and has a weight of 40. "ioband2" is mapped to "/dev/sda2" > and has a weight of 10. "ioband1" can use 80% --- 40/(40+10)*100 --- > of the bandwidth of the physical disk "/dev/sda" while "ioband2" can use 20%.Just to clarify, when you say ioband1 can use 80% of the bandwidh, you mean that is how much it will get if both io bands are loaded right? If there is no activity on ioband2, then ioband1 will get the full disk bandwidth right?
Hirokazu Takahashi
2008-Nov-13 23:15 UTC
[dm-devel] Re: [PATCH 0/8] I/O bandwidth controller and BIO tracking
Hi, Balbir,> Hirokazu Takahashi wrote: > > Hi, Kamezawa-san, > > > > This patch makes the page_cgroup framework be able to be used even if > > the compile option of the cgroup memory controller is off. > > So bio-cgroup can use this framework without the memory controller. > > > > Signed-off-by: Hirokazu Takahashi <taka at valinux.co.jp> > > > > diff -dupr linux-2.6.28-rc2.bc0/include/linux/memcontrol.h linux-2.6.28-rc2/include/linux/memcontrol.h > > --- linux-2.6.28-rc2.bc0/include/linux/memcontrol.h 2008-11-10 18:31:34.000000000 +0900 > > +++ linux-2.6.28-rc2/include/linux/memcontrol.h 2008-11-11 13:51:42.000000000 +0900 > > @@ -27,6 +27,9 @@ struct mm_struct; > > > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR > > > > +extern void __init_mem_page_cgroup(struct page_cgroup *pc); > > +#define mem_cgroup_disabled() mem_cgroup_subsys.disabled > > + > > extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm, > > gfp_t gfp_mask); > > /* for swap handling */ > > @@ -81,6 +84,15 @@ extern long mem_cgroup_calc_reclaim(stru > > #else /* CONFIG_CGROUP_MEM_RES_CTLR */ > > struct mem_cgroup; > > > > +static inline void __init_mem_page_cgroup(struct page_cgroup *pc) > > +{ > > +} > > + > > +static inline int mem_cgroup_disabled(void) > > +{ > > + return 1; > > +} > > + > > With CONFIG_CGROUP_MEM_RES_CTLR not defined, page_cgroup init routines will just > return, is that what bio page_cgroup needs? > > -- > BalbirOne of the other patches includes the following code, which calls __init_bio_page_cgroup() to initialize bio-cgroup thing. +++ linux-2.6.28-rc2/mm/page_cgroup.c 2008-11-12 11:20:33.000000000 +0900 @@ -9,6 +9,7 @@ #include <linux/vmalloc.h> #include <linux/cgroup.h> #include <linux/memcontrol.h> +#include <linux/biotrack.h> static void __meminit __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn) @@ -16,6 +17,7 @@ __init_page_cgroup(struct page_cgroup *p pc->flags = 0; pc->page = pfn_to_page(pfn); __init_mem_page_cgroup(pc); + __init_bio_page_cgroup(pc); } static unsigned long total_usage;
Hi,> Ryo Tsuruta wrote: > > Create two ioband devices "ioband1" and "ioband2". "ioband1" is mapped > > to "/dev/sda1" and has a weight of 40. "ioband2" is mapped to "/dev/sda2" > > and has a weight of 10. "ioband1" can use 80% --- 40/(40+10)*100 --- > > of the bandwidth of the physical disk "/dev/sda" while "ioband2" can use 20%. > > Just to clarify, when you say ioband1 can use 80% of the bandwidh, you > mean that is how much it will get if both io bands are loaded right? If > there is no activity on ioband2, then ioband1 will get the full disk > bandwidth right?Absolutely, you are right! Thank you, Hirokazu Takahashi.
Hi Phillip,> > Ryo Tsuruta wrote: > > > Create two ioband devices "ioband1" and "ioband2". "ioband1" is mapped > > > to "/dev/sda1" and has a weight of 40. "ioband2" is mapped to "/dev/sda2" > > > and has a weight of 10. "ioband1" can use 80% --- 40/(40+10)*100 --- > > > of the bandwidth of the physical disk "/dev/sda" while "ioband2" can use 20%. > > > > Just to clarify, when you say ioband1 can use 80% of the bandwidh, you > > mean that is how much it will get if both io bands are loaded right? If > > there is no activity on ioband2, then ioband1 will get the full disk > > bandwidth right? > > Absolutely, you are right!Here is a benchmark result of sharing bandwidth between three ioband devices. When there is no activity on ioband2, the total bandwidth of the disk is shared only between ioband1 and ioband3 according to their weights. http://people.valinux.co.jp/~ryov/dm-ioband/benchmark/partition1.html -- Ryo Tsuruta <ryov at valinux.co.jp>