vgoyal at redhat.com
2008-Nov-06 15:30 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
Hi, If you are not already tired of so many io controller implementations, here is another one. This is a very eary very crude implementation to get early feedback to see if this approach makes any sense or not. This controller is a proportional weight IO controller primarily based on/inspired by dm-ioband. One of the things I personally found little odd about dm-ioband was need of a dm-ioband device for every device we want to control. I thought that probably we can make this control per request queue and get rid of device mapper driver. This should make configuration aspect easy. I have picked up quite some amount of code from dm-ioband especially for biocgroup implementation. I have done very basic testing and that is running 2-3 dd commands in different cgroups on x86_64. Wanted to throw out the code early to get some feedback. More details about the design and how to are in documentation patch. Your comments are welcome. Thanks Vivek --
An embedded and charset-unspecified text was scrubbed... Name: bio-group-documentation.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20081106/eeed9e93/attachment.txt
vgoyal at redhat.com
2008-Nov-06 15:30 UTC
[patch 2/4] io controller: biocgroup implementation
An embedded and charset-unspecified text was scrubbed... Name: bio-cgroup-implementation Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20081106/a8691617/attachment.txt
vgoyal at redhat.com
2008-Nov-06 15:30 UTC
[patch 3/4] io controller: Core IO controller implementation logic
An embedded and charset-unspecified text was scrubbed... Name: bio-group-core-implementation.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20081106/6a881d79/attachment.txt
vgoyal at redhat.com
2008-Nov-06 15:30 UTC
[patch 4/4] io controller: Put IO controller to use in device mapper and standard make_request() function
An embedded and charset-unspecified text was scrubbed... Name: bio-cgroup-tweak-make-request-functions.patch Url: http://lists.linux-foundation.org/pipermail/virtualization/attachments/20081106/19672863/attachment.txt
Peter Zijlstra
2008-Nov-06 15:49 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
On Thu, 2008-11-06 at 10:30 -0500, vgoyal at redhat.com wrote:> Hi, > > If you are not already tired of so many io controller implementations, here > is another one. > > This is a very eary very crude implementation to get early feedback to see > if this approach makes any sense or not. > > This controller is a proportional weight IO controller primarily > based on/inspired by dm-ioband. One of the things I personally found little > odd about dm-ioband was need of a dm-ioband device for every device we want > to control. I thought that probably we can make this control per request > queue and get rid of device mapper driver. This should make configuration > aspect easy. > > I have picked up quite some amount of code from dm-ioband especially for > biocgroup implementation. > > I have done very basic testing and that is running 2-3 dd commands in different > cgroups on x86_64. Wanted to throw out the code early to get some feedback. > > More details about the design and how to are in documentation patch. > > Your comments are welcome.please include QUILT_REFRESH_ARGS="--diffstat --strip-trailing-whitespace" in your environment or .quiltrc I would expect all those bio* files to be placed in block/ not mm/ Does this still require I use dm, or does it also work on regular block devices? Patch 4/4 isn't quite clear on this.
Peter Zijlstra
2008-Nov-06 17:11 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
On Thu, 2008-11-06 at 11:57 -0500, Rik van Riel wrote:> Peter Zijlstra wrote: > > > The only real issue I can see is with linear volumes, but those are > > stupid anyway - non of the gains but all the risks. > > Linear volumes may well be the most common ones. > > People start out with the filesystems at a certain size, > increasing onto a second (new) disk later, when more space > is required.Are they aware of how risky linear volumes are? I would discourage anyone from using them.
Dave Chinner
2008-Nov-07 00:41 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
On Thu, Nov 06, 2008 at 06:11:27PM +0100, Peter Zijlstra wrote:> On Thu, 2008-11-06 at 11:57 -0500, Rik van Riel wrote: > > Peter Zijlstra wrote: > > > > > The only real issue I can see is with linear volumes, but those are > > > stupid anyway - non of the gains but all the risks. > > > > Linear volumes may well be the most common ones. > > > > People start out with the filesystems at a certain size, > > increasing onto a second (new) disk later, when more space > > is required. > > Are they aware of how risky linear volumes are? I would discourage > anyone from using them.In what way are they risky? Cheers, Dave. -- Dave Chinner david at fromorbit.com
Gui Jianfeng
2008-Nov-07 02:36 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
vgoyal at redhat.com wrote:> Hi, > > If you are not already tired of so many io controller implementations, here > is another one. > > This is a very eary very crude implementation to get early feedback to see > if this approach makes any sense or not. > > This controller is a proportional weight IO controller primarily > based on/inspired by dm-ioband. One of the things I personally found little > odd about dm-ioband was need of a dm-ioband device for every device we want > to control. I thought that probably we can make this control per request > queue and get rid of device mapper driver. This should make configuration > aspect easy. > > I have picked up quite some amount of code from dm-ioband especially for > biocgroup implementation. > > I have done very basic testing and that is running 2-3 dd commands in different > cgroups on x86_64. Wanted to throw out the code early to get some feedback. > > More details about the design and how to are in documentation patch. > > Your comments are welcome.Which kernel version is this patch set based on?> > Thanks > Vivek >-- Regards Gui Jianfeng
KAMEZAWA Hiroyuki
2008-Nov-07 02:50 UTC
[patch 2/4] io controller: biocgroup implementation
On Thu, 06 Nov 2008 10:30:24 -0500 vgoyal at redhat.com wrote:> > o biocgroup functionality. > o Implemented new controller "bio" > o Most of it picked from dm-ioband biocgroup implementation patches. >page_cgroup implementation is changed and most of this patch needs rework. please see the latest one. (I think most of new characteristics are useful for you.) One comment from me is => +struct page_cgroup {> + struct list_head lru; /* per cgroup LRU list */ > + struct page *page; > + struct mem_cgroup *mem_cgroup; > + int flags; > +#ifdef CONFIG_CGROUP_BIO > + struct list_head blist; /* for bio_cgroup page list */ > + struct bio_cgroup *bio_cgroup; > +#endif > +};= this blist is too bad. please keep this object small... Maybe dm-ioband people will post his own new one. just making use of it is an idea. Thanks, -Kame
Peter Zijlstra
2008-Nov-07 10:31 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
On Fri, 2008-11-07 at 11:41 +1100, Dave Chinner wrote:> On Thu, Nov 06, 2008 at 06:11:27PM +0100, Peter Zijlstra wrote: > > On Thu, 2008-11-06 at 11:57 -0500, Rik van Riel wrote: > > > Peter Zijlstra wrote: > > > > > > > The only real issue I can see is with linear volumes, but those are > > > > stupid anyway - non of the gains but all the risks. > > > > > > Linear volumes may well be the most common ones. > > > > > > People start out with the filesystems at a certain size, > > > increasing onto a second (new) disk later, when more space > > > is required. > > > > Are they aware of how risky linear volumes are? I would discourage > > anyone from using them. > > In what way are they risky?You loose all your data when one disk dies, so your mtbf decreases with the number of disks in your linear span. And you get non of the benefits from having multiple disks, like extra speed from striping, or redundancy from raid. Therefore I say that linear volumes are the absolute worst choice.
Vivek Goyal
2008-Nov-07 13:38 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
On Fri, Nov 07, 2008 at 10:36:50AM +0800, Gui Jianfeng wrote:> vgoyal at redhat.com wrote: > > Hi, > > > > If you are not already tired of so many io controller implementations, here > > is another one. > > > > This is a very eary very crude implementation to get early feedback to see > > if this approach makes any sense or not. > > > > This controller is a proportional weight IO controller primarily > > based on/inspired by dm-ioband. One of the things I personally found little > > odd about dm-ioband was need of a dm-ioband device for every device we want > > to control. I thought that probably we can make this control per request > > queue and get rid of device mapper driver. This should make configuration > > aspect easy. > > > > I have picked up quite some amount of code from dm-ioband especially for > > biocgroup implementation. > > > > I have done very basic testing and that is running 2-3 dd commands in different > > cgroups on x86_64. Wanted to throw out the code early to get some feedback. > > > > More details about the design and how to are in documentation patch. > > > > Your comments are welcome. > > Which kernel version is this patch set based on? >2.6.27 Thanks Vivek
On Fri, Nov 07, 2008 at 11:50:30AM +0900, KAMEZAWA Hiroyuki wrote:> On Thu, 06 Nov 2008 10:30:24 -0500 > vgoyal at redhat.com wrote: > > > > > o biocgroup functionality. > > o Implemented new controller "bio" > > o Most of it picked from dm-ioband biocgroup implementation patches. > > > page_cgroup implementation is changed and most of this patch needs rework. > please see the latest one. (I think most of new characteristics are useful > for you.) >Sure I will have a look.> One comment from me is > => > +struct page_cgroup { > > + struct list_head lru; /* per cgroup LRU list */ > > + struct page *page; > > + struct mem_cgroup *mem_cgroup; > > + int flags; > > +#ifdef CONFIG_CGROUP_BIO > > + struct list_head blist; /* for bio_cgroup page list */ > > + struct bio_cgroup *bio_cgroup; > > +#endif > > +}; > => > this blist is too bad. please keep this object small... >This is just another connecting element so that page_cgroup can be on another list also. It is useful in making sure that IO on all the pages of a bio group has completed beofer that bio cgroup is deleted.> Maybe dm-ioband people will post his own new one. just making use of it is an idea.Sure, I will have a look when dm-ioband people post new version of patch and how they have optimized it further. Thanks Vivek
Gui Jianfeng
2008-Nov-11 08:50 UTC
[patch 3/4] io controller: Core IO controller implementation logic
vgoyal at redhat.com wrote: Hi vivek, I think bio_group_controller() need to be exported by EXPORT_SYMBOL() -- Regards Gui Jianfeng
Ryo Tsuruta
2008-Nov-13 09:05 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
Hi, From: vgoyal at redhat.com Subject: [patch 0/4] [RFC] Another proportional weight IO controller Date: Thu, 06 Nov 2008 10:30:22 -0500> Hi, > > If you are not already tired of so many io controller implementations, here > is another one. > > This is a very eary very crude implementation to get early feedback to see > if this approach makes any sense or not. > > This controller is a proportional weight IO controller primarily > based on/inspired by dm-ioband. One of the things I personally found little > odd about dm-ioband was need of a dm-ioband device for every device we want > to control. I thought that probably we can make this control per request > queue and get rid of device mapper driver. This should make configuration > aspect easy. > > I have picked up quite some amount of code from dm-ioband especially for > biocgroup implementation. > > I have done very basic testing and that is running 2-3 dd commands in different > cgroups on x86_64. Wanted to throw out the code early to get some feedback. > > More details about the design and how to are in documentation patch. > > Your comments are welcome.Do you have any benchmark results? I'm especially interested in the followings: - Comparison of disk performance with and without the I/O controller patch. - Put uneven I/O loads. Processes, which belong to a cgroup which is given a smaller weight than another cgroup, put heavier I/O load like the following. echo 1024 > /cgroup/bio/test1/bio.shares echo 8192 > /cgroup/bio/test2/bio.shares echo $$ > /cgroup/bio/test1/tasks dd if=/somefile1-1 of=/dev/null & dd if=/somefile1-2 of=/dev/null & ... dd if=/somefile1-100 of=/dev/null echo $$ > /cgroup/bio/test2/tasks dd if=/somefile2-1 of=/dev/null & dd if=/somefile2-2 of=/dev/null & ... dd if=/somefile2-10 of=/dev/null & Thanks, Ryo Tsuruta
Nauman Rafique
2008-Nov-26 19:41 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
On Wed, Nov 26, 2008 at 6:06 AM, Paolo Valente <paolo.valente at unimore.it> wrote:> Fabio and I are a little bit worried about the fact that the problem > of working in the time domain instead of the service domain is not > being properly dealt with. Probably we did not express ourselves very > clearly, so we will try to put in more practical terms. Using B-WF2Q+ > in the time domain instead of using CFQ (Round-Robin) means introducing > higher complexity than CFQ to get almost the same service properties > of CFQ. With regard to fairness (long term) B-WF2Q+ in the time domainAre we talking about a case where all the contenders have equal weights and are continuously backlogged? That seems to be the only case when B-WF2Q+ would behave like Round-Robin. Am I missing something here? I can see that the only direct advantage of using WF2Q+ scheduling is reduced jitter or latency in certain cases. But under heavy loads, that might result in request latencies seen by RT threads to be reduced from a few seconds to a few msec.> has exactly the same (un)fairness problems of CFQ. As far as bandwidth > differentiation is concerned, it can be obtained with CFQ by just > increasing the time slice (e.g., double weight => double slice). This > has no impact on long term guarantees and certainly does not decrease > the throughput. > > With regard to short term guarantees (request completion time), one of > the properties of the reference ideal system of Wf2Q+ is that, assuming > for simplicity that all the queues have the same weight, as the ideal > system serves each queue at the same speed, shorter budgets are completed > in a shorter time intervals than longer budgets. B-WF2Q+ guarantees > O(1) deviation from this ideal service. Hence, the tight delay/jitter > measured in our experiments with BFQ is a consequence of the simple (and > probably still improvable) budget assignment mechanism of (the overall) > BFQ. In contrast, if all the budgets are equal, as it happens if we use > time slices, the resulting scheduler is exactly a Round-Robin, again > as in CFQ (see [1]).Can the budget assignment mechanism of BFQ be converted to time slice assignment mechanism? What I am trying to say here is that we can have variable time slices, just like we have variable budgets.> > Finally, with regard to completion time delay differentiation through > weight differentiation, this is probably the only case in which B-WF2Q+ > would perform better than CFQ, because, in case of CFQ, reducing the > time slices may reduce the throughput, whereas increasing the time slice > would increase the worst-case delay/jitter. > > In the end, BFQ succeeds in guaranteeing fairness (or in general the > desired bandwidth distribution) because it works in the service domain > (and this is probably the only way to achieve this goal), not because > it uses WF2Q+ instead of Round-Robin. Similarly, it provides tight > delay/jitter only because B-WF2Q+ is used in combination with a simple > budget assignment (differentiation) mechanism (again in the service > domain). > > [1] http://feanor.sssup.it/~fabio/linux/bfq/results.php > > -- > ----------------------------------------------------------- > | Paolo Valente | | > | Algogroup | | > | Dip. Ing. Informazione | tel: +39 059 2056318 | > | Via Vignolese 905/b | fax: +39 059 2056199 | > | 41100 Modena | | > | home: http://algo.ing.unimo.it/people/paolo/ | > ----------------------------------------------------------- > >
Fabio Checconi
2008-Nov-26 22:21 UTC
[patch 0/4] [RFC] Another proportional weight IO controller
> From: Nauman Rafique <nauman at google.com> > Date: Wed, Nov 26, 2008 11:41:46AM -0800 > > On Wed, Nov 26, 2008 at 6:06 AM, Paolo Valente <paolo.valente at unimore.it> wrote: > > Fabio and I are a little bit worried about the fact that the problem > > of working in the time domain instead of the service domain is not > > being properly dealt with. Probably we did not express ourselves very > > clearly, so we will try to put in more practical terms. Using B-WF2Q+ > > in the time domain instead of using CFQ (Round-Robin) means introducing > > higher complexity than CFQ to get almost the same service properties > > of CFQ. With regard to fairness (long term) B-WF2Q+ in the time domain > > Are we talking about a case where all the contenders have equal > weights and are continuously backlogged? That seems to be the only > case when B-WF2Q+ would behave like Round-Robin. Am I missing > something here? >It is the case with equal weights, but it is really a common one.> I can see that the only direct advantage of using WF2Q+ scheduling is > reduced jitter or latency in certain cases. But under heavy loads, > that might result in request latencies seen by RT threads to be > reduced from a few seconds to a few msec. > > > has exactly the same (un)fairness problems of CFQ. As far as bandwidth > > differentiation is concerned, it can be obtained with CFQ by just > > increasing the time slice (e.g., double weight => double slice). This > > has no impact on long term guarantees and certainly does not decrease > > the throughput. > > > > With regard to short term guarantees (request completion time), one of > > the properties of the reference ideal system of Wf2Q+ is that, assuming > > for simplicity that all the queues have the same weight, as the ideal > > system serves each queue at the same speed, shorter budgets are completed > > in a shorter time intervals than longer budgets. B-WF2Q+ guarantees > > O(1) deviation from this ideal service. Hence, the tight delay/jitter > > measured in our experiments with BFQ is a consequence of the simple (and > > probably still improvable) budget assignment mechanism of (the overall) > > BFQ. In contrast, if all the budgets are equal, as it happens if we use > > time slices, the resulting scheduler is exactly a Round-Robin, again > > as in CFQ (see [1]). > > Can the budget assignment mechanism of BFQ be converted to time slice > assignment mechanism? What I am trying to say here is that we can have > variable time slices, just like we have variable budgets. >Yes, it could be converted, and it would do in the time domain the same differentiation it does now in the service domain. What we would lose in the process is the fairness in the service domain. The service properties/guarantees of the resulting scheduler would _not_ be the same as the BFQ ones. Both long term and short term guarantees would be affected by the unfairness given by the different service rate experienced by the scheduled entities.> > > > Finally, with regard to completion time delay differentiation through > > weight differentiation, this is probably the only case in which B-WF2Q+ > > would perform better than CFQ, because, in case of CFQ, reducing the > > time slices may reduce the throughput, whereas increasing the time slice > > would increase the worst-case delay/jitter. > > > > In the end, BFQ succeeds in guaranteeing fairness (or in general the > > desired bandwidth distribution) because it works in the service domain > > (and this is probably the only way to achieve this goal), not because > > it uses WF2Q+ instead of Round-Robin. Similarly, it provides tight > > delay/jitter only because B-WF2Q+ is used in combination with a simple > > budget assignment (differentiation) mechanism (again in the service > > domain). > > > > [1] http://feanor.sssup.it/~fabio/linux/bfq/results.php > > > > -- > > ----------------------------------------------------------- > > | Paolo Valente | | > > | Algogroup | | > > | Dip. Ing. Informazione | tel: +39 059 2056318 | > > | Via Vignolese 905/b | fax: +39 059 2056199 | > > | 41100 Modena | | > > | home: http://algo.ing.unimo.it/people/paolo/ | > > ----------------------------------------------------------- > > > >