Hello, This are the notes about the block improvements discussed at the Hackathon, some of them, if not all, have already been incorporated into: https://docs.google.com/document/d/1Vh5T8Z3Tx3sUEhVB0DnNDKBNiqB_ZA8Z5YVqAsCIjuI/edit Here is a list of future work items, more measurable and limited: A) Separate request and response rings. This has several benefits, we will be able to reduce the size of the response struct, since it no longer has to be the same size of the request. Also, we could increase the number of in-flight requests, since we are no longer limited by the size of the request ring. We still need to make sure that all in-flight requests can be written to the response ring once they are finished, or added to a queue that writes them to the response ring when there''s a free slot. B) Clean up the size differences between 32/64bit structs, also while there reduce the size of a request so it is aligned to the cache (64bits). C) Investigate the interrupt rate between blkfront/blkback and if needed add support for polling, switching between polling or events could be done automatically by blkfront/blkback when a high interrupt rate is detected. D) Multipage ring support.
Hello, While working on further block improvements I''ve found an issue with persistent grants in blkfront. Persistent grants basically allocate grants and then they are never released, so both blkfront and blkback keep using the same memory pages for all the transactions. This is not a problem in blkback, because we can dynamically choose how many grants we want to map. On the other hand, blkfront cannot remove the access to those grants at any point, because blkfront doesn''t know if blkback has this grants mapped persistently or not. So if for example we start expanding the number of segments in indirect requests, to a value like 512 segments per requests, blkfront will probably try to persistently map 512*32+512 = 16896 grants per device, that''s much more grants that the current default, which is 32*256 = 8192 (if using grant tables v2). This can cause serious problems to other interfaces inside the DomU, since blkfront basically starts hoarding all possible grants, leaving other interfaces completely locked. I''ve been thinking about different ways to solve this, but so far I haven''t been able to found a nice solution: 1. Limit the number of persistent grants a blkfront instance can use, let''s say that only the first X used grants will be persistently mapped by both blkfront and blkback, and if more grants are needed the previous map/unmap will be used. 2. Switch to grant copy in blkback, and get rid of persistent grants (I have not benchmarked this solution, but I''m quite sure it will involve a performance regression, specially when scaling to a high number of domains). 3. Increase the size of the grant_table or the size of a single grant (from 4k to 2M) (this is from Stefano Stabellini). 4. Introduce a new request type that we can use to request blkback to unmap certain grefs so we can free them in blkfront. So far none of them looks like a suitable solution.
On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:> Hello, > > While working on further block improvements I''ve found an issue with > persistent grants in blkfront. > > Persistent grants basically allocate grants and then they are never > released, so both blkfront and blkback keep using the same memory pages > for all the transactions. > > This is not a problem in blkback, because we can dynamically choose how > many grants we want to map. On the other hand, blkfront cannot remove > the access to those grants at any point, because blkfront doesn''t know > if blkback has this grants mapped persistently or not. > > So if for example we start expanding the number of segments in indirect > requests, to a value like 512 segments per requests, blkfront will > probably try to persistently map 512*32+512 = 16896 grants per device, > that''s much more grants that the current default, which is 32*256 = 8192 > (if using grant tables v2). This can cause serious problems to other > interfaces inside the DomU, since blkfront basically starts hoarding all > possible grants, leaving other interfaces completely locked.Yikes.> I''ve been thinking about different ways to solve this, but so far I > haven''t been able to found a nice solution: > > 1. Limit the number of persistent grants a blkfront instance can use, > let''s say that only the first X used grants will be persistently mapped > by both blkfront and blkback, and if more grants are needed the previous > map/unmap will be used.I''m not thrilled with this option. It would likely introduce some significant performance variability, wouldn''t it?> 2. Switch to grant copy in blkback, and get rid of persistent grants (I > have not benchmarked this solution, but I''m quite sure it will involve a > performance regression, specially when scaling to a high number of domains).Why do you think so?> 3. Increase the size of the grant_table or the size of a single grant > (from 4k to 2M) (this is from Stefano Stabellini).Seems like a bit of a bigger hammer approach.> 4. Introduce a new request type that we can use to request blkback to > unmap certain grefs so we can free them in blkfront.Sounds complex.> So far none of them looks like a suitable solution.I agree. Of these, I think #2 is worth a little more attention. --msw
On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:> Hello, > > While working on further block improvements I''ve found an issue with > persistent grants in blkfront. > > Persistent grants basically allocate grants and then they are never > released, so both blkfront and blkback keep using the same memory pages > for all the transactions. > > This is not a problem in blkback, because we can dynamically choose how > many grants we want to map. On the other hand, blkfront cannot remove > the access to those grants at any point, because blkfront doesn''t know > if blkback has this grants mapped persistently or not. > > So if for example we start expanding the number of segments in indirect > requests, to a value like 512 segments per requests, blkfront will > probably try to persistently map 512*32+512 = 16896 grants per device, > that''s much more grants that the current default, which is 32*256 = 8192 > (if using grant tables v2). This can cause serious problems to other > interfaces inside the DomU, since blkfront basically starts hoarding all > possible grants, leaving other interfaces completely locked. > > I''ve been thinking about different ways to solve this, but so far I > haven''t been able to found a nice solution: > > 1. Limit the number of persistent grants a blkfront instance can use, > let''s say that only the first X used grants will be persistently mapped > by both blkfront and blkback, and if more grants are needed the previous > map/unmap will be used. > > 2. Switch to grant copy in blkback, and get rid of persistent grants (I > have not benchmarked this solution, but I''m quite sure it will involve a > performance regression, specially when scaling to a high number of domains). > > 3. Increase the size of the grant_table or the size of a single grant > (from 4k to 2M) (this is from Stefano Stabellini). > > 4. Introduce a new request type that we can use to request blkback to > unmap certain grefs so we can free them in blkfront.5). Lift the limit of grant pages a domain can have. 6). Have an outstanding of grant pools that are mapped to a guest and recycle them? That way both netfront and blkfront could use them as needed?> > So far none of them looks like a suitable solution. >
On Fri, Jun 21, 2013 at 04:16:25PM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > > Hello, > > > > While working on further block improvements I''ve found an issue with > > persistent grants in blkfront. > > > > Persistent grants basically allocate grants and then they are never > > released, so both blkfront and blkback keep using the same memory pages > > for all the transactions. > > > > This is not a problem in blkback, because we can dynamically choose how > > many grants we want to map. On the other hand, blkfront cannot remove > > the access to those grants at any point, because blkfront doesn''t know > > if blkback has this grants mapped persistently or not. > > > > So if for example we start expanding the number of segments in indirect > > requests, to a value like 512 segments per requests, blkfront will > > probably try to persistently map 512*32+512 = 16896 grants per device, > > that''s much more grants that the current default, which is 32*256 = 8192 > > (if using grant tables v2). This can cause serious problems to other > > interfaces inside the DomU, since blkfront basically starts hoarding all > > possible grants, leaving other interfaces completely locked. > > > > I''ve been thinking about different ways to solve this, but so far I > > haven''t been able to found a nice solution: > > > > 1. Limit the number of persistent grants a blkfront instance can use, > > let''s say that only the first X used grants will be persistently mapped > > by both blkfront and blkback, and if more grants are needed the previous > > map/unmap will be used. > > > > 2. Switch to grant copy in blkback, and get rid of persistent grants (I > > have not benchmarked this solution, but I''m quite sure it will involve a > > performance regression, specially when scaling to a high number of domains). > >Any chance that the speed of copying is fast enough for block devices?> > 3. Increase the size of the grant_table or the size of a single grant > > (from 4k to 2M) (this is from Stefano Stabellini). > > > > 4. Introduce a new request type that we can use to request blkback to > > unmap certain grefs so we can free them in blkfront. > > > 5). Lift the limit of grant pages a domain can have.If I''m not mistaken, this is basically the same as "increase the size of the grant_table" in #3.> > 6). Have an outstanding of grant pools that are mapped to a guest and > recycle them? That way both netfront and blkfront could use them as needed? >Is there an easy way to instrument the network stack to use those pages only? Wei.> > > > So far none of them looks like a suitable solution. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 21/06/13 20:07, Matt Wilson wrote:> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >> Hello, >> >> While working on further block improvements I''ve found an issue with >> persistent grants in blkfront. >> >> Persistent grants basically allocate grants and then they are never >> released, so both blkfront and blkback keep using the same memory pages >> for all the transactions. >> >> This is not a problem in blkback, because we can dynamically choose how >> many grants we want to map. On the other hand, blkfront cannot remove >> the access to those grants at any point, because blkfront doesn''t know >> if blkback has this grants mapped persistently or not. >> >> So if for example we start expanding the number of segments in indirect >> requests, to a value like 512 segments per requests, blkfront will >> probably try to persistently map 512*32+512 = 16896 grants per device, >> that''s much more grants that the current default, which is 32*256 = 8192 >> (if using grant tables v2). This can cause serious problems to other >> interfaces inside the DomU, since blkfront basically starts hoarding all >> possible grants, leaving other interfaces completely locked. > > Yikes. > >> I''ve been thinking about different ways to solve this, but so far I >> haven''t been able to found a nice solution: >> >> 1. Limit the number of persistent grants a blkfront instance can use, >> let''s say that only the first X used grants will be persistently mapped >> by both blkfront and blkback, and if more grants are needed the previous >> map/unmap will be used. > > I''m not thrilled with this option. It would likely introduce some > significant performance variability, wouldn''t it?Probably, and also it will be hard to distribute the number of available grant across the different interfaces in a performance sensible way, specially given the fact that once a grant is assigned to a interface it cannot be returned back to the pool of grants. So if we had two interfaces with very different usage (one very busy and another one almost idle), and equally distribute the grants amongst them, one will have a lot of unused grants while the other will suffer from starvation.> >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I >> have not benchmarked this solution, but I''m quite sure it will involve a >> performance regression, specially when scaling to a high number of domains). > > Why do you think so?First because grant_copy is done by the hypervisor, while when using persistent grants the copy is done by the guest. Also, grant_copy takes the grant lock, so when scaling to a large number of domains there''s going to be contention around this lock. Persistent grants don''t need any shared lock, and thus scale better.> >> 3. Increase the size of the grant_table or the size of a single grant >> (from 4k to 2M) (this is from Stefano Stabellini). > > Seems like a bit of a bigger hammer approach. > >> 4. Introduce a new request type that we can use to request blkback to >> unmap certain grefs so we can free them in blkfront. > > Sounds complex. > >> So far none of them looks like a suitable solution. > > I agree. Of these, I think #2 is worth a little more attention. > > --msw >
On 21/06/13 22:16, Konrad Rzeszutek Wilk wrote:> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >> Hello, >> >> While working on further block improvements I''ve found an issue with >> persistent grants in blkfront. >> >> Persistent grants basically allocate grants and then they are never >> released, so both blkfront and blkback keep using the same memory pages >> for all the transactions. >> >> This is not a problem in blkback, because we can dynamically choose how >> many grants we want to map. On the other hand, blkfront cannot remove >> the access to those grants at any point, because blkfront doesn''t know >> if blkback has this grants mapped persistently or not. >> >> So if for example we start expanding the number of segments in indirect >> requests, to a value like 512 segments per requests, blkfront will >> probably try to persistently map 512*32+512 = 16896 grants per device, >> that''s much more grants that the current default, which is 32*256 = 8192 >> (if using grant tables v2). This can cause serious problems to other >> interfaces inside the DomU, since blkfront basically starts hoarding all >> possible grants, leaving other interfaces completely locked. >> >> I''ve been thinking about different ways to solve this, but so far I >> haven''t been able to found a nice solution: >> >> 1. Limit the number of persistent grants a blkfront instance can use, >> let''s say that only the first X used grants will be persistently mapped >> by both blkfront and blkback, and if more grants are needed the previous >> map/unmap will be used. >> >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I >> have not benchmarked this solution, but I''m quite sure it will involve a >> performance regression, specially when scaling to a high number of domains). >> >> 3. Increase the size of the grant_table or the size of a single grant >> (from 4k to 2M) (this is from Stefano Stabellini). >> >> 4. Introduce a new request type that we can use to request blkback to >> unmap certain grefs so we can free them in blkfront. > > > 5). Lift the limit of grant pages a domain can have. > > 6). Have an outstanding of grant pools that are mapped to a guest and > recycle them? That way both netfront and blkfront could use them as needed?If all the backends run in the same guest that could be a viable option, but if we have backends running in different domains we will end up with several different pools for each backend domain, and thus the scenario is going to be quite similar to what we have now (a pool can hoard all available grants and leave the others starving).
On Sat, 22 Jun 2013, Wei Liu wrote:> On Fri, Jun 21, 2013 at 04:16:25PM -0400, Konrad Rzeszutek Wilk wrote: > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > > > Hello, > > > > > > While working on further block improvements I''ve found an issue with > > > persistent grants in blkfront. > > > > > > Persistent grants basically allocate grants and then they are never > > > released, so both blkfront and blkback keep using the same memory pages > > > for all the transactions. > > > > > > This is not a problem in blkback, because we can dynamically choose how > > > many grants we want to map. On the other hand, blkfront cannot remove > > > the access to those grants at any point, because blkfront doesn''t know > > > if blkback has this grants mapped persistently or not. > > > > > > So if for example we start expanding the number of segments in indirect > > > requests, to a value like 512 segments per requests, blkfront will > > > probably try to persistently map 512*32+512 = 16896 grants per device, > > > that''s much more grants that the current default, which is 32*256 = 8192 > > > (if using grant tables v2). This can cause serious problems to other > > > interfaces inside the DomU, since blkfront basically starts hoarding all > > > possible grants, leaving other interfaces completely locked. > > > > > > I''ve been thinking about different ways to solve this, but so far I > > > haven''t been able to found a nice solution: > > > > > > 1. Limit the number of persistent grants a blkfront instance can use, > > > let''s say that only the first X used grants will be persistently mapped > > > by both blkfront and blkback, and if more grants are needed the previous > > > map/unmap will be used. > > > > > > 2. Switch to grant copy in blkback, and get rid of persistent grants (I > > > have not benchmarked this solution, but I''m quite sure it will involve a > > > performance regression, specially when scaling to a high number of domains). > > > > > Any chance that the speed of copying is fast enough for block devices? > > > > 3. Increase the size of the grant_table or the size of a single grant > > > (from 4k to 2M) (this is from Stefano Stabellini). > > > > > > 4. Introduce a new request type that we can use to request blkback to > > > unmap certain grefs so we can free them in blkfront. > > > > > > 5). Lift the limit of grant pages a domain can have. > > If I''m not mistaken, this is basically the same as "increase the size of > the grant_table" in #3.Yes, that was one of the things I was suggesting, but it needs investigating: I wouldn''t want that increasing the number of grant frames would reach a different scalability limit of the data structure. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Sat, Jun 22, 2013 at 09:11:20AM +0200, Roger Pau Monné wrote:> On 21/06/13 20:07, Matt Wilson wrote: > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote:[...]> >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I > >> have not benchmarked this solution, but I''m quite sure it will involve a > >> performance regression, specially when scaling to a high number of domains). > > > > Why do you think so? > > First because grant_copy is done by the hypervisor, while when using > persistent grants the copy is done by the guest. Also, grant_copy takes > the grant lock, so when scaling to a large number of domains there''s > going to be contention around this lock. Persistent grants don''t need > any shared lock, and thus scale better.It''d benefit xen-netback to make the locking in the copy path more fine grained. That would help multi-vif domUs today, and multi-queue vifs later on. Thoughts? --msw
On Mon, Jun 24, 2013 at 11:09:19PM -0700, Matt Wilson wrote:> On Sat, Jun 22, 2013 at 09:11:20AM +0200, Roger Pau Monné wrote: > > On 21/06/13 20:07, Matt Wilson wrote: > > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > > [...] > > > >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I > > >> have not benchmarked this solution, but I''m quite sure it will involve a > > >> performance regression, specially when scaling to a high number of domains). > > > > > > Why do you think so? > > > > First because grant_copy is done by the hypervisor, while when using > > persistent grants the copy is done by the guest. Also, grant_copy takes > > the grant lock, so when scaling to a large number of domains there''s > > going to be contention around this lock. Persistent grants don''t need > > any shared lock, and thus scale better. > > It''d benefit xen-netback to make the locking in the copy path more > fine grained. That would help multi-vif domUs today, and multi-queue > vifs later on. >I''m not sure I follow. I presume you mean using persistent grant in xen-netback to help scale better? Wei.> Thoughts? > > --msw > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On Tue, Jun 25, 2013 at 02:01:30PM +0100, Wei Liu wrote:> On Mon, Jun 24, 2013 at 11:09:19PM -0700, Matt Wilson wrote: > > On Sat, Jun 22, 2013 at 09:11:20AM +0200, Roger Pau Monné wrote: > > > On 21/06/13 20:07, Matt Wilson wrote: > > > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > > > > [...] > > > > > >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I > > > >> have not benchmarked this solution, but I''m quite sure it will involve a > > > >> performance regression, specially when scaling to a high number of domains). > > > > > > > > Why do you think so? > > > > > > First because grant_copy is done by the hypervisor, while when using > > > persistent grants the copy is done by the guest. Also, grant_copy takes > > > the grant lock, so when scaling to a large number of domains there''s > > > going to be contention around this lock. Persistent grants don''t need > > > any shared lock, and thus scale better. > > > > It''d benefit xen-netback to make the locking in the copy path more > > fine grained. That would help multi-vif domUs today, and multi-queue > > vifs later on. > > > > I''m not sure I follow. I presume you mean using persistent grant in > xen-netback to help scale better?No, I mean further scaling improvements in the GNTTABOP_copy path would benefit xen-netback performance when a single guest has multiple vifs, and will be needed for good multi-queue performance. Given we might need to do some work there, would it make sense to change blkback to use GNTTABOP_copy to avoid the problem he''s identified with persistent grants. --msw
On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:> On 21/06/13 20:07, Matt Wilson wrote: > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > >> Hello, > >> > >> While working on further block improvements I've found an issue with > >> persistent grants in blkfront. > >> > >> Persistent grants basically allocate grants and then they are never > >> released, so both blkfront and blkback keep using the same memory pages > >> for all the transactions. > >> > >> This is not a problem in blkback, because we can dynamically choose how > >> many grants we want to map. On the other hand, blkfront cannot remove > >> the access to those grants at any point, because blkfront doesn't know > >> if blkback has this grants mapped persistently or not. > >> > >> So if for example we start expanding the number of segments in indirect > >> requests, to a value like 512 segments per requests, blkfront will > >> probably try to persistently map 512*32+512 = 16896 grants per device, > >> that's much more grants that the current default, which is 32*256 = 8192 > >> (if using grant tables v2). This can cause serious problems to other > >> interfaces inside the DomU, since blkfront basically starts hoarding all > >> possible grants, leaving other interfaces completely locked. > > > > Yikes. > > > >> I've been thinking about different ways to solve this, but so far I > >> haven't been able to found a nice solution: > >> > >> 1. Limit the number of persistent grants a blkfront instance can use, > >> let's say that only the first X used grants will be persistently mapped > >> by both blkfront and blkback, and if more grants are needed the previous > >> map/unmap will be used. > > > > I'm not thrilled with this option. It would likely introduce some > > significant performance variability, wouldn't it? > > Probably, and also it will be hard to distribute the number of available > grant across the different interfaces in a performance sensible way, > specially given the fact that once a grant is assigned to a interface it > cannot be returned back to the pool of grants. > > So if we had two interfaces with very different usage (one very busy and > another one almost idle), and equally distribute the grants amongst > them, one will have a lot of unused grants while the other will suffer > from starvation.I do think we need to implement some sort of reclaim scheme, which probably does mean a specific request (per your #4). We simply can't have a device which once upon a time had high throughput but is no mostly ideal continue to tie up all those grants. If you make the reuse of grants use an MRU scheme and reclaim the currently unused tail fairly infrequently and in large batches then the perf overhead should be minimal, I think. I also don't think I would discount the idea of using ephemeral grants to cover bursts so easily either, in fact it might fall out quite naturally from an MRU scheme? In that scheme bursting up is pretty cheap since grant map is relative inexpensive, and recovering from the burst shouldn't be too expensive if you batch it. If it turns out to be not a burst but a sustained level of I/O then the MRU scheme would mean you wouldn't be recovering them. I also think there probably needs to be some tunable per device limit on the maximum persistent grants, perhaps minimum and maximum pool sizes ties in with an MRU scheme? If nothing else it gives the admin the ability to prioritise devices. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote:> First because grant_copy is done by the hypervisor, while when using > persistent grants the copy is done by the guest.This is true and a reasonable concern.> Also, grant_copy takes > the grant lock, so when scaling to a large number of domains there's > going to be contention around this lock.Does grant copy really take the lock for the duration of the copy, preventing any other grant ops from the source and/or target domain? If true then that sounds like an area which is ripe for optimisation! However I am hopeful that you are mistaken... __acquire_grant_for_copy() takes the grant lock while it pins the entry into the active grant entry list and not for the actual duration of the copy (and likewise __release_grant_for-copy). I hope Jan can confirm this! Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 25.06.13 at 17:57, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: >> Also, grant_copy takes >> the grant lock, so when scaling to a large number of domains there's >> going to be contention around this lock. > > Does grant copy really take the lock for the duration of the copy, > preventing any other grant ops from the source and/or target domain? > > If true then that sounds like an area which is ripe for optimisation! > > However I am hopeful that you are mistaken... __acquire_grant_for_copy() > takes the grant lock while it pins the entry into the active grant entry > list and not for the actual duration of the copy (and likewise > __release_grant_for-copy). I hope Jan can confirm this!Yes, that's how I recall it works since the removal of the per-domain lock uses from those paths. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 25/06/13 17:57, Ian Campbell wrote:> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: >> First because grant_copy is done by the hypervisor, while when using >> persistent grants the copy is done by the guest. > > This is true and a reasonable concern. > >> Also, grant_copy takes >> the grant lock, so when scaling to a large number of domains there's >> going to be contention around this lock. > > Does grant copy really take the lock for the duration of the copy, > preventing any other grant ops from the source and/or target domain? > > If true then that sounds like an area which is ripe for optimisation! > > However I am hopeful that you are mistaken... __acquire_grant_for_copy() > takes the grant lock while it pins the entry into the active grant entry > list and not for the actual duration of the copy (and likewise > __release_grant_for-copy). I hope Jan can confirm this!Sorry, probably I haven't detailed enough here. I didn't want to mean it takes the lock for the duration of the whole copy, but it is used at some places during the grant copy operation so it might introduce contention when the number of domains is high (although I have not measured it). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, 25 Jun 2013, Ian Campbell wrote:> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: > > On 21/06/13 20:07, Matt Wilson wrote: > > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > > >> Hello, > > >> > > >> While working on further block improvements I''ve found an issue with > > >> persistent grants in blkfront. > > >> > > >> Persistent grants basically allocate grants and then they are never > > >> released, so both blkfront and blkback keep using the same memory pages > > >> for all the transactions. > > >> > > >> This is not a problem in blkback, because we can dynamically choose how > > >> many grants we want to map. On the other hand, blkfront cannot remove > > >> the access to those grants at any point, because blkfront doesn''t know > > >> if blkback has this grants mapped persistently or not. > > >> > > >> So if for example we start expanding the number of segments in indirect > > >> requests, to a value like 512 segments per requests, blkfront will > > >> probably try to persistently map 512*32+512 = 16896 grants per device, > > >> that''s much more grants that the current default, which is 32*256 = 8192 > > >> (if using grant tables v2). This can cause serious problems to other > > >> interfaces inside the DomU, since blkfront basically starts hoarding all > > >> possible grants, leaving other interfaces completely locked. > > > > > > Yikes. > > > > > >> I''ve been thinking about different ways to solve this, but so far I > > >> haven''t been able to found a nice solution: > > >> > > >> 1. Limit the number of persistent grants a blkfront instance can use, > > >> let''s say that only the first X used grants will be persistently mapped > > >> by both blkfront and blkback, and if more grants are needed the previous > > >> map/unmap will be used. > > > > > > I''m not thrilled with this option. It would likely introduce some > > > significant performance variability, wouldn''t it? > > > > Probably, and also it will be hard to distribute the number of available > > grant across the different interfaces in a performance sensible way, > > specially given the fact that once a grant is assigned to a interface it > > cannot be returned back to the pool of grants. > > > > So if we had two interfaces with very different usage (one very busy and > > another one almost idle), and equally distribute the grants amongst > > them, one will have a lot of unused grants while the other will suffer > > from starvation. > > I do think we need to implement some sort of reclaim scheme, which > probably does mean a specific request (per your #4). We simply can''t > have a device which once upon a time had high throughput but is no > mostly ideal continue to tie up all those grants. > > If you make the reuse of grants use an MRU scheme and reclaim the > currently unused tail fairly infrequently and in large batches then the > perf overhead should be minimal, I think. > > I also don''t think I would discount the idea of using ephemeral grants > to cover bursts so easily either, in fact it might fall out quite > naturally from an MRU scheme? In that scheme bursting up is pretty cheap > since grant map is relative inexpensive, and recovering from the burst > shouldn''t be too expensive if you batch it. If it turns out to be not a > burst but a sustained level of I/O then the MRU scheme would mean you > wouldn''t be recovering them. > > I also think there probably needs to be some tunable per device limit on > the maximum persistent grants, perhaps minimum and maximum pool sizes > ties in with an MRU scheme? If nothing else it gives the admin the > ability to prioritise devices.If we introduce a reclaim call we have to be careful not to fall back to a map/unmap scheme like we had before. The way I see it either these additional grants are useful or not. In the first case we could just limit the maximum amount of persistent grants and be done with it. If they are not useful (they have been allocated for one very large request and not used much after that), could we find a way to identify unusually large requests and avoid using persistent grants for those? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> On Tue, 25 Jun 2013, Ian Campbell wrote: >> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: >> > On 21/06/13 20:07, Matt Wilson wrote: >> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >> > >> Hello, >> > >> >> > >> While working on further block improvements I''ve found an issue with >> > >> persistent grants in blkfront. >> > >> >> > >> Persistent grants basically allocate grants and then they are never >> > >> released, so both blkfront and blkback keep using the same memory pages >> > >> for all the transactions. >> > >> >> > >> This is not a problem in blkback, because we can dynamically choose how >> > >> many grants we want to map. On the other hand, blkfront cannot remove >> > >> the access to those grants at any point, because blkfront doesn''t know >> > >> if blkback has this grants mapped persistently or not. >> > >> >> > >> So if for example we start expanding the number of segments in indirect >> > >> requests, to a value like 512 segments per requests, blkfront will >> > >> probably try to persistently map 512*32+512 = 16896 grants per device, >> > >> that''s much more grants that the current default, which is 32*256 = 8192 >> > >> (if using grant tables v2). This can cause serious problems to other >> > >> interfaces inside the DomU, since blkfront basically starts hoarding all >> > >> possible grants, leaving other interfaces completely locked. >> > > >> > > Yikes. >> > > >> > >> I''ve been thinking about different ways to solve this, but so far I >> > >> haven''t been able to found a nice solution: >> > >> >> > >> 1. Limit the number of persistent grants a blkfront instance can use, >> > >> let''s say that only the first X used grants will be persistently mapped >> > >> by both blkfront and blkback, and if more grants are needed the previous >> > >> map/unmap will be used. >> > > >> > > I''m not thrilled with this option. It would likely introduce some >> > > significant performance variability, wouldn''t it? >> > >> > Probably, and also it will be hard to distribute the number of available >> > grant across the different interfaces in a performance sensible way, >> > specially given the fact that once a grant is assigned to a interface it >> > cannot be returned back to the pool of grants. >> > >> > So if we had two interfaces with very different usage (one very busy and >> > another one almost idle), and equally distribute the grants amongst >> > them, one will have a lot of unused grants while the other will suffer >> > from starvation. >> >> I do think we need to implement some sort of reclaim scheme, which >> probably does mean a specific request (per your #4). We simply can''t >> have a device which once upon a time had high throughput but is no >> mostly ideal continue to tie up all those grants. >> >> If you make the reuse of grants use an MRU scheme and reclaim the >> currently unused tail fairly infrequently and in large batches then the >> perf overhead should be minimal, I think. >> >> I also don''t think I would discount the idea of using ephemeral grants >> to cover bursts so easily either, in fact it might fall out quite >> naturally from an MRU scheme? In that scheme bursting up is pretty cheap >> since grant map is relative inexpensive, and recovering from the burst >> shouldn''t be too expensive if you batch it. If it turns out to be not a >> burst but a sustained level of I/O then the MRU scheme would mean you >> wouldn''t be recovering them. >> >> I also think there probably needs to be some tunable per device limit on >> the maximum persistent grants, perhaps minimum and maximum pool sizes >> ties in with an MRU scheme? If nothing else it gives the admin the >> ability to prioritise devices. > > If we introduce a reclaim call we have to be careful not to fall back > to a map/unmap scheme like we had before. > > The way I see it either these additional grants are useful or not. > In the first case we could just limit the maximum amount of persistent > grants and be done with it. > If they are not useful (they have been allocated for one very large > request and not used much after that), could we find a way to identify > unusually large requests and avoid using persistent grants for those?Isn''t it possible that these grants are useful for some periods of time, but not for others? You wouldn''t say, "Caching the disk data in main memory is either useful or not; if it is not useful (if it was allocated for one very large request and not used much after that), we should find a way to identify unusually large requests and avoid caching it." If you''re playing a movie, sure; but in most cases, the cache was useful for a time, then stopped being useful. Treating the persistent grants the same way makes sense to me. -George
On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote:> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > > On Tue, 25 Jun 2013, Ian Campbell wrote: > >> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: > >> > On 21/06/13 20:07, Matt Wilson wrote: > >> > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > >> > >> Hello, > >> > >> > >> > >> While working on further block improvements I've found an issue with > >> > >> persistent grants in blkfront. > >> > >> > >> > >> Persistent grants basically allocate grants and then they are never > >> > >> released, so both blkfront and blkback keep using the same memory pages > >> > >> for all the transactions. > >> > >> > >> > >> This is not a problem in blkback, because we can dynamically choose how > >> > >> many grants we want to map. On the other hand, blkfront cannot remove > >> > >> the access to those grants at any point, because blkfront doesn't know > >> > >> if blkback has this grants mapped persistently or not. > >> > >> > >> > >> So if for example we start expanding the number of segments in indirect > >> > >> requests, to a value like 512 segments per requests, blkfront will > >> > >> probably try to persistently map 512*32+512 = 16896 grants per device, > >> > >> that's much more grants that the current default, which is 32*256 = 8192 > >> > >> (if using grant tables v2). This can cause serious problems to other > >> > >> interfaces inside the DomU, since blkfront basically starts hoarding all > >> > >> possible grants, leaving other interfaces completely locked. > >> > > > >> > > Yikes. > >> > > > >> > >> I've been thinking about different ways to solve this, but so far I > >> > >> haven't been able to found a nice solution: > >> > >> > >> > >> 1. Limit the number of persistent grants a blkfront instance can use, > >> > >> let's say that only the first X used grants will be persistently mapped > >> > >> by both blkfront and blkback, and if more grants are needed the previous > >> > >> map/unmap will be used. > >> > > > >> > > I'm not thrilled with this option. It would likely introduce some > >> > > significant performance variability, wouldn't it? > >> > > >> > Probably, and also it will be hard to distribute the number of available > >> > grant across the different interfaces in a performance sensible way, > >> > specially given the fact that once a grant is assigned to a interface it > >> > cannot be returned back to the pool of grants. > >> > > >> > So if we had two interfaces with very different usage (one very busy and > >> > another one almost idle), and equally distribute the grants amongst > >> > them, one will have a lot of unused grants while the other will suffer > >> > from starvation. > >> > >> I do think we need to implement some sort of reclaim scheme, which > >> probably does mean a specific request (per your #4). We simply can't > >> have a device which once upon a time had high throughput but is no > >> mostly ideal continue to tie up all those grants. > >> > >> If you make the reuse of grants use an MRU scheme and reclaim the > >> currently unused tail fairly infrequently and in large batches then the > >> perf overhead should be minimal, I think. > >> > >> I also don't think I would discount the idea of using ephemeral grants > >> to cover bursts so easily either, in fact it might fall out quite > >> naturally from an MRU scheme? In that scheme bursting up is pretty cheap > >> since grant map is relative inexpensive, and recovering from the burst > >> shouldn't be too expensive if you batch it. If it turns out to be not a > >> burst but a sustained level of I/O then the MRU scheme would mean you > >> wouldn't be recovering them. > >> > >> I also think there probably needs to be some tunable per device limit on > >> the maximum persistent grants, perhaps minimum and maximum pool sizes > >> ties in with an MRU scheme? If nothing else it gives the admin the > >> ability to prioritise devices. > > > > If we introduce a reclaim call we have to be careful not to fall back > > to a map/unmap scheme like we had before. > > > > The way I see it either these additional grants are useful or not. > > In the first case we could just limit the maximum amount of persistent > > grants and be done with it. > > If they are not useful (they have been allocated for one very large > > request and not used much after that), could we find a way to identify > > unusually large requests and avoid using persistent grants for those? > > Isn't it possible that these grants are useful for some periods of > time, but not for others? You wouldn't say, "Caching the disk data in > main memory is either useful or not; if it is not useful (if it was > allocated for one very large request and not used much after that), we > should find a way to identify unusually large requests and avoid > caching it." If you're playing a movie, sure; but in most cases, the > cache was useful for a time, then stopped being useful. Treating the > persistent grants the same way makes sense to me.Right, this is what I was trying to suggest with the MRU scheme. If you are using lots of grants and you keep on reusing them then they remain persistent and don't get reclaimed. If you are not reusing them for a while then they get reclaimed. If you make "for a while" big enough then you should find you aren't unintentionally falling back to a map/unmap scheme. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 26/06/13 12:37, Ian Campbell wrote:> On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote: >> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini >> <stefano.stabellini@eu.citrix.com> wrote: >>> On Tue, 25 Jun 2013, Ian Campbell wrote: >>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: >>>>> On 21/06/13 20:07, Matt Wilson wrote: >>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >>>>>>> Hello, >>>>>>> >>>>>>> While working on further block improvements I've found an issue with >>>>>>> persistent grants in blkfront. >>>>>>> >>>>>>> Persistent grants basically allocate grants and then they are never >>>>>>> released, so both blkfront and blkback keep using the same memory pages >>>>>>> for all the transactions. >>>>>>> >>>>>>> This is not a problem in blkback, because we can dynamically choose how >>>>>>> many grants we want to map. On the other hand, blkfront cannot remove >>>>>>> the access to those grants at any point, because blkfront doesn't know >>>>>>> if blkback has this grants mapped persistently or not. >>>>>>> >>>>>>> So if for example we start expanding the number of segments in indirect >>>>>>> requests, to a value like 512 segments per requests, blkfront will >>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device, >>>>>>> that's much more grants that the current default, which is 32*256 = 8192 >>>>>>> (if using grant tables v2). This can cause serious problems to other >>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding all >>>>>>> possible grants, leaving other interfaces completely locked. >>>>>> Yikes. >>>>>> >>>>>>> I've been thinking about different ways to solve this, but so far I >>>>>>> haven't been able to found a nice solution: >>>>>>> >>>>>>> 1. Limit the number of persistent grants a blkfront instance can use, >>>>>>> let's say that only the first X used grants will be persistently mapped >>>>>>> by both blkfront and blkback, and if more grants are needed the previous >>>>>>> map/unmap will be used. >>>>>> I'm not thrilled with this option. It would likely introduce some >>>>>> significant performance variability, wouldn't it? >>>>> Probably, and also it will be hard to distribute the number of available >>>>> grant across the different interfaces in a performance sensible way, >>>>> specially given the fact that once a grant is assigned to a interface it >>>>> cannot be returned back to the pool of grants. >>>>> >>>>> So if we had two interfaces with very different usage (one very busy and >>>>> another one almost idle), and equally distribute the grants amongst >>>>> them, one will have a lot of unused grants while the other will suffer >>>>> from starvation. >>>> I do think we need to implement some sort of reclaim scheme, which >>>> probably does mean a specific request (per your #4). We simply can't >>>> have a device which once upon a time had high throughput but is no >>>> mostly ideal continue to tie up all those grants. >>>> >>>> If you make the reuse of grants use an MRU scheme and reclaim the >>>> currently unused tail fairly infrequently and in large batches then the >>>> perf overhead should be minimal, I think. >>>> >>>> I also don't think I would discount the idea of using ephemeral grants >>>> to cover bursts so easily either, in fact it might fall out quite >>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap >>>> since grant map is relative inexpensive, and recovering from the burst >>>> shouldn't be too expensive if you batch it. If it turns out to be not a >>>> burst but a sustained level of I/O then the MRU scheme would mean you >>>> wouldn't be recovering them. >>>> >>>> I also think there probably needs to be some tunable per device limit on >>>> the maximum persistent grants, perhaps minimum and maximum pool sizes >>>> ties in with an MRU scheme? If nothing else it gives the admin the >>>> ability to prioritise devices. >>> If we introduce a reclaim call we have to be careful not to fall back >>> to a map/unmap scheme like we had before. >>> >>> The way I see it either these additional grants are useful or not. >>> In the first case we could just limit the maximum amount of persistent >>> grants and be done with it. >>> If they are not useful (they have been allocated for one very large >>> request and not used much after that), could we find a way to identify >>> unusually large requests and avoid using persistent grants for those? >> Isn't it possible that these grants are useful for some periods of >> time, but not for others? You wouldn't say, "Caching the disk data in >> main memory is either useful or not; if it is not useful (if it was >> allocated for one very large request and not used much after that), we >> should find a way to identify unusually large requests and avoid >> caching it." If you're playing a movie, sure; but in most cases, the >> cache was useful for a time, then stopped being useful. Treating the >> persistent grants the same way makes sense to me. > Right, this is what I was trying to suggest with the MRU scheme. If you > are using lots of grants and you keep on reusing them then they remain > persistent and don't get reclaimed. If you are not reusing them for a > while then they get reclaimed. If you make "for a while" big enough then > you should find you aren't unintentionally falling back to a map/unmap > scheme.And I was trying to say that I agreed with you. :-) BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep the most recently used"; is there a practical difference between that and "LRU" ("Discard the Least Recently Used")? Presumably we could implement the clock algorithm pretty reasonably... -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote:> On 26/06/13 12:37, Ian Campbell wrote: > > On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote: > >> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini > >> <stefano.stabellini@eu.citrix.com> wrote: > >>> On Tue, 25 Jun 2013, Ian Campbell wrote: > >>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: > >>>>> On 21/06/13 20:07, Matt Wilson wrote: > >>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> While working on further block improvements I've found an issue with > >>>>>>> persistent grants in blkfront. > >>>>>>> > >>>>>>> Persistent grants basically allocate grants and then they are never > >>>>>>> released, so both blkfront and blkback keep using the same memory pages > >>>>>>> for all the transactions. > >>>>>>> > >>>>>>> This is not a problem in blkback, because we can dynamically choose how > >>>>>>> many grants we want to map. On the other hand, blkfront cannot remove > >>>>>>> the access to those grants at any point, because blkfront doesn't know > >>>>>>> if blkback has this grants mapped persistently or not. > >>>>>>> > >>>>>>> So if for example we start expanding the number of segments in indirect > >>>>>>> requests, to a value like 512 segments per requests, blkfront will > >>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device, > >>>>>>> that's much more grants that the current default, which is 32*256 = 8192 > >>>>>>> (if using grant tables v2). This can cause serious problems to other > >>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding all > >>>>>>> possible grants, leaving other interfaces completely locked. > >>>>>> Yikes. > >>>>>> > >>>>>>> I've been thinking about different ways to solve this, but so far I > >>>>>>> haven't been able to found a nice solution: > >>>>>>> > >>>>>>> 1. Limit the number of persistent grants a blkfront instance can use, > >>>>>>> let's say that only the first X used grants will be persistently mapped > >>>>>>> by both blkfront and blkback, and if more grants are needed the previous > >>>>>>> map/unmap will be used. > >>>>>> I'm not thrilled with this option. It would likely introduce some > >>>>>> significant performance variability, wouldn't it? > >>>>> Probably, and also it will be hard to distribute the number of available > >>>>> grant across the different interfaces in a performance sensible way, > >>>>> specially given the fact that once a grant is assigned to a interface it > >>>>> cannot be returned back to the pool of grants. > >>>>> > >>>>> So if we had two interfaces with very different usage (one very busy and > >>>>> another one almost idle), and equally distribute the grants amongst > >>>>> them, one will have a lot of unused grants while the other will suffer > >>>>> from starvation. > >>>> I do think we need to implement some sort of reclaim scheme, which > >>>> probably does mean a specific request (per your #4). We simply can't > >>>> have a device which once upon a time had high throughput but is no > >>>> mostly ideal continue to tie up all those grants. > >>>> > >>>> If you make the reuse of grants use an MRU scheme and reclaim the > >>>> currently unused tail fairly infrequently and in large batches then the > >>>> perf overhead should be minimal, I think. > >>>> > >>>> I also don't think I would discount the idea of using ephemeral grants > >>>> to cover bursts so easily either, in fact it might fall out quite > >>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap > >>>> since grant map is relative inexpensive, and recovering from the burst > >>>> shouldn't be too expensive if you batch it. If it turns out to be not a > >>>> burst but a sustained level of I/O then the MRU scheme would mean you > >>>> wouldn't be recovering them. > >>>> > >>>> I also think there probably needs to be some tunable per device limit on > >>>> the maximum persistent grants, perhaps minimum and maximum pool sizes > >>>> ties in with an MRU scheme? If nothing else it gives the admin the > >>>> ability to prioritise devices. > >>> If we introduce a reclaim call we have to be careful not to fall back > >>> to a map/unmap scheme like we had before. > >>> > >>> The way I see it either these additional grants are useful or not. > >>> In the first case we could just limit the maximum amount of persistent > >>> grants and be done with it. > >>> If they are not useful (they have been allocated for one very large > >>> request and not used much after that), could we find a way to identify > >>> unusually large requests and avoid using persistent grants for those? > >> Isn't it possible that these grants are useful for some periods of > >> time, but not for others? You wouldn't say, "Caching the disk data in > >> main memory is either useful or not; if it is not useful (if it was > >> allocated for one very large request and not used much after that), we > >> should find a way to identify unusually large requests and avoid > >> caching it." If you're playing a movie, sure; but in most cases, the > >> cache was useful for a time, then stopped being useful. Treating the > >> persistent grants the same way makes sense to me. > > Right, this is what I was trying to suggest with the MRU scheme. If you > > are using lots of grants and you keep on reusing them then they remain > > persistent and don't get reclaimed. If you are not reusing them for a > > while then they get reclaimed. If you make "for a while" big enough then > > you should find you aren't unintentionally falling back to a map/unmap > > scheme. > > And I was trying to say that I agreed with you. :-)Excellent ;-)> BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep > the most recently used"; is there a practical difference between that > and "LRU" ("Discard the Least Recently Used")?I started off with LRU and then got my self confused and changed it everywhere. Yes I mean keep Most Recently Used == discard Least Recently Used.> Presumably we could implement the clock algorithm pretty reasonably...That's the sort of approach I was imagining... Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 21/06/13 20:07, Matt Wilson wrote:> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >> Hello, >> >> While working on further block improvements I''ve found an issue with >> persistent grants in blkfront. >> >> Persistent grants basically allocate grants and then they are never >> released, so both blkfront and blkback keep using the same memory pages >> for all the transactions. >> >> This is not a problem in blkback, because we can dynamically choose how >> many grants we want to map. On the other hand, blkfront cannot remove >> the access to those grants at any point, because blkfront doesn''t know >> if blkback has this grants mapped persistently or not. >> >> So if for example we start expanding the number of segments in indirect >> requests, to a value like 512 segments per requests, blkfront will >> probably try to persistently map 512*32+512 = 16896 grants per device, >> that''s much more grants that the current default, which is 32*256 = 8192 >> (if using grant tables v2). This can cause serious problems to other >> interfaces inside the DomU, since blkfront basically starts hoarding all >> possible grants, leaving other interfaces completely locked. > > Yikes. > >> I''ve been thinking about different ways to solve this, but so far I >> haven''t been able to found a nice solution: >> >> 1. Limit the number of persistent grants a blkfront instance can use, >> let''s say that only the first X used grants will be persistently mapped >> by both blkfront and blkback, and if more grants are needed the previous >> map/unmap will be used. > > I''m not thrilled with this option. It would likely introduce some > significant performance variability, wouldn''t it? > >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I >> have not benchmarked this solution, but I''m quite sure it will involve a >> performance regression, specially when scaling to a high number of domains). > > Why do you think so?I''ve hacked a prototype blkback using grant_copy instead of persistent grants, and removed the persistent grants support in blkfront and indeed the performance of grant_copy is lower than persistent grants, and it seems to scale much worse. I''ve run several fio read/write benchmarks, using 512 segments per request on a ramdisk, and the output is the following: http://xenbits.xen.org/people/royger/grant_copy/ Roger.
On 27/06/13 16:21, Ian Campbell wrote:> On Thu, 2013-06-27 at 14:58 +0100, George Dunlap wrote: >> On 26/06/13 12:37, Ian Campbell wrote: >>> On Wed, 2013-06-26 at 10:37 +0100, George Dunlap wrote: >>>> On Tue, Jun 25, 2013 at 7:04 PM, Stefano Stabellini >>>> <stefano.stabellini@eu.citrix.com> wrote: >>>>> On Tue, 25 Jun 2013, Ian Campbell wrote: >>>>>> On Sat, 2013-06-22 at 09:11 +0200, Roger Pau Monné wrote: >>>>>>> On 21/06/13 20:07, Matt Wilson wrote: >>>>>>>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> While working on further block improvements I've found an issue with >>>>>>>>> persistent grants in blkfront. >>>>>>>>> >>>>>>>>> Persistent grants basically allocate grants and then they are never >>>>>>>>> released, so both blkfront and blkback keep using the same memory pages >>>>>>>>> for all the transactions. >>>>>>>>> >>>>>>>>> This is not a problem in blkback, because we can dynamically choose how >>>>>>>>> many grants we want to map. On the other hand, blkfront cannot remove >>>>>>>>> the access to those grants at any point, because blkfront doesn't know >>>>>>>>> if blkback has this grants mapped persistently or not. >>>>>>>>> >>>>>>>>> So if for example we start expanding the number of segments in indirect >>>>>>>>> requests, to a value like 512 segments per requests, blkfront will >>>>>>>>> probably try to persistently map 512*32+512 = 16896 grants per device, >>>>>>>>> that's much more grants that the current default, which is 32*256 = 8192 >>>>>>>>> (if using grant tables v2). This can cause serious problems to other >>>>>>>>> interfaces inside the DomU, since blkfront basically starts hoarding all >>>>>>>>> possible grants, leaving other interfaces completely locked. >>>>>>>> Yikes. >>>>>>>> >>>>>>>>> I've been thinking about different ways to solve this, but so far I >>>>>>>>> haven't been able to found a nice solution: >>>>>>>>> >>>>>>>>> 1. Limit the number of persistent grants a blkfront instance can use, >>>>>>>>> let's say that only the first X used grants will be persistently mapped >>>>>>>>> by both blkfront and blkback, and if more grants are needed the previous >>>>>>>>> map/unmap will be used. >>>>>>>> I'm not thrilled with this option. It would likely introduce some >>>>>>>> significant performance variability, wouldn't it? >>>>>>> Probably, and also it will be hard to distribute the number of available >>>>>>> grant across the different interfaces in a performance sensible way, >>>>>>> specially given the fact that once a grant is assigned to a interface it >>>>>>> cannot be returned back to the pool of grants. >>>>>>> >>>>>>> So if we had two interfaces with very different usage (one very busy and >>>>>>> another one almost idle), and equally distribute the grants amongst >>>>>>> them, one will have a lot of unused grants while the other will suffer >>>>>>> from starvation. >>>>>> I do think we need to implement some sort of reclaim scheme, which >>>>>> probably does mean a specific request (per your #4). We simply can't >>>>>> have a device which once upon a time had high throughput but is no >>>>>> mostly ideal continue to tie up all those grants. >>>>>> >>>>>> If you make the reuse of grants use an MRU scheme and reclaim the >>>>>> currently unused tail fairly infrequently and in large batches then the >>>>>> perf overhead should be minimal, I think. >>>>>> >>>>>> I also don't think I would discount the idea of using ephemeral grants >>>>>> to cover bursts so easily either, in fact it might fall out quite >>>>>> naturally from an MRU scheme? In that scheme bursting up is pretty cheap >>>>>> since grant map is relative inexpensive, and recovering from the burst >>>>>> shouldn't be too expensive if you batch it. If it turns out to be not a >>>>>> burst but a sustained level of I/O then the MRU scheme would mean you >>>>>> wouldn't be recovering them. >>>>>> >>>>>> I also think there probably needs to be some tunable per device limit on >>>>>> the maximum persistent grants, perhaps minimum and maximum pool sizes >>>>>> ties in with an MRU scheme? If nothing else it gives the admin the >>>>>> ability to prioritise devices. >>>>> If we introduce a reclaim call we have to be careful not to fall back >>>>> to a map/unmap scheme like we had before. >>>>> >>>>> The way I see it either these additional grants are useful or not. >>>>> In the first case we could just limit the maximum amount of persistent >>>>> grants and be done with it. >>>>> If they are not useful (they have been allocated for one very large >>>>> request and not used much after that), could we find a way to identify >>>>> unusually large requests and avoid using persistent grants for those? >>>> Isn't it possible that these grants are useful for some periods of >>>> time, but not for others? You wouldn't say, "Caching the disk data in >>>> main memory is either useful or not; if it is not useful (if it was >>>> allocated for one very large request and not used much after that), we >>>> should find a way to identify unusually large requests and avoid >>>> caching it." If you're playing a movie, sure; but in most cases, the >>>> cache was useful for a time, then stopped being useful. Treating the >>>> persistent grants the same way makes sense to me. >>> Right, this is what I was trying to suggest with the MRU scheme. If you >>> are using lots of grants and you keep on reusing them then they remain >>> persistent and don't get reclaimed. If you are not reusing them for a >>> while then they get reclaimed. If you make "for a while" big enough then >>> you should find you aren't unintentionally falling back to a map/unmap >>> scheme. >> >> And I was trying to say that I agreed with you. :-) > > Excellent ;-)I also agree that this is the best solution, I will start looking at implementing it.>> BTW, I presume "MRU" stands for "Most Recently Used", and means "Keep >> the most recently used"; is there a practical difference between that >> and "LRU" ("Discard the Least Recently Used")? > > I started off with LRU and then got my self confused and changed it > everywhere. Yes I mean keep Most Recently Used == discard Least Recently > Used.This will help if the disk is only doing intermittent bursts of data, but if the disk is under high I/O during a long time we might end up under the same situation (all grants hoarded by a single disk). We should make sure that there's always a buffer of unused grants so other disks or nic interfaces can continue to work as expected. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, 27 Jun 2013, Roger Pau Monné wrote:> On 21/06/13 20:07, Matt Wilson wrote: > > On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: > >> Hello, > >> > >> While working on further block improvements I''ve found an issue with > >> persistent grants in blkfront. > >> > >> Persistent grants basically allocate grants and then they are never > >> released, so both blkfront and blkback keep using the same memory pages > >> for all the transactions. > >> > >> This is not a problem in blkback, because we can dynamically choose how > >> many grants we want to map. On the other hand, blkfront cannot remove > >> the access to those grants at any point, because blkfront doesn''t know > >> if blkback has this grants mapped persistently or not. > >> > >> So if for example we start expanding the number of segments in indirect > >> requests, to a value like 512 segments per requests, blkfront will > >> probably try to persistently map 512*32+512 = 16896 grants per device, > >> that''s much more grants that the current default, which is 32*256 = 8192 > >> (if using grant tables v2). This can cause serious problems to other > >> interfaces inside the DomU, since blkfront basically starts hoarding all > >> possible grants, leaving other interfaces completely locked. > > > > Yikes. > > > >> I''ve been thinking about different ways to solve this, but so far I > >> haven''t been able to found a nice solution: > >> > >> 1. Limit the number of persistent grants a blkfront instance can use, > >> let''s say that only the first X used grants will be persistently mapped > >> by both blkfront and blkback, and if more grants are needed the previous > >> map/unmap will be used. > > > > I''m not thrilled with this option. It would likely introduce some > > significant performance variability, wouldn''t it? > > > >> 2. Switch to grant copy in blkback, and get rid of persistent grants (I > >> have not benchmarked this solution, but I''m quite sure it will involve a > >> performance regression, specially when scaling to a high number of domains). > > > > Why do you think so? > > I''ve hacked a prototype blkback using grant_copy instead of persistent > grants, and removed the persistent grants support in blkfront and indeed > the performance of grant_copy is lower than persistent grants, and it > seems to scale much worse. I''ve run several fio read/write benchmarks, > using 512 segments per request on a ramdisk, and the output is the > following: > > http://xenbits.xen.org/people/royger/grant_copy/Very impressive. We should consider doing the same experiment with netfront/netback at some point. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 24/06/13 13:06, Stefano Stabellini wrote:> On Sat, 22 Jun 2013, Wei Liu wrote: >> On Fri, Jun 21, 2013 at 04:16:25PM -0400, Konrad Rzeszutek Wilk wrote: >>> On Fri, Jun 21, 2013 at 07:10:59PM +0200, Roger Pau Monné wrote: >>>> Hello, >>>> >>>> While working on further block improvements I've found an issue with >>>> persistent grants in blkfront. >>>> >>>> Persistent grants basically allocate grants and then they are never >>>> released, so both blkfront and blkback keep using the same memory pages >>>> for all the transactions. >>>> >>>> This is not a problem in blkback, because we can dynamically choose how >>>> many grants we want to map. On the other hand, blkfront cannot remove >>>> the access to those grants at any point, because blkfront doesn't know >>>> if blkback has this grants mapped persistently or not. >>>> >>>> So if for example we start expanding the number of segments in indirect >>>> requests, to a value like 512 segments per requests, blkfront will >>>> probably try to persistently map 512*32+512 = 16896 grants per device, >>>> that's much more grants that the current default, which is 32*256 = 8192 >>>> (if using grant tables v2). This can cause serious problems to other >>>> interfaces inside the DomU, since blkfront basically starts hoarding all >>>> possible grants, leaving other interfaces completely locked. >>>> >>>> I've been thinking about different ways to solve this, but so far I >>>> haven't been able to found a nice solution: >>>> >>>> 1. Limit the number of persistent grants a blkfront instance can use, >>>> let's say that only the first X used grants will be persistently mapped >>>> by both blkfront and blkback, and if more grants are needed the previous >>>> map/unmap will be used. >>>> >>>> 2. Switch to grant copy in blkback, and get rid of persistent grants (I >>>> have not benchmarked this solution, but I'm quite sure it will involve a >>>> performance regression, specially when scaling to a high number of domains). >>>> >> >> Any chance that the speed of copying is fast enough for block devices? >> >>>> 3. Increase the size of the grant_table or the size of a single grant >>>> (from 4k to 2M) (this is from Stefano Stabellini). >>>> >>>> 4. Introduce a new request type that we can use to request blkback to >>>> unmap certain grefs so we can free them in blkfront. >>> >>> >>> 5). Lift the limit of grant pages a domain can have. >> >> If I'm not mistaken, this is basically the same as "increase the size of >> the grant_table" in #3. > > Yes, that was one of the things I was suggesting, but it needs > investigating: I wouldn't want that increasing the number of grant > frames would reach a different scalability limit of the data structure.I don't think there's any implicit scalability limit in the data structure itself, it's just an array and grants are ordered as array[gref]. I've discussed with Stefano the usage of domain pages to increase the size of the grant table, so instead of using xenheap pages we could use domain pages and thus remove the limitation (since we will be consuming domain memory). I have a very hacky prototype that uses domain pages instead of xenheap pages for expanding the grant table, but I think that before implementing this it would be more suitable to implement #4, even if we are using domain pages to increase the grant table, we need a way to allow blkfront to remove persistent grants, or we will end up with a lot of unsused pages in blkfront after I/O bursts. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel