Awhile back I added the domain config flag "superpages" to support Linux hugepages in PV domains. When the flag is set, the PV domain is populated entirely with superpages. If not enough superpage-sized chunks can be found, the domain creation fails. At some time after my patch was accepted, the code I added to domain restore was removed because I broke page allocation batching. I put it on my TODO list to reimplement it, then it got lost, for which I apologize. Now I have gotten back to reimplementing PV superpage support in restore, I find that recently other code was added to restore that, while triggered by the superpage flag, only allocates superpages opportunistically and falls back to small pages if it fails. This breaks the original semantics of the flag and could cause any OS that depends on the semantics to fail catastrophically. I have a patch that implements the original semantics of the superpage flag while preserving the batch allocation behavior. I can remove the competing code and submit mine, but I have a question. What value is there in implementing opportunistic allocation of superpages for a PV (or an HVM) domain in restore? It clearly can''t be based on the superpages flag. Opportunistic superpage allocation is already the default behavior for HVM domain creation. Should it also be a default on HVM restore? What about for PV domains? Is there any real benefit? Thanks, Dave McCracken Oracle Corp.
On 25/06/12 15:38, Dave McCracken wrote:> Awhile back I added the domain config flag "superpages" to support Linux > hugepages in PV domains. When the flag is set, the PV domain is populated > entirely with superpages. If not enough superpage-sized chunks can be found, > the domain creation fails. > > At some time after my patch was accepted, the code I added to domain restore > was removed because I broke page allocation batching. I put it on my TODO > list to reimplement it, then it got lost, for which I apologize. > > Now I have gotten back to reimplementing PV superpage support in restore, I > find that recently other code was added to restore that, while triggered by > the superpage flag, only allocates superpages opportunistically and falls back > to small pages if it fails. This breaks the original semantics of the flag > and could cause any OS that depends on the semantics to fail catastrophically. > > I have a patch that implements the original semantics of the superpage flag > while preserving the batch allocation behavior. I can remove the competing > code and submit mine, but I have a question. What value is there in > implementing opportunistic allocation of superpages for a PV (or an HVM) > domain in restore? It clearly can''t be based on the superpages flag. > Opportunistic superpage allocation is already the default behavior for HVM > domain creation. Should it also be a default on HVM restore? What about for > PV domains? Is there any real benefit?Well the value of having superpages for HVM guests is pretty obvious. When using hardware assisted pagetables (HAP), the number of memory reads on a TLB lookup is guest_levels * p2m_level -- so on a 64-bit guest, the one extra level of p2m could cause up to 4 extra memory reads for every TLB miss. The reason to do it opportunistically instead of all-or-nothing is that there''s no reason not to -- every little helps. :-) My question is, what is the value of enforcing all-or-nothing for PV guests? Is it the case that PV guests have to be entirely in either one mode or the other? I''m not particularly fussed about having a way to disable the opportunistic superpage allocation for HVM guests, and just turning that on all the time. I only really used the flag because I saw it was being passed but wasn''t being used; I didn''t realize it was meant to have the "use superpages or abort" semantics. My only non-negotiable is that we have *a way* to get opportunistic superpages for HVM guests. -George
>>> On 25.06.12 at 17:08, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 25/06/12 15:38, Dave McCracken wrote: >> Awhile back I added the domain config flag "superpages" to support Linux >> hugepages in PV domains. When the flag is set, the PV domain is populated >> entirely with superpages. If not enough superpage-sized chunks can be found, >> the domain creation fails. >> >> At some time after my patch was accepted, the code I added to domain restore >> was removed because I broke page allocation batching. I put it on my TODO >> list to reimplement it, then it got lost, for which I apologize. >> >> Now I have gotten back to reimplementing PV superpage support in restore, I >> find that recently other code was added to restore that, while triggered by >> the superpage flag, only allocates superpages opportunistically and falls > back >> to small pages if it fails. This breaks the original semantics of the flag >> and could cause any OS that depends on the semantics to fail > catastrophically. >> >> I have a patch that implements the original semantics of the superpage flag >> while preserving the batch allocation behavior. I can remove the competing >> code and submit mine, but I have a question. What value is there in >> implementing opportunistic allocation of superpages for a PV (or an HVM) >> domain in restore? It clearly can''t be based on the superpages flag. >> Opportunistic superpage allocation is already the default behavior for HVM >> domain creation. Should it also be a default on HVM restore? What about > for >> PV domains? Is there any real benefit? > Well the value of having superpages for HVM guests is pretty obvious. > When using hardware assisted pagetables (HAP), the number of memory > reads on a TLB lookup is guest_levels * p2m_level -- so on a 64-bit > guest, the one extra level of p2m could cause up to 4 extra memory reads > for every TLB miss. The reason to do it opportunistically instead of > all-or-nothing is that there''s no reason not to -- every little helps. :-) > > My question is, what is the value of enforcing all-or-nothing for PV > guests? Is it the case that PV guests have to be entirely in either one > mode or the other?Since I understand a PV guest''s balloon driver must play with this, I indeed think this is a strictly separated set.> I''m not particularly fussed about having a way to disable the > opportunistic superpage allocation for HVM guests, and just turning that > on all the time. I only really used the flag because I saw it was being > passed but wasn''t being used; I didn''t realize it was meant to have the > "use superpages or abort" semantics. My only non-negotiable is that we > have *a way* to get opportunistic superpages for HVM guests.Couldn''t we have the setting be an override for the HVM allocation behavior (defaulting to enabled there), and have the originally intended meaning for PV (disabled by default)? Jan
On Monday, June 25, 2012, Jan Beulich wrote:> > My question is, what is the value of enforcing all-or-nothing for PV > > guests? Is it the case that PV guests have to be entirely in either one > > mode or the other? > > Since I understand a PV guest''s balloon driver must play with > this, I indeed think this is a strictly separated set.I specifically need to be able to guarantee superpage-backed memory in PV guests to be able to map them as superpages (hugepages in Linux). I''m trying to come up with some benefit for opportunistic superpages in PV guests, but nothing comes to mind.> > I''m not particularly fussed about having a way to disable the > > opportunistic superpage allocation for HVM guests, and just turning that > > on all the time. I only really used the flag because I saw it was being > > passed but wasn''t being used; I didn''t realize it was meant to have the > > "use superpages or abort" semantics. My only non-negotiable is that we > > have a way to get opportunistic superpages for HVM guests. > > Couldn''t we have the setting be an override for the HVM > allocation behavior (defaulting to enabled there), and have > the originally intended meaning for PV (disabled by default)?I like this idea. It would be simple enough. Is there any reason to allow disabling HVM superpage allocation? HVM domain creation always allocates as many superpages as it can, then falls back to small pages for the rest. Wouldn''t it be reasonable to make restore always do this too? Dave McCracken Oracle Corp.
On Mon, Jun 25, 2012 at 5:07 PM, Dave McCracken <dcm@mccr.org> wrote:>> Couldn''t we have the setting be an override for the HVM >> allocation behavior (defaulting to enabled there), and have >> the originally intended meaning for PV (disabled by default)? > > I like this idea. It would be simple enough. > > Is there any reason to allow disabling HVM superpage allocation? HVM domain > creation always allocates as many superpages as it can, then falls back to > small pages for the rest. Wouldn''t it be reasonable to make restore always do > this too?At this point, probably not. Every toolstack that I know of (xend, xl, xapi) always set it to ''1'' for HVM guests. Is there any reason not to just have the "superpage" flag mean "try to use superpages", and then have the allocation routine fail if superpages && pv==true? It seems like that might be the simplest option. Alternately, we could change the argument to "pv_superpages" or something, and ignore it for HVM guests (always trying to allocate superpages if available). But I''m not sure that really buys us anything. The only trick would be trying to help people in the future to not make the same mistake I did in interpreting what the "superpages" flag means. -George
>>> On 25.06.12 at 18:07, Dave McCracken <dcm@mccr.org> wrote: > On Monday, June 25, 2012, Jan Beulich wrote: >> > My question is, what is the value of enforcing all-or-nothing for PV >> > guests? Is it the case that PV guests have to be entirely in either one >> > mode or the other? >> >> Since I understand a PV guest''s balloon driver must play with >> this, I indeed think this is a strictly separated set. > > I specifically need to be able to guarantee superpage-backed memory in PV > guests to be able to map them as superpages (hugepages in Linux). I''m > trying > to come up with some benefit for opportunistic superpages in PV guests, but > nothing comes to mind. > >> > I''m not particularly fussed about having a way to disable the >> > opportunistic superpage allocation for HVM guests, and just turning that >> > on all the time. I only really used the flag because I saw it was being >> > passed but wasn''t being used; I didn''t realize it was meant to have the >> > "use superpages or abort" semantics. My only non-negotiable is that we >> > have a way to get opportunistic superpages for HVM guests. >> >> Couldn''t we have the setting be an override for the HVM >> allocation behavior (defaulting to enabled there), and have >> the originally intended meaning for PV (disabled by default)? > > I like this idea. It would be simple enough. > > Is there any reason to allow disabling HVM superpage allocation?Debugging of certain code paths? Or discriminating certain (unimportant) VMs?> HVM domain > creation always allocates as many superpages as it can, then falls back to > small pages for the rest. Wouldn''t it be reasonable to make restore always > do this too?Absolutely imo - not having done so from the beginning was likely just an oversight (but that would need confirmation by someone more familiar with that code than me). Jan
Am 25.06.2012 16:38, schrieb Dave McCracken:> > Awhile back I added the domain config flag "superpages" to support Linux > hugepages in PV domains. When the flag is set, the PV domain is populated > entirely with superpages. If not enough superpage-sized chunks can be found, > the domain creation fails. > > At some time after my patch was accepted, the code I added to domain restore > was removed because I broke page allocation batching. I put it on my TODO > list to reimplement it, then it got lost, for which I apologize. > > Now I have gotten back to reimplementing PV superpage support in restore, I > find that recently other code was added to restore that, while triggered by > the superpage flag, only allocates superpages opportunistically and falls back > to small pages if it fails. This breaks the original semantics of the flag > and could cause any OS that depends on the semantics to fail catastrophically. > > I have a patch that implements the original semantics of the superpage flag > while preserving the batch allocation behavior. I can remove the competing > code and submit mine, but I have a question. What value is there in > implementing opportunistic allocation of superpages for a PV (or an HVM) > domain in restore? It clearly can''t be based on the superpages flag. > Opportunistic superpage allocation is already the default behavior for HVM > domain creation. Should it also be a default on HVM restore? What about for > PV domains? Is there any real benefit?There is a real benefit. We are seeing severe performance penalties after migrating a HVM domain. Performance is going down by 10% or more! Our OS (BS2000) is trying to use superpages where possible. Before live migration I can see that the complete memory for the domain is allocated in at least 2MB chunks, after the migration not a single superpage is left. With EPT this not only makes each TLB-miss more expensive, but there are much more TLB-misses, as no 2MB TLB-entries are possible at all! Juergen -- Juergen Gross Principal Developer Operating Systems PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html