George, Ian, Ian, Tim --
After the very extended debate about Oracle''s proposed
"memory capacity claim" hypercall, it seemed prudent to
first implement and demonstrate a working model.
A hypervisor patch (and a two-line libxc patch) has
now been completed and reviewed and Jan Beulich has found
it acceptable, so it is now time to return to the
discussion of whether the feature is needed.  The
patch description provides the detailed technical argument:
http://lists.xen.org/archives/html/xen-devel/2012-11/msg01427.html 
I believe, in the earlier debate, you were the most
vociferous opponents of this proposed hypercall,
primarily -- if I may attempt to summarize -- because
you believe that your paradigm of a how a toolstack
should manage memory ("omnisciently") renders the proposed
feature unnecessary.  I believe I adequately described
the differences between your paradigm and Oracle''s, and
also demonstrated Oracle''s customer need for the feature,
but the debate fell silent without your acknowledgement.
While it has always been my understanding that the
hypervisor is intended to be toolstack-independent,
Jan would like your Ack before committing the hypervisor
changes.
So if you still object, please state your objections.
Otherwise, please ack, so Jan can commit it and Oracle
can move forward with toolstack development built
on the hypercall.
Thanks,
Dan
>>> On 26.11.12 at 20:07, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > While it has always been my understanding that the > hypervisor is intended to be toolstack-independent, > Jan would like your Ack before committing the hypervisor > changes. > > So if you still object, please state your objections. > Otherwise, please ack, so Jan can commit it and Oracle > can move forward with toolstack development built > on the hypercall.Sorry, there must have been some misunderstanding here: First of all, without a maintainer''s ack (Keir''s in this case) I can''t commit anything to code that I''m not explicitly listed for as maintainer. Second, while I said the code itself looks acceptable, I also pointed out that in the shape it is right now it is dead code, as there''s no user for it. So all we would get would be the risk of new bugs (and the one I just pointed out worries me in so far as how much testing this code really has seen). Third, deferral (or denial) of the patch going in is certainly not a blocking factor for tools side development at Oracle. In the worst case, you''d have to maintain the patch in your own tree(s); I do realize that you want to avoid that (as I would, but there are examples of patches that we carry in our trees that didn''t get accepted into the community one - luckily they''re of smaller size). Jan
On 27/11/2012 08:48, "Jan Beulich" <JBeulich@suse.com> wrote:> Sorry, there must have been some misunderstanding here: First > of all, without a maintainer''s ack (Keir''s in this case) I can''t commit > anything to code that I''m not explicitly listed for as maintainer. > > Second, while I said the code itself looks acceptable, I also pointed > out that in the shape it is right now it is dead code, as there''s no > user for it. So all we would get would be the risk of new bugs (and > the one I just pointed out worries me in so far as how much testing > this code really has seen). > > Third, deferral (or denial) of the patch going in is certainly not a > blocking factor for tools side development at Oracle. In the worst > case, you''d have to maintain the patch in your own tree(s); I do > realize that you want to avoid that (as I would, but there are > examples of patches that we carry in our trees that didn''t get > accepted into the community one - luckily they''re of smaller size).It''s actually not that large a patch, and perhaps we could take a domain_adjust_tot_pages() hook which would reduce the size of any private patch, while not inflicting a maintenance or bug burden on mainline. -- Keir
On Tue, 2012-11-27 at 08:48 +0000, Jan Beulich wrote:> Second, while I said the code itself looks acceptable, I also pointed > out that in the shape it is right now it is dead code, as there''s no > user for it. So all we would get would be the risk of new bugs (and > the one I just pointed out worries me in so far as how much testing > this code really has seen).I agree with this.> Third, deferral (or denial) of the patch going in is certainly not a > blocking factor for tools side development at Oracle. In the worst > case, you''d have to maintain the patch in your own tree(s); I do > realize that you want to avoid that (as I would, but there are > examples of patches that we carry in our trees that didn''t get > accepted into the community one - luckily they''re of smaller size).I think Oracle carrying this patch in their tree is probably the best approach for now. I''m also a little surprised that this patch is being so aggressively pushed upstream when the toolstack work which would use it is seemingly not fully formed yet. Anyway, it seems to me that this argument seems to me to be starting from the wrong end, it starts from the hypercall and tries to justify it based on requirements imposed by an toolstack which is presented as something of a fixed black box from the xen-devel point of view, which is not something I find particularly convincing. So a possible alternative to Oracle carrying this patch long term is that someone who understands Oracle''s toolstack''s requirements and constraints takes over from Dan (who I think has said several times that he is not familiar with all the details of the Oracle toolstack) as advocate for finding a solution to the underlying issue here and can engage xen-devel in a discussion about the design decisions involved from the toolstack downwards. Either this leads to the proposed solution in the hypervisor (or something similar) or it results in a completely different solution which everyone is happy with (or I suppose it might still end up with Oracle carrying this patch long term). I also feel I should also point out that contrary to the claims in http://lists.xen.org/archives/html/xen-devel/2012-11/msg01427.html and elsewhere the acceptance or otherwise of this patch has nothing to do with Citrix. Although some of the folks involved in the discussion are employed by Citrix they are all members of the "platform team" which operates independently, is concerned with the state of Xen.org provided Xen bits and is not tied to any product team (Citrix or otherwise). So this has nothing whatsoever to do with Citrix''s plans to use this mechanism (and such conspiracy theories IMHO add nothing to the discussion). AFAIK no one who is involved with any of Citrix''s products has said anything at all in any thread on the matter one way or the other. Ian
> From: Jan Beulich [mailto:JBeulich@suse.com] > Subject: Re: Please ack XENMEM_claim_pages hypercall?Hi Jan --> >>> On 26.11.12 at 20:07, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > > While it has always been my understanding that the > > hypervisor is intended to be toolstack-independent, > > Jan would like your Ack before committing the hypervisor > > changes. > > > > So if you still object, please state your objections. > > Otherwise, please ack, so Jan can commit it and Oracle > > can move forward with toolstack development built > > on the hypercall. > > Sorry, there must have been some misunderstanding here: First > of all, without a maintainer''s ack (Keir''s in this case) I can''t commit > anything to code that I''m not explicitly listed for as maintainer.Oops, sorry. Since you contribute so widely to the hypervisor, the maintainership division between you and Keir is not particularly clear.> Second, while I said the code itself looks acceptable, I also pointed > out that in the shape it is right now it is dead code, as there''s no > user for it. So all we would get would be the risk of new bugs (and > the one I just pointed out worries me in so far as how much testing > this code really has seen).The proposed hypercall is dead code to the _xl_ toolstack. As I said, I personally can write an xm/xend patch that uses it (which will undoubtedly launch another firestorm, so I was trying to avoid that). But most importantly, many of the code changes _will_ be tested without any toolstack changes at all, as the existing toolstack (with no claim hypercalls) exercises the changes. Perhaps this is the most important testing of all and, as I understand it, is a primary purpose of xen-unstable. I do agree though that it is difficult to get adequate testing without a toolstack user. I, too, would like to see that fixed. :-(> Third, deferral (or denial) of the patch going in is certainly not a > blocking factor for tools side development at Oracle. In the worst > case, you''d have to maintain the patch in your own tree(s); I do > realize that you want to avoid that (as I would, but there are > examples of patches that we carry in our trees that didn''t get > accepted into the community one - luckily they''re of smaller size).I guess I''ve failed to make a very important point because I thought it was obvious... Because of the addition of the hypercall subops, this is an ABI change. I think you would agree that maintaining an ABI change out-of-tree is much more difficult than maintaining non-ABI changes out-of-tree. If the hypercall subops are reserved, whether the remainder of the patch is accepted now or not, that might be a reasonable compromise to Oracle. Thanks, Dan
> From: Keir Fraser [mailto:keir@xen.org] > Subject: Re: [Xen-devel] Please ack XENMEM_claim_pages hypercall? > > On 27/11/2012 08:48, "Jan Beulich" <JBeulich@suse.com> wrote: > > > Sorry, there must have been some misunderstanding here: First > > of all, without a maintainer''s ack (Keir''s in this case) I can''t commit > > anything to code that I''m not explicitly listed for as maintainer. > > > > Second, while I said the code itself looks acceptable, I also pointed > > out that in the shape it is right now it is dead code, as there''s no > > user for it. So all we would get would be the risk of new bugs (and > > the one I just pointed out worries me in so far as how much testing > > this code really has seen). > > > > Third, deferral (or denial) of the patch going in is certainly not a > > blocking factor for tools side development at Oracle. In the worst > > case, you''d have to maintain the patch in your own tree(s); I do > > realize that you want to avoid that (as I would, but there are > > examples of patches that we carry in our trees that didn''t get > > accepted into the community one - luckily they''re of smaller size). > > It''s actually not that large a patch, and perhaps we could take a > domain_adjust_tot_pages() hook which would reduce the size of any private > patch, while not inflicting a maintenance or bug burden on mainline.Thanks for your kind words. That would be very helpful. Also reserving the subops would be much more valuable. Thanks, Dan
At 10:31 +0000 on 27 Nov (1354012308), Ian Campbell wrote:> On Tue, 2012-11-27 at 08:48 +0000, Jan Beulich wrote: > > Second, while I said the code itself looks acceptable, I also pointed > > out that in the shape it is right now it is dead code, as there''s no > > user for it. So all we would get would be the risk of new bugs (and > > the one I just pointed out worries me in so far as how much testing > > this code really has seen). > > I agree with this.Me too (and with Jan and Ian''s other points).> > Third, deferral (or denial) of the patch going in is certainly not a > > blocking factor for tools side development at Oracle. In the worst > > case, you''d have to maintain the patch in your own tree(s); I do > > realize that you want to avoid that (as I would, but there are > > examples of patches that we carry in our trees that didn''t get > > accepted into the community one - luckily they''re of smaller size). > > I think Oracle carrying this patch in their tree is probably the best > approach for now.Agreed. We can always take a patch reserving hypercall numbers for now, so there''s no risk that an Oracle private patch will clash with an upstream allocation later on. Tim.
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Tuesday, November 27, 2012 3:32 AM > To: Jan Beulich > Cc: Dan Magenheimer; George Dunlap; Ian Jackson; xen-devel@lists.xen.org; Konrad Wilk; Zhigang Wang; > Keir (Xen.org); Tim (Xen.org) > Subject: Re: Please ack XENMEM_claim_pages hypercall?Hi Ian --> I''m also a little surprised that this patch is being so aggressively > pushed upstream when the toolstack work which would use it is seemingly > not fully formed yet.The deficiency in the Oracle toolstack has been known for years and this solution has been discussed and approved at the VP level.> Anyway, it seems to me that this argument seems to me to be starting > from the wrong end, it starts from the hypercall and tries to justify it > based on requirements imposed by an toolstack which is presented as > something of a fixed black box from the xen-devel point of view, which > is not something I find particularly convincing.I fully recognize that the existence of the black box is annoying. But the argument began not with a hypercall but with a real customer problem. The early discussion, in which others suggested "change your toolstack", uncovered a rather dramatic paradigm difference that re-emphasized the need for the hypercall. See below.> So a possible alternative to Oracle carrying this patch long term is > that someone who understands Oracle''s toolstack''s requirements and > constraints takes over from Dan (who I think has said several times that > he is not familiar with all the details of the Oracle toolstack) as > advocate for finding a solution to the underlying issue here and can > engage xen-devel in a discussion about the design decisions involved > from the toolstack downwards. Either this leads to the proposed solution > in the hypervisor (or something similar) or it results in a completely > different solution which everyone is happy with (or I suppose it might > still end up with Oracle carrying this patch long term).From a single-system-xl-toolstack-centric perspective ("paradigm"), I can see your point. This is not an xl-toolstack-centric problem.> I also feel I should also point out that contrary to the claims in > http://lists.xen.org/archives/html/xen-devel/2012-11/msg01427.html and > elsewhere the acceptance or otherwise of this patch has nothing to do > with Citrix. Although some of the folks involved in the discussion are > employed by Citrix they are all members of the "platform team" which > operates independently, is concerned with the state of Xen.org provided > Xen bits and is not tied to any product team (Citrix or otherwise). So > this has nothing whatsoever to do with Citrix''s plans to use this > mechanism (and such conspiracy theories IMHO add nothing to the > discussion). AFAIK no one who is involved with any of Citrix''s products > has said anything at all in any thread on the matter one way or the > other.Conspiracy theory? Some old guy from your neck of the woods said something like: "Methinks, the maintainer doth protest too much." ;-) I apologize if my words have implied that maintainers employed by Citrix are intentionally making decisions that favor Citrix. I _am_ implying, however, that the single-system-inference/policy-driven memory-load-balancer paradigm _has_ influenced the open source hypervisor _and_ heavily influenced the objections to this proposal. AFAIK, Citrix''s Dynamic Memory Controller (DMC) in XenServer is the only shipping example (in the Xen universe) of that, so I call it the "Citrix paradigm". Xen decisions made with this paradigm in mind heavily favor a single-system model, which lead to solutions to problems which are not as applicable to a data-center paradigm such as Oracle''s. I _do_ consider this a serious issue and suggest that the maintainers consider it deeply because the hypervisor must serve both paradigms. Now, with that in mind, I still haven''t heard any objections other than insufficient testing (fair, but also very true of most patches accepted into xen-unstable months before a release); and "your toolstack should use the ''Citrix paradigm''" (and I believe I have adequately explained why we cannot). Are there other valid objections? Thanks, Dan
On 27/11/12 14:38, Dan Magenheimer wrote:> Now, with that in mind, I still haven''t heard any objections > other than insufficient testing (fair, but also very true of > most patches accepted into xen-unstable months before a release); > and "your toolstack should use the ''Citrix paradigm''" (and > I believe I have adequately explained why we cannot). Are > there other valid objections?Yes -- the main one being, you have not convinced anyone it''s necessary yet. Maybe you should ask someone else to take up this cause -- we seem to have major problems communicating with you. -George
On 27/11/2012 14:08, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> It''s actually not that large a patch, and perhaps we could take a >> domain_adjust_tot_pages() hook which would reduce the size of any private >> patch, while not inflicting a maintenance or bug burden on mainline. > > Thanks for your kind words. That would be very helpful. > Also reserving the subops would be much more valuable.Yes yes, reserving subop(s) is certainly doable. We''ve done it for other vendors in the past, and happy to again, within reason. -- Keir
Dan Magenheimer writes ("RE: Please ack XENMEM_claim_pages
hypercall?"):> From a single-system-xl-toolstack-centric perspective
("paradigm"),
> I can see your point.
I don''t think this is the case.  What you are doing is putting this
node-specific claim functionality in the hypervisor.  I still think it
should be done outside the hypervisor.  This does not mean that there
has to be a single omniscient piece of software for an entire
cluster.
It just means that there has to be a single omniscient piece of
software _for a particular host_.  In your proposal that piece of
software is the hypervisor - specifically, the part of the hypervisor
that does the bookkeeping of these claims.  However I am still
unconvinced that this can''t be implemented outside the hypervisor.
In general it is a key design principle of a system like Xen that the
hypervisor should provide only those facilities which are strictly
necessary.  Any functionality which can be reasonably provided outside
the hypervisor should be excluded from it.  It is this principle which
you are running up against.
It may be (and it seems likely from what you''ve said in private email)
that some of your existing guests make some assumptions about the
semantics of the Xen ballooning memory interface, which might be
violated by such a design.  Specifically it appears that your design
has guests deliberately balloon down further than requested by the
toolstack, in a kind of attempt to negotiate with other users of
memory on the system by actually releasing and claiming Xen memory.
But I don''t think it''s reasonable to demand that the shared
codebase
reflect such undocumented prior assumptions.  Particularly when this
design seems poor.  I find it poor because (a) using actual memory
allocation and release provides only a very impoverished signalling
mechanism (b) it imposes on every part of the system the possibility
of unexpected memory allocation failure.  (Your claim hypercall is an
attempt to mitigate (b) for some but not all cases.)
It seems to me that the correct approach is to design and implement a
new interface which allows a guest (whether that be its kernel or an
agent) to conduct a richer negotiation with out-of-hypervisor
toolstack software.  But that toolstack software (which might take any
particular shape - certainly we don''t want to make any assumptions
about its policies and nature) needs to be sufficiently aware of the
claims and arbitrate between them, in a way that arranges that guests
which obey the rules never see an "out of memory" from the hypervisor.
Of course if it really is desired to have each guest make its own
decisions and simply for them to somehow agree to divvy up the
available resources, then even so a new hypervisor mechanism is not
needed.  All that is needed is a way for those guests to synchronise
their accesses and updates to shared records of the available and
in-use memory.
> AFAIK, Citrix''s Dynamic Memory Controller (DMC) in XenServer is
> the only shipping example (in the Xen universe) of that,
I have never worked on the XenServer codebase and I have no clear idea
what this "Dynamic Memory Controller" is.  No-one here has explained
it to me and I have no particular desire to know about it.  Its
design, and its requirements or lack of them, have not influenced my
opinion on your proposals.
As an example to demonstrate that the reaction you are seeing is
nothing to do with whether the originators of the proposal are inside
our outside Citrix, please refer to our cool reaction to the v4v
proposals which also involved new hypervisor functionality.  There
too, we require the case to be made: we need to be able to see that an
out-of-hypervisor approach is not sufficient.
>    Xen decisions made with this paradigm
> in mind heavily favor a single-system model,
Nothing in my end of this conversation is predicated on any particular
deployment paradigm.  It is clear that in both your proposal and my
counter-proposal[1] there is a single place in each host where memory
allocation decisions are made and in particular where the memory needs
of competing guests etc. are arbitrated.
In your proposal this place is in the hypervisor and the negotiation
between the competing resource users is "grab the memory if you want
to".  Naturally an in-hypervisor arbitration facility has to be very
simple and a sophisticated policy is difficult to apply.
In my counter-proposal this negotiation occurs between the guest and
an out-of-hypervisor per-host arbitrator of some kind.
I think you are going to say that in your system the guests decide for
themselves how much memory to claim based on their views of how much
is free, and whether their allocations fail.  However, there is no
particular reason why the information about how much memory is free,
and how much has been committed for each purpose, could not be
collected somewhere outside the hypervisor.
[1] I don''t have a detailed counter-proposal design of course, but
that''s mostly because the information and reasoning you have provided
about your objectives and constraints is rather vague.
I agree with George that you should consider allowing someone else to
have a go at explaining things to us.  If this new hypercall is indeed
needed then all that is required is a clear and logical explanation of
why this is so.  I''m sorry to say that your efforts in this direction
so far have not been sufficient, and I feel that our attempts to
elicit explanations from you have not been as successful as needed.
I would love to help Oracle out by solving this problem which is
evidently causing a lot of trouble.  But it''s difficult, and in
particular we do seem to be having serious trouble communicating with
you.
Sorry,
Ian.
> From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com] > Subject: RE: Please ack XENMEM_claim_pages hypercall? > > Dan Magenheimer writes ("RE: Please ack XENMEM_claim_pages hypercall?"): > > From a single-system-xl-toolstack-centric perspective ("paradigm"), > > I can see your point. > > I don''t think this is the case. What you are doing is putting this > node-specific claim functionality in the hypervisor. I still think it > should be done outside the hypervisor. This does not mean that there > has to be a single omniscient piece of software for an entire > cluster. > > <remainder deleted>Hi Ian -- Thanks for taking the time to write a detailed response. I realized that I had promised you a complete summary of the problem and alternate solutions; but then I got involved in demonstrating and refining a prototype and failed to deliver. I had thought the patch summary would serve that purpose, but now see that it is not sufficiently comprehensive. So in an attempt to fulfill my promise and provide all of the necessary information you may require, I am in the process of writing up a complete summary and will send it on a new thread, hopefully later today (US time). Dan