George, Ian, Ian, Tim -- After the very extended debate about Oracle''s proposed "memory capacity claim" hypercall, it seemed prudent to first implement and demonstrate a working model. A hypervisor patch (and a two-line libxc patch) has now been completed and reviewed and Jan Beulich has found it acceptable, so it is now time to return to the discussion of whether the feature is needed. The patch description provides the detailed technical argument: http://lists.xen.org/archives/html/xen-devel/2012-11/msg01427.html I believe, in the earlier debate, you were the most vociferous opponents of this proposed hypercall, primarily -- if I may attempt to summarize -- because you believe that your paradigm of a how a toolstack should manage memory ("omnisciently") renders the proposed feature unnecessary. I believe I adequately described the differences between your paradigm and Oracle''s, and also demonstrated Oracle''s customer need for the feature, but the debate fell silent without your acknowledgement. While it has always been my understanding that the hypervisor is intended to be toolstack-independent, Jan would like your Ack before committing the hypervisor changes. So if you still object, please state your objections. Otherwise, please ack, so Jan can commit it and Oracle can move forward with toolstack development built on the hypercall. Thanks, Dan
>>> On 26.11.12 at 20:07, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > While it has always been my understanding that the > hypervisor is intended to be toolstack-independent, > Jan would like your Ack before committing the hypervisor > changes. > > So if you still object, please state your objections. > Otherwise, please ack, so Jan can commit it and Oracle > can move forward with toolstack development built > on the hypercall.Sorry, there must have been some misunderstanding here: First of all, without a maintainer''s ack (Keir''s in this case) I can''t commit anything to code that I''m not explicitly listed for as maintainer. Second, while I said the code itself looks acceptable, I also pointed out that in the shape it is right now it is dead code, as there''s no user for it. So all we would get would be the risk of new bugs (and the one I just pointed out worries me in so far as how much testing this code really has seen). Third, deferral (or denial) of the patch going in is certainly not a blocking factor for tools side development at Oracle. In the worst case, you''d have to maintain the patch in your own tree(s); I do realize that you want to avoid that (as I would, but there are examples of patches that we carry in our trees that didn''t get accepted into the community one - luckily they''re of smaller size). Jan
On 27/11/2012 08:48, "Jan Beulich" <JBeulich@suse.com> wrote:> Sorry, there must have been some misunderstanding here: First > of all, without a maintainer''s ack (Keir''s in this case) I can''t commit > anything to code that I''m not explicitly listed for as maintainer. > > Second, while I said the code itself looks acceptable, I also pointed > out that in the shape it is right now it is dead code, as there''s no > user for it. So all we would get would be the risk of new bugs (and > the one I just pointed out worries me in so far as how much testing > this code really has seen). > > Third, deferral (or denial) of the patch going in is certainly not a > blocking factor for tools side development at Oracle. In the worst > case, you''d have to maintain the patch in your own tree(s); I do > realize that you want to avoid that (as I would, but there are > examples of patches that we carry in our trees that didn''t get > accepted into the community one - luckily they''re of smaller size).It''s actually not that large a patch, and perhaps we could take a domain_adjust_tot_pages() hook which would reduce the size of any private patch, while not inflicting a maintenance or bug burden on mainline. -- Keir
On Tue, 2012-11-27 at 08:48 +0000, Jan Beulich wrote:> Second, while I said the code itself looks acceptable, I also pointed > out that in the shape it is right now it is dead code, as there''s no > user for it. So all we would get would be the risk of new bugs (and > the one I just pointed out worries me in so far as how much testing > this code really has seen).I agree with this.> Third, deferral (or denial) of the patch going in is certainly not a > blocking factor for tools side development at Oracle. In the worst > case, you''d have to maintain the patch in your own tree(s); I do > realize that you want to avoid that (as I would, but there are > examples of patches that we carry in our trees that didn''t get > accepted into the community one - luckily they''re of smaller size).I think Oracle carrying this patch in their tree is probably the best approach for now. I''m also a little surprised that this patch is being so aggressively pushed upstream when the toolstack work which would use it is seemingly not fully formed yet. Anyway, it seems to me that this argument seems to me to be starting from the wrong end, it starts from the hypercall and tries to justify it based on requirements imposed by an toolstack which is presented as something of a fixed black box from the xen-devel point of view, which is not something I find particularly convincing. So a possible alternative to Oracle carrying this patch long term is that someone who understands Oracle''s toolstack''s requirements and constraints takes over from Dan (who I think has said several times that he is not familiar with all the details of the Oracle toolstack) as advocate for finding a solution to the underlying issue here and can engage xen-devel in a discussion about the design decisions involved from the toolstack downwards. Either this leads to the proposed solution in the hypervisor (or something similar) or it results in a completely different solution which everyone is happy with (or I suppose it might still end up with Oracle carrying this patch long term). I also feel I should also point out that contrary to the claims in http://lists.xen.org/archives/html/xen-devel/2012-11/msg01427.html and elsewhere the acceptance or otherwise of this patch has nothing to do with Citrix. Although some of the folks involved in the discussion are employed by Citrix they are all members of the "platform team" which operates independently, is concerned with the state of Xen.org provided Xen bits and is not tied to any product team (Citrix or otherwise). So this has nothing whatsoever to do with Citrix''s plans to use this mechanism (and such conspiracy theories IMHO add nothing to the discussion). AFAIK no one who is involved with any of Citrix''s products has said anything at all in any thread on the matter one way or the other. Ian
> From: Jan Beulich [mailto:JBeulich@suse.com] > Subject: Re: Please ack XENMEM_claim_pages hypercall?Hi Jan --> >>> On 26.11.12 at 20:07, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > > While it has always been my understanding that the > > hypervisor is intended to be toolstack-independent, > > Jan would like your Ack before committing the hypervisor > > changes. > > > > So if you still object, please state your objections. > > Otherwise, please ack, so Jan can commit it and Oracle > > can move forward with toolstack development built > > on the hypercall. > > Sorry, there must have been some misunderstanding here: First > of all, without a maintainer''s ack (Keir''s in this case) I can''t commit > anything to code that I''m not explicitly listed for as maintainer.Oops, sorry. Since you contribute so widely to the hypervisor, the maintainership division between you and Keir is not particularly clear.> Second, while I said the code itself looks acceptable, I also pointed > out that in the shape it is right now it is dead code, as there''s no > user for it. So all we would get would be the risk of new bugs (and > the one I just pointed out worries me in so far as how much testing > this code really has seen).The proposed hypercall is dead code to the _xl_ toolstack. As I said, I personally can write an xm/xend patch that uses it (which will undoubtedly launch another firestorm, so I was trying to avoid that). But most importantly, many of the code changes _will_ be tested without any toolstack changes at all, as the existing toolstack (with no claim hypercalls) exercises the changes. Perhaps this is the most important testing of all and, as I understand it, is a primary purpose of xen-unstable. I do agree though that it is difficult to get adequate testing without a toolstack user. I, too, would like to see that fixed. :-(> Third, deferral (or denial) of the patch going in is certainly not a > blocking factor for tools side development at Oracle. In the worst > case, you''d have to maintain the patch in your own tree(s); I do > realize that you want to avoid that (as I would, but there are > examples of patches that we carry in our trees that didn''t get > accepted into the community one - luckily they''re of smaller size).I guess I''ve failed to make a very important point because I thought it was obvious... Because of the addition of the hypercall subops, this is an ABI change. I think you would agree that maintaining an ABI change out-of-tree is much more difficult than maintaining non-ABI changes out-of-tree. If the hypercall subops are reserved, whether the remainder of the patch is accepted now or not, that might be a reasonable compromise to Oracle. Thanks, Dan
> From: Keir Fraser [mailto:keir@xen.org] > Subject: Re: [Xen-devel] Please ack XENMEM_claim_pages hypercall? > > On 27/11/2012 08:48, "Jan Beulich" <JBeulich@suse.com> wrote: > > > Sorry, there must have been some misunderstanding here: First > > of all, without a maintainer''s ack (Keir''s in this case) I can''t commit > > anything to code that I''m not explicitly listed for as maintainer. > > > > Second, while I said the code itself looks acceptable, I also pointed > > out that in the shape it is right now it is dead code, as there''s no > > user for it. So all we would get would be the risk of new bugs (and > > the one I just pointed out worries me in so far as how much testing > > this code really has seen). > > > > Third, deferral (or denial) of the patch going in is certainly not a > > blocking factor for tools side development at Oracle. In the worst > > case, you''d have to maintain the patch in your own tree(s); I do > > realize that you want to avoid that (as I would, but there are > > examples of patches that we carry in our trees that didn''t get > > accepted into the community one - luckily they''re of smaller size). > > It''s actually not that large a patch, and perhaps we could take a > domain_adjust_tot_pages() hook which would reduce the size of any private > patch, while not inflicting a maintenance or bug burden on mainline.Thanks for your kind words. That would be very helpful. Also reserving the subops would be much more valuable. Thanks, Dan
At 10:31 +0000 on 27 Nov (1354012308), Ian Campbell wrote:> On Tue, 2012-11-27 at 08:48 +0000, Jan Beulich wrote: > > Second, while I said the code itself looks acceptable, I also pointed > > out that in the shape it is right now it is dead code, as there''s no > > user for it. So all we would get would be the risk of new bugs (and > > the one I just pointed out worries me in so far as how much testing > > this code really has seen). > > I agree with this.Me too (and with Jan and Ian''s other points).> > Third, deferral (or denial) of the patch going in is certainly not a > > blocking factor for tools side development at Oracle. In the worst > > case, you''d have to maintain the patch in your own tree(s); I do > > realize that you want to avoid that (as I would, but there are > > examples of patches that we carry in our trees that didn''t get > > accepted into the community one - luckily they''re of smaller size). > > I think Oracle carrying this patch in their tree is probably the best > approach for now.Agreed. We can always take a patch reserving hypercall numbers for now, so there''s no risk that an Oracle private patch will clash with an upstream allocation later on. Tim.
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Tuesday, November 27, 2012 3:32 AM > To: Jan Beulich > Cc: Dan Magenheimer; George Dunlap; Ian Jackson; xen-devel@lists.xen.org; Konrad Wilk; Zhigang Wang; > Keir (Xen.org); Tim (Xen.org) > Subject: Re: Please ack XENMEM_claim_pages hypercall?Hi Ian --> I''m also a little surprised that this patch is being so aggressively > pushed upstream when the toolstack work which would use it is seemingly > not fully formed yet.The deficiency in the Oracle toolstack has been known for years and this solution has been discussed and approved at the VP level.> Anyway, it seems to me that this argument seems to me to be starting > from the wrong end, it starts from the hypercall and tries to justify it > based on requirements imposed by an toolstack which is presented as > something of a fixed black box from the xen-devel point of view, which > is not something I find particularly convincing.I fully recognize that the existence of the black box is annoying. But the argument began not with a hypercall but with a real customer problem. The early discussion, in which others suggested "change your toolstack", uncovered a rather dramatic paradigm difference that re-emphasized the need for the hypercall. See below.> So a possible alternative to Oracle carrying this patch long term is > that someone who understands Oracle''s toolstack''s requirements and > constraints takes over from Dan (who I think has said several times that > he is not familiar with all the details of the Oracle toolstack) as > advocate for finding a solution to the underlying issue here and can > engage xen-devel in a discussion about the design decisions involved > from the toolstack downwards. Either this leads to the proposed solution > in the hypervisor (or something similar) or it results in a completely > different solution which everyone is happy with (or I suppose it might > still end up with Oracle carrying this patch long term).From a single-system-xl-toolstack-centric perspective ("paradigm"), I can see your point. This is not an xl-toolstack-centric problem.> I also feel I should also point out that contrary to the claims in > http://lists.xen.org/archives/html/xen-devel/2012-11/msg01427.html and > elsewhere the acceptance or otherwise of this patch has nothing to do > with Citrix. Although some of the folks involved in the discussion are > employed by Citrix they are all members of the "platform team" which > operates independently, is concerned with the state of Xen.org provided > Xen bits and is not tied to any product team (Citrix or otherwise). So > this has nothing whatsoever to do with Citrix''s plans to use this > mechanism (and such conspiracy theories IMHO add nothing to the > discussion). AFAIK no one who is involved with any of Citrix''s products > has said anything at all in any thread on the matter one way or the > other.Conspiracy theory? Some old guy from your neck of the woods said something like: "Methinks, the maintainer doth protest too much." ;-) I apologize if my words have implied that maintainers employed by Citrix are intentionally making decisions that favor Citrix. I _am_ implying, however, that the single-system-inference/policy-driven memory-load-balancer paradigm _has_ influenced the open source hypervisor _and_ heavily influenced the objections to this proposal. AFAIK, Citrix''s Dynamic Memory Controller (DMC) in XenServer is the only shipping example (in the Xen universe) of that, so I call it the "Citrix paradigm". Xen decisions made with this paradigm in mind heavily favor a single-system model, which lead to solutions to problems which are not as applicable to a data-center paradigm such as Oracle''s. I _do_ consider this a serious issue and suggest that the maintainers consider it deeply because the hypervisor must serve both paradigms. Now, with that in mind, I still haven''t heard any objections other than insufficient testing (fair, but also very true of most patches accepted into xen-unstable months before a release); and "your toolstack should use the ''Citrix paradigm''" (and I believe I have adequately explained why we cannot). Are there other valid objections? Thanks, Dan
On 27/11/12 14:38, Dan Magenheimer wrote:> Now, with that in mind, I still haven''t heard any objections > other than insufficient testing (fair, but also very true of > most patches accepted into xen-unstable months before a release); > and "your toolstack should use the ''Citrix paradigm''" (and > I believe I have adequately explained why we cannot). Are > there other valid objections?Yes -- the main one being, you have not convinced anyone it''s necessary yet. Maybe you should ask someone else to take up this cause -- we seem to have major problems communicating with you. -George
On 27/11/2012 14:08, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> It''s actually not that large a patch, and perhaps we could take a >> domain_adjust_tot_pages() hook which would reduce the size of any private >> patch, while not inflicting a maintenance or bug burden on mainline. > > Thanks for your kind words. That would be very helpful. > Also reserving the subops would be much more valuable.Yes yes, reserving subop(s) is certainly doable. We''ve done it for other vendors in the past, and happy to again, within reason. -- Keir
Dan Magenheimer writes ("RE: Please ack XENMEM_claim_pages hypercall?"):> From a single-system-xl-toolstack-centric perspective ("paradigm"), > I can see your point.I don''t think this is the case. What you are doing is putting this node-specific claim functionality in the hypervisor. I still think it should be done outside the hypervisor. This does not mean that there has to be a single omniscient piece of software for an entire cluster. It just means that there has to be a single omniscient piece of software _for a particular host_. In your proposal that piece of software is the hypervisor - specifically, the part of the hypervisor that does the bookkeeping of these claims. However I am still unconvinced that this can''t be implemented outside the hypervisor. In general it is a key design principle of a system like Xen that the hypervisor should provide only those facilities which are strictly necessary. Any functionality which can be reasonably provided outside the hypervisor should be excluded from it. It is this principle which you are running up against. It may be (and it seems likely from what you''ve said in private email) that some of your existing guests make some assumptions about the semantics of the Xen ballooning memory interface, which might be violated by such a design. Specifically it appears that your design has guests deliberately balloon down further than requested by the toolstack, in a kind of attempt to negotiate with other users of memory on the system by actually releasing and claiming Xen memory. But I don''t think it''s reasonable to demand that the shared codebase reflect such undocumented prior assumptions. Particularly when this design seems poor. I find it poor because (a) using actual memory allocation and release provides only a very impoverished signalling mechanism (b) it imposes on every part of the system the possibility of unexpected memory allocation failure. (Your claim hypercall is an attempt to mitigate (b) for some but not all cases.) It seems to me that the correct approach is to design and implement a new interface which allows a guest (whether that be its kernel or an agent) to conduct a richer negotiation with out-of-hypervisor toolstack software. But that toolstack software (which might take any particular shape - certainly we don''t want to make any assumptions about its policies and nature) needs to be sufficiently aware of the claims and arbitrate between them, in a way that arranges that guests which obey the rules never see an "out of memory" from the hypervisor. Of course if it really is desired to have each guest make its own decisions and simply for them to somehow agree to divvy up the available resources, then even so a new hypervisor mechanism is not needed. All that is needed is a way for those guests to synchronise their accesses and updates to shared records of the available and in-use memory.> AFAIK, Citrix''s Dynamic Memory Controller (DMC) in XenServer is > the only shipping example (in the Xen universe) of that,I have never worked on the XenServer codebase and I have no clear idea what this "Dynamic Memory Controller" is. No-one here has explained it to me and I have no particular desire to know about it. Its design, and its requirements or lack of them, have not influenced my opinion on your proposals. As an example to demonstrate that the reaction you are seeing is nothing to do with whether the originators of the proposal are inside our outside Citrix, please refer to our cool reaction to the v4v proposals which also involved new hypervisor functionality. There too, we require the case to be made: we need to be able to see that an out-of-hypervisor approach is not sufficient.> Xen decisions made with this paradigm > in mind heavily favor a single-system model,Nothing in my end of this conversation is predicated on any particular deployment paradigm. It is clear that in both your proposal and my counter-proposal[1] there is a single place in each host where memory allocation decisions are made and in particular where the memory needs of competing guests etc. are arbitrated. In your proposal this place is in the hypervisor and the negotiation between the competing resource users is "grab the memory if you want to". Naturally an in-hypervisor arbitration facility has to be very simple and a sophisticated policy is difficult to apply. In my counter-proposal this negotiation occurs between the guest and an out-of-hypervisor per-host arbitrator of some kind. I think you are going to say that in your system the guests decide for themselves how much memory to claim based on their views of how much is free, and whether their allocations fail. However, there is no particular reason why the information about how much memory is free, and how much has been committed for each purpose, could not be collected somewhere outside the hypervisor. [1] I don''t have a detailed counter-proposal design of course, but that''s mostly because the information and reasoning you have provided about your objectives and constraints is rather vague. I agree with George that you should consider allowing someone else to have a go at explaining things to us. If this new hypercall is indeed needed then all that is required is a clear and logical explanation of why this is so. I''m sorry to say that your efforts in this direction so far have not been sufficient, and I feel that our attempts to elicit explanations from you have not been as successful as needed. I would love to help Oracle out by solving this problem which is evidently causing a lot of trouble. But it''s difficult, and in particular we do seem to be having serious trouble communicating with you. Sorry, Ian.
> From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com] > Subject: RE: Please ack XENMEM_claim_pages hypercall? > > Dan Magenheimer writes ("RE: Please ack XENMEM_claim_pages hypercall?"): > > From a single-system-xl-toolstack-centric perspective ("paradigm"), > > I can see your point. > > I don''t think this is the case. What you are doing is putting this > node-specific claim functionality in the hypervisor. I still think it > should be done outside the hypervisor. This does not mean that there > has to be a single omniscient piece of software for an entire > cluster. > > <remainder deleted>Hi Ian -- Thanks for taking the time to write a detailed response. I realized that I had promised you a complete summary of the problem and alternate solutions; but then I got involved in demonstrating and refining a prototype and failed to deliver. I had thought the patch summary would serve that purpose, but now see that it is not sufficiently comprehensive. So in an attempt to fulfill my promise and provide all of the necessary information you may require, I am in the process of writing up a complete summary and will send it on a new thread, hopefully later today (US time). Dan