Rusty Russell
2013-May-06 04:22 UTC
Re: lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
Paul Bolle <pebolle-IWqWACnzNjzz+pZb47iToQ@public.gmane.org> writes:> 0) Since v3.9 launching a guest kernel with the lguest tool triggers an > "unhandled trap 13" error if CONFIG_MICROCODE_INTEL_EARLY is set (and > one runs on an Intel CPU). That option was introduced in v3.9.Yech... me too: lguest: unhandled trap 13 at 0x537b8d (0x0)> 1) For instance, on qemu I ran into this error: > lguest: unhandled trap 13 at 0x97087d (0x0) > > 2) Disassembling that address (with the page offset added) in vmlinux > (in gdb) showed: > > Dump of assembler code for function collect_cpu_info_early: > 0xc09707e4 <+0>: push %ebp > [...] > 0xc097087d <+153>: wrmsr > [...] > 0xc09708a5 <+193>: ret > End of assembler dump. > > 3) The internet tells me wrmsr will trigger a general protection fault > if the CPU is not running in ring 0. And a guest kernel seems to do that > since v3.9: > > arch/x86/kernel/head_32.S:call load_ucode_bsp > arch/x86/kernel/microcode_core_early.c:load_ucode_bsp() > arch/x86/kernel/microcode_intel_early.c:_load_ucode_intel_bsp() > collect_cpu_info_early()Yes, there''s also the known issue that CONFIG_OLPC breaks lguest boot. HPA, how about we extend the KEEP_SEGMENTS flag to mean "don''t touch privileged registers" in general? That''s what it used to do when it was introduced, and AFAIK only lguest uses it. Xen folk CC''d. %ebx seems to be free, how about something like this? Subject: x86: treat KEEP_SEGMENTS flag as advice not to try any privileged operations The i386 boot path used to only do one privileged thing before entering the per-platform boot-path: load segments. Thus KEEP_SEGMENTS was born, used by lguest (and could be used by others like Xen). CONFIG_OLPC and CONFIG_MICROCODE_EARLY were added since then, so with those options, lguest guest crash. I''ve been telling people to turn them off, but that''s a bit naff. Expand KEEP_SEGMENTS to mean "don''t try to do ring0 stuff" and it does the right thing. Signed-off-by: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 73afd11..8b50829 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -89,8 +89,9 @@ ENTRY(startup_32) movl pa(stack_start),%ecx /* test KEEP_SEGMENTS flag to see if the bootloader is asking - us to not reload segments */ - testb $(1<<6), BP_loadflags(%esi) + us to not reload segments: keep in %ebx. */ + movb BP_loadflags(%esi),%ebx + andb $(1<<6),%ebx jnz 2f /* @@ -138,6 +139,9 @@ ENTRY(startup_32) movsl 1: + /* KEEP_SEGMENTS implies we don''t touch priv regs at all. */ + andb %ebx,%ebx + jnz 2f #ifdef CONFIG_OLPC /* save OFW''s pgdir table for later use when calling into OFW */ movl %cr3, %eax @@ -149,6 +153,7 @@ ENTRY(startup_32) call load_ucode_bsp #endif +2: /* * Initialize page tables. This creates a PDE and a set of page * tables, which are located immediately beyond __brk_base. The variable
H. Peter Anvin
2013-May-06 14:51 UTC
Re: lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On 05/05/2013 09:22 PM, Rusty Russell wrote:> > HPA, how about we extend the KEEP_SEGMENTS flag to mean "don''t touch > privileged registers" in general? That''s what it used to do when it > was introduced, and AFAIK only lguest uses it. Xen folk CC''d. >KEEP_SEGMENTS was introduced at the request of Xen, not lguest. I''m a bit concerned about extending it as I don''t know what else might end up being affected. On that subject, I would genuinely like to have a discussion of the value vs. pain of continuing to support lguest. It has not been a *huge* problem, especially not compared to Xen, but core paravirtualization has turned out to be a maintenance nightmare and it has had an enormous negative impact on development work. That being said, lguest really hasn''t been a huge problem, but partly it is a "level of support" thing... -hpa
Rusty Russell
2013-May-07 05:18 UTC
Re: lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> writes:> On 05/05/2013 09:22 PM, Rusty Russell wrote: >> >> HPA, how about we extend the KEEP_SEGMENTS flag to mean "don''t touch >> privileged registers" in general? That''s what it used to do when it >> was introduced, and AFAIK only lguest uses it. Xen folk CC''d. >> > > KEEP_SEGMENTS was introduced at the request of Xen, not lguest. I''m a > bit concerned about extending it as I don''t know what else might end up > being affected.Not sure if they ever used it, or if they still have their own entry point. All this stuff (OLPC and early microcode) probably belongs after we jump to the platform handler: the microcode calls to native_cpuid() is a clue that we''re doing something *badly* wrong. It''s easy enough for lguest to decode and skip the unsupported instructions, though I''ve preferred not to do that.> On that subject, I would genuinely like to have a discussion of the > value vs. pain of continuing to support lguest. It has not been a > *huge* problem, especially not compared to Xen, but core > paravirtualization has turned out to be a maintenance nightmare and it > has had an enormous negative impact on development work. > > That being said, lguest really hasn''t been a huge problem, but partly it > is a "level of support" thing...Yeah, I always figured when paravirt_ops goes, lguest goes. By that stage, every reasonable piece of hardware should have VT support, so if lguest really wants to continue it can switch to that. Which makes it look a lot like bhyve, I guess. What''s *really* weird is that there''s been a burst of lguest activity in the last couple of months. Someone''s even reviving lguest64. So maybe it''s not dead yet. Cheers, Rusty.
Konrad Rzeszutek Wilk
2013-May-08 17:20 UTC
Re: [Lguest] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On Tue, May 07, 2013 at 02:48:15PM +0930, Rusty Russell wrote:> "H. Peter Anvin" <hpa@zytor.com> writes: > > On 05/05/2013 09:22 PM, Rusty Russell wrote: > >> > >> HPA, how about we extend the KEEP_SEGMENTS flag to mean "don''t touch > >> privileged registers" in general? That''s what it used to do when it > >> was introduced, and AFAIK only lguest uses it. Xen folk CC''d. > >> > > > > KEEP_SEGMENTS was introduced at the request of Xen, not lguest. I''m a > > bit concerned about extending it as I don''t know what else might end up > > being affected. > > Not sure if they ever used it, or if they still have their own entry > point.I don''t ever recall it being used. But git commit bd53147db8bdf5dd49025c198ff18ac23f560e0e says Both x86 and x86_64 support the same boot protocol so we need to implement the KEEP_SEGMENTS on x86_64 as well. It isn''t just paravirt bootloaders that could use this functionality. And a quick Google search tells me: http://lkml.indiana.edu/hypermail/linux/kernel/1102.0/01422.html I think the only potential user is whatever Eric Biederman thought off?> > All this stuff (OLPC and early microcode) probably belongs after we jump > to the platform handler: the microcode calls to native_cpuid() is a clue > that we''re doing something *badly* wrong. > > It''s easy enough for lguest to decode and skip the unsupported > instructions, though I''ve preferred not to do that. > > > On that subject, I would genuinely like to have a discussion of the > > value vs. pain of continuing to support lguest. It has not been a > > *huge* problem, especially not compared to Xen, but core > > paravirtualization has turned out to be a maintenance nightmare and it > > has had an enormous negative impact on development work.<scratches his head> I think this kind of discussion is best done face-to-face b/c people can become emotional and then this discussion turns in the craper. If I am reading you right the #1 issue is that you don''t know whether a certain paravirt instruction has any side-effects and as such you don''t feel that you can treat it like an equivalent instruction that is defined in the Intel SDM? And that means that any development work you have in the pipeline is affected because you don''t have the documentation on hand and are unsure whether you will break something?> > > > That being said, lguest really hasn''t been a huge problem, but partly it > > is a "level of support" thing... > > Yeah, I always figured when paravirt_ops goes, lguest goes. By that > stage, every reasonable piece of hardware should have VT support, so if > lguest really wants to continue it can switch to that. Which makes it > look a lot like bhyve, I guess. > > What''s *really* weird is that there''s been a burst of lguest activity in > the last couple of months. Someone''s even reviving lguest64. So maybe > it''s not dead yet. > > Cheers, > Rusty. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
H. Peter Anvin
2013-May-08 18:07 UTC
Re: [Xen-devel] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On 05/08/2013 10:20 AM, Konrad Rzeszutek Wilk wrote:> > If I am reading you right the #1 issue is that you don''t know whether > a certain paravirt instruction has any side-effects and as such you don''t > feel that you can treat it like an equivalent instruction that is defined > in the Intel SDM? > > And that means that any development work you have in the pipeline is > affected because you don''t have the documentation on hand and are unsure > whether you will break something? >That is, indeed, the #1 issue (and you and I have discussed it at length, obviously.) There are a few other issues: 2. some of the paravirt_ops are plain wrong. Most of the really big problems are in the MMU-related ones, but as an easily-explained example that I ran into the other day: read_cr4_safe() assumes that there is no useful distinction between "cr4 is zero" and "cr4 doesn''t exist". Unfortunately, this is an invalid assumption. It would be a five-minute fix in the normal case, but since it is paravirtualized, fixing it involves grokking the semantics of each PV layer, including any of the hypercalls that may be involved. In this particular example I think the answer is actually reasonable simple, because I don''t think any of the hypervisors support running on pre-cr4 hardware (basically 486 at this point.) 3. "Let''s add another hook" becomes a far too easy solution to new problems. 4. Performance and maintainability impact of having to support multiple code flows with different semantics. The semantics of the Xen MMU, in particular, is actually quite different from the x86 MMU. 5. Performance and maintainability impact of a maze of twisty little functions, all different. For example, in the case of some of the MSR functions, we actually end up telling the compiler to combine and break up the two 32-bit halves into a 64-bit register multiple times, because the functions don''t actually match up. I still don''t understand why we don''t just patch out the rdmsr/wrmsr instructions, using those registers. -hpa
Rusty Russell
2013-May-09 01:43 UTC
Re: [Xen-devel] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> writes:> On Tue, May 07, 2013 at 02:48:15PM +0930, Rusty Russell wrote: >> "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> writes: >> > On 05/05/2013 09:22 PM, Rusty Russell wrote: >> >> >> >> HPA, how about we extend the KEEP_SEGMENTS flag to mean "don''t touch >> >> privileged registers" in general? That''s what it used to do when it >> >> was introduced, and AFAIK only lguest uses it. Xen folk CC''d. >> >> >> > >> > KEEP_SEGMENTS was introduced at the request of Xen, not lguest. I''m a >> > bit concerned about extending it as I don''t know what else might end up >> > being affected. >> >> Not sure if they ever used it, or if they still have their own entry >> point. > > I don''t ever recall it being used. But git commit bd53147db8bdf5dd49025c198ff18ac23f560e0e > says > > Both x86 and x86_64 support the same boot protocol so we need > to implement the KEEP_SEGMENTS on x86_64 as well. It isn''t > just paravirt bootloaders that could use this functionality. > > And a quick Google search tells me: > http://lkml.indiana.edu/hypermail/linux/kernel/1102.0/01422.html > > I think the only potential user is whatever Eric Biederman thought off?OK, that confirms my understanding it''s lguest-only. To return to the immediate proposal: I''d like to expand it KEEP_SEGMENTS to say "don''t do any privileged ops before the platform entry point" which is what it used to do (before we added those config options), and still does (if they''re disabled). This fixes lguest for those configs, and if there *are* any other users, this just means they don''t support OLPC machines (since 2.6.37) or early microcode (3.9). OLPC + KEEP_SEGMENTS is vanishingly unlikely (the OLPC bootloader doesn''t use it), and early microcode never worked with paravirt anyway (those native_cpuid() calls). Failing that, second best to remove KEEP_SEGMENTS, and have lguest ''emulate'' (ie. skip!) those instructions. We already do this for in/out instructions, see drivers/lguest/x86/core.c emulate_insn(). More code, but it''s lguest code. Cheers, Rusty.
H. Peter Anvin
2013-May-09 02:50 UTC
Re: lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On 05/06/2013 10:18 PM, Rusty Russell wrote:> > All this stuff (OLPC and early microcode) probably belongs after we jump > to the platform handler: the microcode calls to native_cpuid() is a clue > that we''re doing something *badly* wrong. > > It''s easy enough for lguest to decode and skip the unsupported > instructions, though I''ve preferred not to do that. >Depends on the platform, but yes, it applies to native, CE4100, and Moorestown/SFI platforms, but not Xen or lguest, so that pretty much makes sense. -hpa
H. Peter Anvin
2013-May-09 03:24 UTC
Re: [Xen-devel] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On 05/08/2013 06:43 PM, Rusty Russell wrote:> > OK, that confirms my understanding it''s lguest-only. > > To return to the immediate proposal: I''d like to expand it KEEP_SEGMENTS > to say "don''t do any privileged ops before the platform entry point" > which is what it used to do (before we added those config options), and > still does (if they''re disabled). > > This fixes lguest for those configs, and if there *are* any other users, > this just means they don''t support OLPC machines (since 2.6.37) or early > microcode (3.9). OLPC + KEEP_SEGMENTS is vanishingly unlikely (the OLPC > bootloader doesn''t use it), and early microcode never worked with > paravirt anyway (those native_cpuid() calls). > > Failing that, second best to remove KEEP_SEGMENTS, and have lguest > ''emulate'' (ie. skip!) those instructions. We already do this for in/out > instructions, see drivers/lguest/x86/core.c emulate_insn(). More code, > but it''s lguest code. >Or we can skip the microcode stuff for platform == 2. -hpa
Paul Bolle
2013-May-09 08:24 UTC
Re: [Xen-devel] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On Wed, 2013-05-08 at 20:24 -0700, H. Peter Anvin wrote:> On 05/08/2013 06:43 PM, Rusty Russell wrote: > > > > OK, that confirms my understanding it''s lguest-only. > > > > To return to the immediate proposal: I''d like to expand it KEEP_SEGMENTS > > to say "don''t do any privileged ops before the platform entry point" > > which is what it used to do (before we added those config options), and > > still does (if they''re disabled). > > > > This fixes lguest for those configs, and if there *are* any other users, > > this just means they don''t support OLPC machines (since 2.6.37) or early > > microcode (3.9). OLPC + KEEP_SEGMENTS is vanishingly unlikely (the OLPC > > bootloader doesn''t use it), and early microcode never worked with > > paravirt anyway (those native_cpuid() calls). > > > > Failing that, second best to remove KEEP_SEGMENTS, and have lguest > > ''emulate'' (ie. skip!) those instructions. We already do this for in/out > > instructions, see drivers/lguest/x86/core.c emulate_insn(). More code, > > but it''s lguest code. > > > > Or we can skip the microcode stuff for platform == 2.And could the OLPC specific code movl %cr3, %eax movl %eax, pa(olpc_ofw_pgd) in head_32.S perhaps be moved to (just before) the enable_paging label? I don''t think anything cares about %cr3 before that, so it won''t interfere with OLPC''s (creative?) use of that register. And olpc_ofw_pgd seems to be only used much later on, as the call chain looks something like: ENTRY(initial_code) i386_start_kernel() start_kernel() setup_arch() setup_olpc_ofw_pgd() early_ioremap(olpc_ofw_pgd, [...]); That would move those two OLPC specific lines to a point were pv_mmu_ops.read_cr3 points to lguest_read_cr3() so lguest can handle them, wouldn''t it? Paul Bolle
Konrad Rzeszutek Wilk
2013-May-10 14:51 UTC
Is: paravirt docs and diet. Was:Re: [Lguest] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On Wed, May 8, 2013 at 8:47 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Wed, May 08, 2013 at 11:07:33AM -0700, H. Peter Anvin wrote: >> On 05/08/2013 10:20 AM, Konrad Rzeszutek Wilk wrote: >> > >> > If I am reading you right the #1 issue is that you don''t know whether >> > a certain paravirt instruction has any side-effects and as such you don''t >> > feel that you can treat it like an equivalent instruction that is defined >> > in the Intel SDM? >> > >> > And that means that any development work you have in the pipeline is >> > affected because you don''t have the documentation on hand and are unsure >> > whether you will break something? >> > >> >> That is, indeed, the #1 issue (and you and I have discussed it at >> length, obviously.) >First of thank you for enumerating the issues and clarifying them. My understanding was (until this email conversation) was that the lack of the hypercall docs was hindering you. However you are not interested in that - you are after the documentation for the paravirt interface. Specifically how it differs from what baremetal would do. That changes my priorities! In that case lguest provides an excellent source of documentation on how the paravirt interfaces should work. It is almost like reading a book. Perhaps moving them to paravirt.h will help. There are of course some differences between the platforms that use it - but that would help identify them.>> There are a few other issues: >> >> 2. some of the paravirt_ops are plain wrong. Most of the really big >> problems are in the MMU-related ones, but as an easily-explained example >> that I ran into the other day: >> >> read_cr4_safe() assumes that there is no useful distinction between "cr4 >> is zero" and "cr4 doesn''t exist". Unfortunately, this is an invalid >> assumption. It would be a five-minute fix in the normal case, but since >> it is paravirtualized, fixing it involves grokking the semantics of each >> PV layer, including any of the hypercalls that may be involved. >> >> In this particular example I think the answer is actually reasonable >> simple, because I don''t think any of the hypervisors support running on >> pre-cr4 hardware (basically 486 at this point.) >From my understanding the paravirt interface was a shim for three platforms (VMware, Xen and lguest). And the API was choosen to be broad to encompass everything. That was then and nowadays Xen and lguest are the top users. And there are things that don''t make sense anymore (like the example above). Your specific example is interesting as a quick 10 second lookup confirmed that the implementation for different paravirt platforms all point to native_read_cr4(). There are no PV hypercalls involved. Axe! We had a public discussion on this and I think determined that the paravit_ops can benefit from a diet. Certainly the [read|write]_msr_safe that was introduced by Borislav Petkov per your suggestion was a good candidate. So was the store_gdt. I think the read_tsc and this one (read_cr4) can be slashed too. Not sure about the idt ones but I hadn''t done a careful examination of that. I hope it is clear that there is no resistance from me in this diet-program. The issue I face to help you are the same you do: priority - those merge windows, darn bugs, regressions, etc. And until this conversation I was under the impression the documentation for hypercalls was #1 - but that is more of a #3 (#2 being the diet program, #1 being the paravirt documentation).>> >> 3. "Let''s add another hook" becomes a far too easy solution to new problems. >I would hope that with your help in brainstorming these problems one can figure out that right way of doing it. Perhaps instead of posting the patches first, one can solicit you first for help to design it properly from the start?> >> >> 4. Performance and maintainability impact of having to support multiple >> code flows with different semantics. The semantics of the Xen MMU, in >> particular, is actually quite different from the x86 MMU. >You along with Thomas had designed a nice way of not only making this code nicely but also cleaning up the code. And I hope we all learned more about the Xen RO/pinning/pagetables protection mechanism. The semantics are understood now. In terms of development however, I recall that you as a excellent maintainer was able to delegate the work to an Oracle employee and everybody pitched in to help review the code and test. We did not test a forced branch which had some extra patches and that bit us all in the merge window. The after effect is quite nice code and design.> > >> >> 5. Performance and maintainability impact of a maze of twisty little >> functions, all different. For example, in the case of some of the MSR >> functions, we actually end up telling the compiler to combine and break >> up the two 32-bit halves into a 64-bit register multiple times, because >> the functions don''t actually match up. I still don''t understand why we >> don''t just patch out the rdmsr/wrmsr instructions, using those registers.My understanding is that there are two variants of paravirt patching: inline patching (for a selective bunch of them), and patch in a call. The inline patching is the more efficient one and the functions that were choosen for this were the performance sensitive. I think at the time this was designed the rdmsr/wrmsr instructions were not terribly fast. I think that some of these paravirt interfaces could be optimized now. One idea that I had been toying around and asked the ksplice team to prototype was the usage of the asm goto mechanism. It only did it for the pte_modify call but I never got to run the performance numbers on all the different architectures. In summary I think (and please correct me if I mis-understood you): - The paravirt interfaces documentation should be #1 priority, - Fixes, diet and optimization for paravirt infrastructure is needed. And since we don''t want to hinder your development and I would need to reschedule work-items I have to ask - what are the development work that is hindering this? So I have an idea of the timelines (if this needs to be an NDA discussion, pls email me privately your manager''s email so I can start the discussion on that). Lastly, a bit seperate topic, but close, I thought that the paravirt interface was under x86 maintainers ownership, but the MAINTAINERS file says otherwise! Jeremy, Rusty, are you OK with this. Rusty, would you have the time to review the patches or would it be easier if I was the sole person ? From f622669a3b6e05d2962e3886ef8b46add18b1e6c Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Fri, 10 May 2013 10:41:22 -0400 Subject: [PATCH] MAINTAINERS: Become the maintainer of the paravirt interface. Somehow I thought the x86 team owned it. Time to step up and maintain this and put the paravirt interface on a diet! CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Chris Wright <chrisw@sous-sol.org> CC: Alok Kataria <akataria@vmware.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: virtualization@lists.linux-foundation.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- MAINTAINERS | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3d7782b..7e24105 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6044,10 +6044,8 @@ F: drivers/char/ppdev.c F: include/uapi/linux/ppdev.h PARAVIRT_OPS INTERFACE -M: Jeremy Fitzhardinge <jeremy@goop.org> -M: Chris Wright <chrisw@sous-sol.org> -M: Alok Kataria <akataria@vmware.com> M: Rusty Russell <rusty@rustcorp.com.au> +M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> L: virtualization@lists.linux-foundation.org S: Supported F: Documentation/ia64/paravirt_ops.txt -- 1.8.1.4
H. Peter Anvin
2013-May-10 17:19 UTC
Re: Is: paravirt docs and diet. Was:Re: [Xen-devel] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On 05/10/2013 07:51 AM, Konrad Rzeszutek Wilk wrote:> > First of thank you for enumerating the issues and clarifying them. >Thank you for the writeup, and for all the work you personally have done in making things *much* smoother than they have been in the past. I don''t want to make it sound like I''m just bellyaching here. In the end, pvops is effectively driverizing the CPU (and a bunch of other things, some of which should be separately driverized... that work is happening too, of course.) Driverizing the CPU is really, really hard, although it can be done well; it just needs to be documented down to the level of detail the CPU itself is. The current CPU documentation is 3044 pages long ;)> My understanding was (until this email conversation) was that the > lack of the hypercall docs was hindering you. However you are not > interested in that - you are after the documentation for the paravirt > interface. Specifically how it differs from what baremetal would do. > That changes my priorities! > > In that case lguest provides an excellent source of documentation on > how the paravirt interfaces should work. It is almost like reading a book. > > Perhaps moving them to paravirt.h will help. There are of course > some differences between the platforms that use it - but that would help > identify them. >Sort of, but not really. Unless each pvop is documented to the same extent at the native operations they implement, there is no way to deduce the necessary conditions from an example, no matter how well-written. That means that the pvop users themselves have to be comprehensible, too. This is particularly important when we want to change the pvops themselves, as that is a kernel-internal interface as opposed to the hypervisor interface which is an external ABI. [... lots of good stuff snipped ...]> Your specific example is interesting as a quick 10 second lookup confirmed > that the implementation for different paravirt platforms all point to > native_read_cr4(). There are no PV hypercalls involved. Axe!Lguest uses these pvops to avoid touching cr4 at all: /* cr4 is used to enable and disable PGE, but we don''t care. */ static unsigned long lguest_read_cr4(void) { return 0; } static void lguest_write_cr4(unsigned long val) { } Lguest doesn''t even seem to have an implementation for read_cr4_safe; I don''t know what that means will happen.> I hope it is clear that there is no resistance from me in this diet-program. > The issue I face to help you are the same you do: priority - those merge windows, > darn bugs, regressions, etc.Of course. Ultimately, we don''t want to break things. I would like to see the Xen community more committed to a roadmap toward killing PV in favor of HVM or what you call hybrid.> And until this conversation I was under the impression the documentation > for hypercalls was #1 - but that is more of a #3 (#2 being the > diet program, #1 being the paravirt documentation).That probably is a fair statement.>>> 3. "Let''s add another hook" becomes a far too easy solution to new problems. > > I would hope that with your help in brainstorming these problems one can figure > out that right way of doing it. Perhaps instead of posting the patches > first, one can solicit you first for help to design it properly from the start?Absolutely, and that is happening more and more. Perhaps more in general, "show me the code" in all honour, but "show me the design" first.>>> >>> 4. Performance and maintainability impact of having to support multiple >>> code flows with different semantics. The semantics of the Xen MMU, in >>> particular, is actually quite different from the x86 MMU. > > You along with Thomas had designed a nice way of not only making this code > nicely but also cleaning up the code. And I hope we all learned more > about the Xen RO/pinning/pagetables protection mechanism. The semantics > are understood now. > > In terms of development however, I recall that you as a excellent > maintainer was able to delegate the work to an Oracle employee and > everybody pitched in to help review the code and test. We did not test > a forced branch which had some extra patches and that bit us all > in the merge window. > > The after effect is quite nice code and design.Yes, in the end it ended up being a really nice piece of teamwork. Kudos to everyone involved.>>> >>> 5. Performance and maintainability impact of a maze of twisty little >>> functions, all different. For example, in the case of some of the MSR >>> functions, we actually end up telling the compiler to combine and break >>> up the two 32-bit halves into a 64-bit register multiple times, because >>> the functions don''t actually match up. I still don''t understand why we >>> don''t just patch out the rdmsr/wrmsr instructions, using those registers. > > My understanding is that there are two variants of paravirt patching: > inline patching (for a selective bunch of them), and patch in a call. > > The inline patching is the more efficient one and the functions that > were choosen for this were the performance sensitive. I think at the > time this was designed the rdmsr/wrmsr instructions were not terribly > fast.Yes. The vast majority of all MSRs are not performance critical, however, there is a small handful that are (and there are reasons to believe the number will increase.)> I think that some of these paravirt interfaces could be optimized now. > > One idea that I had been toying around and asked the ksplice team to > prototype was the usage of the asm goto mechanism. It only did it for > the pte_modify call but I never got to run the performance numbers > on all the different architectures. > > In summary I think (and please correct me if I mis-understood you): > - The paravirt interfaces documentation should be #1 priority, > - Fixes, diet and optimization for paravirt infrastructure is needed. > > And since we don''t want to hinder your development and I would need > to reschedule work-items I have to ask - what are the development work > that is hindering this? So I have an idea of the timelines (if this > needs to be an NDA discussion, pls email me privately your manager''s > email so I can start the discussion on that).Here is the funny bit... it actually hinders small pieces of work more than big pieces of work. For someone who is already doing a bunch of heavy lifting, it is much easier to amortize the cost of dealing with any PV funnies that come into play, but if you were hoping to do a 10-line patch to fix a problem you''re not going to prefix it with a patch series to fix PV issues raised by the patch if you can at all avoid it.> Lastly, a bit seperate topic, but close, I thought that the paravirt > interface was under x86 maintainers ownership, but the MAINTAINERS > file says otherwise! > > Jeremy, Rusty, are you OK with this. Rusty, would you have the time to > review the patches or would it be easier if I was the sole person ?Yes, I suspect that is one of the problems we have had. The paravirt interface dates back to a time when there was no "de jure" x86 maintainer (Andi Kleen was maintaining x86-64, and Linus was theoretically maintaining i386 as the "reference port" but didn''t really have the bandwidth) which didn''t really help the situation any. At the same time, I think we have realized over time just how hard this is coupled. This is a classic hook problem (and part of why hooks in general are frowned upon in the kernel)... they are highly unobtrusive when they are first created, but then they end up being fixed points for the code around them, and people naturally like to avoid messing with the fixed points... so the code around grows in unnatural ways. That being said, the one single biggest issue is definitely the MMU. That is the one area where PV and non-PV differs dramatically in semantics, and it is also one of the most performance-critical parts of the machine. -hpa
Matt Wilson
2013-May-10 17:53 UTC
Re: Is: paravirt docs and diet. Was:Re: [Lguest] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On Fri, May 10, 2013 at 10:19:16AM -0700, H. Peter Anvin wrote:> On 05/10/2013 07:51 AM, Konrad Rzeszutek Wilk wrote: > > I hope it is clear that there is no resistance from me in this diet-program. > > The issue I face to help you are the same you do: priority - those merge windows, > > darn bugs, regressions, etc. > > Of course. Ultimately, we don''t want to break things. I would like to > see the Xen community more committed to a roadmap toward killing PV in > favor of HVM or what you call hybrid.I think that there is broad agreement both that we want to move in that direction, and also that a migration will take time to realize. When you say "more committed," are you thinking of a declared schedule? Or perhaps more visible progress toward that goal? --msw
H. Peter Anvin
2013-May-10 18:07 UTC
Re: [Xen-devel] Is: paravirt docs and diet. Was:Re: lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
More of a time target... even if far out. Matt Wilson <msw-vV1OtcyAfmbQT0dZR+AlfA@public.gmane.org> wrote:>On Fri, May 10, 2013 at 10:19:16AM -0700, H. Peter Anvin wrote: >> On 05/10/2013 07:51 AM, Konrad Rzeszutek Wilk wrote: >> > I hope it is clear that there is no resistance from me in this >diet-program. >> > The issue I face to help you are the same you do: priority - those >merge windows, >> > darn bugs, regressions, etc. >> >> Of course. Ultimately, we don''t want to break things. I would like >to >> see the Xen community more committed to a roadmap toward killing PV >in >> favor of HVM or what you call hybrid. > >I think that there is broad agreement both that we want to move in >that direction, and also that a migration will take time to realize. > >When you say "more committed," are you thinking of a declared >schedule? Or perhaps more visible progress toward that goal? > >--msw-- Sent from my mobile phone. Please excuse brevity and lack of formatting.
Konrad Rzeszutek Wilk
2013-May-13 20:11 UTC
Re: Is: paravirt docs and diet. Was:Re: [Lguest] lguest: unhandled trap 13 and CONFIG_MICROCODE_INTEL_EARLY
On Fri, May 10, 2013 at 11:07:10AM -0700, H. Peter Anvin wrote:> More of a time target... even if far out.The PVH (which is what Mukesh CC-ed here) not yet in the hypervisor (even as an experiemental case). When that is in, and it works with the base test-cases (launching guests, migration, etc) that would start the clock. We chatted about this last year at the Xen Developer meeting to make sure we all are comfortable with this and also with the time target. Which if I recall correctly was +5 years from the start of the the base test-cases working and the PVH feature turned from experimental to working. Just to iterate, this would mean that a lot of paravirt MMU parts would not been needed anymore. Please note that ''killing PV'' is a bit of broad word. The PV (paravirtualization) of say drivers - so that we don''t use emulated drivers, or use faster IPIs, or what not - would not be eliminated. This is just about paravirt ops reduction.> > Matt Wilson <msw@amazon.com> wrote: > > >On Fri, May 10, 2013 at 10:19:16AM -0700, H. Peter Anvin wrote: > >> On 05/10/2013 07:51 AM, Konrad Rzeszutek Wilk wrote: > >> > I hope it is clear that there is no resistance from me in this > >diet-program. > >> > The issue I face to help you are the same you do: priority - those > >merge windows, > >> > darn bugs, regressions, etc. > >> > >> Of course. Ultimately, we don''t want to break things. I would like > >to > >> see the Xen community more committed to a roadmap toward killing PV > >in > >> favor of HVM or what you call hybrid. > > > >I think that there is broad agreement both that we want to move in > >that direction, and also that a migration will take time to realize. > > > >When you say "more committed," are you thinking of a declared > >schedule? Or perhaps more visible progress toward that goal? > > > >--msw > > -- > Sent from my mobile phone. Please excuse brevity and lack of formatting. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >