I just decided to create a new thread... Here''s the code I wanted to propose. thanks, Mukesh /* * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall used * to initialise a secondary vcpu prior to boot. Called from * arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. * * In case of linux: * The call comes from cpu_initialize_context(). (boot vcpu 0 context is * set by the tools via do_domctl -> vcpu_initialise). * * PVH 32bitfixme: this function needs to be modified for 32bit guest. */ int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) { int rc; if ( v->vcpu_id == 0 ) return 0; if ( !(ctxtp->flags & VGCF_in_kernel) ) return -EINVAL; if ( ctxtp->ldt_base || ctxtp->ldt_ents || (ctxtp->user_regs.cs & 4) || ctxtp->user_regs.ss || ctxtp->user_regs.es || ctxtp->user_regs.ds ) return -EINVAL; if ( ctxtp->user_regs.cs == 0 ) return -EINVAL; vmx_vmcs_enter(v); __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); /* IA-32e: ss/es/ds are ignored, we load cs only. */ __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); if ( (rc = hvm_load_segment_selector(x86_seg_cs, ctxtp->user_regs.cs)) ) return rc; __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) { vmx_vmcs_exit(v); return rc; } vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); vmx_vmcs_exit(v); return 0; }
Hi, At 18:45 -0700 on 12 Aug (1376333113), Mukesh Rathor wrote:> I just decided to create a new thread... Here''s the code I wanted > to propose. > > /* > * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall used > * to initialise a secondary vcpu prior to boot. Called from > * arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. > * > * In case of linux: > * The call comes from cpu_initialize_context(). (boot vcpu 0 context is > * set by the tools via do_domctl -> vcpu_initialise). > * > * PVH 32bitfixme: this function needs to be modified for 32bit guest. > */ > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) > { > int rc; > > if ( v->vcpu_id == 0 ) > return 0; > > if ( !(ctxtp->flags & VGCF_in_kernel) ) > return -EINVAL;Is that a new restriction on PVH? Should there be a matching check of the CS RPL or is it safe not to?> if ( ctxtp->ldt_base || ctxtp->ldt_ents || > (ctxtp->user_regs.cs & 4)I think you could support LDTs by either taking the LDT range supplied (with a bit of sanity checking) or asking for a descriptor and using hvm_load_segment_selector(x86_seg_ldtr).> || ctxtp->user_regs.ss || > ctxtp->user_regs.es || ctxtp->user_regs.ds ) > return -EINVAL; > > if ( ctxtp->user_regs.cs == 0 ) > return -EINVAL; > > vmx_vmcs_enter(v); > __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); > __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); > > __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > > /* IA-32e: ss/es/ds are ignored, we load cs only. */ > __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); > if ( (rc = hvm_load_segment_selector(x86_seg_cs, ctxtp->user_regs.cs)) ) > return rc;This looks good to me, thank you.> __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); > __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs);Or we could load these segments from their selectors too, and then either check that the supplied bases match or mandate that they be zero. I''m happy enough with this interface, though. Either way it would be nice to document which of GDT[fs].base and fs_base is actually loaded (and likewise for gs_base_*). Tim.> if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) > { > vmx_vmcs_exit(v); > return rc; > } > vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); > > vmx_vmcs_exit(v); > return 0; > }
>>> On 13.08.13 at 03:45, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) > { > int rc; > > if ( v->vcpu_id == 0 ) > return 0;Bogus/pointless.> if ( !(ctxtp->flags & VGCF_in_kernel) ) > return -EINVAL; > > if ( ctxtp->ldt_base || ctxtp->ldt_ents || > (ctxtp->user_regs.cs & 4) || ctxtp->user_regs.ss || > ctxtp->user_regs.es || ctxtp->user_regs.ds ) > return -EINVAL;How about FS/GS? If you don''t enforce these selectors to be zero too, then loading only base and selector values below isn''t sufficient (and again potentially inconsistent).> > if ( ctxtp->user_regs.cs == 0 ) > return -EINVAL;Perhaps also check RPL == 0?> vmx_vmcs_enter(v); > __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); > __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); > > __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > > /* IA-32e: ss/es/ds are ignored, we load cs only. */ > __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); > if ( (rc = hvm_load_segment_selector(x86_seg_cs, ctxtp->user_regs.cs)) ) > return rc;You can''t use that function here without modification, as it assumes v == current. Jan> > __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); > __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); > > if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) > { > vmx_vmcs_exit(v); > return rc; > } > vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); > > vmx_vmcs_exit(v); > return 0; > }
>>> On 13.08.13 at 11:16, Tim Deegan <tim@xen.org> wrote: >> __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); >> __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); > > Or we could load these segments from their selectors too, and then > either check that the supplied bases match or mandate that they be zero.For FS and GS that would be wrong - you can''t load them through a selector when the base is beyond 4Gb. As said in my other reply, I think we should require the selectors to be zero, and load just the bases. Jan
On Tue, 13 Aug 2013 11:56:36 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 13.08.13 at 03:45, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context > > *ctxtp) { > > int rc; > > > > if ( v->vcpu_id == 0 ) > > return 0; > > Bogus/pointless.No, we don''t have anything for the boot vcpu. It''s totally coming up on the flat address space. For non boot, the vcpu is coming up on the kernel GDT. Recall it''s a PV guest (coming up in an HVM container).> > if ( !(ctxtp->flags & VGCF_in_kernel) ) > > return -EINVAL; > > > > if ( ctxtp->ldt_base || ctxtp->ldt_ents || > > (ctxtp->user_regs.cs & 4) || ctxtp->user_regs.ss || > > ctxtp->user_regs.es || ctxtp->user_regs.ds ) > > return -EINVAL; > > How about FS/GS? If you don''t enforce these selectors to be zero > too, then loading only base and selector values below isn''t > sufficient (and again potentially inconsistent). > > > > > if ( ctxtp->user_regs.cs == 0 ) > > return -EINVAL; > > Perhaps also check RPL == 0?OK. I still think we should just remove the check for VGCF_in_kernel, why do we care for PVH? For 64bit PV, we needed that, but a PVH guest can come up any RPL it chooses to, right? Below is updated version. thanks Mukesh /* * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall used * to initialise a secondary vcpu prior to boot. Called from * arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. * * In case of linux: * The call comes from cpu_initialize_context(). (boot vcpu 0 context is * set by the tools via do_domctl -> vcpu_initialise). * * PVH 32bitfixme: this function needs to be modified for 32bit guest. */ int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) { int rc = 0; if ( v->vcpu_id == 0 ) return 0; if ( !(ctxtp->flags & VGCF_in_kernel) ) return -EINVAL; if ( ctxtp->ldt_base || ctxtp->ldt_ents || (ctxtp->user_regs.cs & 4) || ctxtp->user_regs.ss || ctxtp->user_regs.es || ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ) return -EINVAL; if ( ctxtp->user_regs.cs == 0 || (ctxtp->user_regs.cs & 3) == 3 ) return -EINVAL; vmx_vmcs_enter(v); __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); /* IA-32e: ss/es/ds are ignored, we load cs only. */ __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); if ( (rc = hvm_load_segment_selector(v, x86_seg_cs, ctxtp->user_regs.cs)) ) goto out; if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) goto out; vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); out: vmx_vmcs_exit(v); return 0; }
>>> On 14.08.13 at 04:12, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Tue, 13 Aug 2013 11:56:36 +0100 > "Jan Beulich" <JBeulich@suse.com> wrote: > >> >>> On 13.08.13 at 03:45, Mukesh Rathor <mukesh.rathor@oracle.com> >> >>> wrote: >> > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context >> > *ctxtp) { >> > int rc; >> > >> > if ( v->vcpu_id == 0 ) >> > return 0; >> >> Bogus/pointless. > > No, we don''t have anything for the boot vcpu. It''s totally coming up > on the flat address space. For non boot, the vcpu is coming up on the > kernel GDT. Recall it''s a PV guest (coming up in an HVM container).No, that''s the wrong perspective. You either should never get here for vCPU 0, or you should refuse this for all already initialized vCPU-s.>> > if ( !(ctxtp->flags & VGCF_in_kernel) ) >> > return -EINVAL; >> > >> > if ( ctxtp->ldt_base || ctxtp->ldt_ents || >> > (ctxtp->user_regs.cs & 4) || ctxtp->user_regs.ss || >> > ctxtp->user_regs.es || ctxtp->user_regs.ds ) >> > return -EINVAL; >> >> How about FS/GS? If you don''t enforce these selectors to be zero >> too, then loading only base and selector values below isn''t >> sufficient (and again potentially inconsistent). >> >> > >> > if ( ctxtp->user_regs.cs == 0 ) >> > return -EINVAL; >> >> Perhaps also check RPL == 0? > > OK. I still think we should just remove the check for VGCF_in_kernel, > why do we care for PVH? For 64bit PV, we needed that, but a PVH guest > can come up any RPL it chooses to, right?Again, no - if you drop the checking of the flag, you need the function to behave correctly even if it is not set (and do consistency checks for _both_ cases). So to keep the code simple _and_ correct, checking the flag to be clear and doing only the kernel mode validation is likely the better route.> int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) > { > int rc = 0; > > if ( v->vcpu_id == 0 ) > return 0; > > if ( !(ctxtp->flags & VGCF_in_kernel) ) > return -EINVAL; > > if ( ctxtp->ldt_base || ctxtp->ldt_ents || > (ctxtp->user_regs.cs & 4) || ctxtp->user_regs.ss || > ctxtp->user_regs.es || ctxtp->user_regs.ds || > ctxtp->user_regs.fs || ctxtp->user_regs.gs ) > > return -EINVAL;Stray blank line before the return statement.> > if ( ctxtp->user_regs.cs == 0 || (ctxtp->user_regs.cs & 3) == 3 ) > return -EINVAL;Perhaps better check for any non-zero RPL, as the RPL needs to match of the (enforced) CS hidden fields? Also, I''d very much prefer all the CS related checks to be together.> vmx_vmcs_enter(v); > __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); > __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); > > __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > > /* IA-32e: ss/es/ds are ignored, we load cs only. */ > __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs);Pointlessly duplicating what the function call below will also do.> if ( (rc = hvm_load_segment_selector(v, x86_seg_cs, ctxtp->user_regs.cs)) ) > goto out;Jan
On Wed, Aug 14, 2013 at 10:12 AM, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 14.08.13 at 04:12, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: >> On Tue, 13 Aug 2013 11:56:36 +0100 >> "Jan Beulich" <JBeulich@suse.com> wrote: >> >>> >>> On 13.08.13 at 03:45, Mukesh Rathor <mukesh.rathor@oracle.com> >>> >>> wrote: >>> > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context >>> > *ctxtp) { >>> > int rc; >>> > >>> > if ( v->vcpu_id == 0 ) >>> > return 0; >>> >>> Bogus/pointless. >> >> No, we don''t have anything for the boot vcpu. It''s totally coming up >> on the flat address space. For non boot, the vcpu is coming up on the >> kernel GDT. Recall it''s a PV guest (coming up in an HVM container). > > No, that''s the wrong perspective. You either should never get here > for vCPU 0, or you should refuse this for all already initialized > vCPU-s.+1, FWIW. -George
On Wed, 14 Aug 2013 10:12:18 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 14.08.13 at 04:12, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > On Tue, 13 Aug 2013 11:56:36 +0100 > > "Jan Beulich" <JBeulich@suse.com> wrote: > >...> >> > if ( v->vcpu_id == 0 ) > >> > return 0; > >> > >> Bogus/pointless. > > > > No, we don''t have anything for the boot vcpu. It''s totally coming up > > on the flat address space. For non boot, the vcpu is coming up on > > the kernel GDT. Recall it''s a PV guest (coming up in an HVM > > container). > > No, that''s the wrong perspective. You either should never get here > for vCPU 0, or you should refuse this for all already initialized > vCPU-s.Ok, i''ll move the check to caller. FWIW, the vcpu->is_initialised is inapplicable here because this is relevant for non-boot vcpus only.>> >> if ( ctxtp->user_regs.cs == 0 || (ctxtp->user_regs.cs & 3) == 3 ) >> return -EINVAL;>Perhaps better check for any non-zero RPL, as the RPL needs >to match of the (enforced) CS hidden fields?Well, we now load the descriptor provided by the guest in the GDT so that''s kinda un needed now.. thats why i thought we don''t need to enforce RPL or check VGCF_in_kernel. But, that would work too... thanks, mukesh Final(hopefully) version: /* * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall used * to initialise a secondary vcpu prior to boot. (boot vcpu 0 context is * set by the tools via do_domctl -> vcpu_initialise). Called from * arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. * * In case of linux: * The call comes from cpu_initialize_context(). * * PVH 32bitfixme: this function needs to be modified for 32bit guest. */ int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) { int rc; if ( !(ctxtp->flags & VGCF_in_kernel) ) return -EINVAL; if ( ctxtp->ldt_base || ctxtp->ldt_ents || ctxtp->user_regs.ss || ctxtp->user_regs.es || ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ) return -EINVAL; if ( ctxtp->user_regs.cs == 0 || (ctxtp->user_regs.cs & 7) ) return -EINVAL; vmx_vmcs_enter(v); __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); /* IA-32e: ss/es/ds are ignored. */ if ( (rc = hvm_load_segment_selector(v, x86_seg_cs, ctxtp->user_regs.cs)) ) goto out; __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) goto out; vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); out: vmx_vmcs_exit(v); return rc; }
On Wed, 14 Aug 2013 17:25:15 -0700 Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> On Wed, 14 Aug 2013 10:12:18 +0100 > "Jan Beulich" <JBeulich@suse.com> wrote:.......> int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context > *ctxtp) { > int rc; > > if ( !(ctxtp->flags & VGCF_in_kernel) ) > return -EINVAL; > > if ( ctxtp->ldt_base || ctxtp->ldt_ents || > ctxtp->user_regs.ss || ctxtp->user_regs.es || > ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs ) > return -EINVAL; > > if ( ctxtp->user_regs.cs == 0 || (ctxtp->user_regs.cs & 7) ) > return -EINVAL; > > vmx_vmcs_enter(v); > __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); > __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); > > /* IA-32e: ss/es/ds are ignored. */ > if ( (rc = hvm_load_segment_selector(v, x86_seg_cs, > ctxtp->user_regs.cs)) ) goto out; > > __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > > if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) > goto out; > vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); >Jan, Thinking about this more, I realized we are unnecessarily creating a vmcs intercept for gs_base_user. How about we not allow that either? thanks mukesh
>>> On 15.08.13 at 02:25, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Wed, 14 Aug 2013 10:12:18 +0100 > "Jan Beulich" <JBeulich@suse.com> wrote: > >> >>> On 14.08.13 at 04:12, Mukesh Rathor <mukesh.rathor@oracle.com> >> >>> wrote: >> > On Tue, 13 Aug 2013 11:56:36 +0100 >> > "Jan Beulich" <JBeulich@suse.com> wrote: >> > > ... >> >> > if ( v->vcpu_id == 0 ) >> >> > return 0; >> >> >> >> Bogus/pointless. >> > >> > No, we don''t have anything for the boot vcpu. It''s totally coming up >> > on the flat address space. For non boot, the vcpu is coming up on >> > the kernel GDT. Recall it''s a PV guest (coming up in an HVM >> > container). >> >> No, that''s the wrong perspective. You either should never get here >> for vCPU 0, or you should refuse this for all already initialized >> vCPU-s. > > Ok, i''ll move the check to caller. FWIW, the vcpu->is_initialised is > inapplicable here because this is relevant for non-boot vcpus only.That won''t make the check any better. Can you please explain in detail why this check is necessary, and why the ->is_initialised one would be wrong?>>> if ( ctxtp->user_regs.cs == 0 || (ctxtp->user_regs.cs & 3) == 3 ) >>> return -EINVAL; > >>Perhaps better check for any non-zero RPL, as the RPL needs >>to match of the (enforced) CS hidden fields? > > Well, we now load the descriptor provided by the guest in the GDT > so that''s kinda un needed now.. thats why i thought we don''t need to > enforce RPL or check VGCF_in_kernel. But, that would work too...Of course you don''t need to duplicate anything that hvm_load_segment_selector() does; anything that''s not obvious would of course warrant a code comment to that effect.> Final(hopefully) version:Looks okay, but we''ll certainly need to see the changes to hvm_load_segment_selector() too. Jan
>>> On 15.08.13 at 03:58, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: >> __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); >> __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); >> >> if ( (rc = vmx_add_guest_msr(MSR_SHADOW_GS_BASE)) ) >> goto out; >> vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); >> > > Thinking about this more, I realized we are unnecessarily creating > a vmcs intercept for gs_base_user. How about we not allow that either?Why would requiring this base to be zero avoid the need to create an intercept? Also, remember that the user/kernel distinction here is sort of a mis-naming by the Xen public interface - the CPU architecture calls them GS base and shadow GS base, so other usage models by kernels are possible (albeit admittedly unlikely). Jan
On Thu, 15 Aug 2013 07:31:32 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 15.08.13 at 02:25, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > On Wed, 14 Aug 2013 10:12:18 +0100 > > "Jan Beulich" <JBeulich@suse.com> wrote: > > > >> >>> On 14.08.13 at 04:12, Mukesh Rathor <mukesh.rathor@oracle.com> > >> >>> wrote: > >> > On Tue, 13 Aug 2013 11:56:36 +0100 > >> > "Jan Beulich" <JBeulich@suse.com> wrote: > >> > > > ... > >> >> > if ( v->vcpu_id == 0 ) > >> >> > return 0; > >> >> > >> >> Bogus/pointless. > >> > > >> > No, we don''t have anything for the boot vcpu. It''s totally > >> > coming up on the flat address space. For non boot, the vcpu is > >> > coming up on the kernel GDT. Recall it''s a PV guest (coming up > >> > in an HVM container). > >> > >> No, that''s the wrong perspective. You either should never get here > >> for vCPU 0, or you should refuse this for all already initialized > >> vCPU-s. > > > > Ok, i''ll move the check to caller. FWIW, the vcpu->is_initialised is > > inapplicable here because this is relevant for non-boot vcpus only. > > That won''t make the check any better. Can you please explain in > detail why this check is necessary, and why the ->is_initialised one > would be wrong?is_initialised is marked for *each* vcpu upon it being updated, so second pass around, things are skipped in arch_set_info_guest(). What we want here is something just for non-boot vcpu. The boot vcpu should fields mostly null in this function since its coming up flat with default descriptors we loaded. Hence, it will return -EINVAL as the function is. The boot vcpu sets up bunch of things for other vcpus, like gdt, per cpu data areas, etc.. and jump starts them. To jump start secondary vcpus, its sets few things in this function. For PV guest, all things need to be sent to the hypervisor, but for PVH the initialization can be done in the guest with very minor change, hence I had proposed in the past to make this function even more minimal. Actually, all the secondary vcpu needs is gs_base_kernel. Again, we are not really creating a new guest type, but booting a PV guest in HVM container. Hence, we look at it from the perspective of what a PV guest and accomodate PVH extension. If I had my way, I''d have this function as follows: int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) { vmx_vmcs_enter(v); __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); vmx_vmcs_exit(v); return 0; } I realize there are other fields in vcpu_guest_context, but at the risk of repeating myself, they are all needed for PV, but not for PVH. Yes, they all will need to be used for pvh save/restore, but not here where we are just powering on a secondary vcpu. Perhaps, rename function to something like: vmx_pvh_set_boot_secondary_vcpu_fields()? thanks much, Mukesh
>>> On 16.08.13 at 04:26, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Thu, 15 Aug 2013 07:31:32 +0100 > "Jan Beulich" <JBeulich@suse.com> wrote: > >> >>> On 15.08.13 at 02:25, Mukesh Rathor <mukesh.rathor@oracle.com> >> >>> wrote: >> > On Wed, 14 Aug 2013 10:12:18 +0100 >> > "Jan Beulich" <JBeulich@suse.com> wrote: >> > >> >> >>> On 14.08.13 at 04:12, Mukesh Rathor <mukesh.rathor@oracle.com> >> >> >>> wrote: >> >> > On Tue, 13 Aug 2013 11:56:36 +0100 >> >> > "Jan Beulich" <JBeulich@suse.com> wrote: >> >> > >> > ... >> >> >> > if ( v->vcpu_id == 0 ) >> >> >> > return 0; >> >> >> >> >> >> Bogus/pointless. >> >> > >> >> > No, we don''t have anything for the boot vcpu. It''s totally >> >> > coming up on the flat address space. For non boot, the vcpu is >> >> > coming up on the kernel GDT. Recall it''s a PV guest (coming up >> >> > in an HVM container). >> >> >> >> No, that''s the wrong perspective. You either should never get here >> >> for vCPU 0, or you should refuse this for all already initialized >> >> vCPU-s. >> > >> > Ok, i''ll move the check to caller. FWIW, the vcpu->is_initialised is >> > inapplicable here because this is relevant for non-boot vcpus only. >> >> That won''t make the check any better. Can you please explain in >> detail why this check is necessary, and why the ->is_initialised one >> would be wrong? > > is_initialised is marked for *each* vcpu upon it being updated, so > second pass around, things are skipped in arch_set_info_guest().So what''s the point of the of != 0 check then? The intention is for this to be a one time thing, i.e. to not be used on already initialized CPUs.> What we want here is something just for non-boot vcpu. The boot vcpu > should fields mostly null in this function since its coming up flat > with default descriptors we loaded. Hence, it will return -EINVAL as > the function is.But you still didn''t explain how you would _incorrectly_ get here for the boot CPU _at all_.> The boot vcpu sets up bunch of things for other vcpus, > like gdt, per cpu data areas, etc.. and jump starts them. To jump start > secondary vcpus, its sets few things in this function.Nor did you explain why a second call, with is_initialized already set, would be valid.> If I had my way, I''d have this function as follows: > > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) > { > vmx_vmcs_enter(v); > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > vmx_vmcs_exit(v); > > return 0; > } > > I realize there are other fields in vcpu_guest_context, but at the > risk of repeating myself, they are all needed for PV, but not for PVH. > Yes, they all will need to be used for pvh save/restore, but not here where > we are just powering on a secondary vcpu. Perhaps, rename function to > something like: vmx_pvh_set_boot_secondary_vcpu_fields()?I think you''re heading the completely wrong direction here. The function, just like its PV and HVM counterparts, should be generically handling both booting and restoring of a guest. And any fields documented as unused for PVH in _both_ of these scenarios should be validated to be zero (to allow eventual later use for whatever purpose). Jan
On Fri, 16 Aug 2013 08:28:12 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 16.08.13 at 04:26, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > On Thu, 15 Aug 2013 07:31:32 +0100 > > "Jan Beulich" <JBeulich@suse.com> wrote: > > > >> >>> On 15.08.13 at 02:25, Mukesh Rathor <mukesh.rathor@oracle.com> > >> >>> wrote: > >> > On Wed, 14 Aug 2013 10:12:18 +0100...> > is_initialised is marked for *each* vcpu upon it being updated, so > > second pass around, things are skipped in arch_set_info_guest(). > > So what''s the point of the of != 0 check then? The intention is for > this to be a one time thing, i.e. to not be used on already > initialized CPUs. > > > What we want here is something just for non-boot vcpu. The boot vcpu > > should fields mostly null in this function since its coming up flat > > with default descriptors we loaded. Hence, it will return -EINVAL as > > the function is. > > But you still didn''t explain how you would _incorrectly_ get here for > the boot CPU _at all_.Like I said, arch_set_info_guest() is called for all vcpus, boot or non-boot. arch_set_info_guest() calls pvh_set_vcpu_info() for all vcpus, boot or nonboot, hence the check for vcpuid. Anyways, with latest change below, i think it doesn''t matter now anyways.> > The boot vcpu sets up bunch of things for other vcpus, > > like gdt, per cpu data areas, etc.. and jump starts them. To jump > > start secondary vcpus, its sets few things in this function. > > Nor did you explain why a second call, with is_initialized already > set, would be valid.pvh_set_vcpu_info() gets called once for each vcpu. is_initialized doesn''t affect it, neither we can use is_initialized to find whether a vcpu is boot vcpu or not.> > If I had my way, I''d have this function as follows: > > > > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context > > *ctxtp) { > > vmx_vmcs_enter(v); > > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > > vmx_vmcs_exit(v); > > > > return 0; > > } > > > > I realize there are other fields in vcpu_guest_context, but at the > > risk of repeating myself, they are all needed for PV, but not for > > PVH. Yes, they all will need to be used for pvh save/restore, but > > not here where we are just powering on a secondary vcpu. Perhaps, > > rename function to something like: > > vmx_pvh_set_boot_secondary_vcpu_fields()? > > I think you''re heading the completely wrong direction here. The > function, just like its PV and HVM counterparts, should be > generically handling both booting and restoring of a guest.I don''t think so :). Like I said in another thread, PVH save/restore is intended to be done the HVM way. If you are not familiar with HVM way: Boot: do_vcpu_op->VCPUOP_initialise->arch_set_info_guest(). Save/Restore: do_domctl->XEN_DOMCTL_sethvmcontext -> hvm_load/save(). Same for PVH, thus, the function above is called for pvh boot path only, like it says in the prolog.> And any fields documented as unused for PVH in _both_ of these > scenarios should be validated to be zero (to allow eventual later > use for whatever purpose).Ok, I''ll have to change tools to make sure they are zeroed in the create path. I can do that. I''ll also take that to mean you are ok with the above short function with checks for zero... I''ll have it that way in the next version. Thank you very much :), I''m glad this is finally resolved. Mukesh vmx_pvh_vcpu_boot_set_info(...) (renamed from vmx_pvh_set_vcpu_info()) { check all other fields are zero vmx_vmcs_enter(v); __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); vmx_vmcs_exit(v); }
On Fri, 16 Aug 2013 15:28:37 -0700 Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> On Fri, 16 Aug 2013 08:28:12 +0100 > "Jan Beulich" <JBeulich@suse.com> wrote:.........> Ok, I''ll have to change tools to make sure they are zeroed in the > create path. I can do that. I''ll also take that to mean you are ok > with the above short function with checks for zero... I''ll have it > that way in the next version. Thank you very much :), I''m glad this > is finally resolved.Ok, changed the tools to clear fields, tested and everything works: /* * Set vmcs fields during boot of a vcpu. Called from arch_set_info_guest. * * Boot vcpu call is from tools via: * do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest * * Secondary vcpu''s are brought up by the guest itself via: * do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest * (In case of linux, the call comes from cpu_initialize_context()). * * Note, PVH save/restore is expected to happen the HVM way, ie, * do_domctl -> XEN_DOMCTL_sethvmcontext -> hvm_load/save * and not get here. * * PVH 32bitfixme: this function needs to be modified for 32bit guest. */ int vmx_pvh_vcpu_boot_set_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) { if ( ctxtp->ldt_base || ctxtp->ldt_ents || ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es || ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs || ctxtp->gdt.pvh.addr || ctxtp->gdt.pvh.limit || ctxtp->fs_base || ctxtp->gs_base_user ) return -EINVAL; vmx_vmcs_enter(v); __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); vmx_vmcs_exit(v); return 0; }
At 18:37 -0700 on 16 Aug (1376678229), Mukesh Rathor wrote:> Ok, changed the tools to clear fields, tested and everything works: > > /* > * Set vmcs fields during boot of a vcpu. Called from arch_set_info_guest. > * > * Boot vcpu call is from tools via: > * do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest > * > * Secondary vcpu''s are brought up by the guest itself via: > * do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest > * (In case of linux, the call comes from cpu_initialize_context()). > * > * Note, PVH save/restore is expected to happen the HVM way, ie, > * do_domctl -> XEN_DOMCTL_sethvmcontext -> hvm_load/save > * and not get here. > * > * PVH 32bitfixme: this function needs to be modified for 32bit guest.What''s the 32-bit interface going to be like? Presumably it will have to load segment state, so this is just postponing that work until then.> */ > int vmx_pvh_vcpu_boot_set_info(struct vcpu *v, > struct vcpu_guest_context *ctxtp) > { > if ( ctxtp->ldt_base || ctxtp->ldt_ents || > ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es || > ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs || > ctxtp->gdt.pvh.addr || ctxtp->gdt.pvh.limit || > ctxtp->fs_base || ctxtp->gs_base_user ) > return -EINVAL; > > vmx_vmcs_enter(v); > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > vmx_vmcs_exit(v);Why does this one field get loaded? Couldn''t it be zero too? Tim.
>>> Mukesh Rathor <mukesh.rathor@oracle.com> 08/17/13 3:37 AM >>> >/* >* Set vmcs fields during boot of a vcpu. Called from arch_set_info_guest. >* >* Boot vcpu call is from tools via: >* do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest >* >* Secondary vcpu''s are brought up by the guest itself via: >* do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest >* (In case of linux, the call comes from cpu_initialize_context()).So here you describe clearly that the function is to be called for each vCPU exactly once. No vCPU == 0 special casing needed (also not in the caller, as I understood you moved the check there - if not, all is fine).>int vmx_pvh_vcpu_boot_set_info(struct vcpu *v, >struct vcpu_guest_context *ctxtp) >{ >if ( ctxtp->ldt_base || ctxtp->ldt_ents || >ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es || >ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs || >ctxtp->gdt.pvh.addr || ctxtp->gdt.pvh.limit || >ctxtp->fs_base || ctxtp->gs_base_user ) >return -EINVAL;I assume there''s be a place where these restrictions are both documented and rationalized; otherwise this looks pretty arbitrary. Jan
On Sat, 17 Aug 2013 11:22:20 +0100 Tim Deegan <tim@xen.org> wrote:> At 18:37 -0700 on 16 Aug (1376678229), Mukesh Rathor wrote: > > Ok, changed the tools to clear fields, tested and everything works: > > > > /* > > * Set vmcs fields during boot of a vcpu. Called from > > arch_set_info_guest. * > > * Boot vcpu call is from tools via: > > * do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest > > * > > * Secondary vcpu''s are brought up by the guest itself via: > > * do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest > > * (In case of linux, the call comes from > > cpu_initialize_context()). * > > * Note, PVH save/restore is expected to happen the HVM way, ie, > > * do_domctl -> XEN_DOMCTL_sethvmcontext -> hvm_load/save > > * and not get here. > > * > > * PVH 32bitfixme: this function needs to be modified for 32bit > > guest. > > What''s the 32-bit interface going to be like? Presumably it will have > to load segment state, so this is just postponing that work until > then.Right. I think by tweaking guest mostly and xen some, taking advantage of the hvm container, we might be able to get 32bit down to bare minimum.> > */ > > int vmx_pvh_vcpu_boot_set_info(struct vcpu *v, > > struct vcpu_guest_context *ctxtp) > > { > > if ( ctxtp->ldt_base || ctxtp->ldt_ents || > > ctxtp->user_regs.cs || ctxtp->user_regs.ss || > > ctxtp->user_regs.es || ctxtp->user_regs.ds || ctxtp->user_regs.fs > > || ctxtp->user_regs.gs || ctxtp->gdt.pvh.addr || > > ctxtp->gdt.pvh.limit || ctxtp->fs_base || ctxtp->gs_base_user ) > > return -EINVAL; > > > > vmx_vmcs_enter(v); > > __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); > > vmx_vmcs_exit(v); > > Why does this one field get loaded? Couldn''t it be zero too?With this both xen and guest can get rid of some code. Without it, the guest would need ugly hack, and vcpu bringup will be serialized. So, this seems to be the most optimal solution. thanks mukesh
On Mon, 19 Aug 2013 10:34:49 +0100 "Jan Beulich" <jbeulich@suse.com> wrote:> >>> Mukesh Rathor <mukesh.rathor@oracle.com> 08/17/13 3:37 AM >>> > >/* > >* Set vmcs fields during boot of a vcpu. Called from > >arch_set_info_guest. * > >* Boot vcpu call is from tools via: > >* do_domctl -> XEN_DOMCTL_setvcpucontext -> arch_set_info_guest > >* > >* Secondary vcpu''s are brought up by the guest itself via: > >* do_vcpu_op -> VCPUOP_initialise -> arch_set_info_guest > >* (In case of linux, the call comes from > >cpu_initialize_context()). > > So here you describe clearly that the function is to be called for > each vCPU exactly once. No vCPU == 0 special casing needed (also not > in the caller, as I understood you moved the check there - if not, > all is fine).No more check for vcpu==0 needed now.> >int vmx_pvh_vcpu_boot_set_info(struct vcpu *v, > >struct vcpu_guest_context *ctxtp) > >{ > >if ( ctxtp->ldt_base || ctxtp->ldt_ents || > >ctxtp->user_regs.cs || ctxtp->user_regs.ss || ctxtp->user_regs.es || > >ctxtp->user_regs.ds || ctxtp->user_regs.fs || ctxtp->user_regs.gs || > >ctxtp->gdt.pvh.addr || ctxtp->gdt.pvh.limit || > >ctxtp->fs_base || ctxtp->gs_base_user ) > >return -EINVAL; > > I assume there''s be a place where these restrictions are both > documented and rationalized; otherwise this looks pretty arbitrary.Yes, I''ll document the API. thanks mukesh