thr3ads.net - Xen devel - [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time (take 2) [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Nakajima, Jun

2005-Nov-17 06:46 UTC

[Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time (take 2)

Nakajima, Jun wrote:>> And why would we need to take interrupts between loading esp0 and
>> LDT? 
>> 
>>         load_esp0(t, thread);
>> 
>> +       local_irq_enable();
>> +
>>         load_LDT(&init_mm.context);
> 
> I thought it''s required to get IPI working (for load_LDT and the
other
> on-going flush TLB actitivies), but looks bogus after sleeping on it.
> I''m pretty sure that it resolves the hang, and it''s
hiding an
> underlying bug.
> 
I''ve finally root caused it. It''s much deeper than I expect...
Here is what''s happening:

void arch_do_createdomain(struct vcpu *v)
{
    ...
    l1_pgentry_t gdt_l1e;
    ...
    d->arch.mm_perdomain_pt = alloc_xenheap_page();
    memset(d->arch.mm_perdomain_pt, 0, PAGE_SIZE);
    ...

   for ( vcpuid = 0; vcpuid < MAX_VIRT_CPUS; vcpuid++ )
        d->arch.mm_perdomain_pt[
            (vcpuid << PDPT_VCPU_SHIFT) + FIRST_RESERVED_GDT_PAGE]
gdt_l1e;

The max value of (vcpuid << PDPT_VCPU_SHIFT) + FIRST_RESERVED_GDT_PAGE
is 1006 (< 1024), but the size of each entry is 8 bytes for PAE (and
x86_64), so alloc_xenheap_page() (i.e. a single page) was not
sufficient, and it''s corrupting the next page which contains the areas
for vcpu_info, which contains  evtchn_upcall_pending for vcpus. That
affected vcpu 7 (and 23) on my machine, and at load_LDT, we check the
pending events at hypercall_preempt_check(), and it''s already on for
vcpu 7, but it''s never cleared by hypercall4_create_continuation()
because nobody set such events... So it was looping there.

int do_mmuext_op(
    struct mmuext_op *uops,
    ...
{
   ...

   for ( i = 0; i < count; i++ )
    {
        if ( hypercall_preempt_check() )
        {
            rc = hypercall4_create_continuation(
                __HYPERVISOR_mmuext_op, uops,
                (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
            break;
        }

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>

----
diff -r 9c7aeec94f8a xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Tue Nov 15 19:46:48 2005 +0100
+++ b/xen/arch/x86/domain.c	Wed Nov 16 23:23:44 2005 -0700
@@ -252,6 +252,8 @@
     struct domain *d = v->domain;
     l1_pgentry_t gdt_l1e;
     int vcpuid;
+    physaddr_t size;
+    int order;
 
     if ( is_idle_task(d) )
         return;
@@ -265,9 +267,11 @@
     SHARE_PFN_WITH_DOMAIN(virt_to_page(d->shared_info), d);
     set_pfn_from_mfn(virt_to_phys(d->shared_info) >> PAGE_SHIFT,
             INVALID_M2P_ENTRY);
-
-    d->arch.mm_perdomain_pt = alloc_xenheap_page();
-    memset(d->arch.mm_perdomain_pt, 0, PAGE_SIZE);
+    size = ((((MAX_VIRT_CPUS - 1) << PDPT_VCPU_SHIFT) 
+             + FIRST_RESERVED_GDT_PAGE) * sizeof (l1_pgentry_t));
+    order = get_order_from_bytes(size);
+    d->arch.mm_perdomain_pt = alloc_xenheap_pages(order);
+    memset(d->arch.mm_perdomain_pt, 0, PAGE_SIZE << order);
     set_pfn_from_mfn(virt_to_phys(d->arch.mm_perdomain_pt) >>
PAGE_SHIFT,
             INVALID_M2P_ENTRY);
     v->arch.perdomain_ptes = d->arch.mm_perdomain_pt;

  

Jun
---
Intel Open Source Technology Center


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Nov-17 11:59 UTC

head link

Re: [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time (take 2)

On 17 Nov 2005, at 06:46, Nakajima, Jun wrote:
> The max value of (vcpuid << PDPT_VCPU_SHIFT) +
FIRST_RESERVED_GDT_PAGE
> is 1006 (< 1024), but the size of each entry is 8 bytes for PAE (and
> x86_64), so alloc_xenheap_page() (i.e. a single page) was not
> sufficient, and it''s corrupting the next page which contains the
areas
> for vcpu_info, which contains  evtchn_upcall_pending for vcpus. That
> affected vcpu 7 (and 23) on my machine, and at load_LDT, we check the
> pending events at hypercall_preempt_check(), and it''s already on
for
> vcpu 7, but it''s never cleared by hypercall4_create_continuation()
> because nobody set such events... So it was looping there.
Thanks Jun! I''ve fixed your patch a little (e.g., to deallocate the 
correct number of pages) and checked into our staging tree. Hopefully I 
haven''t broken it again. :-)

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2005 - [PATCH] Fixing PAE SMP dom0 hang at boot time (take 2)

[Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time (take 2)

Re: [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time (take 2)