Nakajima, Jun
2005-Nov-16 01:54 UTC
[Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time
This patch fixes a hang with PAE SMP dom0 on big SMP machines. As far as I tested, 8-way PAE SMP dom0 boots fine on >=8-way machines. The fix is not PAE specific, and I made the equivent changes to x86_64 xenlinux. Tested on both PAE and x86_64 dom0 xenlinux on >=8-way SMP machines with>6GB.Jun --- Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-Nov-16 16:40 UTC
Re: [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time
* Nakajima, Jun <jun.nakajima@intel.com> [2005-11-15 19:56]:> This patch fixes a hang with PAE SMP dom0 on big SMP machines. As far as > I tested, 8-way PAE SMP dom0 boots fine on >=8-way machines. The fix is > not PAE specific, and I made the equivent changes to x86_64 xenlinux. > Tested on both PAE and x86_64 dom0 xenlinux on >=8-way SMP machines with > >6GB.Jun, could you explain the patch a bit more? Why wouldn''t we want to initialize the per-cpu gdt area for cpus other than CPU0 ? - cpu_gdt_init(&cpu_gdt_descr[cpu]); + if (!cpu) + cpu_gdt_init(&cpu_gdt_descr[cpu]); And why would we need to take interrupts between loading esp0 and LDT? load_esp0(t, thread); + local_irq_enable(); + load_LDT(&init_mm.context); -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nakajima, Jun
2005-Nov-16 17:42 UTC
RE: [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time
Ryan Harper wrote:> * Nakajima, Jun <jun.nakajima@intel.com> [2005-11-15 19:56]: >> This patch fixes a hang with PAE SMP dom0 on big SMP machines. As >> far as I tested, 8-way PAE SMP dom0 boots fine on >=8-way machines. >> The fix is not PAE specific, and I made the equivent changes to >> x86_64 xenlinux. Tested on both PAE and x86_64 dom0 xenlinux on >> >=8-way SMP machines with >>> 6GB. > > Jun, could you explain the patch a bit more? > > Why wouldn''t we want to initialize the per-cpu gdt area for cpus other > than CPU0 ? > > - cpu_gdt_init(&cpu_gdt_descr[cpu]); > + if (!cpu) > + cpu_gdt_init(&cpu_gdt_descr[cpu]);CPU0 creates the (virutal) gdt for other CPUs in advance at VCPUOP_initalise, and the propoer selector values are set up at the same. So it''s redundant. It also avoids spurious page faults caused by make_page_readonly() against the gdt page.> > > And why would we need to take interrupts between loading esp0 and LDT? > > load_esp0(t, thread); > > + local_irq_enable(); > + > load_LDT(&init_mm.context);I thought it''s required to get IPI working (for load_LDT and the other on-going flush TLB actitivies), but looks bogus after sleeping on it. I''m pretty sure that it resolves the hang, and it''s hiding an underlying bug. Jun --- Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-Nov-16 17:50 UTC
Re: [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time
* Nakajima, Jun <jun.nakajima@intel.com> [2005-11-16 11:42]:> Ryan Harper wrote: > > * Nakajima, Jun <jun.nakajima@intel.com> [2005-11-15 19:56]: > >> This patch fixes a hang with PAE SMP dom0 on big SMP machines. As > >> far as I tested, 8-way PAE SMP dom0 boots fine on >=8-way machines. > >> The fix is not PAE specific, and I made the equivent changes to > >> x86_64 xenlinux. Tested on both PAE and x86_64 dom0 xenlinux on > >> >=8-way SMP machines with > >>> 6GB. > > > > Jun, could you explain the patch a bit more? > > > > Why wouldn''t we want to initialize the per-cpu gdt area for cpus other > > than CPU0 ? > > > > - cpu_gdt_init(&cpu_gdt_descr[cpu]); > > + if (!cpu) > > + cpu_gdt_init(&cpu_gdt_descr[cpu]); > > CPU0 creates the (virutal) gdt for other CPUs in advance at > VCPUOP_initalise, and the propoer selector values are set up at the > same. So it''s redundant. It also avoids spurious page faults caused by > make_page_readonly() against the gdt page.OK. Thanks for the explaination. With this patch, we are still seeing the revised set_gdt() issue (comment 18 on bug 366) on our 32-way. We''ve don''t seen the hang except when we have 8 cpus instead of the full 32. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nakajima, Jun
2005-Nov-16 18:35 UTC
RE: [Xen-devel] [PATCH] Fixing PAE SMP dom0 hang at boot time
Ryan Harper wrote:> * Nakajima, Jun <jun.nakajima@intel.com> [2005-11-16 11:42]: >> Ryan Harper wrote: >>> * Nakajima, Jun <jun.nakajima@intel.com> [2005-11-15 19:56]: >>>> This patch fixes a hang with PAE SMP dom0 on big SMP machines. As >>>> far as I tested, 8-way PAE SMP dom0 boots fine on >=8-way machines. >>>> The fix is not PAE specific, and I made the equivent changes to >>>> x86_64 xenlinux. Tested on both PAE and x86_64 dom0 xenlinux on >>>>> =8-way SMP machines with >>>>> 6GB. >>> >>> Jun, could you explain the patch a bit more? >>> >>> Why wouldn''t we want to initialize the per-cpu gdt area for cpus >>> other than CPU0 ? >>> >>> - cpu_gdt_init(&cpu_gdt_descr[cpu]); >>> + if (!cpu) >>> + cpu_gdt_init(&cpu_gdt_descr[cpu]); >> >> CPU0 creates the (virutal) gdt for other CPUs in advance at >> VCPUOP_initalise, and the propoer selector values are set up at the >> same. So it''s redundant. It also avoids spurious page faults caused >> by make_page_readonly() against the gdt page. > > OK. Thanks for the explaination. With this patch, we are still > seeing the revised set_gdt() issue (comment 18 on bug 366) on our > 32-way. We''ve don''t seen the hang except when we have 8 cpus instead > of the full > 32.Yes, I''m aware of that, and updated the bugzilla. I believe it''s not an SMP problem, but probably is caused by 16GB memory or copy_from_user() (for PAE with large memory). Anyway, let''s use the bugzilla. Jun --- Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel