Hi everyone, I was trying to build a new machine but the system keeps rebooting. I used the lasted unstable version from xen-unstable.hg. I have tried with Fedora 16 (kernel 3.3.0-8) and Xubuntu 11.10 (3.0.0.17-generic). The output to my serial console is attached. Cheers, Francisco _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 05/04/12 18:37, Francisco Rocha wrote:> Hi everyone, > > I was trying to build a new machine but the system keeps rebooting. > I used the lasted unstable version from xen-unstable.hg. > > I have tried with Fedora 16 (kernel 3.3.0-8) and Xubuntu 11.10 (3.0.0.17-generic). > > The output to my serial console is attached. > > Cheers, > FranciscoWhat is your Linux command line? does it include "console=hvc0"? Perhaps some early_printk settings are required. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
>>> On 05.04.12 at 19:37, Francisco Rocha <f.e.liberal-rocha@newcastle.ac.uk> wrote: > I was trying to build a new machine but the system keeps rebooting. > I used the lasted unstable version from xen-unstable.hg. > > I have tried with Fedora 16 (kernel 3.3.0-8) and Xubuntu 11.10 > (3.0.0.17-generic). > > The output to my serial console is attached.So as already said by someone else, this is a fault on an XSETBV instruction. In the kernel this immediately follows the setting of CR4.OSXSAVE, yet in Xen''s emulation code the only way to get #UD here is that (virtual) CR4 bit is not set; all other failure paths result in #GP. The emulation code handling the setting of this CR4 bit, however, would issue a warning if the kernel was attempting to set a bit that the hypervisor doesn''t allow to be set, yet no such warning is present in the log you provided (and you''re already running at the highest logging level). In any case, a fundamental question is whether your CPU has XSAVE support in the first place, and whether kernel and hypervisor disagree about that for some reason. Could you for that purpose post /proc/cpuinfo contents from when running a native kernel? Beyond that, adding some tracing to the hypervisor may be necessary to monitor the Dom0 CR4 writes and maybe how XSAVE support gets initialized in Xen. Would you be able to do so on your own, and post the results? Jan
>>> On 10.04.12 at 13:08, "Jan Beulich" <JBeulich@suse.com> wrote: > In any case, a fundamental question is whether your CPU has > XSAVE support in the first place, and whether kernel and > hypervisor disagree about that for some reason. Could you > for that purpose post /proc/cpuinfo contents from when running > a native kernel?Just realized that this question is answered by the log you provided: (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 so indeed the fastest approach (short of someone seeing something obviously wrong with the code) appears to be to add some tracing to the CR4 handling (pv_guest_cr4_fixup() and the XSETBV handling in emulate_privileged_op()), particularly also because the register dump indicates that the relevant bit was not set in CR4 at the point where the XSETBV faulted. Jan
________________________________________ From: Jan Beulich [JBeulich@suse.com] Sent: 10 April 2012 12:20 To: Francisco Rocha Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] lastest xen unstable crash>>> On 10.04.12 at 13:08, "Jan Beulich" <JBeulich@suse.com> wrote: > In any case, a fundamental question is whether your CPU has > XSAVE support in the first place, and whether kernel and > hypervisor disagree about that for some reason. Could you > for that purpose post /proc/cpuinfo contents from when running > a native kernel?Just realized that this question is answered by the log you provided: (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 so indeed the fastest approach (short of someone seeing something obviously wrong with the code) appears to be to add some tracing to the CR4 handling (pv_guest_cr4_fixup() and the XSETBV handling in emulate_privileged_op()), particularly also because the register dump indicates that the relevant bit was not set in CR4 at the point where the XSETBV faulted. Jan I have added some prints in the functions you mentioned. Is this what you need? These are the new lines in the dmesg, the attached file contains the rest. (XEN) domain.c:691:d0 @pv_guest_cr4_fixup-start: id=0 hv_cr4: 00002660 -> guest_cr4:00002660 (XEN) domain.c:707:d0 @pv_guest_cr4_fixup-end: id=0 hv_cr4: 00002660 guest_cr4: 00002660 return: 00002660 (XEN) domain.c:691:d0 @pv_guest_cr4_fixup-start: id=0 hv_cr4: 00002660 -> guest_cr4:00002660 (XEN) domain.c:707:d0 @pv_guest_cr4_fixup-end: id=0 hv_cr4: 00002660 guest_cr4: 00002660 return: 00002660 (XEN) domain.c:691:d0 @pv_guest_cr4_fixup-start: id=0 hv_cr4: 00002660 -> guest_cr4:00002660 (XEN) domain.c:707:d0 @pv_guest_cr4_fixup-end: id=0 hv_cr4: 00002660 guest_cr4: 00002660 return: 00002660 (XEN) traps.c:2243:d0 @XSETBV: new_xfeature: 0000000000000007 (XEN) traps.c:2246:d0 @XSETBV: (v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE): 0000000000000000 Here is the /proc/cpuinfo running on a native kernel: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz stepping : 7 microcode : 0x25 cpu MHz : 800.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5382.77 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: and /proc/cpuinfo with dom0 running with xsave=0: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz stepping : 7 microcode : 0x23 cpu MHz : 800.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc aperfmperf pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb pln pts dts bogomips : 5382.58 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Cheers, Francisco _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 10.04.12 at 14:23, Francisco Rocha <f.e.liberal-rocha@newcastle.ac.uk> wrote: > I have added some prints in the functions you mentioned. Is this what you > need?Yes.> These are the new lines in the dmesg, the attached file contains the rest. > > (XEN) domain.c:691:d0 @pv_guest_cr4_fixup-start: id=0 hv_cr4: 00002660 -> guest_cr4:00002660 > (XEN) domain.c:707:d0 @pv_guest_cr4_fixup-end: id=0 hv_cr4: 00002660 guest_cr4: 00002660 return: 00002660 > (XEN) domain.c:691:d0 @pv_guest_cr4_fixup-start: id=0 hv_cr4: 00002660 -> guest_cr4:00002660 > (XEN) domain.c:707:d0 @pv_guest_cr4_fixup-end: id=0 hv_cr4: 00002660 guest_cr4: 00002660 return: 00002660 > (XEN) domain.c:691:d0 @pv_guest_cr4_fixup-start: id=0 hv_cr4: 00002660 -> guest_cr4:00002660 > (XEN) domain.c:707:d0 @pv_guest_cr4_fixup-end: id=0 hv_cr4: 00002660 guest_cr4: 00002660 return: 00002660 > (XEN) traps.c:2243:d0 @XSETBV: new_xfeature: 0000000000000007 > (XEN) traps.c:2246:d0 @XSETBV: (v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE): 0000000000000000So as far as Xen is concerned, there''s not even an attempt from the Dom0 kernel to set bit 18. That''s rather odd given that the only instance of XSETBV should sit right ahead of the CR4 write. You may want to verify that this is the case in the kernel binary, and if so you may need to also add tracing at the kernel side (e.g. in set_in_cr4()). Jan