When trying to boot a Solaris Dom0 kernel on a Tecra S1 laptop (Pentium-M cpu;
no PAE available; and the hardware design seems to force using the old
legacy 8259 PICs), using a xen hypervisor compiled with PAE disabled,
the Solaris x86 kernel crashes somewhere inside mach_init() with a
BAD TRAP (#pf page fault) with pc=0 and address 0xf000e6f2.
Crash happens in usr/src/uts/i86pc/os/mp_machdep.c at line 617,
because "pops->psm_softinit == NULL".
612 if (pops->psm_notify_error) {
613 psm_notify_error = mach_notify_error;
614 notify_error = pops->psm_notify_error;
615 }
616
617 (*pops->psm_softinit)();
Cause appears to be in mach_construct_info(), which does not
initialize mach_set[PSM_OWN_SYS_DEFAULT]. When we call
mach_get_platform(PSM_OWN_SYS_DEFAULT) at line 541, the mach_ver[]
array contains
Index 0: pointer to mach_ops
Index 1-3: NULL
(that is, mach_set[PSM_OWN_SYS_DEFAULT] was NULL)
522 static void
523 mach_construct_info()
524 {
525 struct psm_sw *swp;
526 int mach_cnt[PSM_OWN_OVERRIDE+1] = {0};
527 int conflict_owner = 0;
528
529 if (psmsw->psw_forw == psmsw)
530 panic("No valid PSM modules found");
531 mutex_enter(&psmsw_lock);
532 for (swp = psmsw->psw_forw; swp != psmsw; swp =
swp->psw_forw) {
533 if (!(swp->psw_flag & PSM_MOD_IDENTIFY))
534 continue;
535 mach_set[swp->psw_infop->p_owner] =
swp->psw_infop->p_op
s;
536 mach_ver[swp->psw_infop->p_owner] =
swp->psw_infop->p_ve
rsion;
537 mach_cnt[swp->psw_infop->p_owner]++;
538 }
539 mutex_exit(&psmsw_lock);
540
541 mach_get_platform(PSM_OWN_SYS_DEFAULT);
Apparently the xen platform module hasn''t set the bit
"swp->psw_flag &
PSM_MOD_IDENTIFY" so the code at lines 535 - 537 was skipped.
(swp->psw_flag for the xen platform module had a value of 1
== PSM_MOD_INSTALL).
When mach_get_platform(PSM_OWN_SYS_DEFAULT) is called, random data is
copied from address 0 into the mach_ops array (on the Tecra S1,
mach_ops.psm_softinit remains set to NULL).
Problem #1: the code shouldn''t crash like this; I expect some kind
of error message, why the xen platform module has failed
==> Apparently the code assumes a "PSM_OWN_SYS_DEFAULT" psm
module
never fails the probe, that is, the "PSM_OWN_SYS_DEFAULT" psm
module
is always usable. This isn''t the case for xpv_psm, it fails
psm_probe()
on uppc machines.
==> shouldn''t we have a xpv_uppc_psm (PSM_OWN_SYS_DEFAULT) &
xpv_pcplusmp_psm (PSM_OWN_EXCLUSIVE) module;
xpv_uppc_psm is always available, and xpv_pcplusmp_psm only on machines
with APIC ?
Problem #2: why did the xen platform module fail to probe?
usr/src/uts/i86pc/os/mp_implfuncs.c: line 409
397 void
398 psm_install(void)
399 {
400 struct psm_sw *swp, *cswp;
401 struct psm_ops *opsp;
402 char machstring[15];
403 int err;
404
405 mutex_enter(&psmsw_lock);
406 for (swp = psmsw->psw_forw; swp != psmsw; ) {
407 opsp = swp->psw_infop->p_ops;
408 if (opsp->psm_probe) {
409 if ((*opsp->psm_probe)() == PSM_SUCCESS) {
410 swp->psw_flag |= PSM_MOD_IDENTIFY;
411 swp = swp->psw_forw;
412 continue;
413 }
414 }
Root cause is that xpv_psm`xen_psm_probe() tries to probe the apic
usign apic_probe_common(), but the Tecra S1 doesn''t have the apic
enabled, so apic_probe_common() returns -1
(When booting standard Solaris-x86, the uppc psm module is used)
================
The hypervisor seems to have partial(?) / full(?) support for such uppc
systems. Shouldn''t the Solaris i86xpv platform code support such a
system, too?
This message posted from opensolaris.org
Jürgen Keil wrote:> When trying to boot a Solaris Dom0 kernel on a Tecra S1 laptop (Pentium-M cpu; > no PAE available; and the hardware design seems to force using the old > legacy 8259 PICs), using a xen hypervisor compiled with PAE disabled, > the Solaris x86 kernel crashes somewhere inside mach_init() with a > BAD TRAP (#pf page fault) with pc=0 and address 0xf000e6f2.Unfortunately, we don''t support uupc. Solaris dom0 will only run on machines which use pcplusmp. It was a choice we made to spend the time on other features since this wasn''t a big % of H/W systems which customers would use going forward. > The hypervisor seems to have partial(?) / full(?) support for such uppc > systems. Shouldn''t the Solaris i86xpv platform code support such a > system, too? Xen does support uupc based systems. I run a Linux dom0 on my athlonXP system to test Solaris domUs. It would be great if someone in the community would like to add this functionality. Unfortunately, it is very low on our list of todo''s right now so it''s likely this would be the only way we could get this support in a timely manner. Oh, and thanks for all the really great feedback! :-) MRJ> Crash happens in usr/src/uts/i86pc/os/mp_machdep.c at line 617, > because "pops->psm_softinit == NULL". > > 612 if (pops->psm_notify_error) { > 613 psm_notify_error = mach_notify_error; > 614 notify_error = pops->psm_notify_error; > 615 } > 616 > 617 (*pops->psm_softinit)(); > > > Cause appears to be in mach_construct_info(), which does not > initialize mach_set[PSM_OWN_SYS_DEFAULT]. When we call > mach_get_platform(PSM_OWN_SYS_DEFAULT) at line 541, the mach_ver[] > array contains > > Index 0: pointer to mach_ops > Index 1-3: NULL > > (that is, mach_set[PSM_OWN_SYS_DEFAULT] was NULL) > > > 522 static void > 523 mach_construct_info() > 524 { > 525 struct psm_sw *swp; > 526 int mach_cnt[PSM_OWN_OVERRIDE+1] = {0}; > 527 int conflict_owner = 0; > 528 > 529 if (psmsw->psw_forw == psmsw) > 530 panic("No valid PSM modules found"); > 531 mutex_enter(&psmsw_lock); > 532 for (swp = psmsw->psw_forw; swp != psmsw; swp = swp->psw_forw) { > 533 if (!(swp->psw_flag & PSM_MOD_IDENTIFY)) > 534 continue; > 535 mach_set[swp->psw_infop->p_owner] = swp->psw_infop->p_op > s; > 536 mach_ver[swp->psw_infop->p_owner] = swp->psw_infop->p_ve > rsion; > 537 mach_cnt[swp->psw_infop->p_owner]++; > 538 } > 539 mutex_exit(&psmsw_lock); > 540 > 541 mach_get_platform(PSM_OWN_SYS_DEFAULT); > > Apparently the xen platform module hasn''t set the bit "swp->psw_flag & > PSM_MOD_IDENTIFY" so the code at lines 535 - 537 was skipped. > (swp->psw_flag for the xen platform module had a value of 1 > == PSM_MOD_INSTALL). > > When mach_get_platform(PSM_OWN_SYS_DEFAULT) is called, random data is > copied from address 0 into the mach_ops array (on the Tecra S1, > mach_ops.psm_softinit remains set to NULL). > > > > Problem #1: the code shouldn''t crash like this; I expect some kind > of error message, why the xen platform module has failed > > ==> Apparently the code assumes a "PSM_OWN_SYS_DEFAULT" psm module > never fails the probe, that is, the "PSM_OWN_SYS_DEFAULT" psm module > is always usable. This isn''t the case for xpv_psm, it fails psm_probe() > on uppc machines. > > ==> shouldn''t we have a xpv_uppc_psm (PSM_OWN_SYS_DEFAULT) & > xpv_pcplusmp_psm (PSM_OWN_EXCLUSIVE) module; > xpv_uppc_psm is always available, and xpv_pcplusmp_psm only on machines > with APIC ? > > Problem #2: why did the xen platform module fail to probe? > > usr/src/uts/i86pc/os/mp_implfuncs.c: line 409 > > 397 void > 398 psm_install(void) > 399 { > 400 struct psm_sw *swp, *cswp; > 401 struct psm_ops *opsp; > 402 char machstring[15]; > 403 int err; > 404 > 405 mutex_enter(&psmsw_lock); > 406 for (swp = psmsw->psw_forw; swp != psmsw; ) { > 407 opsp = swp->psw_infop->p_ops; > 408 if (opsp->psm_probe) { > 409 if ((*opsp->psm_probe)() == PSM_SUCCESS) { > 410 swp->psw_flag |= PSM_MOD_IDENTIFY; > 411 swp = swp->psw_forw; > 412 continue; > 413 } > 414 } > > > Root cause is that xpv_psm`xen_psm_probe() tries to probe the apic > usign apic_probe_common(), but the Tecra S1 doesn''t have the apic > enabled, so apic_probe_common() returns -1 > > (When booting standard Solaris-x86, the uppc psm module is used) > > > ================> > The hypervisor seems to have partial(?) / full(?) support for such uppc > systems. Shouldn''t the Solaris i86xpv platform code support such a > system, too? > > > This message posted from opensolaris.org > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.org-- Mark Johnson <mark.johnson@sun.com> Sun Microsystems, Inc. (781) 442-0869
On Mon, Aug 06, 2007 at 09:34:09AM -0400, Mark Johnson wrote:> > When trying to boot a Solaris Dom0 kernel on a Tecra S1 laptop (Pentium-M cpu; > > no PAE available; and the hardware design seems to force using the old > > legacy 8259 PICs), using a xen hypervisor compiled with PAE disabled, > > the Solaris x86 kernel crashes somewhere inside mach_init() with a > > BAD TRAP (#pf page fault) with pc=0 and address 0xf000e6f2. > > Unfortunately, we don''t support uupc. Solaris dom0 > will only run on machines which use pcplusmp.I thought we had a much better way of dying though. I think it''s worth filing a bug to get a clear message out rather than some bizarre crash. john