Jürgen Keil
2008-Jun-18  18:11 UTC
snv_81 domU: long delay during boot when dom0 has been up for a long time
This one happened today:
Gentoo linux dom0, xen 3.2.0 hypervisor (32-bit), Intel Quad-core cpu, 8GB
memory.
This linux box / dom0  is up for 40 days.
OpenSolaris SXCE snv_81 PV domU (32-bit)
Problem: when starting the snv_81 domU, we''re stuck for more than 60
seconds
early during bootstrap:
v3.2.0 chgset ''unavailable''
SunOS Release 5.11 Version snv_81 32-bit
Copyright 1983-2007 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
WARNING: High-overhead kmem debugging features enabled (kmem_flags = 0xf). 
Performance degradation and large memory overhead possible. See the Solaris
Tunable Parameters Reference Manual.
features:
31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc>
mem = 524288K (0x20000000)
root nexus = i86xpv
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
<<<<< stops here, no progress for more than 60 seconds
After some long delay, booting continues.  Well, actually, the box had
set snooping=1 in /etc/system, which resulted in a deadman panic
(because the default snooping_interval is 50 seconds).
Without snooping, the box continues booting after the long delay.
The delay can be seen in /var/adm/message (note the 19:55:33 -> 19:56:46
jump):
Jun 18 19:55:33 solaris-domu genunix: [ID 540533 kern.notice] ^MSunOS Release
5.11 Version snv_81 32-bit
Jun 18 19:55:33 solaris-domu genunix: [ID 943907 kern.notice] Copyright
1983-2007 Sun Microsystems, Inc.  All rights reserved.
Jun 18 19:55:33 solaris-domu Use is subject to license terms.
Jun 18 19:55:33 solaris-domu genunix: [ID 289235 kern.warning] WARNING:
High-overhead kmem debugging features enabled (kmem_flags = 0xf).  Performance
degradation and large memory overhead possible. See the Solaris Tunable
Parameters Reference Manual.
Jun 18 19:55:33 solaris-domu unix: [ID 126719 kern.notice] features:
31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc>
Jun 18 19:55:33 solaris-domu unix: [ID 168242 kern.notice] mem = 524288K
(0x20000000)
Jun 18 19:55:33 solaris-domu unix: [ID 508459 kern.warning] WARNING: Cannot
loadpsm pcplusmp
Jun 18 19:55:33 solaris-domu rootnex: [ID 466748 kern.notice] root nexus =
i86xpv
Jun 18 19:55:33 solaris-domu rootnex: [ID 349649 kern.notice] pseudo0 at root
Jun 18 19:55:33 solaris-domu genunix: [ID 936769 kern.notice] pseudo0 is /pseudo
Jun 18 19:55:33 solaris-domu rootnex: [ID 349649 kern.notice] scsi_vhci0 at root
Jun 18 19:55:33 solaris-domu genunix: [ID 936769 kern.notice] scsi_vhci0 is
/scsi_vhci
Jun 18 19:55:33 solaris-domu genunix: [ID 765264 kern.info] NOTICE: Time of Day
clock is Read-Only; set of Date/Time will not persist across reboot.
Jun 18 19:56:46 solaris-domu genunix: [ID 636498 kern.warning] WARNING: cannot
load platform pm driver acpippm
Jun 18 19:56:46 solaris-domu genunix: [ID 636498 kern.warning] WARNING: cannot
load platform pm driver ppm
Jun 18 19:56:46 solaris-domu rootnex: [ID 349649 kern.notice] xpvd0 at root
Jun 18 19:56:46 solaris-domu xpvd: [ID 395608 kern.notice] xdf@0, xdf0
Problem appears to be in cyclic_timer:
   571  void
   572  cyclic_timer(void)
   573  {
   574          tm_req_t *req;
   575          timer_tw_t *tw;
   576          hrtime_t curr_tick, curr;
   577
   578          /* If the system is suspended, just return */
   579          if (timer_suspended)
   580                  return;
   581
   582          /* Get the current time */
   583          timer_hrtime = ddi_timer->tick_time = curr = gethrtime();
gethrtime returned 3468552305121315; dividing by nanoseconds and the number
of seconds in a day:
3468552305121315/10^9/(24*60*60)
40.145
(dom0 is up for 40.145 days)
   584          curr_tick = tw_tick(ddi_timer->tick_time);
tw_tick returns 2774841844.
And we have ddi_timer->tick == 0
[0]> ddi_timer::print [0].tick
[0].tick = 0
Now the following while loop seems to loop 2774841844 times; somehow
that just seems to waste quite a lot of time, and neither
timeout_dispatch() nor transfer_req() get called.
   586  restart:
   587          /*
   588           * Check the timer cogs to see if there are timeout requests
   589           * who reach the expiration time. Call timeout_invoke() to
execute
   590           * the requests, then.
   591           */
   592          while (curr_tick >= ddi_timer->tick) {
   ...
   654                  ddi_timer->tick++;
   655          }
I suspect that on metal this is no problem, because the returned
value from gethrtime() is small.
 
 
This message posted from opensolaris.org