Jürgen Keil
2008-Jun-18 18:11 UTC
snv_81 domU: long delay during boot when dom0 has been up for a long time
This one happened today: Gentoo linux dom0, xen 3.2.0 hypervisor (32-bit), Intel Quad-core cpu, 8GB memory. This linux box / dom0 is up for 40 days. OpenSolaris SXCE snv_81 PV domU (32-bit) Problem: when starting the snv_81 domU, we''re stuck for more than 60 seconds early during bootstrap: v3.2.0 chgset ''unavailable'' SunOS Release 5.11 Version snv_81 32-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. WARNING: High-overhead kmem debugging features enabled (kmem_flags = 0xf). Performance degradation and large memory overhead possible. See the Solaris Tunable Parameters Reference Manual. features: 31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc> mem = 524288K (0x20000000) root nexus = i86xpv pseudo0 at root pseudo0 is /pseudo scsi_vhci0 at root scsi_vhci0 is /scsi_vhci <<<<< stops here, no progress for more than 60 seconds After some long delay, booting continues. Well, actually, the box had set snooping=1 in /etc/system, which resulted in a deadman panic (because the default snooping_interval is 50 seconds). Without snooping, the box continues booting after the long delay. The delay can be seen in /var/adm/message (note the 19:55:33 -> 19:56:46 jump): Jun 18 19:55:33 solaris-domu genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version snv_81 32-bit Jun 18 19:55:33 solaris-domu genunix: [ID 943907 kern.notice] Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Jun 18 19:55:33 solaris-domu Use is subject to license terms. Jun 18 19:55:33 solaris-domu genunix: [ID 289235 kern.warning] WARNING: High-overhead kmem debugging features enabled (kmem_flags = 0xf). Performance degradation and large memory overhead possible. See the Solaris Tunable Parameters Reference Manual. Jun 18 19:55:33 solaris-domu unix: [ID 126719 kern.notice] features: 31e66c6<ssse3,cpuid,cmp,cx16,sse3,nx,sse2,sse,cx8,pae,mmx,cmov,msr,tsc> Jun 18 19:55:33 solaris-domu unix: [ID 168242 kern.notice] mem = 524288K (0x20000000) Jun 18 19:55:33 solaris-domu unix: [ID 508459 kern.warning] WARNING: Cannot loadpsm pcplusmp Jun 18 19:55:33 solaris-domu rootnex: [ID 466748 kern.notice] root nexus = i86xpv Jun 18 19:55:33 solaris-domu rootnex: [ID 349649 kern.notice] pseudo0 at root Jun 18 19:55:33 solaris-domu genunix: [ID 936769 kern.notice] pseudo0 is /pseudo Jun 18 19:55:33 solaris-domu rootnex: [ID 349649 kern.notice] scsi_vhci0 at root Jun 18 19:55:33 solaris-domu genunix: [ID 936769 kern.notice] scsi_vhci0 is /scsi_vhci Jun 18 19:55:33 solaris-domu genunix: [ID 765264 kern.info] NOTICE: Time of Day clock is Read-Only; set of Date/Time will not persist across reboot. Jun 18 19:56:46 solaris-domu genunix: [ID 636498 kern.warning] WARNING: cannot load platform pm driver acpippm Jun 18 19:56:46 solaris-domu genunix: [ID 636498 kern.warning] WARNING: cannot load platform pm driver ppm Jun 18 19:56:46 solaris-domu rootnex: [ID 349649 kern.notice] xpvd0 at root Jun 18 19:56:46 solaris-domu xpvd: [ID 395608 kern.notice] xdf@0, xdf0 Problem appears to be in cyclic_timer: 571 void 572 cyclic_timer(void) 573 { 574 tm_req_t *req; 575 timer_tw_t *tw; 576 hrtime_t curr_tick, curr; 577 578 /* If the system is suspended, just return */ 579 if (timer_suspended) 580 return; 581 582 /* Get the current time */ 583 timer_hrtime = ddi_timer->tick_time = curr = gethrtime(); gethrtime returned 3468552305121315; dividing by nanoseconds and the number of seconds in a day: 3468552305121315/10^9/(24*60*60) 40.145 (dom0 is up for 40.145 days) 584 curr_tick = tw_tick(ddi_timer->tick_time); tw_tick returns 2774841844. And we have ddi_timer->tick == 0 [0]> ddi_timer::print [0].tick [0].tick = 0 Now the following while loop seems to loop 2774841844 times; somehow that just seems to waste quite a lot of time, and neither timeout_dispatch() nor transfer_req() get called. 586 restart: 587 /* 588 * Check the timer cogs to see if there are timeout requests 589 * who reach the expiration time. Call timeout_invoke() to execute 590 * the requests, then. 591 */ 592 while (curr_tick >= ddi_timer->tick) { ... 654 ddi_timer->tick++; 655 } I suspect that on metal this is no problem, because the returned value from gethrtime() is small. This message posted from opensolaris.org