All the discussions of "what''s the best distribution to run Xen on" seem to focus on Linux. I''ve been considering running Xen on OpenSolaris, to take advantage of ZFS. Anybody have any experiences, comments, things to watch out, etc.? Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. .... Yogi Berra
On Wed, Feb 04, 2009 at 09:48:18AM -0500, Miles Fidelman wrote:> I''ve been considering running Xen on OpenSolaris, to take advantage of ZFS. > > Anybody have any experiences, comments, things to watch out, etc.?One important thing is to use zvols for your backing storage, not file-on-ZFS. regards john
On Wed, Feb 4, 2009 at 9:48 PM, Miles Fidelman <mfidelman@meetinghouse.net> wrote:> All the discussions of "what''s the best distribution to run Xen on" seem > to focus on Linux.Probably because opensolaris is still a "newbie" in Xen world. And the fact that it can''t run out-of-the-box on many HP servers (the ones that use smart array). Add to that the fact that you''re practically stuck with Sun''s version of Xen (3.1.4, is it?), with little-to-no chance of adopting Xen-3.3 anytime soon, surely the "seem to focus on Linux" part makes sense. Then again, some people (like me) might consider all the opensolaris goodies (zvol, dtrace, etc.) makes it worthed (at least for certain requirements) :D> > I''ve been considering running Xen on OpenSolaris, to take advantage of ZFS. > > Anybody have any experiences, comments, things to watch out, etc.? >Lets see. Here''s what''s most important for me: - watch out for command and config location differences. I got a hard time getting domUs autostart (hint: opensoslaris don''t use /etc/xen/auto) and setting their vnc console to bind to all ip by default (no /etc/xen/xend-config.sxp). - watch out for bridges and vlans. Opensolaris 2008.11 uses different mechanism compared to previous versions - you might want to try setting refreservation=none on zvols. This should give the same space-saving result as using sparse-file for domU''s backing storage. - James Harper''s widely-tested GPLPV driver for windows domU don''t work on opensolaris dom0 (yet). Sun has their own PV driver, but I''m not sure how well it performs. Regards, Fajar
Fajar A. Nugraha wrote:> On Wed, Feb 4, 2009 at 9:48 PM, Miles Fidelman > <mfidelman@meetinghouse.net> wrote: >> All the discussions of "what''s the best distribution to run Xen on" seem >> to focus on Linux. > > Probably because opensolaris is still a "newbie" in Xen world. And the > fact that it can''t run out-of-the-box on many HP servers (the ones > that use smart array). Add to that the fact that you''re practically > stuck with Sun''s version of Xen (3.1.4, is it?), with little-to-no > chance of adopting Xen-3.3 anytime soon, surely the "seem to focus on > Linux" part makes sense.3.3.2-pre source bits work and are available today. We are actively working on getting them tested and putback to OpenSolaris.> Then again, some people (like me) might consider all the opensolaris > goodies (zvol, dtrace, etc.) makes it worthed (at least for certain > requirements) :D > >> I''ve been considering running Xen on OpenSolaris, to take advantage of ZFS. >> >> Anybody have any experiences, comments, things to watch out, etc.? >> > > Lets see. Here''s what''s most important for me: > - watch out for command and config location differences. I got a hard > time getting domUs autostart (hint: opensoslaris don''t use > /etc/xen/auto) and setting their vnc console to bind to all ip by > default (no /etc/xen/xend-config.sxp). > - watch out for bridges and vlans. Opensolaris 2008.11 uses different > mechanism compared to previous versions > - you might want to try setting refreservation=none on zvols. This > should give the same space-saving result as using sparse-file for > domU''s backing storage.we did things different where we thought we could do things better. In the examples above, I believe we have a better solution. MRJ > - James Harper''s widely-tested GPLPV driver for windows domU don''t > work on opensolaris dom0 (yet). Sun has their own PV driver, but I''m > not sure how well it performs.
On Wed, Feb 04, 2009 at 10:09:57PM +0700, Fajar A. Nugraha wrote:> - James Harper''s widely-tested GPLPV driver for windows domU don''t > work on opensolaris dom0 (yet).I''d be interested in fixing this. Can you describe the problems?
On 2/4/09 10:09 AM, Fajar A. Nugraha wrote:> On Wed, Feb 4, 2009 at 9:48 PM, Miles Fidelman > <mfidelman@meetinghouse.net> wrote: >> All the discussions of "what''s the best distribution to run Xen on" seem >> to focus on Linux. > > Probably because opensolaris is still a "newbie" in Xen world. And the > fact that it can''t run out-of-the-box on many HP servers (the ones > that use smart array). Add to that the fact that you''re practically > stuck with Sun''s version of Xen (3.1.4, is it?), with little-to-no > chance of adopting Xen-3.3 anytime soon, surely the "seem to focus on > Linux" part makes sense. > > Then again, some people (like me) might consider all the opensolaris > goodies (zvol, dtrace, etc.) makes it worthed (at least for certain > requirements) :DThank you to Miles for asking, and the multiple replies. I have been thrashing with that decision also. In our case, we''re running Sun hardware [2200M2] and thought OpenSolaris would have a better relationship with the RAID controller, multiple NIC''s and LOM; all essential. We''re not sure how well supported they are in current Linux distributions.
On Wed, Feb 4, 2009 at 10:40 PM, Mark Johnson <Mark.Johnson@sun.com> wrote:>> little-to-no chance of adopting Xen-3.3 anytime soon > > 3.3.2-pre source bits work and are available today. > We are actively working on getting them tested and > putback to OpenSolaris. >Good to hear that! So there''s hope It''d be on 2009.05, right?> we did things different where we thought we could do > things better. In the examples above, I believe we > have a better solution.I agree with you on some of them. Still, I wished we''ve reached the point where moving domUs between Linux <-> Opensolaris dom0 is a simple matter matter of copying data and config files.
On Wed, Feb 4, 2009 at 10:41 PM, David Edmondson <dme@sun.com> wrote:> On Wed, Feb 04, 2009 at 10:09:57PM +0700, Fajar A. Nugraha wrote: >> - James Harper''s widely-tested GPLPV driver for windows domU don''t >> work on opensolaris dom0 (yet). > > I''d be interested in fixing this. Can you describe the problems? >In the past there''s this http://www.nabble.com/Another-GPLPV-pre-release-0.9.11-pre20-td20336594.html With the latest driver 0.9.12-pre13 from http://www.meadowcourt.org/downloads/ installation completed succesfully, PV driver driver works correctly, but the network interface doesn''t work. I''m testing it on opensolaris snv_98 since all versions after that (including snv_106) refused to boot xen kernel on HP box with smart array and qlogic card (normal kernel works fine with disable-qlc=true). Filed a bug on http://defect.opensolaris.org/bz/show_bug.cgi?id=5905 but no takers so far (guess it might be HW-specific issue). Let me know if I can help with some testing on this (somewhat) old opensolaris version. Regards, Fajar
Fajar A. Nugraha wrote:> On Wed, Feb 4, 2009 at 10:41 PM, David Edmondson <dme@sun.com> wrote: >> On Wed, Feb 04, 2009 at 10:09:57PM +0700, Fajar A. Nugraha wrote: >>> - James Harper''s widely-tested GPLPV driver for windows domU don''t >>> work on opensolaris dom0 (yet). >> I''d be interested in fixing this. Can you describe the problems? >> > > In the past there''s this > > http://www.nabble.com/Another-GPLPV-pre-release-0.9.11-pre20-td20336594.html > > With the latest driver 0.9.12-pre13 from > http://www.meadowcourt.org/downloads/ installation completed > succesfully, PV driver driver works correctly, but the network > interface doesn''t work. > > I''m testing it on opensolaris snv_98 since all versions after that > (including snv_106) refused to boot xen kernel on HP box with smart > array and qlogic card (normal kernel works fine with > disable-qlc=true). Filed a bug on > http://defect.opensolaris.org/bz/show_bug.cgi?id=5905 but no takers so > far (guess it might be HW-specific issue).You have a driver for the HP smart array (since it boots in metal)? If so, where is it installed to? MRJ> Let me know if I can help with some testing on this (somewhat) old > opensolaris version. > > Regards, > > Fajar > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.org
Fajar A. Nugraha wrote:> On Wed, Feb 4, 2009 at 10:40 PM, Mark Johnson <Mark.Johnson@sun.com> wrote: >>> little-to-no chance of adopting Xen-3.3 anytime soon >> 3.3.2-pre source bits work and are available today. >> We are actively working on getting them tested and >> putback to OpenSolaris. >> > > Good to hear that! So there''s hope It''d be on 2009.05, right?Unlikely.. The freeze for that is in a few weeks and we are unlikely to finish a full regression test and ARC work by then. I would think we would have it back to Nevada by the time the next version of OpenSolaris is released.>> we did things different where we thought we could do >> things better. In the examples above, I believe we >> have a better solution. > > I agree with you on some of them. Still, I wished we''ve reached the > point where moving domUs between Linux <-> Opensolaris dom0 is a > simple matter matter of copying data and config files.You can do that today.. What problems do you have moving a guest between a linux dom0 and a solaris dom0? Migration "should" work too, but we don''t test it. MRJ
On Thu, Feb 5, 2009 at 7:11 PM, Mark Johnson <Mark.Johnson@sun.com> wrote:> Fajar A. Nugraha wrote: >> I''m testing it on opensolaris snv_98 since all versions after that >> (including snv_106) refused to boot xen kernel on HP box with smart >> array and qlogic card (normal kernel works fine with >> disable-qlc=true). Filed a bug on >> http://defect.opensolaris.org/bz/show_bug.cgi?id=5905 but no takers so >> far (guess it might be HW-specific issue). > > You have a driver for the HP smart array (since it > boots in metal)?Yes.> If so, where is it installed to?I added CPQary3 driver to x86.microroot from the install CD, and rebuild the install CD afterwards. Opensolaris is installed on internal disk. On a side note, I just acquired another similar HP system but WITHOUT qlogic card. And guess what, xen kernel works perfectly! Somehow qlogic driver is causing problems on my system. On non-xen kernel using disable-qlc=true allows it to boot (I''m not using any external drive on this server). On xen kernel this workaround is not enough. Here''s lspci output from Linux of the problematic qlogic card. 10:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02) 10:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02) Regards, Fajar
On Thu, Feb 5, 2009 at 7:24 PM, Mark Johnson <Mark.Johnson@sun.com> wrote:>> I agree with you on some of them. Still, I wished we''ve reached the >> point where moving domUs between Linux <-> Opensolaris dom0 is a >> simple matter matter of copying data and config files. > > You can do that today.. What problems do you have > moving a guest between a linux dom0 and a solaris > dom0?Most of them can be modified to work correctly, like : - clock issue (Linux use UTC on hwclock by default, while solaris uses local time. Can be fixed). - I can''t use vncunused=0 (then again I have problem using this on Xen-3.3.1/Linux) - different bootloader location. /usr/bin/pygrub vs /usr/lib/xen/bin/pygrub (again, simple workaround) - serial=''pty'' does not work on HVM guest on opensolaris> Migration "should" work too, but we don''t test it.That would be hard, since migration requires : - same network setup (how can we set the same bridge setup on Linux and opensolaris?) - same disk setup (using /dev/sda or /dev/disk/by-id obviously won''t work on solaris, even when it''s iscsi-imported disk). - same Xen version (is this required?) Regards, Fajar
Fajar A. Nugraha wrote:> On Thu, Feb 5, 2009 at 7:24 PM, Mark Johnson <Mark.Johnson@sun.com> wrote: >>> I agree with you on some of them. Still, I wished we''ve reached the >>> point where moving domUs between Linux <-> Opensolaris dom0 is a >>> simple matter matter of copying data and config files. >> You can do that today.. What problems do you have >> moving a guest between a linux dom0 and a solaris >> dom0? > > Most of them can be modified to work correctly, like : > - clock issue (Linux use UTC on hwclock by default, while solaris uses > local time. Can be fixed).Yeah. Either of these can be changed... For Solaris, man rtc which will change /etc/rtc_config> - I can''t use vncunused=0 (then again I have problem using this on > Xen-3.3.1/Linux)Oh, what happens? (We have a bunch of qemu unstable patches backported in our 3.1.4 which probably explains why it behaves the same as what you see in 3.3.1)> - different bootloader location. /usr/bin/pygrub vs > /usr/lib/xen/bin/pygrub (again, simple workaround)Yeah, this is a problem.. Hardcoding the path in the domain config is broken... It should really just be pygrub, etc. But I don''t expect anyone to fix this anytime soon on Solaris or Linux. It should be a pretty easy fix if anyone wants to contribute it though.> - serial=''pty'' does not work on HVM guest on opensolarisThis was fixed somewhat recently... On solaris, it''s: (serial pty)>> Migration "should" work too, but we don''t test it. > > That would be hard, since migration requires : > - same network setup (how can we set the same bridge setup on Linux > and opensolaris?)On solaris, you can rename the NICs to match the bridges on the linux side.. This also requires a somewhat new version of OpenSolaris. i.e. dladm rename-link We have a bunch of new NIC stuff coming in relatively soon. Rate limiting, vlan, support.. NATs should work now too, but as of today it requires manual setup on dom0. I haven''t played with it to see how/if it works yet.> - same disk setup (using /dev/sda or /dev/disk/by-id obviously won''t > work on solaris, even when it''s iscsi-imported disk).This is where the real problem is... When I did a migration between Solaris and Fedora (a while ago) I used file:/.../. So it does work... But you wouldn''t use file:/ in a real setup for performance reasons. We''ve diverged quite a bit here unfortunately. XenSource did a closed source fork in this area and the current stuff upstream wasn''t close to what we needed. so we have file:/... and phy:/.... But for tap, we have a different implementation which is tap:vdisk: and supports vmdk, vdi (virtual box disk), and soon vhd. It also supports snapshots, rollbacks, clones, etc for each (man vdiskadm). It''s all opensource, but unlikely to be adopted upstream. coming soon we will also have (format subject to change) phy:iscsi:alias/<iscsi-alias> phy:iscsi:static/<server IP>/<lun>/<target id> phy:iscsi:discover/<lun>/<alias or target id> e.g. disk = [''phy:iscsi:alias/serv01-iscsi/winxp05-disk0,0,w''] disk = [''phy:iscsi:static/192.168.0.70/0/iqn.1986-03.com.sun:02:17f34578-00a9-ef69-f3e9-b8a2896a4915,0,w''] and something like... phy:san:guid/<lun>/<guid>/ I''m not sure what linux dmo0s have in these areas these days?> - same Xen version (is this required?)They would have to be close.. i.e. both 3.3.x.. MRJ
Fajar A. Nugraha wrote:> On Thu, Feb 5, 2009 at 7:11 PM, Mark Johnson <Mark.Johnson@sun.com> wrote: >> Fajar A. Nugraha wrote: >>> I''m testing it on opensolaris snv_98 since all versions after that >>> (including snv_106) refused to boot xen kernel on HP box with smart >>> array and qlogic card (normal kernel works fine with >>> disable-qlc=true). Filed a bug on >>> http://defect.opensolaris.org/bz/show_bug.cgi?id=5905 but no takers so >>> far (guess it might be HW-specific issue). >> You have a driver for the HP smart array (since it >> boots in metal)? > > Yes. > >> If so, where is it installed to? > > I added CPQary3 driver to x86.microroot from the install CD, and > rebuild the install CD afterwards. Opensolaris is installed on > internal disk. > > On a side note, I just acquired another similar HP system but WITHOUT > qlogic card. And guess what, xen kernel works perfectly! > > Somehow qlogic driver is causing problems on my system. On non-xen > kernel using disable-qlc=true allows it to boot (I''m not using any > external drive on this server). On xen kernel this workaround is not > enough.Hmm. It''s either a NMI problem or a run away interrupt. Unfortunately, it''s seems to be specific to your H/W so it would be hard for us to debug it. You can boot into the kernel debugger and see for sure.. Add a -kd to the unix line in the grub menu. module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -kd set a breakpoint on deadman [0]> ::bp deadman continue [0]> :c run through a bunch of breakpoints and then see where we are.. Do that a few times. [0]> ,10:c [0]> $c [0]> ,10:c [0]> $c [0]> ,10:c [0]> $c [0]> ,10:c [0]> ::threadlist -v If you get this far, see if you can put the threadlist on a pastebin.. delete all breakpoints [0]> :z [0]> :c MRJ> Here''s lspci output from Linux of the problematic qlogic card. > 10:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to > PCI Express HBA (rev 02) > 10:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to > PCI Express HBA (rev 02) > > Regards, > > Fajar
I said:> In our case, we''re running Sun hardware [2200M2] and thought > OpenSolaris would have a better relationship with the RAID > controller, multiple NIC''s and LOM; all essential.The joke''s on me. I found this: _ / Sun Fireā¢ X2200 M2 Server Operating System Installation Guide 1.3.4 Configuring Your System for RAID You have two RAID support options for Sun Fire X2200 M2 server. You can use the onboard Nvidia NVRAID option, which only supports the Windows OS, or, if you have an LSI SAS2042E-R PCIe card installed in your server, you can use the LSI option. \_ How Sun can have a hardware function that is Windows-only baffles me, but there it is. Guess we''ll be learning what ZFS software RAID is...
On Fri, Feb 6, 2009 at 9:52 PM, Mark Johnson <Mark.Johnson@sun.com> wrote:> > You can boot into the kernel debugger and see for sure.. > Add a -kd to the unix line in the grub menu.Hi Mark, looking at your information, seems I will have to setup serial console to capture the output. Is there a howto on setting up serial console for Solaris, especially under xVM? I have succesfully set up serial console for Linux dom0, but on Xen kernels there''s a special workaround needed since Xen assigns com1 by default as xencon. The particular settings for linux are "console=vga,com1" on kernel line "console=ttyS0 console=tty0" on module line what is the correct setting on opensolaris? Regards, Fajar
Hi Fajar, Fajar A. Nugraha wrote:> On Fri, Feb 6, 2009 at 9:52 PM, Mark Johnson <Mark.Johnson@sun.com> wrote: >> You can boot into the kernel debugger and see for sure.. >> Add a -kd to the unix line in the grub menu. > > Hi Mark, > > looking at your information, seems I will have to setup serial console > to capture the output. Is there a howto on setting up serial console > for Solaris, especially under xVM? > > I have succesfully set up serial console for Linux dom0, but on Xen > kernels there''s a special workaround needed since Xen assigns com1 by > default as xencon. The particular settings for linux are > > "console=vga,com1" on kernel line > "console=ttyS0 console=tty0" on module line > > what is the correct setting on opensolaris? >Here''s what I use... MRJ serial --unit=0 --speed=9600 terminal --timeout=10 serial console title Solaris dom0 kernel$ /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -k -B console=ttya module$ /platform/i86pc/$ISADIR/boot_archive
On Thu, Feb 12, 2009 at 4:29 AM, Mark Johnson <Mark.Johnson@sun.com> wrote:> Here''s what I use... > serial --unit=0 --speed=9600 > terminal --timeout=10 serial console > > title Solaris dom0 > kernel$ /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g > module$ /platform/i86xpv/kernel/$ISADIR/unix > /platform/i86xpv/kernel/$ISADIR/unix -k -B console=ttya > module$ /platform/i86pc/$ISADIR/boot_archive >I tried that, and added "-v" for more verbose output, and I got as far as (xVM) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (xVM) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen). (xVM) Freed 124kB init memory. on serial console. The [0]> prompt stilll appear on VGA console (not serial), which is hard to capture even with ILO GUI console. Anyway, after the first [0]> ,10:c it shows (screenshot on http://www.upload.mn/view/9fg9j91rkegyet6bx6em.png) [1] and I can''t type anything on it. Any ideas? Regards, Fajar
Fajar A. Nugraha wrote:> On Thu, Feb 12, 2009 at 4:29 AM, Mark Johnson <Mark.Johnson@sun.com> wrote: >> Here''s what I use... >> serial --unit=0 --speed=9600 >> terminal --timeout=10 serial console >> >> title Solaris dom0 >> kernel$ /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g >> module$ /platform/i86xpv/kernel/$ISADIR/unix >> /platform/i86xpv/kernel/$ISADIR/unix -k -B console=ttya >> module$ /platform/i86pc/$ISADIR/boot_archive >> > > I tried that, and added "-v" for more verbose output, and I got as far as > > (xVM) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) > (xVM) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch > input to Xen). > (xVM) Freed 124kB init memory. > > on serial console. > > The [0]> prompt stilll appear on VGA console (not serial), which is > hard to capture even with ILO GUI console.really? You had -B console=ttya?> Anyway, > > after the first > [0]> ,10:c > > it shows (screenshot on http://www.upload.mn/view/9fg9j91rkegyet6bx6em.png) > [1] > > and I can''t type anything on it. > > Any ideas?In the xen console, you should be able to hit three ctrl-a''s. From there, try a q, then 3 ctrl-a''s again.. See if that causes anything to happen on the screen.. You can try adding a nmi=ignore to the xen.gz line. You could also try adding a noapic to see if that help narrow down the problem. MRJ> Regards, > > Fajar > From - Thu
On Wed, Feb 18, 2009 at 2:45 AM, Mark Johnson <Mark.Johnson@sun.com> wrote:>> The [0]> prompt stilll appear on VGA console (not serial), which is >> hard to capture even with ILO GUI console. > > really? You had -B console=ttya?Ah, sorry. My bad. I use module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS,disable-qlc=true console=ttya when it should be module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS,disable-qlc=true,console=ttya Anyway, here''s threadlist output http://pastebin.com/f6c5f5918> You can try adding a nmi=ignore to the xen.gz line.This doesn''t have any effect> You could also try adding a noapic to see if that help > narrow down the problem.This one cause panic http://pastebin.com/f53a20e6d Let me know if you need more info. Regards, Fajar
Fajar A. Nugraha wrote:> On Wed, Feb 18, 2009 at 2:45 AM, Mark Johnson <Mark.Johnson@sun.com> wrote: >>> The [0]> prompt stilll appear on VGA console (not serial), which is >>> hard to capture even with ILO GUI console. >> really? You had -B console=ttya? > > Ah, sorry. My bad. I use > > module$ /platform/i86xpv/kernel/$ISADIR/unix > /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS,disable-qlc=true > console=ttya > > when it should be > > module$ /platform/i86xpv/kernel/$ISADIR/unix > /platform/i86xpv/kernel/$ISADIR/unix -B > $ZFS-BOOTFS,disable-qlc=true,console=ttya > > Anyway, here''s threadlist output > http://pastebin.com/f6c5f5918Hmm, that''s odd.. if you continue a while longer ([0]> ,10:c ) do you still see these two threads stuck at the same point? fffffffffbc601a0 fffffffffbc5f530 fffffffffbc61d90 0 96 0 PC: _resume_from_idle+0xfa CMD: stack pointer for thread fffffffffbc601a0: fffffffffbc92850 [ fffffffffbc92850 _resume_from_idle+0xfa() ] mutex_enter+0x10() kmem_cache_alloc+0x13a() vmem_alloc+0x1bc() segkmem_xalloc+0x94() segkmem_alloc_vn+0xcd() segkmem_alloc+0x24() vmem_xalloc+0x547() vmem_alloc+0x161() kmem_slab_create+0x81() kmem_slab_alloc+0x5b() kmem_cache_alloc+0x13a() kmem_zalloc+0x6a() callout_hash_init+0x32() callout_cpu_online+0x157() callout_mp_init+0x25() main+0x2c3() _locore_start+0x80() ffffff0012ccbc60 fffffffffbc5f530 fffffffffbc61d90 0 109 0 PC: thread_start THREAD: thread_create_intr() stack pointer for thread ffffff0012ccbc60: ffffff0012ccb840 dtrace_xpv_gethrtime+0x51() hilevel_intr_prolog+0x35() do_interrupt+0xd4() xen_callback_handler+0x36e() xen_callback+0xcd() cyclic_timer+0x168() cyclic_softint+0xdc() cbe_softclock+0x1a() av_dispatch_softvect+0x5f() dispatch_softint+0x38() switch_sp_and_call+0x13() dosoftint+0x59() do_interrupt+0xf7() xen_callback_handler+0x36e() xen_callback+0xcd() mutex_enter+0x10() kmem_cache_alloc+0x13a() vmem_alloc+0x1bc() segkmem_xalloc+0x94() segkmem_alloc_vn+0xcd() segkmem_alloc+0x24() vmem_xalloc+0x547() vmem_alloc+0x161() kmem_slab_create+0x81() kmem_slab_alloc+0x5b() kmem_cache_alloc+0x13a() kmem_zalloc+0x6a() callout_hash_init+0x32() callout_cpu_online+0x157() callout_mp_init+0x25() main+0x2c3() _locore_start+0x80() Also, if you see the THREAD: attach_drivers() stuck too... ffffff001300dc60 fffffffffbc5f530 0 0 60 ffffff001300da40 PC: _resume_from_idle+0xfa THREAD: attach_drivers() stack pointer for thread ffffff001300dc60: ffffff001300d9c0 [ ffffff001300d9c0 _resume_from_idle+0xfa() ] swtch+0x160() sema_p+0x1cf() mod_load+0xe6() mod_hold_installed_mod+0x75() modrload+0xcd() modload+0x18() mod_hold_dev_by_major+0x8e() ddi_hold_driver+0x15() ddi_hold_installed_driver+0x1c() attach_drivers+0x37() thread_start+8() Take the upper left address and dump the stack with params.. e.g. ffffff001300dc60::findstack -v Then dump the two strings passed to modload(). e.g. modload+0x10(fffffffffbb8ef38, ffffff014667f1e8) mod_hold_dev_by_major+0x8e(94) [0]> fffffffffbb8ef38/s 0xfffffffffbb8ef38: drv [0]> ffffff014667f1e8/s 0xffffff014667f1e8: rsm What driver is trying to attach?>> You can try adding a nmi=ignore to the xen.gz line. > > This doesn''t have any effect > >> You could also try adding a noapic to see if that help >> narrow down the problem. > > This one cause panic > http://pastebin.com/f53a20e6d > > Let me know if you need more info.Assuming it hangs at the above, adding nosmp should allow the machine to boot.. Assuming it does, we''ll have to start looking at why... if it stuck at a driver trying to attach, you can add a disable-<drivername> to the -B line to skip over that driver to see if the system will boot without that driver (like you did with qlc). MRJ> Regards, > > Fajar
On Wed, Feb 18, 2009 at 7:43 PM, Mark Johnson <Mark.Johnson@sun.com> wrote:> Assuming it hangs at the above, adding nosmp should allow > the machine to boot.. Assuming it does, we''ll have to start > looking at why...Whoa, you totally nailed it! Adding nosmp DOES allow the machine to boot! At least now we know where to start looking for the problem :)> > if it stuck at a driver trying to attach, you can add > a disable-<drivername> to the -B line to skip over that > driver to see if the system will boot without that driver > (like you did with qlc). >Here''s the weird part. Even though I booted with disable-qlc=true, booting with "-v" shows numerous error message about "Unable to install/attach driver ''qlc". http://pastebin.com/f2c6f465c And now, another weird part: booting xen kernel, with nosmp, but WITHOUT "disable-qlc=true", WORKS! http://pastebin.com/f40c861b9 Is this enough, or do you still need the output of "findstack" to find which driver is trying to attach? As a comparison, on normal kernel (snv_99+), disable-qlc=true is enough to allow the system to boot. So I''m guessing there''s a serious bug, most likely on qlc driver, which manifests on this particular hardware configuration (HP blade, QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)). It works perfectly on snv_98. Regards, Fajar
Fajar A. Nugraha wrote:> On Wed, Feb 18, 2009 at 7:43 PM, Mark Johnson <Mark.Johnson@sun.com> wrote: >> Assuming it hangs at the above, adding nosmp should allow >> the machine to boot.. Assuming it does, we''ll have to start >> looking at why... > > Whoa, you totally nailed it! > Adding nosmp DOES allow the machine to boot! > At least now we know where to start looking for the problem :) > >> if it stuck at a driver trying to attach, you can add >> a disable-<drivername> to the -B line to skip over that >> driver to see if the system will boot without that driver >> (like you did with qlc). >> > > Here''s the weird part. Even though I booted with disable-qlc=true, > booting with "-v" shows numerous error message about "Unable to > install/attach driver ''qlc". > http://pastebin.com/f2c6f465cProbably just a side effect of the disable..> And now, another weird part: booting xen kernel, with nosmp, but > WITHOUT "disable-qlc=true", WORKS! > http://pastebin.com/f40c861b9Yeah, I see that qlc attaches fine too qlc0 is /pci@0,0/pci8086,25e5@5/pci103c,1705@0 PCIE-device: pci103c,1705@0,1, qlc1 qlc1 is /pci@0,0/pci8086,25e5@5/pci103c,1705@0,1 Can you do a mdb -k [0]> ::interrupts [0]> $q and see if anything is sharing the interrupt with qlc?> Is this enough, or do you still need the output of "findstack" to find > which driver is trying to attach?Yeah, I''d like to see what driver is causing it.> As a comparison, on normal kernel (snv_99+), disable-qlc=true is > enough to allow the system to boot. So I''m guessing there''s a serious > bug, most likely on qlc driver, which manifests on this particular > hardware configuration (HP blade, QLogic Corp. ISP2432-based 4Gb Fibre > Channel to PCI Express HBA (rev 02)). It works perfectly on snv_98.Might be another device which is interacting with qlc due to sharing interrupt... On Xen, the interrupts are routed differently so different devices share different interrupts. I''m guessing it''s a USB driver which is the actual culprit. MRJ> Regards, > > Fajar
On Wed, Feb 18, 2009 at 9:27 PM, Mark Johnson <Mark.Johnson@sun.com> wrote:> Can you do a > mdb -k > [0]> ::interrupts > [0]> $q >http://pastebin.com/f4d9f5f5e> > and see if anything is sharing the interrupt with qlc? > >Looks like smart array and usb. This is from booting with nosmp. I''m using CPQary3-1.91-solaris10-i386. I REALLY hope that''s not the one causing problems :(> >> Is this enough, or do you still need the output of "findstack" to find >> which driver is trying to attach? > > Yeah, I''d like to see what driver is causing it. >That may be hard, as I''m unable to duplicate it.>From booting with "-B $ZFS-BOOTFS,disable-qlc=true,console=ttya -kd",without nosmp, this is after several more ",10:c" http://pastebin.com/f649adb83 I have no idea how to "see these two threads stuck at the same point?" (what is the reference that is the same after reboots anyway? CMD? THREAD?) Searching for "THREAD: attach_drivers()", I found only one ffffff001300dc60 fffffffffbc5f530 0 0 60 ffffff0347578244 PC: _resume_from_idle+0xfa THREAD: attach_drivers() stack pointer for thread ffffff001300dc60: ffffff001300d7a0 [ ffffff001300d7a0 _resume_from_idle+0xfa() ] swtch+0x160() cv_wait+0x61() ndi_devi_enter+0x68() walk_devs+0xe2() ddi_walk_devs+0x22() pcie`pcie_scan_mps+0x51() pcie`pcie_get_fabric_mps+0x1d() pcie`pcie_init_bus+0x3b3() npe`npe_initchild+0x111() npe`npe_ctlops+0x233() init_node+0x78() i_ndi_config_node+0xfa() i_ddi_attachchild+0x40() i_ddi_attach_node_hierarchy+0x61() attach_driver_nodes+0x59() ddi_hold_installed_driver+0x116() attach_drivers+0x37() thread_start+8() But there''s no mod_load there. and then [4]> ffffff001300dc60::findstack -v stack pointer for thread ffffff001300dc60: ffffff001300d7a0 [ ffffff001300d7a0 _resume_from_idle+0xfa() ] ffffff001300d7d0 swtch+0x160() ffffff001300d800 cv_wait+0x61(ffffff0347578244, ffffff0347578158) ffffff001300d840 ndi_devi_enter+0x68(ffffff03475780f0, ffffff001300d854) ffffff001300d8b0 walk_devs+0xe2(ffffff034757b008, fffffffff7a11dc0, ffffff001300d900, 1) ffffff001300d8e0 ddi_walk_devs+0x22(ffffff034757b008, fffffffff7a11dc0, ffffff001300d900) ffffff001300d940 pcie`pcie_scan_mps+0x51(ffffff034757b4d8, ffffff034757b008, ffffff001300d984) ffffff001300d970 pcie`pcie_get_fabric_mps+0x1d(ffffff034757b4d8, ffffff034757b008, ffffff001300d984) ffffff001300d9b0 pcie`pcie_init_bus+0x3b3(ffffff034757b008) ffffff001300da40 npe`npe_initchild+0x111(ffffff034757b008) ffffff001300dac0 npe`npe_ctlops+0x233(ffffff034757b4d8, ffffff034757b4d8, 1, ffffff034757b008, 0) ffffff001300db00 init_node+0x78(ffffff034757b008) ffffff001300db40 i_ndi_config_node+0xfa(ffffff034757b008, 6, 0) ffffff001300db60 i_ddi_attachchild+0x40(ffffff034757b008) ffffff001300dba0 i_ddi_attach_node_hierarchy+0x61(ffffff034757b008) ffffff001300dbe0 attach_driver_nodes+0x59(b8) ffffff001300dc20 ddi_hold_installed_driver+0x116(b8) ffffff001300dc40 attach_drivers+0x37() ffffff001300dc50 thread_start+8() So I decided to reboot again, this time using the same number of ",10:c" in your initial mail, got another THREAD: attach_drivers(), but still no mod_load http://pastebin.com/f1faf9e29 Any ideas? Regards, Fajar
Fajar A. Nugraha wrote:> On Wed, Feb 18, 2009 at 9:27 PM, Mark Johnson <Mark.Johnson@sun.com> wrote: >> Can you do a >> mdb -k >> [0]> ::interrupts >> [0]> $q >> > > http://pastebin.com/f4d9f5f5e > >> and see if anything is sharing the interrupt with qlc? >> >> > > Looks like smart array and usb. This is from booting with nosmp. > I''m using CPQary3-1.91-solaris10-i386. I REALLY hope that''s not the > one causing problems :(try disable-<usb>=true... Whatever USB device is sharing that interrupt. Just to see if it will boot MP with just the usb driver disabled. MRJ>>> Is this enough, or do you still need the output of "findstack" to find >>> which driver is trying to attach? >> Yeah, I''d like to see what driver is causing it. >> > > That may be hard, as I''m unable to duplicate it. > From booting with "-B $ZFS-BOOTFS,disable-qlc=true,console=ttya -kd", > without nosmp, this is after several more ",10:c" > > http://pastebin.com/f649adb83 > > I have no idea how to "see these two threads stuck at the same point?" > (what is the reference that is the same after reboots anyway? CMD? > THREAD?) > > Searching for "THREAD: attach_drivers()", I found only one > > ffffff001300dc60 fffffffffbc5f530 0 0 60 ffffff0347578244 > PC: _resume_from_idle+0xfa THREAD: attach_drivers() > stack pointer for thread ffffff001300dc60: ffffff001300d7a0 > [ ffffff001300d7a0 _resume_from_idle+0xfa() ] > swtch+0x160() > cv_wait+0x61() > ndi_devi_enter+0x68() > walk_devs+0xe2() > ddi_walk_devs+0x22() > pcie`pcie_scan_mps+0x51() > pcie`pcie_get_fabric_mps+0x1d() > pcie`pcie_init_bus+0x3b3() > npe`npe_initchild+0x111() > npe`npe_ctlops+0x233() > init_node+0x78() > i_ndi_config_node+0xfa() > i_ddi_attachchild+0x40() > i_ddi_attach_node_hierarchy+0x61() > attach_driver_nodes+0x59() > ddi_hold_installed_driver+0x116() > attach_drivers+0x37() > thread_start+8() > > But there''s no mod_load there. and then > > [4]> ffffff001300dc60::findstack -v > stack pointer for thread ffffff001300dc60: ffffff001300d7a0 > [ ffffff001300d7a0 _resume_from_idle+0xfa() ] > ffffff001300d7d0 swtch+0x160() > ffffff001300d800 cv_wait+0x61(ffffff0347578244, ffffff0347578158) > ffffff001300d840 ndi_devi_enter+0x68(ffffff03475780f0, ffffff001300d854) > ffffff001300d8b0 walk_devs+0xe2(ffffff034757b008, fffffffff7a11dc0, > ffffff001300d900, 1) > ffffff001300d8e0 ddi_walk_devs+0x22(ffffff034757b008, fffffffff7a11dc0, > ffffff001300d900) > ffffff001300d940 pcie`pcie_scan_mps+0x51(ffffff034757b4d8, ffffff034757b008, > ffffff001300d984) > ffffff001300d970 pcie`pcie_get_fabric_mps+0x1d(ffffff034757b4d8, > ffffff034757b008, ffffff001300d984) > ffffff001300d9b0 pcie`pcie_init_bus+0x3b3(ffffff034757b008) > ffffff001300da40 npe`npe_initchild+0x111(ffffff034757b008) > ffffff001300dac0 npe`npe_ctlops+0x233(ffffff034757b4d8, ffffff034757b4d8, 1, > ffffff034757b008, 0) > ffffff001300db00 init_node+0x78(ffffff034757b008) > ffffff001300db40 i_ndi_config_node+0xfa(ffffff034757b008, 6, 0) > ffffff001300db60 i_ddi_attachchild+0x40(ffffff034757b008) > ffffff001300dba0 i_ddi_attach_node_hierarchy+0x61(ffffff034757b008) > ffffff001300dbe0 attach_driver_nodes+0x59(b8) > ffffff001300dc20 ddi_hold_installed_driver+0x116(b8) > ffffff001300dc40 attach_drivers+0x37() > ffffff001300dc50 thread_start+8() > > So I decided to reboot again, this time using the same number of > ",10:c" in your initial mail, got another THREAD: attach_drivers(), > but still no mod_load > > http://pastebin.com/f1faf9e29 > > Any ideas? > > Regards, > > Fajar
On Fri, Feb 20, 2009 at 6:49 AM, Mark Johnson <Mark.Johnson@sun.com> wrote:> try disable-<usb>=true... Whatever USB device is sharing > that interrupt. Just to see if it will boot MP with just > the usb driver disabled.Hi Mark, Good news and bad news. Good news, I updated to snv_107, and the boot problem goes away. It boots correctly. Bad news, since I updated using "beadm mount" and "pkg -R /a image-update", I don''t have any more snv_106 to test your suggestion :D But at least one problem is closed. So here''s another good-news bad-news related to Xen and crossbow on opensolaris. In the past (snv_98), I''m using bnx1 as trunk. Since I can''t bridge/share vlan interface (bnx128001) with domU, I share the trunk and domU uses that trunk and perform vlan operation (creating eth0.102) inside it. It works. snv_107 has crossbow, which change how vlans are created. The good news, it has rename-link which makes me able to create interfaces with names similar to my linux dom0''s (eth0 or br102, for example). Bad news, using vlans for domU does NOT work at all. I tried : - sharing the trunk -> doesn''t work. domU can''t see any traffic. At all. - sharing the vlan -> doesn''t work, domU can''t start because "Error: Device 0 (vif) could not be connected. Backend device not found." When not using vlan, domU works as expected. Should I file this as a bug? Regards, Fajar
On 20 Feb 2009, at 12:49, Fajar A. Nugraha wrote:> snv_107 has crossbow, which change how vlans are created. The good > news, it has rename-link which makes me able to create interfaces with > names similar to my linux dom0''s (eth0 or br102, for example). > > Bad news, using vlans for domU does NOT work at all. I tried : > - sharing the trunk -> doesn''t work. domU can''t see any traffic. At > all. > - sharing the vlan -> doesn''t work, domU can''t start because "Error: > Device 0 (vif) could not be connected. Backend device not found." > > When not using vlan, domU works as expected. Should I file this as a > bug?This is fixed in 109. You should create the VNIC with VLAN in dom0 (dladm create-vlan) and then pass that device through to the guest.
On 5 Feb 2009, at 03:29, Fajar A. Nugraha wrote:> http://www.nabble.com/Another-GPLPV-pre-release-0.9.11-pre20-td20336594.html > > With the latest driver 0.9.12-pre13 from > http://www.meadowcourt.org/downloads/ installation completed > succesfully, PV driver driver works correctly, but the network > interface doesn''t work. > > I''m testing it on opensolaris snv_98 since all versions after that > (including snv_106) refused to boot xen kernel on HP box with smart > array and qlogic card (normal kernel works fine with > disable-qlc=true). Filed a bug on > http://defect.opensolaris.org/bz/show_bug.cgi?id=5905 but no takers so > far (guess it might be HW-specific issue). > > Let me know if I can help with some testing on this (somewhat) old > opensolaris version.I made some changes to the Solaris backend driver in 109 which should improve things. I''d be interested if you have another chance to test with that.
On Fri, Feb 20, 2009 at 8:46 PM, David Edmondson <dme@sun.com> wrote:>> Bad news, using vlans for domU does NOT work at all. I tried :> > This is fixed in 109.Thanks for the update. So if everything goes normally, the IPS dev repository will have it in about four weeks from now, right?> > You should create the VNIC with VLAN in dom0 (dladm create-vlan) and then > pass that device through to the guest.Does it work the same way as vlan and bridge in Linux? In Linux, I can choose whether to bridge the link or the vlan, performing vlan operation in either dom0 or domU, and both will work.