Florian Manschwetus
2009-Mar-09 11:21 UTC
SXCE b108 spontaneous reboots caused by java in Linux PV guest
Just the short version. We have a sun fire x4150 with a sun stk 2540 fc array. Currently we use Nevada SXCE b108 (the problem has occurred in the past with other builds but we was unable to track it down) with xvm core (just the plain stuff without the bui and so) our guest are based on raw devices on our san. We have several Gentoo Linux PV guests (kernel 2.6.27) one of them uses 8 logical cpus and has half scheduling weight so it could be used for system hungry low priority stuff (e.g. some simulations coded in java) so one of those simulation (started fife to seven times to use the eight way multiprocessing) makes the whole machine after some time running to fire a hardreset. That is really BAD. So I would be glad to give you more details if needed, just ask (with some detail where to look). System has latest bios (the one which removes apm), what, so my feeling, has reduced the occurrence of this strange behavior. thanks, florian
Mark Johnson
2009-Mar-09 11:37 UTC
Re: SXCE b108 spontaneous reboots caused by java in Linux PV guest
Florian Manschwetus wrote:> Just the short version. > We have a sun fire x4150 with a sun stk 2540 fc array. > Currently we use Nevada SXCE b108 (the problem has occurred in the past > with other builds but we was unable to track it down) with xvm core > (just the plain stuff without the bui and so) our guest are based on raw > devices on our san. We have several Gentoo Linux PV guests (kernel > 2.6.27) one of them uses 8 logical cpus and has half scheduling weight > so it could be used for system hungry low priority stuff (e.g. some > simulations coded in java) so one of those simulation (started fife to > seven times to use the eight way multiprocessing) makes the whole > machine after some time running to fire a hardreset.What do you mean by hard reset?> That is really BAD. > > So I would be glad to give you more details if needed, just ask (with > some detail where to look).Yes please... :-) What are the guest configurations? (i.e. what are you using for disks, NICs, etc). Do you have the xen and solaris consoles redirected to the serial port? Do you see anything on the serial console before the reboot? Have you tried to load the kernel debugger?> System has latest bios (the one which removes apm), what, so my feeling, > has reduced the occurrence of this strange behavior.reduced or removed? Thanks, MRJ
Florian Manschwetus
2009-Mar-09 11:49 UTC
Re: SXCE b108 spontaneous reboots caused by java in Linux PV guest
Mark Johnson schrieb:> > > Florian Manschwetus wrote: >> Just the short version. >> We have a sun fire x4150 with a sun stk 2540 fc array. >> Currently we use Nevada SXCE b108 (the problem has occurred in the past >> with other builds but we was unable to track it down) with xvm core >> (just the plain stuff without the bui and so) our guest are based on raw >> devices on our san. We have several Gentoo Linux PV guests (kernel >> 2.6.27) one of them uses 8 logical cpus and has half scheduling weight >> so it could be used for system hungry low priority stuff (e.g. some >> simulations coded in java) so one of those simulation (started fife to >> seven times to use the eight way multiprocessing) makes the whole >> machine after some time running to fire a hardreset. > > What do you mean by hard reset?Look at the attached emails (send by ilom)> > >> That is really BAD. >> >> So I would be glad to give you more details if needed, just ask (with >> some detail where to look). > > Yes please... :-) What are the guest configurations? (i.e. what > are you using for disks, NICs, etc). Do you have the xen and solaris > consoles redirected to the serial port? Do you see anything on the > serial console before the reboot?Not tested so far, but I''ll try to get something> > Have you tried to load the kernel debugger?Sorry, I''m really new to solaris so I will need definitely some help here.> > > >> System has latest bios (the one which removes apm), what, so my feeling, >> has reduced the occurrence of this strange behavior. > > reduced or removed?Afaik, latest bios for x4150 has removed apm support from the machines, but review release notes online @suns website for more details. Definitely it has made these machines able to run XVM Server EA3 (using the mentioned workaround) Thanks, florian
Mark Johnson
2009-Mar-09 14:55 UTC
Re: SXCE b108 spontaneous reboots caused by java in Linux PV guest
Florian Manschwetus wrote:> Mark Johnson schrieb: >> >> Florian Manschwetus wrote: >>> Just the short version. >>> We have a sun fire x4150 with a sun stk 2540 fc array. >>> Currently we use Nevada SXCE b108 (the problem has occurred in the past >>> with other builds but we was unable to track it down) with xvm core >>> (just the plain stuff without the bui and so) our guest are based on raw >>> devices on our san. We have several Gentoo Linux PV guests (kernel >>> 2.6.27) one of them uses 8 logical cpus and has half scheduling weight >>> so it could be used for system hungry low priority stuff (e.g. some >>> simulations coded in java) so one of those simulation (started fife to >>> seven times to use the eight way multiprocessing) makes the whole >>> machine after some time running to fire a hardreset. >> What do you mean by hard reset? > Look at the attached emails (send by ilom)I don''t see anything in the attachments?>>> That is really BAD. >>> >>> So I would be glad to give you more details if needed, just ask (with >>> some detail where to look). >> Yes please... :-) What are the guest configurations? (i.e. what >> are you using for disks, NICs, etc). Do you have the xen and solaris >> consoles redirected to the serial port? Do you see anything on the >> serial console before the reboot? > Not tested so far, but I''ll try to get something >> Have you tried to load the kernel debugger? > Sorry, I''m really new to solaris so I will need definitely some help here.In the menu.lst entry, you want something like... kernel$ /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g msi module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -k -B console=hypervisor>> >> >>> System has latest bios (the one which removes apm), what, so my feeling, >>> has reduced the occurrence of this strange behavior. >> reduced or removed? > Afaik, latest bios for x4150 has removed apm support from the machines, > but review release notes online @suns website for more details. > > Definitely it has made these machines able to run XVM Server EA3 (using > the mentioned workaround)Does it make a difference on Nevada SXCE b108? Thanks, MRJ