Yuvraj Agarwal
2010-Apr-27 07:41 UTC
[Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
Hi All, We are setting up a system with a large number of very small VMs for a project. We worked through a number of limitations, including those imposed by blktap2 devices and the number of dynamic IRQ (set kernel config NR_CPUS) etc. After these changes we were able to get to 154 domUs (!), but as soon as we start up the 155th domU the system crashes. We edited each domU config to have one blktap2 device (the disk image) and two virtual network interfaces each. We are using the standard XEN 4.0 (stable/release version) with the 2.6.31.13 (pvops) kernel. Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, but as far as we can see we don''t quite know what might be going causing the system to crash (no console access anymore and system becomes unresponsive and needs to be power-cycled). I have pasted only the relevant bits of information (the last domU that did successfully start and the next one that failed). It may be the case that all the log messages weren''t flushed before the system crashed. Does anyone know where this limit of 155 domU is coming from and how we can fix/increase it? thanks Yuvraj _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Apr-27 09:02 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Tue, Apr 27, 2010 at 12:41:07AM -0700, Yuvraj Agarwal wrote:> Hi All, > > > > We are setting up a system with a large number of very small VMs for a > project. We worked through a number of limitations, including those > imposed by blktap2 devices and the number of dynamic IRQ (set kernel > config NR_CPUS) etc. After these changes we were able to get to 154 domUs > (!), but as soon as we start up the 155^th domU the system crashes. We > edited each domU config to have one blktap2 device (the disk image) and > two virtual network interfaces each. > > > > We are using the standard XEN 4.0 (stable/release version) with the > 2.6.31.13 (pvops) kernel. > > > > Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, > but as far as we can see we don''t quite know what might be going causing > the system to crash (no console access anymore and system becomes > unresponsive and needs to be power-cycled). I have pasted only the > relevant bits of information (the last domU that did successfully start > and the next one that failed). It may be the case that all the log > messages weren''t flushed before the system crashed... > > > > Does anyone know where this limit of 155 domU is coming from and how we > can fix/increase it? >Please paste your dom0 grub.conf. Are you using memory ballooning? -- Pasi> thanks > > Yuvraj> Apr 26 20:08:08 BlackBox tapdisk2[10074]: Created /dev/xen/blktap-2/blktap152 device > Apr 26 20:08:08 BlackBox tapdisk2[10074]: Created /dev/xen/blktap-2/tapdev152 device > Apr 26 20:08:08 BlackBox tapdisk2[10074]: new interface: ring: 251, device: 253, minor: 152 > Apr 26 20:08:08 BlackBox tapdisk2[10074]: I/O queue driver: lio > Apr 26 20:08:08 BlackBox tapdisk2[10074]: block-aio open(''/home/xen/domains/testing-ss-155.ucsd.edu/disk.img'') > Apr 26 20:08:08 BlackBox tapdisk2[10074]: open(/home/xen/domains/testing-ss-155.ucsd.edu/disk.img) with O_DIRECT > Apr 26 20:08:08 BlackBox tapdisk2[10074]: Image size: #012#011pre sector_shift [1073741824]#012#011post sector_shift [2097152] > Apr 26 20:08:08 BlackBox tapdisk2[10074]: opened image /home/xen/domains/testing-ss-155.ucsd.edu/disk.img (1 users, state: 0x00000001, type: 0) > Apr 26 20:08:08 BlackBox tapdisk2[10074]: VBD CHAIN: > Apr 26 20:08:08 BlackBox tapdisk2[10074]: /home/xen/domains/testing-ss-155.ucsd.edu/disk.img: 0 > Apr 26 20:08:09 BlackBox logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/153/770 > Apr 26 20:08:10 BlackBox logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/153/0 > Apr 26 20:08:10 BlackBox logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. > Apr 26 20:08:10 BlackBox logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. > Apr 26 20:08:10 BlackBox logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif153.0, bridge testbr. > Apr 26 20:08:10 BlackBox logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/153/0/hotplug-status connected to xenstore. > Apr 26 20:08:21 BlackBox logger: /etc/xen/scripts/block: Writing backend/vbd/153/770/physical-device fd:98 to xenstore. > Apr 26 20:08:21 BlackBox logger: /etc/xen/scripts/block: Writing backend/vbd/153/770/hotplug-status connected to xenstore. > Apr 26 20:08:21 BlackBox logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/153/1 > Apr 26 20:08:21 BlackBox logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. > Apr 26 20:08:21 BlackBox logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif153.1, bridge dummy. > Apr 26 20:08:21 BlackBox logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/153/1/hotplug-status connected to xenstore. > > > Apr 26 20:10:17 BlackBox tapdisk2[12749]: Created /dev/xen/blktap-2/blktap153 device > Apr 26 20:10:17 BlackBox tapdisk2[12749]: Created /dev/xen/blktap-2/tapdev153 device > Apr 26 20:10:17 BlackBox tapdisk2[12749]: new interface: ring: 251, device: 253, minor: 153 > Apr 26 20:10:17 BlackBox tapdisk2[12749]: I/O queue driver: lio > Apr 26 20:10:17 BlackBox tapdisk2[12749]: block-aio open(''/home/xen/domains/testing-ss-156.ucsd.edu/disk.img'') > Apr 26 20:10:17 BlackBox tapdisk2[12749]: open(/home/xen/domains/testing-ss-156.ucsd.edu/disk.img) with O_DIRECT > Apr 26 20:10:17 BlackBox tapdisk2[12749]: Image size: #012#011pre sector_shift [1073741824]#012#011post sector_shift [2097152] > Apr 26 20:10:17 BlackBox tapdisk2[12749]: opened image /home/xen/domains/testing-ss-156.ucsd.edu/disk.img (1 users, state: 0x00000001, type: 0) > Apr 26 20:10:17 BlackBox tapdisk2[12749]: VBD CHAIN: > Apr 26 20:10:17 BlackBox tapdisk2[12749]: /home/xen/domains/testing-ss-156.ucsd.edu/disk.img: 0 > Apr 26 20:10:17 BlackBox logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/154/770 > Apr 26 20:10:29 BlackBox logger: /etc/xen/scripts/block: Writing backend/vbd/154/770/physical-device fd:99 to xenstore. > Apr 26 20:10:29 BlackBox logger: /etc/xen/scripts/block: Writing backend/vbd/154/770/hotplug-status connected to xenstore. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/154/0 > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif154.0, bridge testbr. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/154/0/hotplug-status connected to xenstore. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/154/1 > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif154.1, bridge dummy. > Apr 26 20:10:30 BlackBox logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/154/1/hotplug-status connected to xenstore.> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Apr-27 13:59 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Tue, Apr 27, 2010 at 12:41:07AM -0700, Yuvraj Agarwal wrote:> Hi All, > > > > We are setting up a system with a large number of very small VMs for a > project. We worked through a number of limitations, including those > imposed by blktap2 devices and the number of dynamic IRQ (set kernel > config NR_CPUS) etc. After these changes we were able to get to 154 domUs > (!), but as soon as we start up the 155th domU the system crashes. We > edited each domU config to have one blktap2 device (the disk image) and > two virtual network interfaces each.This sounds strangly familiar. I believe somebody posted a question about this a couple of months ago on the xen-devel mailing list and found the answer. I think part of it was CONFIG_LEGACY_PTY_COUNT had to be high. But I don''t remember the exact details - you might want to search the archive. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-27 17:14 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
I am not using grub2; using grub-legacy instead. Also, once the dom0 boots up I do have to set the dom0 mem to: (1) xm mem-set 0 12000 (otherwise when starting up a lot of domU it would run out of memory) and (2) echo 1548576 > /proc/sys/fs/aio-max-nr My /boot/grub/menu.lst (pasted relevant lines) ************************************************** root@MESL-BlackBox:/usr/src# cat /boot/grub/menu.lst # menu.lst - See: grub(8), info grub, update-grub(8) # grub-install(8), grub-floppy(8), # grub-md5-crypt, /usr/share/doc/grub # and /usr/share/doc/grub-doc/. ## default num # Set the default entry to the entry number NUM. Numbering starts from 0, and # the entry number 0 is the default if the command is not used. # # You can specify ''saved'' instead of a number. In this case, the default entry # is the entry saved with the command ''savedefault''. # WARNING: If you are using dmraid do not use ''savedefault'' or your # array will desync and will not let you boot your system. default 0 ## timeout sec # Set a timeout, in SEC seconds, before automatically booting the default entry # (normally the first entry defined). timeout 20 ### BEGIN AUTOMAGIC KERNELS LIST ## lines between the AUTOMAGIC KERNELS LIST markers will be modified ## by the debian update-grub script except for the default options below ## DO NOT UNCOMMENT THEM, Just edit them to your needs ## ## Start Default Options ## ## default kernel options ## default kernel options for automagic boot options ## If you want special options for specific kernels use kopt_x_y_z ## where x.y.z is kernel version. Minor versions can be omitted. ## e.g. kopt=root=/dev/hda1 ro ## kopt_2_6_8=root=/dev/hdc1 ro ## kopt_2_6_8_2_686=root=/dev/hdc2 ro # kopt=root=UUID=909f7c32-639a-469d-b34b-b418d2b6a2dc ro ## default grub root device ## e.g. groot=(hd0,0) # groot=909f7c32-639a-469d-b34b-b418d2b6a2dc ## should update-grub create alternative automagic boot options ## e.g. alternative=true ## alternative=false # alternative=true ## should update-grub lock alternative automagic boot options ## e.g. lockalternative=true ## lockalternative=false # lockalternative=false ## additional options to use with the default boot option, but not with the ## alternatives ## e.g. defoptions=vga=791 resume=/dev/hda5 # defoptions=quiet splash ## should update-grub lock old automagic boot options ## e.g. lockold=false ## lockold=true # lockold=false ## Xen hypervisor options to use with the default Xen boot option # xenhopt=dom0_max_vcpus=1 dom0_mem=8192 ## Xen Linux kernel options to use with the default Xen boot option # xenkopt=console=tty0 ## altoption boot targets option ## multiple altoptions lines are allowed ## e.g. altoptions=(extra menu suffix) extra boot options ## altoptions=(recovery) single # altoptions=(recovery mode) single ## controls how many kernels should be put into the menu.lst ## only counts the first occurence of a kernel, not the ## alternative kernel options ## e.g. howmany=all ## howmany=7 # howmany=all ## specify if running in Xen domU or have grub detect automatically ## update-grub will ignore non-xen kernels when running in domU and vice versa ## e.g. indomU=detect ## indomU=true ## indomU=false # indomU=detect ## should update-grub create memtest86 boot option ## e.g. memtest86=true ## memtest86=false # memtest86=true ## should update-grub adjust the value of the default booted system ## can be true or false # updatedefaultentry=false ## should update-grub add savedefault to the default options ## can be true or false # savedefault=false ## ## End Default Options ## title Xen 4.0.0 / Debian GNU/Linux, kernel 2.6.31.13 root (hd0,0) kernel /boot/xen-4.0.0.gz module /boot/vmlinuz-2.6.31.13 root=/dev/sda1 ro title Xen 3.4.2 / Debian GNU/Linux, kernel 2.6.31.8-xenapr2010 root (hd0,0) kernel /boot/xen-3.4.2.gz module /boot/vmlinuz-2.6.31.8-xenapr2010 root=/dev/sda1 ro console=tty0 module /boot/initrd.img-2.6.31.8-xenapr2010 ### END DEBIAN AUTOMAGIC KERNELS LIST Please paste your dom0 grub.conf. Are you using memory ballooning? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-27 17:18 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
Thank you for the pointer. Unfortunately, I was unable to find the message you mentioned after looking through the archives... In my current kernel config (default 2.6.31.13 pvops kernel) I don''t have CONFIG_LEGACY_PTY_COUNT defined, perhaps because I have the following in the .config ? # CONFIG_LEGACY_PTYS is not set Do you think setting the variable above and setting the CONFIG_LEGACY_PTY_COUNT to a high value will help in this case? --Yuvraj -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Tuesday, April 27, 2010 6:59 AM To: Yuvraj Agarwal Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU On Tue, Apr 27, 2010 at 12:41:07AM -0700, Yuvraj Agarwal wrote:> Hi All, > > > > We are setting up a system with a large number of very small VMs for a > project. We worked through a number of limitations, including those > imposed by blktap2 devices and the number of dynamic IRQ (set kernel > config NR_CPUS) etc. After these changes we were able to get to 154domUs> (!), but as soon as we start up the 155th domU the system crashes. We > edited each domU config to have one blktap2 device (the disk image) and > two virtual network interfaces each.This sounds strangly familiar. I believe somebody posted a question about this a couple of months ago on the xen-devel mailing list and found the answer. I think part of it was CONFIG_LEGACY_PTY_COUNT had to be high. But I don''t remember the exact details - you might want to search the archive. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Apr-27 17:18 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Tue, Apr 27, 2010 at 10:14:40AM -0700, Yuvraj Agarwal wrote:> I am not using grub2; using grub-legacy instead. > > Also, once the dom0 boots up I do have to set the dom0 mem to: > > (1) xm mem-set 0 12000 (otherwise when starting up a lot of domU it > would run out of memory) > and > (2) echo 1548576 > /proc/sys/fs/aio-max-nr >You might want to use dom0_mem=<X>M option for xen.gz instead. See: http://wiki.xensource.com/xenwiki/XenBestPractices -- Pasi> > My /boot/grub/menu.lst (pasted relevant lines) > ************************************************** > > root@MESL-BlackBox:/usr/src# cat /boot/grub/menu.lst > # menu.lst - See: grub(8), info grub, update-grub(8) > # grub-install(8), grub-floppy(8), > # grub-md5-crypt, /usr/share/doc/grub > # and /usr/share/doc/grub-doc/. > > ## default num > # Set the default entry to the entry number NUM. Numbering starts from 0, > and > # the entry number 0 is the default if the command is not used. > # > # You can specify ''saved'' instead of a number. In this case, the default > entry > # is the entry saved with the command ''savedefault''. > # WARNING: If you are using dmraid do not use ''savedefault'' or your > # array will desync and will not let you boot your system. > default 0 > > ## timeout sec > # Set a timeout, in SEC seconds, before automatically booting the default > entry > # (normally the first entry defined). > timeout 20 > > ### BEGIN AUTOMAGIC KERNELS LIST > ## lines between the AUTOMAGIC KERNELS LIST markers will be modified > ## by the debian update-grub script except for the default options below > > ## DO NOT UNCOMMENT THEM, Just edit them to your needs > > ## ## Start Default Options ## > ## default kernel options > ## default kernel options for automagic boot options > ## If you want special options for specific kernels use kopt_x_y_z > ## where x.y.z is kernel version. Minor versions can be omitted. > ## e.g. kopt=root=/dev/hda1 ro > ## kopt_2_6_8=root=/dev/hdc1 ro > ## kopt_2_6_8_2_686=root=/dev/hdc2 ro > # kopt=root=UUID=909f7c32-639a-469d-b34b-b418d2b6a2dc ro > > ## default grub root device > ## e.g. groot=(hd0,0) > # groot=909f7c32-639a-469d-b34b-b418d2b6a2dc > > ## should update-grub create alternative automagic boot options > ## e.g. alternative=true > ## alternative=false > # alternative=true > > ## should update-grub lock alternative automagic boot options > ## e.g. lockalternative=true > ## lockalternative=false > # lockalternative=false > > ## additional options to use with the default boot option, but not with > the > ## alternatives > ## e.g. defoptions=vga=791 resume=/dev/hda5 > # defoptions=quiet splash > > ## should update-grub lock old automagic boot options > ## e.g. lockold=false > ## lockold=true > # lockold=false > > ## Xen hypervisor options to use with the default Xen boot option > # xenhopt=dom0_max_vcpus=1 dom0_mem=8192 > > ## Xen Linux kernel options to use with the default Xen boot option > # xenkopt=console=tty0 > > ## altoption boot targets option > ## multiple altoptions lines are allowed > ## e.g. altoptions=(extra menu suffix) extra boot options > ## altoptions=(recovery) single > # altoptions=(recovery mode) single > > ## controls how many kernels should be put into the menu.lst > ## only counts the first occurence of a kernel, not the > ## alternative kernel options > ## e.g. howmany=all > ## howmany=7 > # howmany=all > > ## specify if running in Xen domU or have grub detect automatically > ## update-grub will ignore non-xen kernels when running in domU and vice > versa > ## e.g. indomU=detect > ## indomU=true > ## indomU=false > # indomU=detect > > ## should update-grub create memtest86 boot option > ## e.g. memtest86=true > ## memtest86=false > # memtest86=true > > ## should update-grub adjust the value of the default booted system > ## can be true or false > # updatedefaultentry=false > > ## should update-grub add savedefault to the default options > ## can be true or false > # savedefault=false > > ## ## End Default Options ## > > title Xen 4.0.0 / Debian GNU/Linux, kernel 2.6.31.13 > root (hd0,0) > kernel /boot/xen-4.0.0.gz > module /boot/vmlinuz-2.6.31.13 root=/dev/sda1 ro > > title Xen 3.4.2 / Debian GNU/Linux, kernel 2.6.31.8-xenapr2010 > root (hd0,0) > kernel /boot/xen-3.4.2.gz > module /boot/vmlinuz-2.6.31.8-xenapr2010 root=/dev/sda1 ro > console=tty0 > module /boot/initrd.img-2.6.31.8-xenapr2010 > > ### END DEBIAN AUTOMAGIC KERNELS LIST > > > Please paste your dom0 grub.conf. > Are you using memory ballooning? > > -- Pasi >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Apr-27 18:51 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On 04/27/2010 12:41 AM, Yuvraj Agarwal wrote:> > Hi All, > > > > We are setting up a system with a large number of very small VMs for a > project. We worked through a number of limitations, including those > imposed by blktap2 devices and the number of dynamic IRQ (set kernel > config NR_CPUS) etc. After these changes we were able to get to 154 > domUs (!), but as soon as we start up the 155^th domU the system crashes. >How does the system crash? You mean the dom0 kernel crashes?> Does anyone know where this limit of 155 domU is coming from and how > we can fix/increase it? >What does /proc/interrupts look like before the crash? How many network and block devices do you have in dom0? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-27 18:58 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
I did make that change (dom0_mem=8192M), but I still get the same error. When I started the 154th domU (last time it was 155th) the dom0 crashes (it kills network connections and I have to manually go and reboot it). I did get a little bit more information on xend.log and daemon.log (attached). After the last successful VM startup I did check and make sure the dom0 did indeed have enough memory. root@MESL-BlackBox:/home/xen/noswap-configs# xm list | grep testing | wc -l 153 root@MESL-BlackBox:/home/xen/noswap-configs# xm info | grep mem total_memory : 24490 free_memory : 9254 node_to_memory : node0:2076 node_to_dma32_mem : node0:2076 xen_commandline : dom0_mem=8192M root@MESL-BlackBox:/home/xen/noswap-configs# xm create testing-ss-157.ucsd.edu Using config file "./testing-ss-157.ucsd.edu". Started domain testing-ss-157.ucsd.edu (id=154) /var/log/daemon.log --> daemon.log <attached> /var/log/xend.log --> xend.log <attached> We''d appreciate any pointers to fix this... Thank you --Yuvraj -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Pasi Kärkkäinen Sent: Tuesday, April 27, 2010 10:19 AM To: Yuvraj Agarwal Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU On Tue, Apr 27, 2010 at 10:14:40AM -0700, Yuvraj Agarwal wrote:> I am not using grub2; using grub-legacy instead. > > Also, once the dom0 boots up I do have to set the dom0 mem to: > > (1) xm mem-set 0 12000 (otherwise when starting up a lot of domU it > would run out of memory) > and > (2) echo 1548576 > /proc/sys/fs/aio-max-nr >You might want to use dom0_mem=<X>M option for xen.gz instead. See: http://wiki.xensource.com/xenwiki/XenBestPractices -- Pasi> > My /boot/grub/menu.lst (pasted relevant lines) > ************************************************** > > root@MESL-BlackBox:/usr/src# cat /boot/grub/menu.lst > # menu.lst - See: grub(8), info grub, update-grub(8) > # grub-install(8), grub-floppy(8), > # grub-md5-crypt, /usr/share/doc/grub > # and /usr/share/doc/grub-doc/. > > ## default num > # Set the default entry to the entry number NUM. Numbering starts from0,> and > # the entry number 0 is the default if the command is not used. > # > # You can specify ''saved'' instead of a number. In this case, the default > entry > # is the entry saved with the command ''savedefault''. > # WARNING: If you are using dmraid do not use ''savedefault'' or your > # array will desync and will not let you boot your system. > default 0 > > ## timeout sec > # Set a timeout, in SEC seconds, before automatically booting thedefault> entry > # (normally the first entry defined). > timeout 20 > > ### BEGIN AUTOMAGIC KERNELS LIST > ## lines between the AUTOMAGIC KERNELS LIST markers will be modified > ## by the debian update-grub script except for the default options below > > ## DO NOT UNCOMMENT THEM, Just edit them to your needs > > ## ## Start Default Options ## > ## default kernel options > ## default kernel options for automagic boot options > ## If you want special options for specific kernels use kopt_x_y_z > ## where x.y.z is kernel version. Minor versions can be omitted. > ## e.g. kopt=root=/dev/hda1 ro > ## kopt_2_6_8=root=/dev/hdc1 ro > ## kopt_2_6_8_2_686=root=/dev/hdc2 ro > # kopt=root=UUID=909f7c32-639a-469d-b34b-b418d2b6a2dc ro > > ## default grub root device > ## e.g. groot=(hd0,0) > # groot=909f7c32-639a-469d-b34b-b418d2b6a2dc > > ## should update-grub create alternative automagic boot options > ## e.g. alternative=true > ## alternative=false > # alternative=true > > ## should update-grub lock alternative automagic boot options > ## e.g. lockalternative=true > ## lockalternative=false > # lockalternative=false > > ## additional options to use with the default boot option, but not with > the > ## alternatives > ## e.g. defoptions=vga=791 resume=/dev/hda5 > # defoptions=quiet splash > > ## should update-grub lock old automagic boot options > ## e.g. lockold=false > ## lockold=true > # lockold=false > > ## Xen hypervisor options to use with the default Xen boot option > # xenhopt=dom0_max_vcpus=1 dom0_mem=8192 > > ## Xen Linux kernel options to use with the default Xen boot option > # xenkopt=console=tty0 > > ## altoption boot targets option > ## multiple altoptions lines are allowed > ## e.g. altoptions=(extra menu suffix) extra boot options > ## altoptions=(recovery) single > # altoptions=(recovery mode) single > > ## controls how many kernels should be put into the menu.lst > ## only counts the first occurence of a kernel, not the > ## alternative kernel options > ## e.g. howmany=all > ## howmany=7 > # howmany=all > > ## specify if running in Xen domU or have grub detect automatically > ## update-grub will ignore non-xen kernels when running in domU and vice > versa > ## e.g. indomU=detect > ## indomU=true > ## indomU=false > # indomU=detect > > ## should update-grub create memtest86 boot option > ## e.g. memtest86=true > ## memtest86=false > # memtest86=true > > ## should update-grub adjust the value of the default booted system > ## can be true or false > # updatedefaultentry=false > > ## should update-grub add savedefault to the default options > ## can be true or false > # savedefault=false > > ## ## End Default Options ## > > title Xen 4.0.0 / Debian GNU/Linux, kernel 2.6.31.13 > root (hd0,0) > kernel /boot/xen-4.0.0.gz > module /boot/vmlinuz-2.6.31.13 root=/dev/sda1 ro > > title Xen 3.4.2 / Debian GNU/Linux, kernel 2.6.31.8-xenapr2010 > root (hd0,0) > kernel /boot/xen-3.4.2.gz > module /boot/vmlinuz-2.6.31.8-xenapr2010 root=/dev/sda1 ro > console=tty0 > module /boot/initrd.img-2.6.31.8-xenapr2010 > > ### END DEBIAN AUTOMAGIC KERNELS LIST > > > Please paste your dom0 grub.conf. > Are you using memory ballooning? > > -- Pasi >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-27 19:10 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
Dom0 crashes (the network connections are all killed, and I cant log in locally on the machine anymore either). I''ll start up the 154 domU again and report back /proc/interrupts and the number of blktap2 devices. I do recall that when I checked last (cat /sys/class/blktap2/*/name | wc -l ) I used to get 1 blktap2 device per domU (which means that I had 154 blktap devices). Each domU has two network interfaces. I''ll report back what we find when we start up all the domUs again. On another related note it takes a LONG time to start up all these 150 domains (>20-30mins), and I believe the culprit is the xenstored since it has to write the entire xenstore for each domU. We tried to edit to build the OCAML version instead which is supposed to be faster (?), but xenstored did not start in that case. We edited the Config.mk file to CONFIG_OCAML_XENSTORED ?= y We did clean out the "dist" directory and rebuild and reinstall xen, but xenstored did not start up. Are we missing obvious? --Yuvraj -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge Sent: Tuesday, April 27, 2010 11:52 AM To: Yuvraj Agarwal Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU On 04/27/2010 12:41 AM, Yuvraj Agarwal wrote:> > Hi All, > > > > We are setting up a system with a large number of very small VMs for a > project. We worked through a number of limitations, including those > imposed by blktap2 devices and the number of dynamic IRQ (set kernel > config NR_CPUS) etc. After these changes we were able to get to 154 > domUs (!), but as soon as we start up the 155^th domU the systemcrashes.>How does the system crash? You mean the dom0 kernel crashes?> Does anyone know where this limit of 155 domU is coming from and how > we can fix/increase it? >What does /proc/interrupts look like before the crash? How many network and block devices do you have in dom0? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Apr-27 19:27 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On 04/27/2010 12:10 PM, Yuvraj Agarwal wrote:> Dom0 crashes (the network connections are all killed, and I cant log in > locally on the machine anymore either). >Does it display anything on the console? Do you have a serial console setup?> On another related note it takes a LONG time to start up all these 150 > domains (>20-30mins), and I believe the culprit is the xenstored since it > has to write the entire xenstore for each domU.That should be fairly easy to verify with "top" in dom0.> We tried to edit to build > the OCAML version instead which is supposed to be faster (?), but > xenstored did not start in that case. We edited the Config.mk file to > > CONFIG_OCAML_XENSTORED ?= y > > We did clean out the "dist" directory and rebuild and reinstall xen, but > xenstored did not start up. Are we missing obvious? >What happens if you manually start xenstored (before xend)? Does it complain, crash, something else? I think this is a very under-tested configuration, so it wouldn''t surprise me if there are problems with it. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Apr-27 19:29 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Tue, Apr 27, 2010 at 11:58:42AM -0700, Yuvraj Agarwal wrote:> I did make that change (dom0_mem=8192M), but I still get the same error. > When I started the 154th domU (last time it was 155th) the dom0 crashes > (it kills network connections and I have to manually go and reboot it). >Hmm.. interesting. It should never crash.. Do you have a serial console so you could capture the error/crash messages? See: http://wiki.xensource.com/xenwiki/XenSerialConsole> I did get a little bit more information on xend.log and daemon.log > (attached). After the last successful VM startup I did check and make > sure the dom0 did indeed have enough memory. > > root@MESL-BlackBox:/home/xen/noswap-configs# xm list | grep testing | wc > -l > 153 > root@MESL-BlackBox:/home/xen/noswap-configs# xm info | grep mem > > total_memory : 24490 > free_memory : 9254Ok so over 9 GB of free memory in the hypervisor. How''s the memory in dom0? It still has 8 GB and most of it is free?> node_to_memory : node0:2076 > node_to_dma32_mem : node0:2076 > xen_commandline : dom0_mem=8192M > root@MESL-BlackBox:/home/xen/noswap-configs# xm create > testing-ss-157.ucsd.edu > Using config file "./testing-ss-157.ucsd.edu". > Started domain testing-ss-157.ucsd.edu (id=154) >How much memory does that ss-157 domU have configured?> /var/log/daemon.log --> daemon.log <attached> > /var/log/xend.log --> xend.log <attached> > > We''d appreciate any pointers to fix this... >Did you disable dom0 ballooning from xend-config.sxp? Did you make sure dom0 is NOT ballooned? Or does this crash happen when dom0 is ballooned? Please capture Xen and dom0 messages when it crashes.. -- Pasi> Thank you > --Yuvraj > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Pasi > Kärkkäinen > Sent: Tuesday, April 27, 2010 10:19 AM > To: Yuvraj Agarwal > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes > on starting 155th domU > > On Tue, Apr 27, 2010 at 10:14:40AM -0700, Yuvraj Agarwal wrote: > > I am not using grub2; using grub-legacy instead. > > > > Also, once the dom0 boots up I do have to set the dom0 mem to: > > > > (1) xm mem-set 0 12000 (otherwise when starting up a lot of domU it > > would run out of memory) > > and > > (2) echo 1548576 > /proc/sys/fs/aio-max-nr > > > > You might want to use dom0_mem=<X>M option for xen.gz instead. > > See: > http://wiki.xensource.com/xenwiki/XenBestPractices > > -- Pasi > > > > > My /boot/grub/menu.lst (pasted relevant lines) > > ************************************************** > > > > root@MESL-BlackBox:/usr/src# cat /boot/grub/menu.lst > > # menu.lst - See: grub(8), info grub, update-grub(8) > > # grub-install(8), grub-floppy(8), > > # grub-md5-crypt, /usr/share/doc/grub > > # and /usr/share/doc/grub-doc/. > > > > ## default num > > # Set the default entry to the entry number NUM. Numbering starts from > 0, > > and > > # the entry number 0 is the default if the command is not used. > > # > > # You can specify ''saved'' instead of a number. In this case, the default > > entry > > # is the entry saved with the command ''savedefault''. > > # WARNING: If you are using dmraid do not use ''savedefault'' or your > > # array will desync and will not let you boot your system. > > default 0 > > > > ## timeout sec > > # Set a timeout, in SEC seconds, before automatically booting the > default > > entry > > # (normally the first entry defined). > > timeout 20 > > > > ### BEGIN AUTOMAGIC KERNELS LIST > > ## lines between the AUTOMAGIC KERNELS LIST markers will be modified > > ## by the debian update-grub script except for the default options below > > > > ## DO NOT UNCOMMENT THEM, Just edit them to your needs > > > > ## ## Start Default Options ## > > ## default kernel options > > ## default kernel options for automagic boot options > > ## If you want special options for specific kernels use kopt_x_y_z > > ## where x.y.z is kernel version. Minor versions can be omitted. > > ## e.g. kopt=root=/dev/hda1 ro > > ## kopt_2_6_8=root=/dev/hdc1 ro > > ## kopt_2_6_8_2_686=root=/dev/hdc2 ro > > # kopt=root=UUID=909f7c32-639a-469d-b34b-b418d2b6a2dc ro > > > > ## default grub root device > > ## e.g. groot=(hd0,0) > > # groot=909f7c32-639a-469d-b34b-b418d2b6a2dc > > > > ## should update-grub create alternative automagic boot options > > ## e.g. alternative=true > > ## alternative=false > > # alternative=true > > > > ## should update-grub lock alternative automagic boot options > > ## e.g. lockalternative=true > > ## lockalternative=false > > # lockalternative=false > > > > ## additional options to use with the default boot option, but not with > > the > > ## alternatives > > ## e.g. defoptions=vga=791 resume=/dev/hda5 > > # defoptions=quiet splash > > > > ## should update-grub lock old automagic boot options > > ## e.g. lockold=false > > ## lockold=true > > # lockold=false > > > > ## Xen hypervisor options to use with the default Xen boot option > > # xenhopt=dom0_max_vcpus=1 dom0_mem=8192 > > > > ## Xen Linux kernel options to use with the default Xen boot option > > # xenkopt=console=tty0 > > > > ## altoption boot targets option > > ## multiple altoptions lines are allowed > > ## e.g. altoptions=(extra menu suffix) extra boot options > > ## altoptions=(recovery) single > > # altoptions=(recovery mode) single > > > > ## controls how many kernels should be put into the menu.lst > > ## only counts the first occurence of a kernel, not the > > ## alternative kernel options > > ## e.g. howmany=all > > ## howmany=7 > > # howmany=all > > > > ## specify if running in Xen domU or have grub detect automatically > > ## update-grub will ignore non-xen kernels when running in domU and vice > > versa > > ## e.g. indomU=detect > > ## indomU=true > > ## indomU=false > > # indomU=detect > > > > ## should update-grub create memtest86 boot option > > ## e.g. memtest86=true > > ## memtest86=false > > # memtest86=true > > > > ## should update-grub adjust the value of the default booted system > > ## can be true or false > > # updatedefaultentry=false > > > > ## should update-grub add savedefault to the default options > > ## can be true or false > > # savedefault=false > > > > ## ## End Default Options ## > > > > title Xen 4.0.0 / Debian GNU/Linux, kernel 2.6.31.13 > > root (hd0,0) > > kernel /boot/xen-4.0.0.gz > > module /boot/vmlinuz-2.6.31.13 root=/dev/sda1 ro > > > > title Xen 3.4.2 / Debian GNU/Linux, kernel 2.6.31.8-xenapr2010 > > root (hd0,0) > > kernel /boot/xen-3.4.2.gz > > module /boot/vmlinuz-2.6.31.8-xenapr2010 root=/dev/sda1 ro > > console=tty0 > > module /boot/initrd.img-2.6.31.8-xenapr2010 > > > > ### END DEBIAN AUTOMAGIC KERNELS LIST > > > > > > Please paste your dom0 grub.conf. > > Are you using memory ballooning? > > > > -- Pasi > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-27 19:38 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
Does it display anything on the console? Do you have a serial console setup?>> YA: No, this machine does not have a dedicated serial console, but Ican>> try and find another machine that does.That should be fairly easy to verify with "top" in dom0.>> YA: Sorry I was unclear. We have already used top to see that it isindeed>> xenstored that takes up a lot of CPU cycles as we increase the numberof domUs.>> That is why we wanted to try out the OCAML version since it is supposedto be better. What happens if you manually start xenstored (before xend)? Does it complain, crash, something else?>> YA: I will test this once we have the other problem worked out (numberof VMs) and report back.>> right now the system automatically starts up xend, so I can make thatmanual and test it. I think this is a very under-tested configuration, so it wouldn''t surprise me if there are problems with it.>> YA: that''s what I figured. I can help with the testing. :)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-28 00:43 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On 27/04/2010 08:41, "Yuvraj Agarwal" <yuvraj@cs.ucsd.edu> wrote:> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, but > as far as we can see we don¹t quite know what might be going causing the > system to crash (no console access anymore and system becomes unresponsive and > needs to be power-cycled). I have pasted only the relevant bits of > information (the last domU that did successfully start and the next one that > failed). It may be the case that all the log messages weren¹t flushed before > the system crashed > > Does anyone know where this limit of 155 domU is coming from and how we can > fix/increase it?Get a serial line on a test box, and capture Xen logging output on it. You can both see if any crash messages come from Xen when the 155th domain is created, and also try the serial debug keys (e.g., try ''h'' to get help to start with) to see whether Xen itself is still alive. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-28 01:02 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
Actually, I did identify the problem (don’t know the fix) at least from the console logs. Its related to running out of nr_irq''s (attached JPG for the console log). -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: Tuesday, April 27, 2010 5:44 PM To: Yuvraj Agarwal; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU On 27/04/2010 08:41, "Yuvraj Agarwal" <yuvraj@cs.ucsd.edu> wrote:> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log,but> as far as we can see we don¹t quite know what might be going causing the > system to crash (no console access anymore and system becomesunresponsive and> needs to be power-cycled). I have pasted only the relevant bits of > information (the last domU that did successfully start and the next onethat> failed). It may be the case that all the log messages weren¹t flushedbefore> the system crashedŠ > > Does anyone know where this limit of 155 domU is coming from and how wecan> fix/increase it?Get a serial line on a test box, and capture Xen logging output on it. You can both see if any crash messages come from Xen when the 155th domain is created, and also try the serial debug keys (e.g., try ''h'' to get help to start with) to see whether Xen itself is still alive. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-28 03:45 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
I think nr_irqs is specifiable on the command line on newer kernels. You may be able to do nr_irqs=65536 as a kernel boot parameter, or something like that, without needing to rebuild the kernel. -- Keir On 28/04/2010 02:02, "Yuvraj Agarwal" <yuvraj@cs.ucsd.edu> wrote:> Actually, I did identify the problem (don’t know the fix) at least from > the console logs. Its related to running out of nr_irq''s (attached JPG > for the console log). > > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: Tuesday, April 27, 2010 5:44 PM > To: Yuvraj Agarwal; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes > on starting 155th domU > > On 27/04/2010 08:41, "Yuvraj Agarwal" <yuvraj@cs.ucsd.edu> wrote: > >> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, > but >> as far as we can see we don¹t quite know what might be going causing the >> system to crash (no console access anymore and system becomes > unresponsive and >> needs to be power-cycled). I have pasted only the relevant bits of >> information (the last domU that did successfully start and the next one > that >> failed). It may be the case that all the log messages weren¹t flushed > before >> the system crashedŠ >> >> Does anyone know where this limit of 155 domU is coming from and how we > can >> fix/increase it? > > Get a serial line on a test box, and capture Xen logging output on it. You > can both see if any crash messages come from Xen when the 155th domain is > created, and also try the serial debug keys (e.g., try ''h'' to get help to > start with) to see whether Xen itself is still alive. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-28 03:53 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
Cc''ing Ian and Jeremy -- one of them should be able to answer definitively on this. I''m *pretty* sure there''s an easy way to bump the irq limit you''re hitting, on the pv_ops kernels, but ''nr_irqs='' on dom0 cmdline clearly isn''t it as it had no effect at all! -- Keir On 28/04/2010 07:47, "John McCullough" <jmccullo@cs.ucsd.edu> wrote:> I did a little testing. > > With no kernel option: > # dmesg | grep -i nr_irqs > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > w/nr_irqs=65536: > # dmesg | grep -i nr_irqs > [ 0.000000] Command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] Kernel command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > tweaking the NR_IRQS macro in the kernel will change the NR_IRQS output, > but unfortunately that doesn''t change nr_irqs and I run into the same > limit (36 domus on a less-beefy dual core machine). > > I did find this: > http://blogs.sun.com/fvdl/entry/a_million_vms > which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops > kernel. > > Watching /proc/interrupts, the domain irqs seem to be getting allocated > from 248 downward until they hit some other limit: > ... > 64: 59104 xen-pirq-ioapic-level ioc0 > 89: 1 xen-dyn-event evtchn:xenconsoled > 90: 1 xen-dyn-event evtchn:xenstored > 91: 6 xen-dyn-event vif36.0 > 92: 140 xen-dyn-event blkif-backend > 93: 97 xen-dyn-event evtchn:xenconsoled > 94: 139 xen-dyn-event evtchn:xenstored > 95: 7 xen-dyn-event vif35.0 > 96: 301 xen-dyn-event blkif-backend > 97: 261 xen-dyn-event evtchn:xenconsoled > 98: 145 xen-dyn-event evtchn:xenstored > 99: 7 xen-dyn-event vif34.0 > ... > Perhaps the xen irqs are getting allocated out of the nr_irqs pool, > while they could be allocated from the NR_IRQS pool? > > -John > > > > > On 04/27/2010 08:45 PM, Keir Fraser wrote: >> I think nr_irqs is specifiable on the command line on newer kernels. You may >> be able to do nr_irqs=65536 as a kernel boot parameter, or something like >> that, without needing to rebuild the kernel. >> >> -- Keir >> >> On 28/04/2010 02:02, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >> >> >>> Actually, I did identify the problem (don’t know the fix) at least from >>> the console logs. Its related to running out of nr_irq''s (attached JPG >>> for the console log). >>> >>> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Tuesday, April 27, 2010 5:44 PM >>> To: Yuvraj Agarwal; xen-devel@lists.xensource.com >>> Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes >>> on starting 155th domU >>> >>> On 27/04/2010 08:41, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >>> >>> >>>> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, >>>> >>> but >>> >>>> as far as we can see we don¹t quite know what might be going causing the >>>> system to crash (no console access anymore and system becomes >>>> >>> unresponsive and >>> >>>> needs to be power-cycled). I have pasted only the relevant bits of >>>> information (the last domU that did successfully start and the next one >>>> >>> that >>> >>>> failed). It may be the case that all the log messages weren¹t flushed >>>> >>> before >>> >>>> the system crashedŠ >>>> >>>> Does anyone know where this limit of 155 domU is coming from and how we >>>> >>> can >>> >>>> fix/increase it? >>>> >>> Get a serial line on a test box, and capture Xen logging output on it. You >>> can both see if any crash messages come from Xen when the 155th domain is >>> created, and also try the serial debug keys (e.g., try ''h'' to get help to >>> start with) to see whether Xen itself is still alive. >>> >>> -- Keir >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John McCullough
2010-Apr-28 06:47 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
I did a little testing. With no kernel option: # dmesg | grep -i nr_irqs [ 0.000000] nr_irqs_gsi: 88 [ 0.000000] NR_IRQS:4352 nr_irqs:256 w/nr_irqs=65536: # dmesg | grep -i nr_irqs [ 0.000000] Command line: root=/dev/sda1 ro quiet console=hvc0 nr_irqs=65536 [ 0.000000] nr_irqs_gsi: 88 [ 0.000000] Kernel command line: root=/dev/sda1 ro quiet console=hvc0 nr_irqs=65536 [ 0.000000] NR_IRQS:4352 nr_irqs:256 tweaking the NR_IRQS macro in the kernel will change the NR_IRQS output, but unfortunately that doesn''t change nr_irqs and I run into the same limit (36 domus on a less-beefy dual core machine). I did find this: http://blogs.sun.com/fvdl/entry/a_million_vms which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops kernel. Watching /proc/interrupts, the domain irqs seem to be getting allocated from 248 downward until they hit some other limit: ... 64: 59104 xen-pirq-ioapic-level ioc0 89: 1 xen-dyn-event evtchn:xenconsoled 90: 1 xen-dyn-event evtchn:xenstored 91: 6 xen-dyn-event vif36.0 92: 140 xen-dyn-event blkif-backend 93: 97 xen-dyn-event evtchn:xenconsoled 94: 139 xen-dyn-event evtchn:xenstored 95: 7 xen-dyn-event vif35.0 96: 301 xen-dyn-event blkif-backend 97: 261 xen-dyn-event evtchn:xenconsoled 98: 145 xen-dyn-event evtchn:xenstored 99: 7 xen-dyn-event vif34.0 ... Perhaps the xen irqs are getting allocated out of the nr_irqs pool, while they could be allocated from the NR_IRQS pool? -John On 04/27/2010 08:45 PM, Keir Fraser wrote:> I think nr_irqs is specifiable on the command line on newer kernels. You may > be able to do nr_irqs=65536 as a kernel boot parameter, or something like > that, without needing to rebuild the kernel. > > -- Keir > > On 28/04/2010 02:02, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: > > >> Actually, I did identify the problem (don’t know the fix) at least from >> the console logs. Its related to running out of nr_irq''s (attached JPG >> for the console log). >> >> >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >> Sent: Tuesday, April 27, 2010 5:44 PM >> To: Yuvraj Agarwal; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes >> on starting 155th domU >> >> On 27/04/2010 08:41, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >> >> >>> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, >>> >> but >> >>> as far as we can see we don¹t quite know what might be going causing the >>> system to crash (no console access anymore and system becomes >>> >> unresponsive and >> >>> needs to be power-cycled). I have pasted only the relevant bits of >>> information (the last domU that did successfully start and the next one >>> >> that >> >>> failed). It may be the case that all the log messages weren¹t flushed >>> >> before >> >>> the system crashedŠ >>> >>> Does anyone know where this limit of 155 domU is coming from and how we >>> >> can >> >>> fix/increase it? >>> >> Get a serial line on a test box, and capture Xen logging output on it. You >> can both see if any crash messages come from Xen when the 155th domain is >> created, and also try the serial debug keys (e.g., try ''h'' to get help to >> start with) to see whether Xen itself is still alive. >> >> -- Keir >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Apr-28 14:04 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Tue, Apr 27, 2010 at 11:47:30PM -0700, John McCullough wrote:> I did a little testing. > > With no kernel option: > # dmesg | grep -i nr_irqs > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > w/nr_irqs=65536: > # dmesg | grep -i nr_irqs > [ 0.000000] Command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] Kernel command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > tweaking the NR_IRQS macro in the kernel will change the NR_IRQS output, > but unfortunately that doesn''t change nr_irqs and I run into the same > limit (36 domus on a less-beefy dual core machine).If you have CONFIG_SPARSE_IRQ defined in your .config, it gets overwritten by some code that figures out how many IRQs you need based on your CPU count. So can you change NR_VECTORS in arch/x86/include/asm/irq_vectors.h to a higher value and see what happens?> > I did find this: > http://blogs.sun.com/fvdl/entry/a_million_vms > which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops > kernel. > > Watching /proc/interrupts, the domain irqs seem to be getting allocated > from 248 downward until they hit some other limit:Yeah. They hit the nr_irqs_gsi and don''t go below that.> ... > 64: 59104 xen-pirq-ioapic-level ioc0 > 89: 1 xen-dyn-event evtchn:xenconsoled > 90: 1 xen-dyn-event evtchn:xenstored > 91: 6 xen-dyn-event vif36.0 > 92: 140 xen-dyn-event blkif-backend > 93: 97 xen-dyn-event evtchn:xenconsoled > 94: 139 xen-dyn-event evtchn:xenstored > 95: 7 xen-dyn-event vif35.0 > 96: 301 xen-dyn-event blkif-backend > 97: 261 xen-dyn-event evtchn:xenconsoled > 98: 145 xen-dyn-event evtchn:xenstored > 99: 7 xen-dyn-event vif34.0 > ... > Perhaps the xen irqs are getting allocated out of the nr_irqs pool, > while they could be allocated from the NR_IRQS pool? > > -John > > > > > On 04/27/2010 08:45 PM, Keir Fraser wrote: >> I think nr_irqs is specifiable on the command line on newer kernels. You may >> be able to do nr_irqs=65536 as a kernel boot parameter, or something like >> that, without needing to rebuild the kernel. >> >> -- Keir >> >> On 28/04/2010 02:02, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >> >> >>> Actually, I did identify the problem (don’t know the fix) at least from >>> the console logs. Its related to running out of nr_irq''s (attached JPG >>> for the console log). >>> >>> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Tuesday, April 27, 2010 5:44 PM >>> To: Yuvraj Agarwal; xen-devel@lists.xensource.com >>> Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes >>> on starting 155th domU >>> >>> On 27/04/2010 08:41, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >>> >>> >>>> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, >>>> >>> but >>> >>>> as far as we can see we don¹t quite know what might be going causing the >>>> system to crash (no console access anymore and system becomes >>>> >>> unresponsive and >>> >>>> needs to be power-cycled). I have pasted only the relevant bits of >>>> information (the last domU that did successfully start and the next one >>>> >>> that >>> >>>> failed). It may be the case that all the log messages weren¹t flushed >>>> >>> before >>> >>>> the system crashedŠ >>>> >>>> Does anyone know where this limit of 155 domU is coming from and how we >>>> >>> can >>> >>>> fix/increase it? >>>> >>> Get a serial line on a test box, and capture Xen logging output on it. You >>> can both see if any crash messages come from Xen when the 155th domain is >>> created, and also try the serial debug keys (e.g., try ''h'' to get help to >>> start with) to see whether Xen itself is still alive. >>> >>> -- Keir >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Apr-28 16:57 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Wed, 2010-04-28 at 15:04 +0100, Konrad Rzeszutek Wilk wrote:> On Tue, Apr 27, 2010 at 11:47:30PM -0700, John McCullough wrote: > > I did a little testing. > > > > With no kernel option: > > # dmesg | grep -i nr_irqs > > [ 0.000000] nr_irqs_gsi: 88 > > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > > > w/nr_irqs=65536: > > # dmesg | grep -i nr_irqs > > [ 0.000000] Command line: root=/dev/sda1 ro quiet console=hvc0 > > nr_irqs=65536 > > [ 0.000000] nr_irqs_gsi: 88 > > [ 0.000000] Kernel command line: root=/dev/sda1 ro quiet console=hvc0 > > nr_irqs=65536 > > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > > > tweaking the NR_IRQS macro in the kernel will change the NR_IRQS output, > > but unfortunately that doesn''t change nr_irqs and I run into the same > > limit (36 domus on a less-beefy dual core machine). > > If you have CONFIG_SPARSE_IRQ defined in your .config, it gets > overwritten by some code that figures out how many IRQs you need based > on your CPU count. > > So can you change NR_VECTORS in arch/x86/include/asm/irq_vectors.h to a > higher value and see what happens?Jeremy applied a patch of mine which added some extra space for dynamic IRQs at the start of march: commit 6d4a9168207ade237098a401270959ecc0bdd1e9 Author: Ian Campbell <ian.campbell@citrix.com> Date: Mon Mar 1 11:21:15 2010 +0000 xen: allow some overhead in IRQ space for dynamic IRQs If you have this patch then you can edit NR_DYNAMIC_IRQS in arch/x86/include/asm/irq_vectors.h to increase the number of extra IRQs. Ian.> > > > > I did find this: > > http://blogs.sun.com/fvdl/entry/a_million_vms > > which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops > > kernel. > > > > Watching /proc/interrupts, the domain irqs seem to be getting allocated > > from 248 downward until they hit some other limit: > > Yeah. They hit the nr_irqs_gsi and don''t go below that. > > > ... > > 64: 59104 xen-pirq-ioapic-level ioc0 > > 89: 1 xen-dyn-event evtchn:xenconsoled > > 90: 1 xen-dyn-event evtchn:xenstored > > 91: 6 xen-dyn-event vif36.0 > > 92: 140 xen-dyn-event blkif-backend > > 93: 97 xen-dyn-event evtchn:xenconsoled > > 94: 139 xen-dyn-event evtchn:xenstored > > 95: 7 xen-dyn-event vif35.0 > > 96: 301 xen-dyn-event blkif-backend > > 97: 261 xen-dyn-event evtchn:xenconsoled > > 98: 145 xen-dyn-event evtchn:xenstored > > 99: 7 xen-dyn-event vif34.0 > > ... > > Perhaps the xen irqs are getting allocated out of the nr_irqs pool, > > while they could be allocated from the NR_IRQS pool? > > > > -John > > > > > > > > > > On 04/27/2010 08:45 PM, Keir Fraser wrote: > >> I think nr_irqs is specifiable on the command line on newer kernels. You may > >> be able to do nr_irqs=65536 as a kernel boot parameter, or something like > >> that, without needing to rebuild the kernel. > >> > >> -- Keir > >> > >> On 28/04/2010 02:02, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: > >> > >> > >>> Actually, I did identify the problem (don’t know the fix) at least from > >>> the console logs. Its related to running out of nr_irq''s (attached JPG > >>> for the console log). > >>> > >>> > >>> -----Original Message----- > >>> From: xen-devel-bounces@lists.xensource.com > >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > >>> Sent: Tuesday, April 27, 2010 5:44 PM > >>> To: Yuvraj Agarwal; xen-devel@lists.xensource.com > >>> Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes > >>> on starting 155th domU > >>> > >>> On 27/04/2010 08:41, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: > >>> > >>> > >>>> Attached is the output of /var/log/daemon.log and /var/log/xen/xend.log, > >>>> > >>> but > >>> > >>>> as far as we can see we don¹t quite know what might be going causing the > >>>> system to crash (no console access anymore and system becomes > >>>> > >>> unresponsive and > >>> > >>>> needs to be power-cycled). I have pasted only the relevant bits of > >>>> information (the last domU that did successfully start and the next one > >>>> > >>> that > >>> > >>>> failed). It may be the case that all the log messages weren¹t flushed > >>>> > >>> before > >>> > >>>> the system crashedŠ > >>>> > >>>> Does anyone know where this limit of 155 domU is coming from and how we > >>>> > >>> can > >>> > >>>> fix/increase it? > >>>> > >>> Get a serial line on a test box, and capture Xen logging output on it. You > >>> can both see if any crash messages come from Xen when the 155th domain is > >>> created, and also try the serial debug keys (e.g., try ''h'' to get help to > >>> start with) to see whether Xen itself is still alive. > >>> > >>> -- Keir > >>> > >>> > >>> > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@lists.xensource.com > >>> http://lists.xensource.com/xen-devel > >>> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > >> > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Apr-28 18:13 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On 04/27/2010 11:47 PM, John McCullough wrote:> I did a little testing. > > With no kernel option: > # dmesg | grep -i nr_irqs > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > w/nr_irqs=65536: > # dmesg | grep -i nr_irqs > [ 0.000000] Command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] Kernel command line: root=/dev/sda1 ro quiet > console=hvc0 nr_irqs=65536 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > tweaking the NR_IRQS macro in the kernel will change the NR_IRQS > output, but unfortunately that doesn''t change nr_irqs and I run into > the same limit (36 domus on a less-beefy dual core machine).Yes, NR_IRQS is the hard limit (for any statically defined irq arrays, which are deprecated now), but nr_irqs is the amount it decides to actually allocate for dynamic irq arrays, and so represents the actual runtime limit. nr_irqs is computed in arch_probe_nr_irqs(), and its a function of the number of cpus, with a bump to deal with dynamically allocated MSI interrupts. I should probably add something to specifically add more if we''re running under Xen, at least as a workaround (ultimately the plan is to make all irqs completely dynamically allocated so there is no hard limit).> > I did find this: > http://blogs.sun.com/fvdl/entry/a_million_vms > which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops > kernel.I''m pretty sure that''s referring to Solaris dom0, so the fact that there''s a similarly named symbol is coincidence. (But the root problem is the same.) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Apr-28 18:18 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On 04/28/2010 09:57 AM, Ian Campbell wrote:> Jeremy applied a patch of mine which added some extra space for dynamic > IRQs at the start of march: > commit 6d4a9168207ade237098a401270959ecc0bdd1e9 > Author: Ian Campbell <ian.campbell@citrix.com> > Date: Mon Mar 1 11:21:15 2010 +0000 > > xen: allow some overhead in IRQ space for dynamic IRQs > > If you have this patch then you can edit NR_DYNAMIC_IRQS in > arch/x86/include/asm/irq_vectors.h to increase the number of extra IRQs. >That''s only present in 2.6.32, not .31. But it would be easy to backport. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-28 22:51 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
I tried making the change in linux-2.6-pvops.git/arch/x86/include/asm/irq_vectors.h It was: #define NR_VECTORS 256 I changed it to #define NR_VECTORS 1024 I still get the same number of nr_irqs (dmesg | grep -i nr_irq) before and after the change. [ 0.000000] nr_irqs_gsi: 48 [ 0.500076] NR_IRQS:5120 nr_irqs:944 Also, as earlier it crashes on the same number of domU (154). I didn’t mention earlier, this a dual core Nehalem machine -- 2 (sockets) * 4 cores per CPU * 2 (hyperthreading) --Yuvraj -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Wednesday, April 28, 2010 7:05 AM To: John McCullough Cc: Keir Fraser; xen-devel@lists.xensource.com; Yuvraj Agarwal Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU On Tue, Apr 27, 2010 at 11:47:30PM -0700, John McCullough wrote:> I did a little testing. > > With no kernel option: > # dmesg | grep -i nr_irqs > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > w/nr_irqs=65536: > # dmesg | grep -i nr_irqs > [ 0.000000] Command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] nr_irqs_gsi: 88 > [ 0.000000] Kernel command line: root=/dev/sda1 ro quiet console=hvc0 > nr_irqs=65536 > [ 0.000000] NR_IRQS:4352 nr_irqs:256 > > tweaking the NR_IRQS macro in the kernel will change the NR_IRQS output, > but unfortunately that doesn''t change nr_irqs and I run into the same > limit (36 domus on a less-beefy dual core machine).If you have CONFIG_SPARSE_IRQ defined in your .config, it gets overwritten by some code that figures out how many IRQs you need based on your CPU count. So can you change NR_VECTORS in arch/x86/include/asm/irq_vectors.h to a higher value and see what happens?> > I did find this: > http://blogs.sun.com/fvdl/entry/a_million_vms > which references NR_DYNIRQS, which is in 2.6.18, but not in the pvops > kernel. > > Watching /proc/interrupts, the domain irqs seem to be getting allocated > from 248 downward until they hit some other limit:Yeah. They hit the nr_irqs_gsi and don''t go below that.> ... > 64: 59104 xen-pirq-ioapic-level ioc0 > 89: 1 xen-dyn-event evtchn:xenconsoled > 90: 1 xen-dyn-event evtchn:xenstored > 91: 6 xen-dyn-event vif36.0 > 92: 140 xen-dyn-event blkif-backend > 93: 97 xen-dyn-event evtchn:xenconsoled > 94: 139 xen-dyn-event evtchn:xenstored > 95: 7 xen-dyn-event vif35.0 > 96: 301 xen-dyn-event blkif-backend > 97: 261 xen-dyn-event evtchn:xenconsoled > 98: 145 xen-dyn-event evtchn:xenstored > 99: 7 xen-dyn-event vif34.0 > ... > Perhaps the xen irqs are getting allocated out of the nr_irqs pool, > while they could be allocated from the NR_IRQS pool? > > -John > > > > > On 04/27/2010 08:45 PM, Keir Fraser wrote: >> I think nr_irqs is specifiable on the command line on newer kernels. You >> may >> be able to do nr_irqs=65536 as a kernel boot parameter, or something like >> that, without needing to rebuild the kernel. >> >> -- Keir >> >> On 28/04/2010 02:02, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >> >> >>> Actually, I did identify the problem (don’t know the fix) at least from >>> the console logs. Its related to running out of nr_irq''s (attached JPG >>> for the console log). >>> >>> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Tuesday, April 27, 2010 5:44 PM >>> To: Yuvraj Agarwal; xen-devel@lists.xensource.com >>> Subject: Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system >>> crashes >>> on starting 155th domU >>> >>> On 27/04/2010 08:41, "Yuvraj Agarwal"<yuvraj@cs.ucsd.edu> wrote: >>> >>> >>>> Attached is the output of /var/log/daemon.log and >>>> /var/log/xen/xend.log, >>>> >>> but >>> >>>> as far as we can see we don¹t quite know what might be going causing >>>> the >>>> system to crash (no console access anymore and system becomes >>>> >>> unresponsive and >>> >>>> needs to be power-cycled). I have pasted only the relevant bits of >>>> information (the last domU that did successfully start and the next one >>>> >>> that >>> >>>> failed). It may be the case that all the log messages weren¹t flushed >>>> >>> before >>> >>>> the system crashedŠ >>>> >>>> Does anyone know where this limit of 155 domU is coming from and how we >>>> >>> can >>> >>>> fix/increase it? >>>> >>> Get a serial line on a test box, and capture Xen logging output on it. >>> You >>> can both see if any crash messages come from Xen when the 155th domain >>> is >>> created, and also try the serial debug keys (e.g., try ''h'' to get help >>> to >>> start with) to see whether Xen itself is still alive. >>> >>> -- Keir >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Apr-29 14:56 UTC
Re: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
On Wed, Apr 28, 2010 at 03:51:19PM -0700, Yuvraj Agarwal wrote:> I tried making the change in > linux-2.6-pvops.git/arch/x86/include/asm/irq_vectors.h > > It was: > #define NR_VECTORS 256 > I changed it to > #define NR_VECTORS 1024 > > I still get the same number of nr_irqs (dmesg | grep -i nr_irq) before and > after the change. > > [ 0.000000] nr_irqs_gsi: 48 > [ 0.500076] NR_IRQS:5120 nr_irqs:944That looks to be different from the previous bootup: [ 0.000000] NR_IRQS:4352 nr_irqs:256 ?> > Also, as earlier it crashes on the same number of domU (154). I didn’t > mention earlier, this a dual core Nehalem machine -- 2 (sockets) * 4 cores > per CPU * 2 (hyperthreading)Lots of logical CPUs, weird that your nr_irqs initially was that much lower. Anyhow, you mentioned that you narrowed it down to not being enough IRQs - how did you find that out? Was there an kernel message when you started the 155th guest? Oh, also you say that the /proc/interrupts showed the number descending from 255 down to 89. With this it should have started at 944 and gone down to 49? Which roughly means 175 guests? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuvraj Agarwal
2010-Apr-30 03:12 UTC
RE: [Xen-devel] XEN 4.0 + 2.6.31.13 pvops kernel : system crashes on starting 155th domU
> [ 0.000000] nr_irqs_gsi: 48 > [ 0.500076] NR_IRQS:5120 nr_irqs:944That looks to be different from the previous bootup: [ 0.000000] NR_IRQS:4352 nr_irqs:256>> YA: I think you may have mixed up the post by John. He has a different >> dual core machine >> and he is trying the same edits as me. He was getting nr_irqs as 256 >> while I was getting 944. >> [ 0.000000] nr_irqs_gsi: 48 >> [ 0.500270] NR_IRQS:5120 nr_irqs:944 >> Also it did not matter if I changed the VECTORS to 1024 instead of the >> original 256.> > Also, as earlier it crashes on the same number of domU (154). I didn’t > mention earlier, this a dual core Nehalem machine -- 2 (sockets) * 4 > cores > per CPU * 2 (hyperthreading)Lots of logical CPUs, weird that your nr_irqs initially was that much lower.>> No, as mentioned above, my machine has the larger number of CPUs while >> John McCullough >> machine has lesser number of CPUs.Anyhow, you mentioned that you narrowed it down to not being enough IRQs - how did you find that out? Was there an kernel message when you started the 155th guest?>>> yes, as I mentioned in an email, right after I started up the 154th >>> guest there was >>> a log message printed out as below: >>> Kernel Panic - not syncing: No available IRQs to bind to: increase >>> nr_irqs!Oh, also you say that the /proc/interrupts showed the number descending from 255 down to 89. With this it should have started at 944 and gone down to 49? Which roughly means 175 guests?>> Yes, the interrupts started counting down from 944 as I increased the >> number of domUs; >> I am attaching the /proc/interrupts log file for three cases (no domU, >> after 1 domU, >> and then after 150 domU). The machine crashed after the 154th domU was >> started.Thank you all for helping out to debug this. As I mentioned at the XEN summit, once I do fix these issues I’d really like to create wiki stub for any one else trying to do this. It looks like in the original 2.6.18 kernel people did indeed try this, but I guess some of the edits/changes didn’t make it to the 2.6.31 pvops branch.... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel