Eric Tessler
2007-Jul-13 01:19 UTC
[Xen-users] XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
We have found a critical problem with the XEN 3.1 release (for those who are running 15-20 VMs on a single server). We are using the official XEN 3.1 release on a rackable server (Dual-Core AMD Opteron, 8GB RAM). The problem we are seeing is that intermittently vifs fail to work properly in VMs after we create around 15-17 VMs on our server (all running at the same time, created one by one). Sometimes we can create up to 40 VMs w/o a problem, other times vifs begin to fail on the 15th-17th VM (each VM has 4 vifs, 1 block device, 64MB memory), we see the following error message in the VM (domU) on its console: "XENBUS: Timeout connecting to device: device/vif/3 (state 6)" At the same time in dom0, we see the following error message in /var/log/messages: "vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11" (the error message above means that netif_map failed for some reason in XenBus) If we repeat this same exact test using XEN 3.0.4, we never have any problems. All vifs in all VMs work correctly. This problem must be specific to XEN 3.1. I have searched the web and this user list and I have not been able to find out if anyone else has observed this same problem or if a fix for this problem already exists (if there is a fix, please post info about it here). If there is no fix for this yet, I will be looking into this bug to solve it, any pointers on where to concentrate my debugging efforts would be appreciated (I don''t know the XEN code that well). One other strange note about this issue: If we leave the failed VM alone, we actually can create another VM w/o any problem (vifs come up correctly). Afterwards, we can then destroy and create the VM that used to fail and now it boots w/o any problems (its vif comes up correctly). This smells like a race condition bug in the XEN code (this proves that it is not due to low resources or something like that). Any help on this issue would be greatly appreciated, Thank you, Eric --------------------------------- Get the free Yahoo! toolbar and rest assured with the added security of spyware protection. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2007-Jul-13 06:29 UTC
[Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Can you try 3.0.4 domU kernel agaianst 3.1 dom0 kernel, and vice versa? Also, turn on debug tracing in Xen (boot options loglvl=all guest_loglvl=all¹) and see what appears at the end of xm dmesg¹. -- Keir On 13/7/07 02:19, "Eric Tessler" <maiden1134@yahoo.com> wrote:> At the same time in dom0, we see the following error message in > /var/log/messages: > > "vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11" > > (the error message above means that netif_map failed for some reason in > XenBus) > > > > If we repeat this same exact test using XEN 3.0.4, we never have any problems. > All vifs in all VMs work correctly. This problem must be specific to XEN 3.1._______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Eric Tessler
2007-Jul-14 02:32 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
I was able to get some debugging in on this problem and here is what I have found. I re-ran my test with the XEN debug options enabled as Keir suggested (I also put some debug output in netif_map and map_frontend_pages to find out exactly what was failing). The 16th VM''s vif timed out again and here is what I saw in the dmesg log: (XEN) grant_table.c:557:d1 Expanding dom (1) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d2 Expanding dom (2) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d3 Expanding dom (3) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d4 Expanding dom (4) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d5 Expanding dom (5) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d6 Expanding dom (6) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d7 Expanding dom (7) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d8 Expanding dom (8) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d9 Expanding dom (9) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d10 Expanding dom (10) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d11 Expanding dom (11) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d12 Expanding dom (12) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d13 Expanding dom (13) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d14 Expanding dom (14) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d15 Expanding dom (15) grant table from (4) to (5) frames. (XEN) grant_table.c:557:d16 Expanding dom (16) grant table from (4) to (5) frames. (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000 You can see from above that the first 15 VMs are OK, and the 16th VM fails with the last error message in "mm.c" as shown above. I attempted to trace upwards what exactly was failing so I enabled debug output in "linux-2.6-xen-sparse/drivers/xen/netback/interface.c" (this is where netif_map() is located). I then observed the following output in /var/log/messages when the 16th VMs vif timed out: (map_frontend_pages:227) Gnttab failure mapping rx_ring_ref! (netif_map:274) map frontend pages failed [I added this debug output] vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11 The error message from mm.c displayed in the dmesg log is coming from the function "create_grant_va_mapping" (a call to guest_map_l1e() is failing with NULL). In summary, it looks like the mapping of the RX shared memory ring is failing (the TX mapping is passing, it always fails on the mapping of the RX ring). Another interesting note is that the address dumped in the dmesg log is always the same: d1400000 (I saw the failure about 10 times today and the address never changes). Also, by suggestion of Keir, I tried the XEN 3.0.4 kernel in my 16th VM (2.6.16.33), it failed the same way. The only difference is that instead of extending the grant table from 4 to 5 frames, it was extended from 4 to 16 frames: (XEN) grant_table.c:557:d18 Expanding dom (18) grant table from (4) to (16) frames. (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000 I believe the following stack trace represents the trace of the failure (starting from within XenBus, traced by hand): connect_rings linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c netif_map linux-2.6-xen-sparse/drivers/xen/netback/interface.c map_frontend_pages linux-2.6-xen-sparse/drivers/xen/netback/interface.c __gnttab_map_grant_ref (hypercall) xen/common/grant_table.c create_grant_host_mapping xen/arch/x86/mm.c create_grant_va_mapping xen/arch/x86/mm.c guest_map_l1e xen/arch/x86/mm.c (this is the function that is ultimately failing) Any clue as to what is causing this failure or how to fix it? Is there any other debug info I can provide here that would be of any help in resolving this issue? I have some free time tomorrow to debug this issue, but need some direction; this is in an area of XEN I don''t understand very well. I am also thinking about downloading the xen 3.1 unstable release and trying that one to see if the problem also exists there. Thanks, Eric Keir Fraser <keir@xensource.com> wrote: Can you try 3.0.4 domU kernel agaianst 3.1 dom0 kernel, and vice versa? Also, turn on debug tracing in Xen (boot options loglvl=all guest_loglvl=all) and see what appears at the end of xm dmesg. -- Keir On 13/7/07 02:19, "Eric Tessler" <maiden1134@yahoo.com> wrote: At the same time in dom0, we see the following error message in /var/log/messages: "vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11" (the error message above means that netif_map failed for some reason in XenBus) If we repeat this same exact test using XEN 3.0.4, we never have any problems. All vifs in all VMs work correctly. This problem must be specific to XEN 3.1. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users --------------------------------- Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. --0-827924448-1184380378=:98145 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit <div>I was able to get some debugging in on this problem and here is what I have found.</div> <div> </div> <div>I re-ran my test with the XEN debug options enabled as Keir suggested (I also put some debug output in netif_map and map_frontend_pages to find out exactly what was failing). The 16th VM''s vif timed out again and here is what I saw in the dmesg log:</div> <div> </div> <div> (XEN) grant_table.c:557:d1 Expanding dom (1) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d2 Expanding dom (2) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d3 Expanding dom (3) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d4 Expanding dom (4) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d5 Expanding dom (5) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d6 Expanding dom (6) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d7 Expanding dom (7) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d8 Expanding dom (8) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d9 Expanding dom (9) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d10 Expanding dom (10) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d11 Expanding dom (11) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d12 Expanding dom (12) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d13 Expanding dom (13) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d14 Expanding dom (14) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d15 Expanding dom (15) grant table from (4) to (5) frames.<BR> (XEN) grant_table.c:557:d16 Expanding dom (16) grant table from (4) to (5) frames.<BR> (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000<BR></div> <div>You can see from above that the first 15 VMs are OK, and the 16th VM fails with the last error message in "mm.c" as shown above. I attempted to trace upwards what exactly was failing so I enabled debug output in "linux-2.6-xen-sparse/drivers/xen/netback/interface.c" (this is where netif_map() is located). I then observed the following output in /var/log/messages when the 16th VMs vif timed out:</div> <div> (map_frontend_pages:227) Gnttab failure mapping rx_ring_ref!</div> <div> (netif_map:274) map frontend pages failed [I added this debug output]</div> <div> vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11</div> <div><BR>The error message from mm.c displayed in the dmesg log is coming from the function "create_grant_va_mapping" (a call to guest_map_l1e() is failing with NULL).</div> <div> </div> <div>In summary, it looks like the mapping of the RX shared memory ring is failing (the TX mapping is passing, it always fails on the mapping of the RX ring). Another interesting note is that the address dumped in the dmesg log is always the same: d1400000 (I saw the failure about 10 times today and the address never changes).</div> <div> </div> <div>Also, by suggestion of Keir, I tried the XEN 3.0.4 kernel in my 16th VM (2.6.16.33), it failed the same way. The only difference is that instead of extending the grant table from 4 to 5 frames, it was extended from 4 to 16 frames:</div> <div> (XEN) grant_table.c:557:d18 Expanding dom (18) grant table from (4) to (16) frames.<BR> (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000<BR></div> <div>I believe the following stack trace represents the trace of the failure (starting from within XenBus, traced by hand):</div> <div>connect_rings linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c<BR>netif_map linux-2.6-xen-sparse/drivers/xen/netback/interface.c<BR>map_frontend_pages linux-2.6-xen-sparse/drivers/xen/netback/interface.c<BR>__gnttab_map_grant_ref (hypercall) xen/common/grant_table.c<BR>create_grant_host_mapping xen/arch/x86/mm.c<BR>create_grant_va_mapping xen/arch/x86/mm.c<BR> guest_map_l1e xen/arch/x86/mm.c<BR> (this is the function that is ultimately failing)</div> <div> </div> <div>Any clue as to what is causing this failure or how to fix it? Is there any other debug info I can provide here that would be of any help in resolving this issue? I have some free time tomorrow to debug this issue, but need some direction; this is in an area of XEN I don''t understand very well.</div> <div> </div> <div>I am also thinking about downloading the xen 3.1 unstable release and trying that one to see if the problem also exists there.</div> <div> </div> <div>Thanks,</div> <div> </div> <div>Eric</div> <div><BR><B><I>Keir Fraser <keir@xensource.com></I></B> wrote:</div> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"><FONT face="Verdana, Helvetica, Arial"><SPAN style="FONT-SIZE: 12px">Can you try 3.0.4 domU kernel agaianst 3.1 dom0 kernel, and vice versa? Also, turn on debug tracing in Xen (boot options loglvl=all guest_loglvl=all) and see what appears at the end of xm dmesg.<BR><BR> -- Keir<BR><BR><BR>On 13/7/07 02:19, "Eric Tessler" <maiden1134@yahoo.com> wrote:<BR><BR></SPAN></FONT> <BLOCKQUOTE><FONT face="Verdana, Helvetica, Arial"><SPAN style="FONT-SIZE: 12px">At the same time in dom0, we see the following error message in /var/log/messages:<BR> <BR>"vif vif-16-3: 1 mapping shared-frames 2310/2311 port 11"<BR> <BR>(the error message above means that netif_map failed for some reason in XenBus)<BR> <BR> <BR> <BR>If we repeat this same exact test using XEN 3.0.4, we never have any problems. All vifs in all VMs work correctly. This problem must be specific to XEN 3.1.<BR></SPAN></FONT></BLOCKQUOTE><FONT face="Verdana, Helvetica, Arial"><SPAN style="FONT-SIZE: 12px"><BR></SPAN></FONT>_______________________________________________<BR>Xen-users mailing list<BR>Xen-users@lists.xensource.com<BR>http://lists.xensource.com/xen-users</BLOCKQUOTE><BR><p>  <hr size=1>Be a better Globetrotter. <a href="http://us.rd.yahoo.com/evt=48254/*http://answers.yahoo.com/dir/_ylc=X3oDMTI5MGx2aThyBF9TAzIxMTU1MDAzNTIEX3MDMzk2NTQ1MTAzBHNlYwNCQUJwaWxsYXJfTklfMzYwBHNsawNQcm9kdWN0X3F1ZXN0aW9uX3BhZ2U-?link=list&sid=396545469">Get better travel answers </a>from someone who knows.<br>Yahoo! Answers - Check it out. --0-827924448-1184380378=:98145-- --===============2133859501=Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users --===============2133859501==--
Keir Fraser
2007-Jul-14 06:43 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
What dom0 kernel image are you running? It looks like vmalloc_sync_all(), called from alloc_vm_area() has not caused the pte that will map the rx ring to be made present in the currently-running page tables. The code looks okay on inspection though. -- Keir On 14/7/07 03:32, "Eric Tessler" <maiden1134@yahoo.com> wrote:> Also, by suggestion of Keir, I tried the XEN 3.0.4 kernel in my 16th VM > (2.6.16.33), it failed the same way. The only difference is that instead of > extending the grant table from 4 to 5 frames, it was extended from 4 to 16 > frames: > > (XEN) grant_table.c:557:d18 Expanding dom (18) grant table from (4) to (16) > frames. > (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000 > > I believe the following stack trace represents the trace of the failure > (starting from within XenBus, traced by hand): > > connect_rings > linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c > netif_map > linux-2.6-xen-sparse/drivers/xen/netback/interface.c > map_frontend_pages > linux-2.6-xen-sparse/drivers/xen/netback/interface.c > __gnttab_map_grant_ref (hypercall) xen/common/grant_table.c > create_grant_host_mapping xen/arch/x86/mm.c > create_grant_va_mapping xen/arch/x86/mm.c > guest_map_l1e xen/arch/x86/mm.c > (this is the function that is ultimately failing) > > > > Any clue as to what is causing this failure or how to fix it? Is there any > other debug info I can provide here that would be of any help in resolving > this issue? I have some free time tomorrow to debug this issue, but need some > direction; this is in an area of XEN I don''t understand very well._______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2007-Jul-14 09:01 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Now fixed in the staging tree. The patch (for your dom0 kernel) is also attached to this email. Thanks for your help in tracking this one down! -- Keir On 14/7/07 07:43, "Keir Fraser" <keir@xensource.com> wrote:> > What dom0 kernel image are you running? It looks like vmalloc_sync_all(), > called from alloc_vm_area() has not caused the pte that will map the rx ring > to be made present in the currently-running page tables. The code looks okay > on inspection though. > > -- Keir > > On 14/7/07 03:32, "Eric Tessler" <maiden1134@yahoo.com> wrote: > >> Also, by suggestion of Keir, I tried the XEN 3.0.4 kernel in my 16th VM >> (2.6.16.33), it failed the same way. The only difference is that instead of >> extending the grant table from 4 to 5 frames, it was extended from 4 to 16 >> frames: >> >> (XEN) grant_table.c:557:d18 Expanding dom (18) grant table from (4) to >> (16) frames. >> (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000 >> >> I believe the following stack trace represents the trace of the failure >> (starting from within XenBus, traced by hand): >> >> connect_rings >> linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c >> netif_map >> linux-2.6-xen-sparse/drivers/xen/netback/interface.c >> map_frontend_pages >> linux-2.6-xen-sparse/drivers/xen/netback/interface.c >> __gnttab_map_grant_ref (hypercall) xen/common/grant_table.c >> create_grant_host_mapping xen/arch/x86/mm.c >> create_grant_va_mapping xen/arch/x86/mm.c >> guest_map_l1e xen/arch/x86/mm.c >> (this is the function that is ultimately failing) >> >> >> >> Any clue as to what is causing this failure or how to fix it? Is there any >> other debug info I can provide here that would be of any help in resolving >> this issue? I have some free time tomorrow to debug this issue, but need some >> direction; this is in an area of XEN I don''t understand very well. >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Eric Tessler
2007-Jul-15 06:15 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
I applied the patch and rebuilt XEN - this did indeed resolve the problem. My test now can create 40 VMs w/o any failures. I will leave the test running for a few days to make sure. Thanks for the help, Eric Keir Fraser <keir@xensource.com> wrote: Now fixed in the staging tree. The patch (for your dom0 kernel) is also attached to this email. Thanks for your help in tracking this one down! -- Keir On 14/7/07 07:43, "Keir Fraser" <keir@xensource.com> wrote: What dom0 kernel image are you running? It looks like vmalloc_sync_all(), called from alloc_vm_area() has not caused the pte that will map the rx ring to be made present in the currently-running page tables. The code looks okay on inspection though. -- Keir On 14/7/07 03:32, "Eric Tessler" <maiden1134@yahoo.com> wrote: Also, by suggestion of Keir, I tried the XEN 3.0.4 kernel in my 16th VM (2.6.16.33), it failed the same way. The only difference is that instead of extending the grant table from 4 to 5 frames, it was extended from 4 to 16 frames: (XEN) grant_table.c:557:d18 Expanding dom (18) grant table from (4) to (16) frames. (XEN) mm.c:2605:d0 Could not find L1 PTE for address d1400000 I believe the following stack trace represents the trace of the failure (starting from within XenBus, traced by hand): connect_rings linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c netif_map linux-2.6-xen-sparse/drivers/xen/netback/interface.c map_frontend_pages linux-2.6-xen-sparse/drivers/xen/netback/interface.c __gnttab_map_grant_ref (hypercall) xen/common/grant_table.c create_grant_host_mapping xen/arch/x86/mm.c create_grant_va_mapping xen/arch/x86/mm.c guest_map_l1e xen/arch/x86/mm.c (this is the function that is ultimately failing) Any clue as to what is causing this failure or how to fix it? Is there any other debug info I can provide here that would be of any help in resolving this issue? I have some free time tomorrow to debug this issue, but need some direction; this is in an area of XEN I don''t understand very well. --------------------------------- Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Ronner
2007-Jul-24 11:06 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Hi Keir, Keir Fraser wrote:> Now fixed in the staging tree. The patch (for your dom0 kernel) is also > attached to this email.I have a similar problem with vbds instead of vifs: (domU:) XENBUS: Timeout connecting to device: device/vbd/2049 (state 6) XENBUS: Timeout connecting to device: device/vbd/2052 (state 6) XENBUS: Timeout connecting to device: device/vbd/2050 (state 6) XENBUS: Timeout connecting to device: device/vbd/2051 (state 6) Does your patch also fix this (in theory)? This is a production machine so I''m somewhat reluctant to try things before knowing what they do. I''ll attach the full domU output below. This is using a custom kernel without modules (I hate having to deploy modules in all domUs) and kernel level IP auto config (I like having this info in the xen config file). There are other domUs on this machine with similar configs having no problem at all.> -- KeirRegards, Thomas ---8<--[ domU output ]------------------------------------------ [root@diana ~]# xm create vechtstreek_test -c Using config file "/etc/xen/vechtstreek_test". Started domain vechtstreek_test Linux version 2.6.18-tr01 (root@diana.zoo.cs.uu.nl) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #2 SMP Fri Jul 20 12:14:40 CEST 2007 BIOS-provided physical RAM map: Xen: 0000000000000000 - 0000000010800000 (usable) 0MB HIGHMEM available. 264MB LOWMEM available. NX (Execute Disable) protection: active Allocating PCI resources starting at 20000000 (gap: 10800000:ef800000) Detected 3200.282 MHz processor. Built 1 zonelists. Total pages: 67584 Kernel command line: root=/dev/sda1 ro ip=131.211.84.207:1.2.3.4:131.211.84.193:255.255.255.192:vechtstreek_test:eth0:off Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 2048 (order: 11, 8192 bytes) Xen reported: 3200.112 MHz processor. Console: colour dummy device 80x25 Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Software IO TLB disabled vmalloc area: d1000000-f51fe000, maxmem 2d7fe000 Memory: 251648k/270336k available (3953k kernel code, 10220k reserved, 1648k data, 216k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 6403.14 BogoMIPS (lpj=32015708) Security Framework v1.0.0 initialized Capability LSM initialized Mount-cache hash table entries: 512 CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 2048K Checking ''hlt'' instruction... OK. SMP alternatives: switching to UP code Freeing SMP alternatives: 20k freed Brought up 1 CPUs migration_cost=0 checking if image is initramfs... it is Freeing initrd memory: 588k freed NET: Registered protocol family 16 Brought up 1 CPUs xen_mem: Initialising balloon driver. SCSI subsystem initialized NET: Registered protocol family 2 IP route cache hash table entries: 4096 (order: 2, 16384 bytes) TCP established hash table entries: 16384 (order: 5, 131072 bytes) TCP bind hash table entries: 8192 (order: 4, 65536 bytes) TCP: Hash tables configured (established 16384 bind 8192) TCP reno registered audit: initializing netlink socket (disabled) audit(1185274517.008:1): initialized VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NTFS driver 2.1.27 [Flags: R/O]. fuse init (API version 7.7) OCFS2 1.3.3 OCFS2 Node Manager 1.3.3 OCFS2 DLM 1.3.3 OCFS2 DLMFS 1.3.3 OCFS2 User DLM kernel interface loaded seclvl: seclvl_init: seclvl: Failure registering with the kernel. seclvl: seclvl_init: seclvl: Failure registering with primary security module. seclvl: Error during initialization: rc = [-22] Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) rtc: IRQ 8 is not free. RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize loop: loaded (max 8 devices) nbd: registered device at major 43 tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> Xen virtual console successfully installed as tty1 Event-channel device installed. netfront: Initialising virtual ethernet driver. Loading iSCSI transport class v1.1-646.<5>iscsi: registered transport (tcp) register_blkdev: cannot get major 8 for sd vbd vbd-2049: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2049 i8042.c: No controller found. mice: PS/2 mouse device common for all mice register_blkdev: cannot get major 8 for sd vbd vbd-2049: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2049 device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com device-mapper: multipath: version 1.0.4 loaded device-mapper: multipath round-robin: version 1.0.0 loaded register_blkdev: cannot get major 8 for sd vbd vbd-2052: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2052 dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-2) netem: version 1.2 u32 classifier Performance counters on OLD policer on Netfilter messages via NETLINK v0.30. IPv4 over IPv4 tunneling driver register_blkdev: cannot get major 8 for sd vbd vbd-2052: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2052 GRE over IPv4 tunneling driver ip_conntrack version 2.4 (2112 buckets, 16896 max) - 228 bytes per conntrack register_blkdev: cannot get major 8 for sd vbd vbd-2050: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2050 register_blkdev: cannot get major 8 for sd vbd vbd-2050: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2050 register_blkdev: cannot get major 8 for sd vbd vbd-2051: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2051 register_blkdev: cannot get major 8 for sd vbd vbd-2051: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2051 netfront: device eth0 has copying receive path. ctnetlink v0.90: registering with nfnetlink. ip_conntrack_pptp version 3.1 loaded ip_nat_pptp version 3.0 loaded ip_tables: (C) 2000-2006 Netfilter Core Team ClusterIP Version 0.8 loaded successfully arp_tables: (C) 2002 David S. Miller IPVS: Registered protocols (TCP, UDP, AH, ESP) IPVS: Connection hash table configured (size=4096, memory=32Kbytes) IPVS: ipvs loaded. IPVS: [rr] scheduler registered. IPVS: [wrr] scheduler registered. IPVS: [lc] scheduler registered. IPVS: [wlc] scheduler registered. IPVS: [lblc] scheduler registered. IPVS: [lblcr] scheduler registered. IPVS: [dh] scheduler registered. IPVS: [sh] scheduler registered. IPVS: [sed] scheduler registered. IPVS: [nq] scheduler registered. IPVS: ftp: loaded support on port[0] = 21 TCP bic registered TCP cubic registered TCP westwood registered TCP highspeed registered TCP hybla registered TCP htcp registered TCP vegas registered TCP veno registered TCP scalable registered TCP lp registered Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver ip6_tables: (C) 2000-2006 Netfilter Core Team NET: Registered protocol family 17 NET: Registered protocol family 15 Bridge firewalling registered Ebtables v2.0 registered ebt_ulog: not logging via ulog since somebody else already registered for PF_BRIDGE 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com> All bugs added by David S. Miller <davem@redhat.com> ieee80211: 802.11 data/management/control stack, git-1.1.13 ieee80211: Copyright (C) 2004-2005 Intel Corporation <jketreno@linux.intel.com> Using IPI No-Shortcut mode XENBUS: Timeout connecting to device: device/vbd/2049 (state 6) XENBUS: Timeout connecting to device: device/vbd/2052 (state 6) XENBUS: Timeout connecting to device: device/vbd/2050 (state 6) XENBUS: Timeout connecting to device: device/vbd/2051 (state 6) XENBUS: Device with no driver: device/console/0 IP-Config: Complete: device=eth0, addr=131.211.84.207, mask=255.255.255.192, gw=131.211.84.193, host=vechtstreek_test, domain=, nis-domain=(none), bootserver=1.2.3.4, rootserver=1.2.3.4, rootpathFreeing unused kernel memory: 216k freed Red Hat nash version 4.1.18 starting Mounted /proc filesystem Mounting sysfs Creating /dev Starting udev Creating root device Mounting root filesystem mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root switchroot: mount failed: 22 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! -------------------------------------------------------- ---8<--[ /etc/xen/vechtstreek_test ]-------------------- kernel = "/boot/vmlinux-stripped" ramdisk = "/boot/initrd-xenU-tr01" memory = 256 name = "vechtstreek_test" vif = [ ''mac=00:00:6C:00:00:0D'' ] disk = [ ''phy:sata/vechtstreek_root,sda1,w'', ''phy:sata/vechtstreek_swap,sda4,w'', ''phy:sata/vechtstreek_var,sda2,w'', ''phy:sata/vechtstreek_home,sda3,w'' ] ip="131.211.84.207" netmask="255.255.255.192" gateway="131.211.84.193" hostname="vechtstreek_test" root = "/dev/sda1 ro" _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2007-Jul-24 11:46 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Hi Thomas, This problem is entirely different. The problem is visible earlier in your console output: the Xen block-device driver is unable to acquire the device-number space for SCSI devices (sda, sdb, etc). Hence it is failing to initialise the vbd connections to the backend and is ending up in state 6 (which is XenbusStateClosed). The solutions you have are: 1. Do not build the generic SCSI subsystem into your dom0 kernels. It is this subsystem which (quite reasonably) is allocating the sd* number space to the exclusion of the Xen block-device driver. 2. Call your devices hd* instead of sd* (i.e., hijack the IDE device numbers instead of the SCSI ones), or even use the xvd* number space, which is exclusively reserved for Xen VBDs. Hope this helps, Keir On 24/7/07 12:06, "Thomas Ronner" <thomas@cs.uu.nl> wrote:> Hi Keir, > > Keir Fraser wrote: >> Now fixed in the staging tree. The patch (for your dom0 kernel) is also >> attached to this email. > > I have a similar problem with vbds instead of vifs: > > (domU:) > XENBUS: Timeout connecting to device: device/vbd/2049 (state 6) > XENBUS: Timeout connecting to device: device/vbd/2052 (state 6) > XENBUS: Timeout connecting to device: device/vbd/2050 (state 6) > XENBUS: Timeout connecting to device: device/vbd/2051 (state 6) > > > Does your patch also fix this (in theory)? This is a production machine > so I''m somewhat reluctant to try things before knowing what they do. > I''ll attach the full domU output below. This is using a custom kernel > without modules (I hate having to deploy modules in all domUs) and > kernel level IP auto config (I like having this info in the xen config > file). > > There are other domUs on this machine with similar configs having no > problem at all. > > > >> -- Keir > > Regards, Thomas > > > > ---8<--[ domU output ]------------------------------------------ > [root@diana ~]# xm create vechtstreek_test -c > Using config file "/etc/xen/vechtstreek_test". > Started domain vechtstreek_test > Linux version 2.6.18-tr01 > (root@diana.zoo.cs.uu.nl) (gcc version 4.1.1 20070105 (Red Hat > 4.1.1-52)) #2 SMP Fri Jul 20 12:14:40 CEST 2007 > BIOS-provided physical RAM map: > Xen: 0000000000000000 - 0000000010800000 (usable) > 0MB HIGHMEM available. > 264MB LOWMEM available. > NX (Execute Disable) protection: active > Allocating PCI resources starting at 20000000 (gap: 10800000:ef800000) > Detected 3200.282 MHz processor. > Built 1 zonelists. Total pages: 67584 > Kernel command line: root=/dev/sda1 ro > ip=131.211.84.207:1.2.3.4:131.211.84.193:255.255.255.192:vechtstreek_test:eth0 > :off > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... done. > Initializing CPU#0 > PID hash table entries: 2048 (order: 11, 8192 bytes) > Xen reported: 3200.112 MHz processor. > Console: colour dummy device 80x25 > Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) > Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) > Software IO TLB disabled > vmalloc area: d1000000-f51fe000, maxmem 2d7fe000 > Memory: 251648k/270336k available (3953k kernel code, 10220k reserved, > 1648k data, 216k init, 0k highmem) > Checking if this processor honours the WP bit even in supervisor mode... Ok. > Calibrating delay using timer specific routine.. 6403.14 BogoMIPS > (lpj=32015708) > Security Framework v1.0.0 initialized > Capability LSM initialized > Mount-cache hash table entries: 512 > CPU: Trace cache: 12K uops, L1 D cache: 16K > CPU: L2 cache: 2048K > Checking ''hlt'' instruction... OK. > SMP alternatives: switching to UP code > Freeing SMP alternatives: 20k freed > Brought up 1 CPUs > migration_cost=0 > checking if image is initramfs... it is > Freeing initrd memory: 588k freed > NET: Registered protocol family 16 > Brought up 1 CPUs > xen_mem: Initialising balloon driver. > SCSI subsystem initialized > NET: Registered protocol family 2 > IP route cache hash table entries: 4096 (order: 2, 16384 bytes) > TCP established hash table entries: 16384 (order: 5, 131072 bytes) > TCP bind hash table entries: 8192 (order: 4, 65536 bytes) > TCP: Hash tables configured (established 16384 bind 8192) > TCP reno registered > audit: initializing netlink socket (disabled) > audit(1185274517.008:1): initialized > VFS: Disk quotas dquot_6.5.1 > Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) > Installing knfsd (copyright (C) 1996 okir@monad.swb.de). > NTFS driver 2.1.27 [Flags: R/O]. > fuse init (API version 7.7) > OCFS2 1.3.3 > OCFS2 Node Manager 1.3.3 > OCFS2 DLM 1.3.3 > OCFS2 DLMFS 1.3.3 > OCFS2 User DLM kernel interface loaded > seclvl: seclvl_init: seclvl: Failure registering with the kernel. > seclvl: seclvl_init: seclvl: Failure registering with primary security > module. > seclvl: Error during initialization: rc = [-22] > Initializing Cryptographic API > io scheduler noop registered > io scheduler anticipatory registered > io scheduler deadline registered > io scheduler cfq registered (default) > rtc: IRQ 8 is not free. > RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize > loop: loaded (max 8 devices) > nbd: registered device at major 43 > tun: Universal TUN/TAP device driver, 1.6 > tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> > Xen virtual console successfully installed as tty1 > Event-channel device installed. > netfront: Initialising virtual ethernet driver. > Loading iSCSI transport class v1.1-646.<5>iscsi: registered transport (tcp) > register_blkdev: cannot get major 8 for sd > vbd vbd-2049: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2049 > i8042.c: No controller found. > mice: PS/2 mouse device common for all mice > register_blkdev: cannot get major 8 for sd > vbd vbd-2049: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2049 > device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: > dm-devel@redhat.com > device-mapper: multipath: version 1.0.4 loaded > device-mapper: multipath round-robin: version 1.0.0 loaded > register_blkdev: cannot get major 8 for sd > vbd vbd-2052: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2052 > dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-2) > netem: version 1.2 > u32 classifier > Performance counters on > OLD policer on > Netfilter messages via NETLINK v0.30. > IPv4 over IPv4 tunneling driver > register_blkdev: cannot get major 8 for sd > vbd vbd-2052: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2052 > GRE over IPv4 tunneling driver > ip_conntrack version 2.4 (2112 buckets, 16896 max) - 228 bytes per conntrack > register_blkdev: cannot get major 8 for sd > vbd vbd-2050: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2050 > register_blkdev: cannot get major 8 for sd > vbd vbd-2050: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2050 > register_blkdev: cannot get major 8 for sd > vbd vbd-2051: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2051 > register_blkdev: cannot get major 8 for sd > vbd vbd-2051: 19 xlvbd_add at /local/domain/0/backend/vbd/17/2051 > netfront: device eth0 has copying receive path. > ctnetlink v0.90: registering with nfnetlink. > ip_conntrack_pptp version 3.1 loaded > ip_nat_pptp version 3.0 loaded > ip_tables: (C) 2000-2006 Netfilter Core Team > ClusterIP Version 0.8 loaded successfully > arp_tables: (C) 2002 David S. Miller > IPVS: Registered protocols (TCP, UDP, AH, ESP) > IPVS: Connection hash table configured (size=4096, memory=32Kbytes) > IPVS: ipvs loaded. > IPVS: [rr] scheduler registered. > IPVS: [wrr] scheduler registered. > IPVS: [lc] scheduler registered. > IPVS: [wlc] scheduler registered. > IPVS: [lblc] scheduler registered. > IPVS: [lblcr] scheduler registered. > IPVS: [dh] scheduler registered. > IPVS: [sh] scheduler registered. > IPVS: [sed] scheduler registered. > IPVS: [nq] scheduler registered. > IPVS: ftp: loaded support on port[0] = 21 > TCP bic registered > TCP cubic registered > TCP westwood registered > TCP highspeed registered > TCP hybla registered > TCP htcp registered > TCP vegas registered > TCP veno registered > TCP scalable registered > TCP lp registered > Initializing IPsec netlink socket > NET: Registered protocol family 1 > NET: Registered protocol family 10 > lo: Disabled Privacy Extensions > IPv6 over IPv4 tunneling driver > ip6_tables: (C) 2000-2006 Netfilter Core Team > NET: Registered protocol family 17 > NET: Registered protocol family 15 > Bridge firewalling registered > Ebtables v2.0 registered > ebt_ulog: not logging via ulog since somebody else already registered > for PF_BRIDGE > 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com> > All bugs added by David S. Miller <davem@redhat.com> > ieee80211: 802.11 data/management/control stack, git-1.1.13 > ieee80211: Copyright (C) 2004-2005 Intel Corporation > <jketreno@linux.intel.com> > Using IPI No-Shortcut mode > XENBUS: Timeout connecting to device: device/vbd/2049 (state 6) > XENBUS: Timeout connecting to device: device/vbd/2052 (state 6) > XENBUS: Timeout connecting to device: device/vbd/2050 (state 6) > XENBUS: Timeout connecting to device: device/vbd/2051 (state 6) > XENBUS: Device with no driver: device/console/0 > IP-Config: Complete: > device=eth0, addr=131.211.84.207, mask=255.255.255.192, > gw=131.211.84.193, > host=vechtstreek_test, domain=, nis-domain=(none), > bootserver=1.2.3.4, rootserver=1.2.3.4, rootpath> Freeing unused kernel memory: 216k freed > Red Hat nash version 4.1.18 starting > Mounted /proc filesystem > Mounting sysfs > Creating /dev > Starting udev > Creating root device > Mounting root filesystem > mount: error 6 mounting ext3 > mount: error 2 mounting none > Switching to new root > switchroot: mount failed: 22 > umount /initrd/dev failed: 2 > Kernel panic - not syncing: Attempted to kill init! > -------------------------------------------------------- > > ---8<--[ /etc/xen/vechtstreek_test ]-------------------- > kernel = "/boot/vmlinux-stripped" > ramdisk = "/boot/initrd-xenU-tr01" > memory = 256 > name = "vechtstreek_test" > vif = [ ''mac=00:00:6C:00:00:0D'' ] > disk = [ ''phy:sata/vechtstreek_root,sda1,w'', > ''phy:sata/vechtstreek_swap,sda4,w'', > ''phy:sata/vechtstreek_var,sda2,w'', > ''phy:sata/vechtstreek_home,sda3,w'' ] > ip="131.211.84.207" > netmask="255.255.255.192" > gateway="131.211.84.193" > hostname="vechtstreek_test" > root = "/dev/sda1 ro"_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Ronner
2007-Jul-24 12:34 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Hi Keir, Thanks for your quick reply! Keir Fraser wrote:> Hi Thomas, > > This problem is entirely different. The problem is visible earlier in your > console output: the Xen block-device driver is unable to acquire the > device-number space for SCSI devices (sda, sdb, etc). Hence it is failing to > initialise the vbd connections to the backend and is ending up in state 6 > (which is XenbusStateClosed).I don''t understand. Which Xen block-device driver is unable to? The frontend or the backend? This never happened on Xen 2, al least I never encountered it.> The solutions you have are: > 1. Do not build the generic SCSI subsystem into your dom0 kernels. It is > this subsystem which (quite reasonably) is allocating the sd* number space > to the exclusion of the Xen block-device driver.This is not possible, as the physical machine has SCSI-disks and a SATA disk (which also uses the SCSI subsystem).> 2. Call your devices hd* instead of sd* (i.e., hijack the IDE device > numbers instead of the SCSI ones), or even use the xvd* number space, which > is exclusively reserved for Xen VBDs.I tried hd*, which works. I''m used to making sd* devices as there used to be some Xen version (forgot which one) that was more stable when using sd* devices in domUs.> Hope this helps, > KeirThanks, Thomas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2007-Jul-24 13:26 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
>> This problem is entirely different. The problem is visible earlier in your >> console output: the Xen block-device driver is unable to acquire the >> device-number space for SCSI devices (sda, sdb, etc). Hence it is failing to >> initialise the vbd connections to the backend and is ending up in state 6 >> (which is XenbusStateClosed). > > I don''t understand. Which Xen block-device driver is unable to? The > frontend or the backend? This never happened on Xen 2, al least I never > encountered it.The driver in domU. If you never saw this problem with Xen2, then that''s because the domU kernels you used at that time did not have the normal SCSI subsystem compiled into them.>> The solutions you have are: >> 1. Do not build the generic SCSI subsystem into your dom0 kernels. It is >> this subsystem which (quite reasonably) is allocating the sd* number space >> to the exclusion of the Xen block-device driver. > > This is not possible, as the physical machine has SCSI-disks and a SATA > disk (which also uses the SCSI subsystem).Sorry, that was a typo. I meant you should not build it into your *domU* kernels. It is of course fine to have ordinary SCSI compiled into dom0.>> 2. Call your devices hd* instead of sd* (i.e., hijack the IDE device >> numbers instead of the SCSI ones), or even use the xvd* number space, which >> is exclusively reserved for Xen VBDs. > > I tried hd*, which works. I''m used to making sd* devices as there used > to be some Xen version (forgot which one) that was more stable when > using sd* devices in domUs.Sounds unlikely to me. sd* and hd* are just names. Anyhow, if you stop compiling SCSI into your domU kernel then you can continue to use sd* names for your VBDs. -- Keir _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Ronner
2007-Jul-25 08:12 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Keir Fraser wrote:> The driver in domU. If you never saw this problem with Xen2, then that''s > because the domU kernels you used at that time did not have the normal SCSI > subsystem compiled into them. >Other domUs with similar configs and the same kernel run fine. It was only until I started the 7th domU the problems begun. But I''ll try your suggestion by compiling a domU kernel without SCSI. I wonder what I was thinking including it in the first place.> -- Keir >Thanks, Thomas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2007-Jul-25 08:15 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
On 25/7/07 09:12, "Thomas Ronner" <thomas@cs.uu.nl> wrote:>> The driver in domU. If you never saw this problem with Xen2, then that''s >> because the domU kernels you used at that time did not have the normal SCSI >> subsystem compiled into them. >> > Other domUs with similar configs and the same kernel run fine. It was > only until I started the 7th domU the problems begun.I have to be skeptical about that. If you run the same kernel and basically same configuration, this problem should occur deterministically for all such domains. So something must be different in domUs 1 thru 6! -- Keir _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Ronner
2007-Jul-25 09:55 UTC
Re: [Xen-users] Re: XEN 3.1: critical bug: vif init failure after creating 15-17 VMs (XENBUS: Timeout connecting to device: device/vif)
Hi Keir, Keir Fraser wrote:>> Other domUs with similar configs and the same kernel run fine. It was >> only until I started the 7th domU the problems begun. >> > > I have to be skeptical about that. If you run the same kernel and basically > same configuration, this problem should occur deterministically for all such > domains. So something must be different in domUs 1 thru 6! >No. I copied the config file in /etc/xen from a working config every time I created a new domU. Change names, ips, and go. Block devices for the different domUs are LVM-backed ext3 file systems which I created by unpacking a tar. Every time the same tar. But... Some spooky shit going on here.. I did xm shutdown wikitest xm create wikitest -c (wikitest is another domain, one of the 6 mentioned earlier) It didn''t come up. It behaved the same as the other domain (the ''vechtstreek_test'' domain I originally reported about). It boots fine with the newer kernel without SCSI-support. It started fine a couple of days ago using the kernel with SCSI-support built in. I''m 100% certain about that.> -- KeirThomas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users