With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j patch. The output from this command is attached: xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt & Any ideas how to fix this timeout error? Olaf ... libxl: debug: libxl_event.c:457:watchfd_callback: watch w=0x62bf88 wpath=/local/domain/0/backend/vif/2/0/state token=3/1: event epath=/local/domain/0/backend/vif/2/0/state libxl: debug: libxl_event.c:600:devstate_watch_callback: backend /local/domain/0/backend/vif/2/0/state wanted state 2 still waiting state 1 libxl: debug: libxl_event.c:614:devstate_timeout: backend /local/domain/0/backend/vif/2/0/state wanted state 2 timed out libxl: debug: libxl_event.c:549:libxl__ev_xswatch_deregister: watch w=0x62bf88 wpath=/local/domain/0/backend/vif/2/0/state token=3/1: deregister slotnum=3 libxl: debug: libxl_event.c:561:libxl__ev_xswatch_deregister: watch w=0x62bf88: deregister unregistered libxl: error: libxl_device.c:858:device_backend_callback: unable to disconnect device with path /local/domain/0/backend/vif/2/0 libxl: error: libxl_create.c:1070:domcreate_attach_pci: unable to add nic devices ... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, 2012-08-06 at 18:39 +0100, Olaf Hering wrote:> With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does > not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a > 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j > patch. > > The output from this command is attached: > xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt & > > Any ideas how to fix this timeout error?The tools are waiting for the backend to move from state 1 (XenbusStateInitialising) to state 2 (XenbusStateInitWait). A backend driver typically makes that transition at the end of its probe function -- what is the SLES11SP2 netback waiting for? Or is it failing to init, in which case perhaps there is an error node in XS?> > Olaf > > ... > libxl: debug: libxl_event.c:457:watchfd_callback: watch w=0x62bf88 wpath=/local/domain/0/backend/vif/2/0/state token=3/1: event epath=/local/domain/0/backend/vif/2/0/state > libxl: debug: libxl_event.c:600:devstate_watch_callback: backend /local/domain/0/backend/vif/2/0/state wanted state 2 still waiting state 1 > libxl: debug: libxl_event.c:614:devstate_timeout: backend /local/domain/0/backend/vif/2/0/state wanted state 2 timed out > libxl: debug: libxl_event.c:549:libxl__ev_xswatch_deregister: watch w=0x62bf88 wpath=/local/domain/0/backend/vif/2/0/state token=3/1: deregister slotnum=3 > libxl: debug: libxl_event.c:561:libxl__ev_xswatch_deregister: watch w=0x62bf88: deregister unregistered > libxl: error: libxl_device.c:858:device_backend_callback: unable to disconnect device with path /local/domain/0/backend/vif/2/0 > libxl: error: libxl_create.c:1070:domcreate_attach_pci: unable to add nic devices > ...
On Tue, Aug 07, Ian Campbell wrote:> On Mon, 2012-08-06 at 18:39 +0100, Olaf Hering wrote: > > With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does > > not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a > > 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j > > patch. > > > > The output from this command is attached: > > xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt & > > > > Any ideas how to fix this timeout error? > > The tools are waiting for the backend to move from state 1 > (XenbusStateInitialising) to state 2 (XenbusStateInitWait). A backend > driver typically makes that transition at the end of its probe function > -- what is the SLES11SP2 netback waiting for? Or is it failing to init, > in which case perhaps there is an error node in XS?I think there is a difference between the two kernels. The pvops kernel goes into state 2 right away (I cant tell from repeated xenstore-ls runs if it had also state 1). The sles11 kernel remains in state 1. Did the expectations of libxl change recently? xl create used to work not too long ago. xm does not work either, so the change is most likely in the scripts. Olaf
On Tue, 2012-08-07 at 16:25 +0100, Olaf Hering wrote:> On Tue, Aug 07, Ian Campbell wrote: > > > On Mon, 2012-08-06 at 18:39 +0100, Olaf Hering wrote: > > > With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does > > > not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a > > > 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j > > > patch. > > > > > > The output from this command is attached: > > > xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt & > > > > > > Any ideas how to fix this timeout error? > > > > The tools are waiting for the backend to move from state 1 > > (XenbusStateInitialising) to state 2 (XenbusStateInitWait). A backend > > driver typically makes that transition at the end of its probe function > > -- what is the SLES11SP2 netback waiting for? Or is it failing to init, > > in which case perhaps there is an error node in XS? > > I think there is a difference between the two kernels. The pvops kernel > goes into state 2 right away (I cant tell from repeated xenstore-ls runs > if it had also state 1). > The sles11 kernel remains in state 1.What is it waiting for?> Did the expectations of libxl > change recently? xl create used to work not too long ago.I don''t think the expectation has changed but the implementation is probably more picky since Roger''s hotplug patches.> xm does not work either, so the change is most likely in the scripts.If you are switching from xl to xm then you should either reboot or remove libxl/disable_udev in xenstore manually. Other than that nor much has changed in the scripts either. Are you sure it isn''t the kernel which has changed? Ian.> > Olaf
On Tue, Aug 07, Ian Campbell wrote:> On Tue, 2012-08-07 at 16:25 +0100, Olaf Hering wrote: > > On Tue, Aug 07, Ian Campbell wrote: > > > > > On Mon, 2012-08-06 at 18:39 +0100, Olaf Hering wrote: > > > > With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does > > > > not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a > > > > 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j > > > > patch. > > > > > > > > The output from this command is attached: > > > > xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt & > > > > > > > > Any ideas how to fix this timeout error? > > > > > > The tools are waiting for the backend to move from state 1 > > > (XenbusStateInitialising) to state 2 (XenbusStateInitWait). A backend > > > driver typically makes that transition at the end of its probe function > > > -- what is the SLES11SP2 netback waiting for? Or is it failing to init, > > > in which case perhaps there is an error node in XS? > > > > I think there is a difference between the two kernels. The pvops kernel > > goes into state 2 right away (I cant tell from repeated xenstore-ls runs > > if it had also state 1). > > The sles11 kernel remains in state 1. > > What is it waiting for?I have no idea, have to browse code debug it. A quick test with plain sles11sp2+xend and xm start -p shows that /local/domain/0/backend/vif/1/0/state finally gets into state 2. Looks like something to fix before 4.2.> > Did the expectations of libxl > > change recently? xl create used to work not too long ago. > > I don''t think the expectation has changed but the implementation is > probably more picky since Roger''s hotplug patches. > > > xm does not work either, so the change is most likely in the scripts. > > If you are switching from xl to xm then you should either reboot or > remove libxl/disable_udev in xenstore manually. > > Other than that nor much has changed in the scripts either. Are you sure > it isn''t the kernel which has changed?The kernel is ok. Olaf
On Wed, 2012-08-08 at 18:28 +0100, Olaf Hering wrote:> On Tue, Aug 07, Ian Campbell wrote: > > > On Tue, 2012-08-07 at 16:25 +0100, Olaf Hering wrote: > > > On Tue, Aug 07, Ian Campbell wrote: > > > > > > > On Mon, 2012-08-06 at 18:39 +0100, Olaf Hering wrote: > > > > > With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does > > > > > not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a > > > > > 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j > > > > > patch. > > > > > > > > > > The output from this command is attached: > > > > > xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt & > > > > > > > > > > Any ideas how to fix this timeout error? > > > > > > > > The tools are waiting for the backend to move from state 1 > > > > (XenbusStateInitialising) to state 2 (XenbusStateInitWait). A backend > > > > driver typically makes that transition at the end of its probe function > > > > -- what is the SLES11SP2 netback waiting for? Or is it failing to init, > > > > in which case perhaps there is an error node in XS? > > > > > > I think there is a difference between the two kernels. The pvops kernel > > > goes into state 2 right away (I cant tell from repeated xenstore-ls runs > > > if it had also state 1). > > > The sles11 kernel remains in state 1. > > > > What is it waiting for? > > I have no idea, have to browse code debug it. > A quick test with plain sles11sp2+xend and xm start -p shows that > /local/domain/0/backend/vif/1/0/state finally gets into state 2.When you say "finally" do you mean that it takes an unusually long time?> Looks like something to fix before 4.2. > > > > > Did the expectations of libxl > > > change recently? xl create used to work not too long ago. > > > > I don''t think the expectation has changed but the implementation is > > probably more picky since Roger''s hotplug patches. > > > > > xm does not work either, so the change is most likely in the scripts. > > > > If you are switching from xl to xm then you should either reboot or > > remove libxl/disable_udev in xenstore manually. > > > > Other than that nor much has changed in the scripts either. Are you sure > > it isn''t the kernel which has changed? > > The kernel is ok.I think there is at least the posibility that this kernel has a latent bug exposed by recent changes to libxl, or at least we should consider the possibility. Is this kernel tree available somewhere convenient (i.e. which doesn''t involves unpacking .src.rpms and applying patches etc). I checked netback_probe in the linux-2.6.18-xen.hg tree (which I believe relates at least somewhat to the SLES kernel) and it switches to XenbusStateInitWait just before calling the function which triggers the hotplug script -- so libxl''s behaviour of waiting for XenbusStateInitWait before running the hotplug scripts would seem to be correct. I couldn''t find anything before this point which would cause the driver to block. So if your observation is that your kernel is blocking in state 1 or taking an inordinate amount of time to get to state 2 then that is what you need to dig into. Have you reinstalled your udev rules etc? They changed recently and I suspect they need to be up to date to work with the latest scripts. Although you don''t appear to be getting to that point so I don''t think it would matter (yet). You didn''t answer my question about error nodes in xenstore. You could, experimentally, try increasing LIBXL_INIT_TIMEOUT to some enormous time. Ian.
On Thu, Aug 09, Ian Campbell wrote:> > I have no idea, have to browse code debug it. > > A quick test with plain sles11sp2+xend and xm start -p shows that > > /local/domain/0/backend/vif/1/0/state finally gets into state 2. > > When you say "finally" do you mean that it takes an unusually long time?''finally'' is wrongly worded. It gets into state 2, I notice no delay.> Is this kernel tree available somewhere convenient (i.e. which doesn''t > involves unpacking .src.rpms and applying patches etc).Its available via git, see http://kernel.opensuse.org/git The webui is here: http://kernel.opensuse.org/cgit/kernel/tree/?h=SLE11-SP2> I checked netback_probe in the linux-2.6.18-xen.hg tree (which I believe > relates at least somewhat to the SLES kernel) and it switches to > XenbusStateInitWait just before calling the function which triggers the > hotplug script -- so libxl''s behaviour of waiting for > XenbusStateInitWait before running the hotplug scripts would seem to be > correct. I couldn''t find anything before this point which would cause > the driver to block. So if your observation is that your kernel is > blocking in state 1 or taking an inordinate amount of time to get to > state 2 then that is what you need to dig into.Indeed, netback_probe is appearently never called in my case. I will check why that happens.> Have you reinstalled your udev rules etc? They changed recently and I > suspect they need to be up to date to work with the latest scripts. > Although you don''t appear to be getting to that point so I don''t think > it would matter (yet).Its all coming from xen*.rpm packages, no manual install. The rules are from xen-unstable.> You didn''t answer my question about error nodes in xenstore.I dont see any error nodes in xenstore.> You could, experimentally, try increasing LIBXL_INIT_TIMEOUT to some > enormous time.Thanks for the hint. I will see what I find. Olaf
On Thu, Aug 09, Olaf Hering wrote:> Indeed, netback_probe is appearently never called in my case. I will > check why that happens.What I have seen so far is that in 4.2+xl the vif driver is not registered, while in 4.1+xm there is a vif driver registered. Thats so far the difference I could spot. I will post more results why that happens later. Olaf root@satriani:~ # grep xen-backend xl-dmesg-3.0.34-sles11sp2_olh-xen.txt [ 0.149879] bus: ''xen-backend'': registered [ 0.149879] device: ''xen-backend'': device_add [ 0.149879] PM: Adding info for No Bus:xen-backend [ 90.055240] bus: ''xen-backend'': add device qdisk-1-768 [ 90.055252] PM: Adding info for xen-backend:qdisk-1-768 [ 90.064575] bus: ''xen-backend'': add device qdisk-1-5632 [ 90.064584] PM: Adding info for xen-backend:qdisk-1-5632 [ 90.073196] bus: ''xen-backend'': add device console-1-0 [ 90.073205] PM: Adding info for xen-backend:console-1-0 [ 90.081771] bus: ''xen-backend'': add device vkbd-1-0 [ 90.081776] PM: Adding info for xen-backend:vkbd-1-0 [ 90.378494] bus: ''xen-backend'': add device vif-1-0 [ 90.378504] PM: Adding info for xen-backend:vif-1-0 [ 100.401586] PM: Removing info for xen-backend:console-1-0 [ 100.401596] bus: ''xen-backend'': remove device console-1-0 [ 102.400202] PM: Removing info for xen-backend:qdisk-1-768 [ 102.400212] bus: ''xen-backend'': remove device qdisk-1-768 [ 102.406016] PM: Removing info for xen-backend:qdisk-1-5632 [ 102.406025] bus: ''xen-backend'': remove device qdisk-1-5632 [ 102.411464] PM: Removing info for xen-backend:vkbd-1-0 [ 102.411473] bus: ''xen-backend'': remove device vkbd-1-0 [ 110.410600] PM: Removing info for xen-backend:vif-1-0 [ 110.410610] bus: ''xen-backend'': remove device vif-1-0 root@satriani:~ # grep xen-backend xm-dmesg-3.0.34-sles11sp2_olh-xen.txt [ 0.150119] bus: ''xen-backend'': registered [ 0.150119] device: ''xen-backend'': device_add [ 0.150119] PM: Adding info for No Bus:xen-backend [ 44.319441] bus: ''xen-backend'': add driver tap [ 44.338383] bus: ''xen-backend'': add driver vbd [ 44.367501] bus: ''xen-backend'': add driver vif [ 44.378095] bus: ''xen-backend'': add driver vusb [ 204.002506] bus: ''xen-backend'': add device vfb-1-0 [ 204.002514] PM: Adding info for xen-backend:vfb-1-0 [ 204.017641] bus: ''xen-backend'': add device vbd-1-768 [ 204.017650] PM: Adding info for xen-backend:vbd-1-768 [ 204.017663] bus: ''xen-backend'': driver_probe_device: matched device vbd-1-768 with driver vbd [ 204.017667] bus: ''xen-backend'': really_probe: probing driver vbd with device vbd-1-768 [ 204.018903] bus: ''xen-backend'': really_probe: bound device vbd-1-768 to driver vbd [ 204.032488] bus: ''xen-backend'': add device vbd-1-5632 [ 204.032494] PM: Adding info for xen-backend:vbd-1-5632 [ 204.032502] bus: ''xen-backend'': driver_probe_device: matched device vbd-1-5632 with driver vbd [ 204.032504] bus: ''xen-backend'': really_probe: probing driver vbd with device vbd-1-5632 [ 204.033534] bus: ''xen-backend'': really_probe: bound device vbd-1-5632 to driver vbd [ 204.043973] bus: ''xen-backend'': add device vif-1-0 [ 204.043980] PM: Adding info for xen-backend:vif-1-0 [ 204.043988] bus: ''xen-backend'': driver_probe_device: matched device vif-1-0 with driver vif [ 204.043990] bus: ''xen-backend'': really_probe: probing driver vif with device vif-1-0 [ 204.049398] bus: ''xen-backend'': really_probe: bound device vif-1-0 to driver vif [ 204.739981] bus: ''xen-backend'': add device console-1-0 [ 204.739993] PM: Adding info for xen-backend:console-1-0 [ 340.548887] PM: Removing info for xen-backend:console-1-0 [ 340.548902] bus: ''xen-backend'': remove device console-1-0 [ 340.570464] PM: Removing info for xen-backend:vfb-1-0 [ 340.570470] bus: ''xen-backend'': remove device vfb-1-0 [ 340.577394] PM: Removing info for xen-backend:vbd-1-768 [ 340.577403] bus: ''xen-backend'': remove device vbd-1-768 [ 340.578784] PM: Removing info for xen-backend:vbd-1-5632 [ 340.578791] bus: ''xen-backend'': remove device vbd-1-5632 [ 340.581006] PM: Removing info for xen-backend:vif-1-0 [ 340.581014] bus: ''xen-backend'': remove device vif-1-0
On Fri, Aug 10, Olaf Hering wrote:> On Thu, Aug 09, Olaf Hering wrote: > > > Indeed, netback_probe is appearently never called in my case. I will > > check why that happens. > > What I have seen so far is that in 4.2+xl the vif driver is not > registered, while in 4.1+xm there is a vif driver registered. Thats so > far the difference I could spot.Argh, I was expecting that required kernel drivers are loaded when needed. But thats not the case. There is a workaround or fix for pvops in 25728:a6edbc39fc84. But this changeset misses at least netbk and blkbk. Any idea why that changeset is now needed? Why did it work for everyone before? Olaf
On Fri, 2012-08-10 at 13:59 +0100, Olaf Hering wrote:> On Fri, Aug 10, Olaf Hering wrote: > > > On Thu, Aug 09, Olaf Hering wrote: > > > > > Indeed, netback_probe is appearently never called in my case. I will > > > check why that happens. > > > > What I have seen so far is that in 4.2+xl the vif driver is not > > registered, while in 4.1+xm there is a vif driver registered. Thats so > > far the difference I could spot. > > Argh, I was expecting that required kernel drivers are loaded when > needed. But thats not the case. There is a workaround or fix for pvops > in 25728:a6edbc39fc84. But this changeset misses at least netbk and > blkbk. > > Any idea why that changeset is now needed? > Why did it work for everyone before?Backend driver autoloading is relatively new in pvops kernels at least (I don''t know if it was ever a feature of the older kernels). Perhaps the SLES kernels used to build those drivers into the kernel statically? (that was quite common in the classic-Xen kernel days, but now with pvops modular is becoming more common) None of which answers your questions as to why it used to work for you though. I think we would accept an update to 25728:a6edbc39fc84 to add some new aliases, it was discussed at the time but I think it petered out after a short discussion about what the correct names were. Ian.
On Tue, Aug 14, Ian Campbell wrote:> I think we would accept an update to 25728:a6edbc39fc84 to add some new > aliases, it was discussed at the time but I think it petered out after a > short discussion about what the correct names were.Perhaps the kernel should do a request_module(xen-backend:$type), but that does not solve it for older kernels. Olaf
On Tue, 2012-08-14 at 12:30 +0100, Olaf Hering wrote:> On Tue, Aug 14, Ian Campbell wrote: > > > I think we would accept an update to 25728:a6edbc39fc84 to add some new > > aliases, it was discussed at the time but I think it petered out after a > > short discussion about what the correct names were. > > Perhaps the kernel should do a request_module(xen-backend:$type), but > that does not solve it for older kernels.Modern kernels already do (effectively) that. That''s the "Backend driver autoloading is relatively new in pvops kernels" I referred to is.