Murillo Fernandes Bernardes
2005-Nov-21 15:43 UTC
[Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
Frontend devices are not being unregistered when in closed state. The following patch fix that. Fix bug #420. Makes "05_attach_and_dettach_device_repeatedly_pos" and "09_attach_and_dettach_device_check_data_pos" tests pass. Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com> -- Murillo Fernandes Bernardes IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Berger
2005-Nov-21 16:07 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM:> > Frontend devices are not being unregistered when in closed state. The > following patch fix that. > > Fix bug #420. > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > "09_attach_and_dettach_device_check_data_pos" tests pass.Did you test this with suspending / resuming a dom U? The reason I am asking is that when suspending the driver immediately gets into state ''Closed'' and when resuming into state ''Connected'', but now your device is unregistered. Stefan> > > Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com> > > -- > Murillo Fernandes Bernardes > IBM Linux Technology Center > [attachment "frontend_unregister_device.patch" deleted by Stefan > Berger/Watson/IBM] _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Murillo Fernandes Bernardes
2005-Nov-21 16:33 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
On Monday 21 November 2005 14:07, Stefan Berger wrote:> xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM: > > Frontend devices are not being unregistered when in closed state. The > > following patch fix that. > > > > Fix bug #420. > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > > "09_attach_and_dettach_device_check_data_pos" tests pass. > > Did you test this with suspending / resuming a dom U? The reason I am > asking is that when suspending the driver immediately gets into state > ''Closed'' and when resuming into state ''Connected'', but now your device is > unregistered.No, I did not test suspend/resume. I really don''t see why it should get into Closed on suspend, but anyway, is this really hapenning? I could not find any switch to Closed into suspend''s code, neither on resume. How to test suspend/resume on a domU? It does not have /sys/power/state neither /proc/sleep. -- Murillo Fernandes Bernardes IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Berger
2005-Nov-21 16:49 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
Murillo Fernandes Bernardes <mfb@br.ibm.com> wrote on 11/21/2005 11:33:31 AM:> On Monday 21 November 2005 14:07, Stefan Berger wrote: > > xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM: > > > Frontend devices are not being unregistered when in closed state.The> > > following patch fix that. > > > > > > Fix bug #420. > > > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > > > "09_attach_and_dettach_device_check_data_pos" tests pass. > > > > Did you test this with suspending / resuming a dom U? The reason I am > > asking is that when suspending the driver immediately gets into state > > ''Closed'' and when resuming into state ''Connected'', but now yourdevice is> > unregistered. > > No, I did not test suspend/resume. > > I really don''t see why it should get into Closed on suspend, but anyway,is> this really hapenning? I could not find any switch to Closed intosuspend''s> code, neither on resume. >What I am seeing is that after a suspend / resume the interface ''eth0'' is completely gone. ''ifconfig -a'' shows everything, but no eth0. You might only want to unregister if the domain was not suspended. So you probably need to implement the .suspend function in the frontend and set a state variable to know whether the domain is being hibernated, and you clear that variable in the .resume. You check that variable when the driver is going into the ''Closed'' state and only unregister if not in ''suspend'' mode.> How to test suspend/resume on a domU? It does not have /sys/power/state > neither /proc/sleep.''xm save <dom id> <dom state filename>'' lets you suspend a domain ''xm restore <dom state filename>'' lets you resume a domain. I would only use the network driver for testing this by booting into a RAMDisk. Stefan> > -- > Murillo Fernandes Bernardes > IBM Linux Technology Center_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adam Heath
2005-Nov-21 17:48 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
On Mon, 21 Nov 2005, Murillo Fernandes Bernardes wrote:> > Frontend devices are not being unregistered when in closed state. The > following patch fix that. > > Fix bug #420. > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > "09_attach_and_dettach_device_check_data_pos" tests pass. > > > Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>Hmm. I have a way to make dom0 in unstable kernel-oops. If I attempt to setup a virtual block device to a /dev/nbN(enbd), but never actually configure the /dev/nbN, I get a kernel oops in dom0 when the domU shutdowns down. I can then no longer reboot or shutdown the dom0. Would this be related? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2005-Nov-21 18:31 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
On Mon, Nov 21, 2005 at 11:49:19AM -0500, Stefan Berger wrote:> Murillo Fernandes Bernardes <mfb@br.ibm.com> wrote on 11/21/2005 11:33:31 > AM: > > > On Monday 21 November 2005 14:07, Stefan Berger wrote: > > > xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM: > > > > Frontend devices are not being unregistered when in closed state. > The > > > > following patch fix that. > > > > > > > > Fix bug #420. > > > > > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > > > > "09_attach_and_dettach_device_check_data_pos" tests pass. > > > > > > Did you test this with suspending / resuming a dom U? The reason I am > > > asking is that when suspending the driver immediately gets into state > > > ''Closed'' and when resuming into state ''Connected'', but now your > device is > > > unregistered. > > > > No, I did not test suspend/resume. > > > > I really don''t see why it should get into Closed on suspend, but anyway, > is > > this really hapenning? I could not find any switch to Closed into > suspend''s > > code, neither on resume.xenbus_read_driver_state returns Closed if the backend path is no longer present. Maybe this is where the Closed has come from. However, xenbus_probe.c:otherend_changed is supposed to be protecting us from watches that have fired immediately after a resume. Could you please enable the DPRINTK in xenbus_probe and see whether the Closed is coming through the test in otherend_changed? This would help diagnose the problem. The intention with xenbus_read_driver_state returning Closed was that this was the correct way of forcing the driver to close down if the path goes away, as in normal use the backend path should not just disappear, and for resumption we have a way to detect that. Perhaps one or other of these things should change, but it''s not clear to me which one it is, or if indeed this is the problem at all.> What I am seeing is that after a suspend / resume the interface ''eth0'' is > completely gone. ''ifconfig -a'' shows everything, but no eth0. > > You might only want to unregister if the domain was not suspended. So you > probably need to implement the .suspend function in the frontend and set a > state variable to know whether the domain is being hibernated, and you > clear that variable in the .resume. You check that variable when the > driver is going into the ''Closed'' state and only unregister if not in > ''suspend'' mode.If this is necessary, and it''s not clear to me that it is, then this is a facility that Xenbus should provide in general, rather than each driver having to hack around the problem itself. Returning to Murillo''s patch, I assumed that the unregister_netdev in close_netdev would implicitly call device_unregister, and that this was the correct way to close down the device. Is this not the case? My intention for closedown of the device was that the backend would move to state Closing, triggering a graceful shutdown of the frontend (in this case through netfront_closing, close_netdev, etc.). AFAIK, Xend is correctly setting the backend to state Closing, so I expect unregister_netdev to be being called. There is the different issue that Xend does not check for the existence or state of a device before hotplugging a new one. This means that the frontend might not have time to see the Closing before having a chance to close down, for example. This is a problem with Xend that needs to be fixed there. Xend should refuse to hotplug a device if the frontend for the old one has not yet closed down. This is not to say that Murillo''s patch is wrong, but simply to say that I expect wider issues than can be fixed by this patch alone. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2005-Nov-21 18:40 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
On Mon, Nov 21, 2005 at 11:48:10AM -0600, Adam Heath wrote:> On Mon, 21 Nov 2005, Murillo Fernandes Bernardes wrote: > > > > > Frontend devices are not being unregistered when in closed state. The > > following patch fix that. > > > > Fix bug #420. > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > > "09_attach_and_dettach_device_check_data_pos" tests pass. > > > > > > Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com> > > Hmm. I have a way to make dom0 in unstable kernel-oops. If I attempt to > setup a virtual block device to a /dev/nbN(enbd), but never actually configure > the /dev/nbN, I get a kernel oops in dom0 when the domU shutdowns down. I can > then no longer reboot or shutdown the dom0. > > Would this be related?This doesn''t seem very related, no. What does your oops look like? Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Berger
2005-Nov-21 20:02 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 01:31:48 PM:> On Mon, Nov 21, 2005 at 11:49:19AM -0500, Stefan Berger wrote: > > > Murillo Fernandes Bernardes <mfb@br.ibm.com> wrote on 11/21/200511:33:31> > AM: > > > > > On Monday 21 November 2005 14:07, Stefan Berger wrote: > > > > xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01AM:> > > > > Frontend devices are not being unregistered when in closedstate.> > The > > > > > following patch fix that. > > > > > > > > > > Fix bug #420. > > > > > > > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > > > > > "09_attach_and_dettach_device_check_data_pos" tests pass. > > > > > > > > Did you test this with suspending / resuming a dom U? The reason Iam> > > > asking is that when suspending the driver immediately gets intostate> > > > ''Closed'' and when resuming into state ''Connected'', but now your > > device is > > > > unregistered. > > > > > > No, I did not test suspend/resume. > > > > > > I really don''t see why it should get into Closed on suspend, butanyway,> > is > > > this really hapenning? I could not find any switch to Closed into > > suspend''s > > > code, neither on resume. > > xenbus_read_driver_state returns Closed if the backend path is no longer > present. Maybe this is where the Closed has come from. However, > xenbus_probe.c:otherend_changed is supposed to be protecting us fromwatches> that have fired immediately after a resume. > > Could you please enable the DPRINTK in xenbus_probe and see whether theClosed> is coming through the test in otherend_changed? This would helpdiagnose the> problem.Here''s the log from domain 0''s /var/log/messages: Nov 21 14:54:43 jlfb-2 gpm[2667]: *** info [startup.c(95)]: Nov 21 14:54:43 jlfb-2 gpm[2667]: Started gpm successfully. Entered daemon mode. Nov 21 14:54:43 jlfb-2 gpm[2667]: *** info [mice.c(1766)]: Nov 21 14:54:43 jlfb-2 gpm[2667]: imps2: Auto-detected intellimouse PS/2 Nov 21 14:54:52 jlfb-2 fstab-sync[2817]: removed all generated mount points Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend:639) . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend_unit:624) backend/vif/15/0 Nov 21 14:54:55 jlfb-2 kernel: . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend:639) . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend_unit:624) backend/vtpm/15/6 Nov 21 14:54:55 jlfb-2 kernel: . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend:639) . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend_unit:624) backend/vtpm/2/6 Nov 21 14:54:55 jlfb-2 kernel: . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (frontend_changed:763) . Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:54:55 jlfb-2 gconfd (root-2848): starting (version 2.10.0), pid 2848 user ''root'' Nov 21 14:54:55 jlfb-2 gconfd (root-2848): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0 Nov 21 14:54:55 jlfb-2 gconfd (root-2848): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1 Nov 21 14:54:56 jlfb-2 gconfd (root-2848): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2 Nov 21 14:55:03 jlfb-2 gconfd (root-2848): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0 below: starting user domain Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232) . Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (xenbus_dev_probe:338) . Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:55:40 jlfb-2 last message repeated 5 times Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state is 1, /local/domain/1/device/vif/0/state, /local/domain/1/device/vif/0/state. Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232) . Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:55:41 jlfb-2 kernel: device vif1.0 entered promiscuous mode Nov 21 14:55:41 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering learning state Nov 21 14:55:41 jlfb-2 kernel: xenbr0: topology change detected, propagating Nov 21 14:55:41 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering forwarding state Nov 21 14:55:41 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:55:45 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state is 4, /local/domain/1/device/vif/0/state, /local/domain/1/device/vif/0/state. Nov 21 14:55:45 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . below: suspending user domain Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state is 6, /local/domain/1/device/vif/0/state, /local/domain/1/device/vif/0/state. Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (xenbus_dev_remove:376) . Nov 21 14:56:09 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering disabled state Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232) . Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:56:09 jlfb-2 kernel: device vif1.0 left promiscuous mode Nov 21 14:56:09 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering disabled state Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . below: resuming user domain Nov 21 14:56:30 jlfb-2 last message repeated 2 times Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232) . Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_dev_probe:338) . Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:56:30 jlfb-2 last message repeated 5 times Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state is 1, /local/domain/2/device/vif/0/state, /local/domain/2/device/vif/0/state. Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232) . Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state is 6, /local/domain/2/device/vif/0/state, /local/domain/2/device/vif/0/state. Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_dev_remove:376) . Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232) . Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Nov 21 14:56:30 jlfb-2 kernel: device vif2.0 entered promiscuous mode Nov 21 14:56:30 jlfb-2 kernel: xenbr0: port 1(vif2.0) entering learning state Nov 21 14:56:30 jlfb-2 kernel: xenbr0: topology change detected, propagating Nov 21 14:56:30 jlfb-2 kernel: xenbr0: port 1(vif2.0) entering forwarding state Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) . Result: eth0 is gone. This is a user domain that is booting into a RAM disk. Memory of the user domain is 64M. Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Murillo Fernandes Bernardes
2005-Nov-21 21:11 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
On Monday 21 November 2005 16:31, Ewan Mellor wrote:> On Mon, Nov 21, 2005 at 11:49:19AM -0500, Stefan Berger wrote:> Could you please enable the DPRINTK in xenbus_probe and see whether the > Closed is coming through the test in otherend_changed? This would help > diagnose the problem. >on DomU: xenbus_probe (xenbus_suspend:831) . xenbus_probe (suspend_dev:786) . xenbus_probe (suspend_dev:786) . xenbus_probe (suspend_dev:786) . xenbus_probe (resume_dev:806) . xenbus_probe (otherend_changed:301) state is 6, /local/domain/0/backend/vif/5/0/state, /local/domain/0/backend/vif/5/0/state. xenbus_probe (otherend_changed:301) state is 6, /local/domain/0/backend/vbd/5/770/state, /local/domain/0/backend/vbd/5/770/state. xenbus_probe (resume_dev:806) . xenbus_probe (backend_changed:764) . xenbus_probe (frontend_changed:756) . xenbus_probe (resume_dev:806) . xenbus_probe (otherend_changed:301) state is 4, /local/domain/0/backend/vbd/6/769/state, /local/domain/0/backend/vbd/6/769/state. xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (otherend_changed:301) state is 4, /local/domain/0/backend/vbd/6/769/state, /local/domain/0/backend/vbd/6/769/state. xenbus_probe (otherend_changed:301) state is 4, /local/domain/0/backend/vbd/6/770/state, /local/domain/0/backend/vbd/6/770/state. xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (otherend_changed:301) state is 4, /local/domain/0/backend/vbd/6/770/state, /local/domain/0/backend/vbd/6/770/state. xenbus_probe (otherend_changed:301) state is 4, /local/domain/0/backend/vif/6/0/state, /local/domain/0/backend/vif/6/0/state. xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (frontend_changed:756) . xenbus_probe (otherend_changed:301) state is 4, /local/domain/0/backend/vif/6/0/state, /local/domain/0/backend/vif/6/0/state.> The intention with xenbus_read_driver_state returning Closed was that this > was the correct way of forcing the driver to close down if the path goes > away, as in normal use the backend path should not just disappear, and for > resumption we have a way to detect that. Perhaps one or other of these > things should change, but it''s not clear to me which one it is, or if > indeed this is the problem at all. > > > What I am seeing is that after a suspend / resume the interface ''eth0'' is > > completely gone. ''ifconfig -a'' shows everything, but no eth0. > > > > You might only want to unregister if the domain was not suspended. So you > > probably need to implement the .suspend function in the frontend and set > > a state variable to know whether the domain is being hibernated, and you > > clear that variable in the .resume. You check that variable when the > > driver is going into the ''Closed'' state and only unregister if not in > > ''suspend'' mode. > > If this is necessary, and it''s not clear to me that it is, then this is a > facility that Xenbus should provide in general, rather than each driver > having to hack around the problem itself.What about a XenbusStateSuspended?> > Returning to Murillo''s patch, I assumed that the unregister_netdev in > close_netdev would implicitly call device_unregister, and that this was the > correct way to close down the device. Is this not the case? >It is not happening. All references to device were cleared ? I''m not sure if it is needed in this case.> There is the different issue that Xend does not check for the existence or > state of a device before hotplugging a new one. This means that the > frontend might not have time to see the Closing before having a chance to > close down, for example. This is a problem with Xend that needs to be > fixed there. Xend should refuse to hotplug a device if the frontend for > the old one has not yet closed down. This is not to say that Murillo''s > patch is wrong, but simply to say that I expect wider issues than can be > fixed by this patch alone. >-- Murillo Fernandes Bernardes IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adam Heath
2005-Nov-21 21:30 UTC
Re: [Xen-devel] [PATCH] Fix device removal on net and block frontend drivers
On Mon, 21 Nov 2005, Ewan Mellor wrote:> On Mon, Nov 21, 2005 at 11:48:10AM -0600, Adam Heath wrote: > > > On Mon, 21 Nov 2005, Murillo Fernandes Bernardes wrote: > > > > > > > > Frontend devices are not being unregistered when in closed state. The > > > following patch fix that. > > > > > > Fix bug #420. > > > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and > > > "09_attach_and_dettach_device_check_data_pos" tests pass. > > > > > > > > > Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com> > > > > Hmm. I have a way to make dom0 in unstable kernel-oops. If I attempt to > > setup a virtual block device to a /dev/nbN(enbd), but never actually configure > > the /dev/nbN, I get a kernel oops in dom0 when the domU shutdowns down. I can > > then no longer reboot or shutdown the dom0. > > > > Would this be related? > > This doesn''t seem very related, no. What does your oops look like?>From the config file:=disk = [ ''phy:/dev/space/xen-0-16-swap-0,hda,w'', ''phy:/dev/space/xen-0-16-tmp,hdb,w'', ''phy:/dev/nda,hdc,w'', ''phy:/dev/ndb,hdd,w'' ] = /dev/nda and /dev/ndb have not been configured yet.>From the domU:=[61776.756910] Registering block device major 3 [61776.756999] hda: unknown partition table [61776.782867] hdb: unknown partition table [61776.805007] Registering block device major 22 [61776.805096] hdc:end_request: I/O error, dev hdc, sector 0 [61776.805225] Buffer I/O error on device hdc, logical block 0 [61776.805372] end_request: I/O error, dev hdc, sector 0 [61776.805379] Buffer I/O error on device hdc, logical block 0 [61776.805392] unable to read partition table = And from dom0, the oops(plus leading lines from syslog): Nov 21 14:21:38 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/768 Nov 21 14:21:39 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/768/physical-device 0xfe01 backend/vbd/1/768/node /dev/space/xen-0-16-swap-0 to xenstore. Nov 21 14:21:39 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/768/hotplug-status connected to xenstore. Nov 21 14:21:39 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/832 Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/832/physical-device 0xfe02 backend/vbd/1/832/node /dev/space/xen-0-16-tmp to xenstore. Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/832/hotplug-status connected to xenstore. Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/5632 Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5632/physical-device 0x2b00 backend/vbd/1/5632/node /dev/nda to xenstore. Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5632/hotplug-status connected to xenstore. Nov 21 14:21:41 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/5696 Nov 21 14:21:41 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5696/physical-device 0x2b10 backend/vbd/1/5696/node /dev/ndb to xenstore. Nov 21 14:21:41 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5696/hotplug-status connected to xenstore. Nov 21 14:21:42 xen-3 logger: /etc/xen/scripts/vif-route: online XENBUS_PATH=backend/vif/1/0 Nov 21 14:21:43 xen-3 kernel: [61773.038490] ip_tables: (C) 2000-2002 Netfilter core team Nov 21 14:21:43 xen-3 logger: /etc/xen/scripts/vif-route: Writing backend/vif/1/0/hotplug-status connected to xenstore. Nov 21 14:21:46 xen-3 kernel: [61776.805155] nbd0: Request when not-ready Nov 21 14:21:46 xen-3 kernel: [61776.805187] end_request: I/O error, dev nbd0, sector 0 Nov 21 14:21:46 xen-3 kernel: [61776.805323] nbd0: Request when not-ready Nov 21 14:21:46 xen-3 kernel: [61776.805343] end_request: I/O error, dev nbd0, sector 0 Nov 21 14:21:46 xen-3 kernel: [61776.820855] general protection fault: 0000 [#1] Nov 21 14:21:46 xen-3 kernel: [61776.820879] SMP Nov 21 14:21:46 xen-3 kernel: [61776.820898] Modules linked in: ipt_physdev iptable_filter ip_tables i2c_i801 i2c_core dm_mod nbd sd_mod ata_ piix libata scsi_mod Nov 21 14:21:46 xen-3 kernel: [61776.820979] CPU: 0 Nov 21 14:21:46 xen-3 kernel: [61776.820980] EIP: 0061:[<c0160b60>] Not tainted VLI Nov 21 14:21:46 xen-3 kernel: [61776.820982] EFLAGS: 00010282 (2.6.12.6-xen) Nov 21 14:21:46 xen-3 kernel: [61776.821039] EIP is at blkdev_put+0x9/0x13c Nov 21 14:21:46 xen-3 kernel: [61776.821059] eax: fffffffa ebx: fffffffa ecx: 00000000 edx: 00000106 Nov 21 14:21:46 xen-3 kernel: [61776.821083] esi: c59c46a0 edi: c005a800 ebp: c59c4658 esp: c0427f40 Nov 21 14:21:46 xen-3 kernel: [61776.821107] ds: 007b es: 007b ss: 0069 Nov 21 14:21:46 xen-3 kernel: [61776.821126] Process events/0 (pid: 4, threadinfo=c0426000 task=c0057a20) Nov 21 14:21:46 xen-3 kernel: [61776.821137] Stack: 00000000 c59c467c c59c46a0 c005a800 c59c4658 c025e0bb c59c4658 c025df6c Nov 21 14:21:46 xen-3 kernel: [61776.821207] 00000000 c012c343 00000000 00000002 c1114c60 000de3dd c005a80c c005a814 Nov 21 14:21:46 xen-3 kernel: [61776.821272] c0426000 c59c469c c025df4a 00000001 00000000 c1114c60 00010000 00000000 Nov 21 14:21:46 xen-3 kernel: [61776.821337] Call Trace: Nov 21 14:21:46 xen-3 kernel: [61776.821366] [<c025e0bb>] vbd_free+0xf/0x18 Nov 21 14:21:46 xen-3 kernel: [61776.821394] [<c025df6c>] free_blkif+0x22/0x4c Nov 21 14:21:46 xen-3 kernel: [61776.821421] [<c012c343>] worker_thread+0x175/0x242 Nov 21 14:21:46 xen-3 kernel: [61776.821450] [<c025df4a>] free_blkif+0x0/0x4c Nov 21 14:21:46 xen-3 kernel: [61776.822757] [<c0118fd3>] default_wake_function+0x0/0xc Nov 21 14:21:46 xen-3 kernel: [61776.822786] [<c012c1ce>] worker_thread+0x0/0x242 Nov 21 14:21:46 xen-3 kernel: [61776.822814] [<c012ff99>] kthread+0x93/0x97 Nov 21 14:21:46 xen-3 kernel: [61776.822840] [<c012ff06>] kthread+0x0/0x97 Nov 21 14:21:46 xen-3 kernel: [61776.822867] [<c01070b5>] kernel_thread_helper+0x5/0xb Nov 21 14:21:46 xen-3 kernel: [61776.822894] Code: 89 d8 5b 5e 5f c3 89 f2 89 f8 e8 b6 fa ff ff 89 c3 85 c0 74 e9 89 f8 e8 06 00 00 00 89 d8 5b 5e 5f c3 55 57 56 53 83 ec 04 89 c3 <8b> 70 04 8b 78 58 8d 40 0c 89 04 24 f0 ff 4b 0c 0f 88 3d 03 00 = Doing an objdump of fs/block_dev.c(where blkdev_put exists), I get this: =00000ca7 <blkdev_put>: ca7: 55 push %ebp ca8: 57 push %edi ca9: 56 push %esi caa: 53 push %ebx cab: 83 ec 04 sub $0x4,%esp cae: 89 c3 mov %eax,%ebx cb0: 8b 70 04 mov 0x4(%eax),%esi cb3: 8b 78 58 mov 0x58(%eax),%edi cb6: 8d 40 0c lea 0xc(%eax),%eax cb9: 89 04 24 mov %eax,(%esp) cbc: f0 ff 4b 0c lock decl 0xc(%ebx) cc0: 0f 88 3d 03 00 00 js 1003 <.text.lock.block_dev+0x7e> cc6: e8 fc ff ff ff call cc7 <lock_kernel+0xcc7> ccb: 8b 43 08 mov 0x8(%ebx),%eax = So, address cb0 is at fault. The snippet from block_dev.c has this: =int blkdev_put(struct block_device *bdev) { int ret = 0; struct inode *bd_inode = bdev->bd_inode; struct gendisk *disk = bdev->bd_disk; = bdev->bd_inode is at offset 4. Combined with eax being fffffffa, we get either a wrap-around, or overflow, on the addressing. This all points, however, to vbd->bdev not being initialized properly. In fact, with eax being what it is(-5 is signed int land), makes me believe some error condition isn''t being checked. However, after I read the log again, it looks like an error occurs once, something is freed, and error occurs again, then the oops occurs. However, that may just be coincidence. Anyways, that should be enough info for someone a bit more knowledgable to debug this. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel