Christopher S. Aker
2006-Mar-28 23:34 UTC
[Xen-devel] vbd devices stuck in Initialising/InitWait
Sorry for the cross-post. I''m looking for advice on where next to debug my problem outlined here: http://lists.xensource.com/archives/html/xen-users/2006-03/msg00850.html Not all vbds are coming up in my domU. This happens randomly, so rebooting a time or two will bring them back, but it''s unpredictable. Normally it''s only the even numbered devices, but I have seen an odd one missing. This can happen regardless of if I specify the devices in a conf file, or start a domU paused and add them with block-attach. Xen-unstable, changeset Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0 For a missing device: dom0:~# xenstore-read /local/domain/0/backend/vbd/77/770/state 2 dom0:~# xenstore-read /local/domain/77/device/vbd/770/state 1 Normally: dom0:~# xenstore-read /local/domain/0/backend/vbd/77/769/state 4 dom0:~# xenstore-read /local/domain/77/device/vbd/769/state 4 According to the xenbusState values, the frontend is Initialising, and the backend is in InitWait. From my (very basic) understanding of the code in the frontend and backend drivers, it looks to me like the backend is waiting for the frontend driver to finish initialising and switch to XenbusStateInitialised, but that''s never happening... Any help you could offer debugging this issue would be appreciated. Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Mar-29 08:53 UTC
Re: [Xen-devel] vbd devices stuck in Initialising/InitWait
On Tue, Mar 28, 2006 at 05:34:25PM -0600, Christopher S. Aker wrote:> Sorry for the cross-post. I''m looking for advice on where next to debug > my problem outlined here: > > http://lists.xensource.com/archives/html/xen-users/2006-03/msg00850.html > > Not all vbds are coming up in my domU. This happens randomly, so > rebooting a time or two will bring them back, but it''s unpredictable. > Normally it''s only the even numbered devices, but I have seen an odd one > missing. > > This can happen regardless of if I specify the devices in a conf file, > or start a domU paused and add them with block-attach. > > Xen-unstable, changeset Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0 > > For a missing device: > dom0:~# xenstore-read /local/domain/0/backend/vbd/77/770/state > 2 > dom0:~# xenstore-read /local/domain/77/device/vbd/770/state > 1 > > Normally: > dom0:~# xenstore-read /local/domain/0/backend/vbd/77/769/state > 4 > dom0:~# xenstore-read /local/domain/77/device/vbd/769/state > 4 > > According to the xenbusState values, the frontend is Initialising, and > the backend is in InitWait. From my (very basic) understanding of the > code in the frontend and backend drivers, it looks to me like the > backend is waiting for the frontend driver to finish initialising and > switch to XenbusStateInitialised, but that''s never happening...That sounds to me like the guest kernel is crashing or deadlocking. Do you see anything on the guest console? Perhaps you could put #define DEBUG 1 at the top of blkfront/block.h and xenbus/xenbus_probe.c for your guest kernel, and try and figure out where it is getting stuck. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2006-Mar-29 23:15 UTC
Re: [Xen-devel] vbd devices stuck in Initialising/InitWait
Ewan Mellor wrote:> That sounds to me like the guest kernel is crashing or deadlocking. Do you > see anything on the guest console? Perhaps you could put #define DEBUG 1 at > the top of blkfront/block.h and xenbus/xenbus_probe.c for your guest kernel, > and try and figure out where it is getting stuck.The domains aren''t crashing since they will boot, provided the root device isn''t among the missing... I''ve added DEBUG a bunch of printk''s to talk_to_backend() and xenbus_switch_state(). The outputs are here: http://www.theshore.net/~caker/xen/InitWait/dmesg-working.txt http://www.theshore.net/~caker/xen/InitWait/dmesg-not_working.txt At the bottom of both of those files, I''ve pasted in just the debugging messages in order. There are two main differences: When devices are missing, talk_to_backend() is making duplicate calls for the same vbd to xenbus_switch_state(), and on the second call xenbus_switch_state avoids writing to xenstore an identical value (which it''s supposed to). Why the duplicate calls? Second difference: even though there were two calls to xenbus_switch_state) to set the state to 3, later on xenbus_probe only detects the state as 2. talk_to_backend - about to call xenbus_switch_state xenbus_switch_state() nodename=device/vbd/770 state=3 - entering xenbus_switch_state() nodename=device/vbd/770 state=3 - finished talk_to_backend - about to call xenbus_switch_state xenbus_switch_state() nodename=device/vbd/770 state=3 - entering xenbus_switch_state() nodename=device/vbd/770 state=3 - state == dev->state but then: xenbus_probe (otherend_changed:302) state is 2, /local/domain/0/backend/vbd/151/770/state, /local/domain/0/backend/vbd/151/770/state. So, is this a deadlock or locking issue like you suspected? Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2006-Mar-30 03:50 UTC
Re: [Xen-devel] vbd devices stuck in Initialising/InitWait
Christopher S. Aker wrote:> When devices are missing, talk_to_backend() is making duplicate calls > for the same vbd to xenbus_switch_state(), and on the second call > xenbus_switch_state avoids writing to xenstore an identical value (which > it''s supposed to). Why the duplicate calls?That got me thinking .. The duplicate calls are happening for some reason, but they''re never attempting to write out the state value again because of this code in xenbus_switch_state(): if (state == dev->state) { return 0; } The initial call to xenbus_switch_state() _did_ set the value in dev->state, it just never made it into xenstore even though the xenbus_printf call didn''t return an error. So, commenting out the code above makes everything work. 15 reboots and all the devices have show up every time. I don''t know why the first call to xenbus_switch_state() isn''t really writing out the value to xenstore. xenbus_printf() isn''t returning an error, but still the value fails to make it into the store. That''s the best of my understanding at the moment... -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Mar-30 08:53 UTC
Re: [Xen-devel] vbd devices stuck in Initialising/InitWait
On 30 Mar 2006, at 04:50, Christopher S. Aker wrote:> So, commenting out the code above makes everything work. 15 reboots > and all the devices have show up every time. > > I don''t know why the first call to xenbus_switch_state() isn''t really > writing out the value to xenstore. xenbus_printf() isn''t returning an > error, but still the value fails to make it into the store. That''s > the best of my understanding at the moment...There is a logging mode you can turn on in xenstored and get a dump of all requests and actions taken by the daemon. I think it dumps to stdout. That might help determine whether the requests get as far as the daemon and, if so, what it is doing with them. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Mar-30 09:43 UTC
Re: [Xen-devel] vbd devices stuck in Initialising/InitWait
On Thu, Mar 30, 2006 at 09:53:47AM +0100, Keir Fraser wrote:> > On 30 Mar 2006, at 04:50, Christopher S. Aker wrote: > > >So, commenting out the code above makes everything work. 15 reboots > >and all the devices have show up every time. > > > >I don''t know why the first call to xenbus_switch_state() isn''t really > >writing out the value to xenstore. xenbus_printf() isn''t returning an > >error, but still the value fails to make it into the store. That''s > >the best of my understanding at the moment... > > There is a logging mode you can turn on in xenstored and get a dump of > all requests and actions taken by the daemon. I think it dumps to > stdout. That might help determine whether the requests get as far as > the daemon and, if so, what it is doing with them.Actually, you need to export XENSTORED_TRACE=1 before starting Xend (putting that in your init script and rebooting is the easiest way) and then you get the logs in /var/log/xenstored-trace.log. You need to reboot, rather than merely restart the daemons. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Mar-30 16:49 UTC
Re: [Xen-devel] vbd devices stuck in Initialising/InitWait
On Wed, Mar 29, 2006 at 09:50:02PM -0600, Christopher S. Aker wrote:> Christopher S. Aker wrote: > >When devices are missing, talk_to_backend() is making duplicate calls > >for the same vbd to xenbus_switch_state(), and on the second call > >xenbus_switch_state avoids writing to xenstore an identical value (which > >it''s supposed to). Why the duplicate calls? > > That got me thinking .. The duplicate calls are happening for some > reason, but they''re never attempting to write out the state value again > because of this code in xenbus_switch_state(): > > if (state == dev->state) { > return 0; > } > > The initial call to xenbus_switch_state() _did_ set the value in > dev->state, it just never made it into xenstore even though the > xenbus_printf call didn''t return an error. > > So, commenting out the code above makes everything work. 15 reboots and > all the devices have show up every time. > > I don''t know why the first call to xenbus_switch_state() isn''t really > writing out the value to xenstore. xenbus_printf() isn''t returning an > error, but still the value fails to make it into the store. That''s the > best of my understanding at the moment...I''ve figured this one out. The duplicate calls are coming because the call to xenbus_switch_state is inside a transaction, which is then aborted and retried. This breaks, because of the test above -- the state has been cached locally, but it''s not actually made it to the store, because the transaction fails. xenbus_switch_state simply shouldn''t be being called in a transaction. I am testing a patch at the moment that should solve this problem for you. Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel