thr3ads.net - Xen devel - [Xen-devel] XenStore Watch Behavior [Aug 2006]

If this information is useful, please help other people find it:
Share via:

John McCullough

2006-Aug-26 20:32 UTC

[Xen-devel] XenStore Watch Behavior

Hello,
    I have noticed some issues with watches on XenStore.  Mainly that
multiple watches on the same node in a hierachy or a watch on a node and
a child of that node do not fire as one might expect.
    I have been working on hvm domain forking and I am using the
XenStore to communicate between xend and the qemu-dm.  The first issue
that I noticed was that you cannot use a single node to communicate
state.  Only one of the watches on the node would fire and no
communication could occur.
    Using two nodes for bidirectional communication worked fine in
normal operation, however, I discovered that during shutdown some other
watch existed on the domain''s path in the store and it blocked the
watches on the xend side.  Initially I was using a combination of
xswatch with a Semaphore to perform blocking reads and the xswatch
function was never getting triggered.  I changed to using the interface
more directly via xs.watch and xs.read_watch.  I could block and read
data, but after my own function terminated the xswatch interface would
try to execute my token as an xswatch token.  Adding a no-op .fn and
empty .args and .kwargs to my token let this pass through.
Unfortunately in general operations before guest destruction the changes
that I wanted to be caught by xs.read_watch were being consumed by an
unrelated xs.watch.
    What is the intended behavior of watches on the XenStore?  Should
only one watch be allowed on a given sub-hierarchy?  Should the most
specific watch be triggered alone?  Should all watches be triggered?

Regards,
John McCullough


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-27 14:57 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On 26/8/06 9:32 pm, "John McCullough" <jmccullo@cs.ucsd.edu>
wrote:
>     What is the intended behavior of watches on the XenStore?  Should
> only one watch be allowed on a given sub-hierarchy?  Should the most
> specific watch be triggered alone?  Should all watches be triggered?
I believe it''s all supposed to work in a very obvious and simple way:
All
watches registered on a prefix of the updated node''s path should be
fired. A
single transaction can fire the same watch multiple times if that watch is
on a common prefix of a number of nodes updated by that transaction (since
each firing event specifies the full path of the modified node, so events
can''t really be merged).

If you observe different behaviour from this then it is most likely a bug
and we would love to receive patches!

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McCullough

2006-Aug-29 00:48 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser
wrote:> On 26/8/06 9:32 pm, "John McCullough"
<jmccullo@cs.ucsd.edu> wrote:
> 
> >     What is the intended behavior of watches on the XenStore?  Should
> > only one watch be allowed on a given sub-hierarchy?  Should the most
> > specific watch be triggered alone?  Should all watches be triggered?
> 
> I believe it''s all supposed to work in a very obvious and simple
way: All
> watches registered on a prefix of the updated node''s path should
be fired. A
> single transaction can fire the same watch multiple times if that watch is
> on a common prefix of a number of nodes updated by that transaction (since
> each firing event specifies the full path of the modified node, so events
> can''t really be merged).
> 
> If you observe different behaviour from this then it is most likely a bug
> and we would love to receive patches!
>
I am attaching a band-aid style patch for xswatch.  I haven''t dug very
far into the xenstore code yet, and I''m not sure how much time I have
to
dedicate on this quite yet.

What this patch addresses is xswatch''s tendency to receive watches for
non-xswatch created watches with those tokens.  Is the indended behavior
of read_watch to pick up on all available watches and leave you to
discriminate which to service based on token?

Something that has recently perplexed me, is when using the watch during
the save/restore process, my handler won''t receive watches where the
value written in the store has an underscore.  In the shutdown
situation, the underscore value is passed.  I am at a loss to guess why
this is happening.

-John

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McCullough

2006-Aug-29 00:52 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough
wrote:> Something that has recently perplexed me, is when using the watch during
> the save/restore process, my handler won''t receive watches where
the
> value written in the store has an underscore.  In the shutdown
> situation, the underscore value is passed.  I am at a loss to guess why
> this is happening.
> 
This may be imagined.

-John

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McCullough

2006-Aug-29 02:22 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough
wrote:> On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
> > On 26/8/06 9:32 pm, "John McCullough"
<jmccullo@cs.ucsd.edu> wrote:
> > 
> > >     What is the intended behavior of watches on the XenStore? 
Should
> > > only one watch be allowed on a given sub-hierarchy?  Should the
most
> > > specific watch be triggered alone?  Should all watches be
triggered?
> > 
> > I believe it''s all supposed to work in a very obvious and
simple way: All
> > watches registered on a prefix of the updated node''s path
should be fired. A
> > single transaction can fire the same watch multiple times if that
watch is
> > on a common prefix of a number of nodes updated by that transaction
(since
> > each firing event specifies the full path of the modified node, so
events
> > can''t really be merged).
> > 
> > If you observe different behaviour from this then it is most likely a
bug
> > and we would love to receive patches!
> >
> 
> I am attaching a band-aid style patch for xswatch.  I haven''t dug
very
> far into the xenstore code yet, and I''m not sure how much time I
have to
> dedicate on this quite yet.
> 
> What this patch addresses is xswatch''s tendency to receive watches
for
> non-xswatch created watches with those tokens.  Is the indended behavior
> of read_watch to pick up on all available watches and leave you to
> discriminate which to service based on token?
>
Recently I discovered that my watch and the xswatch were receiving
alternating watches (both in python).  Looking at xs_read_watch in 
tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
calls to take turns.  Given that the python interface shares a single
xshandle, this prevents multiple watches.

Creating an entirely new xshandle for each use of read_watch works.
Moving to a model where the xsutil.xshandle() call creates a new
xshandle seems easily supportable, given that xswatch is primarily used,
and it keeps a reference to it''s own handle.

Does anyone know of other xshandle() uses that warrant the current
behavior?

Regards,
John

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-29 06:27 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On 29/8/06 3:22 am, "John McCullough" <jmccullo@cs.ucsd.edu>
wrote:
> Recently I discovered that my watch and the xswatch were receiving
> alternating watches (both in python).  Looking at xs_read_watch in
> tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
> calls to take turns.  Given that the python interface shares a single
> xshandle, this prevents multiple watches.
> 
> Creating an entirely new xshandle for each use of read_watch works.
> Moving to a model where the xsutil.xshandle() call creates a new
> xshandle seems easily supportable, given that xswatch is primarily used,
> and it keeps a reference to it''s own handle.
> 
> Does anyone know of other xshandle() uses that warrant the current
> behavior?
The current behaviour is broken (or, at least, the semantics really make no
sense at all) if multiple people create ''xs'' objects in the
same python
program. A good fix would be to move the handle allocation from
xshandle_init to xshandle_new. The latter function will have to create a new
container object to hold the handle value, rather than returning self.
Watches will then be registered and read in the isolated context of a
particular caller''s object handle, rather than a bogus shared global
context
of all users of the xs library.

This fix should then get things working for your code if you create yourself
an xs object separate from xswatch''s. It only raises the question how
you
then implement a central select loop in your python program that waits on
the various file handles or sockets created by the various xs objects.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ewan Mellor

2006-Aug-29 09:15 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On Mon, Aug 28, 2006 at 07:22:52PM -0700, John McCullough wrote:
> On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough wrote:
> > On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
> > > On 26/8/06 9:32 pm, "John McCullough"
<jmccullo@cs.ucsd.edu> wrote:
> > > 
> > > >     What is the intended behavior of watches on the
XenStore?  Should
> > > > only one watch be allowed on a given sub-hierarchy?  Should
the most
> > > > specific watch be triggered alone?  Should all watches be
triggered?
> > > 
> > > I believe it''s all supposed to work in a very obvious
and simple way: All
> > > watches registered on a prefix of the updated node''s
path should be fired. A
> > > single transaction can fire the same watch multiple times if that
watch is
> > > on a common prefix of a number of nodes updated by that
transaction (since
> > > each firing event specifies the full path of the modified node,
so events
> > > can''t really be merged).
> > > 
> > > If you observe different behaviour from this then it is most
likely a bug
> > > and we would love to receive patches!
> > >
> > 
> > I am attaching a band-aid style patch for xswatch.  I haven''t
dug very
> > far into the xenstore code yet, and I''m not sure how much
time I have to
> > dedicate on this quite yet.
> > 
> > What this patch addresses is xswatch''s tendency to receive
watches for
> > non-xswatch created watches with those tokens.  Is the indended
behavior
> > of read_watch to pick up on all available watches and leave you to
> > discriminate which to service based on token?
> >
> 
> Recently I discovered that my watch and the xswatch were receiving
> alternating watches (both in python).  Looking at xs_read_watch in 
> tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
> calls to take turns.  Given that the python interface shares a single
> xshandle, this prevents multiple watches.
> 
> Creating an entirely new xshandle for each use of read_watch works.
> Moving to a model where the xsutil.xshandle() call creates a new
> xshandle seems easily supportable, given that xswatch is primarily used,
> and it keeps a reference to it''s own handle.
I''m confused as to what you''re trying to do, so perhaps you
could start again
at the top.

xswatch starts a thread, and that thread handles all calls to xs.read_watch,
and dispatches appropriate callbacks when the watch fires.  I expect that you
would simply create a new instance of xswatch, and then everything else would
be handled for you.  What''s giving you problems?

Ewan.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McCullough

2006-Aug-29 19:12 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

Ewan Mellor wrote:> On Mon, Aug 28, 2006 at 07:22:52PM -0700, John McCullough wrote:
> 
>> On Mon, Aug 28, 2006 at 05:48:31PM -0700, John McCullough wrote:
>>> On Sun, Aug 27, 2006 at 03:57:06PM +0100, Keir Fraser wrote:
>>>> On 26/8/06 9:32 pm, "John McCullough"
<jmccullo@cs.ucsd.edu> wrote:
>>>>
>>>>>     What is the intended behavior of watches on the
XenStore?  Should
>>>>> only one watch be allowed on a given sub-hierarchy?  Should
the most
>>>>> specific watch be triggered alone?  Should all watches be
triggered?
>>>> I believe it''s all supposed to work in a very obvious
and simple way: All
>>>> watches registered on a prefix of the updated node''s
path should be fired. A
>>>> single transaction can fire the same watch multiple times if
that watch is
>>>> on a common prefix of a number of nodes updated by that
transaction (since
>>>> each firing event specifies the full path of the modified node,
so events
>>>> can''t really be merged).
>>>>
>>>> If you observe different behaviour from this then it is most
likely a bug
>>>> and we would love to receive patches!
>>>>
>>> I am attaching a band-aid style patch for xswatch.  I
haven''t dug very
>>> far into the xenstore code yet, and I''m not sure how much
time I have to
>>> dedicate on this quite yet.
>>>
>>> What this patch addresses is xswatch''s tendency to receive
watches for
>>> non-xswatch created watches with those tokens.  Is the indended
behavior
>>> of read_watch to pick up on all available watches and leave you to
>>> discriminate which to service based on token?
>>>
>> Recently I discovered that my watch and the xswatch were receiving
>> alternating watches (both in python).  Looking at xs_read_watch in 
>> tools/xenstore/xs.c, the mutex on the xshandle forces all xs_read_watch
>> calls to take turns.  Given that the python interface shares a single
>> xshandle, this prevents multiple watches.
>>
>> Creating an entirely new xshandle for each use of read_watch works.
>> Moving to a model where the xsutil.xshandle() call creates a new
>> xshandle seems easily supportable, given that xswatch is primarily
used,
>> and it keeps a reference to it''s own handle.
> 
> I''m confused as to what you''re trying to do, so perhaps
you could start again
> at the top.
> 
> xswatch starts a thread, and that thread handles all calls to
xs.read_watch,
> and dispatches appropriate callbacks when the watch fires.  I expect that
you
> would simply create a new instance of xswatch, and then everything else
would
> be handled for you.  What''s giving you problems?
>From the top:
I am working on forking hvm domains.  Part of this involves
communicating with the qemu-dm via the xenstore, because it is the most
readily available channel more complicated than the process signals used
for shutdown and save/restore (via Edwin Zhai''s patch).

After getting an initial prototype working for the forking, I decided I
would try to create a general purpose communications channel that could
be used to communicate with qemu-dm.  The general use case is sending a
command ("shutdown") and waiting for a completion notification
("shutdown_done").  I am currently using a pair of nodes, one for each
communication direction.  I had initial difficulty in getting watches to
trigger, but I am not trying to solve that right now.

I initially used xswatch in conjunction with a semaphore so that I could
set a watch and block on the semaphore until the watch had triggered.
This worked in the general case.  I decided that I would try to replace
the current domain destruction signal with the "shutdown" command over
the channel. I found that during the destruction sequence, my xswatch
watch was never getting triggered and the semaphore would never get
incremented and waiting for the completion notification would block
indefinitely.

At this point I started looking at xswatch and I thought, unaware of the
xshandle behavior, that I could just use xs.read_watch and achieve
blocking without the use of a semaphore.  So I followed that path and
arrived at the problem with a single xshandle and multiple read_watch
behavior.

Keir Fraser wrote:> The current behaviour is broken (or, at least, the semantics really make no
> sense at all) if multiple people create ''xs'' objects in
the same python
> program. A good fix would be to move the handle allocation from
> xshandle_init to xshandle_new. The latter function will have to create a
new
> container object to hold the handle value, rather than returning self.
> Watches will then be registered and read in the isolated context of a
> particular caller''s object handle, rather than a bogus shared
global context
> of all users of the xs library.
> 
> This fix should then get things working for your code if you create
yourself
> an xs object separate from xswatch''s. It only raises the question
how you
> then implement a central select loop in your python program that waits on
> the various file handles or sockets created by the various xs objects.

When I began I had to try to extract the semantics from the code.  I
wrote the API section in
http://wiki.xensource.com/xenwiki/XenStoreReference which needs to be
fixed and better explained.  Once we establish what the correct usage
pattern is I will try to reproduce it on the wiki page.

If I use an independently created xshandle in my blocking communication
channel code, it works in all cases.  If I use the xswatch method, it is
failing in the destruction case.

If the usage model that is desired is to use a single xshandle in a
given process, then we should change the semantics and/or document the
relevant functions.  Also, I would like to find out why my watch is not
executing in the destruction case.

A distilled version of the debugging log that I have is:
 (XendDomainInfo:1424) XendDomainInfo.destroyDomain(6)
 (xswatch:65) xswatch triggered on @releaseDomain
 (image:397) hvm shutdown watch unregistered
 (xsblockingchannel:79) waitFor executes and blocks

I haven''t been able to get xswatch to trigger on any further writes to
my node in the xenstore via xenstore-write.  My only guess is that
during the domain destruction that all watches within a domain''s path
are unwatched.  The surface-level solution that I can think of is to
move the qemu-dm/image destruction earlier in the domain destruction
process.  Are there other solutions?

Regards,
John McCullough

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ewan Mellor

2006-Aug-29 19:42 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

On Tue, Aug 29, 2006 at 12:12:11PM -0700, John McCullough wrote:
> When I began I had to try to extract the semantics from the code.
Yes, that''s quite a common thing to need to do at the moment!  Thanks
for all
your efforts in documentation -- it''s appreciated.
> I wrote the API section in
> http://wiki.xensource.com/xenwiki/XenStoreReference which needs to be fixed
> and better explained.  Once we establish what the correct usage pattern is
I
> will try to reproduce it on the wiki page.
> 
> If I use an independently created xshandle in my blocking communication
> channel code, it works in all cases.  If I use the xswatch method, it is
> failing in the destruction case.
> 
> If the usage model that is desired is to use a single xshandle in a
> given process, then we should change the semantics and/or document the
> relevant functions.  Also, I would like to find out why my watch is not
> executing in the destruction case.
> 
> A distilled version of the debugging log that I have is:
>  (XendDomainInfo:1424) XendDomainInfo.destroyDomain(6)
>  (xswatch:65) xswatch triggered on @releaseDomain
>  (image:397) hvm shutdown watch unregistered
>  (xsblockingchannel:79) waitFor executes and blocks
Can I see the code?  This doesn''t mean an awful lot without seeing what
you''ve
changed.
> I haven''t been able to get xswatch to trigger on any further
writes to
> my node in the xenstore via xenstore-write.  My only guess is that
> during the domain destruction that all watches within a domain''s
path
> are unwatched.
You will certainly lose a watch on anything in the domain''s path
eventually,
because Xend and the hotplug scripts will be cleaning up behind the domain.
You should get one final watch fired when the path disappears.
> The surface-level solution that I can think of is to
> move the qemu-dm/image destruction earlier in the domain destruction
> process.  Are there other solutions?
If you want to have data that outlive the domain (I presume in your case for
just a short while) then you should put them somewhere other than
/local/domain.  There is a /tool/<yournamehere> hierarchy reserved for
third-party tools, if that suits you better.  You would then have to handle
all the sweep-up yourself of course.

In your case, couldn''t you just release the semaphore off the
@releaseDomain
watch?  Don''t forget, domains can spontaneously self-destruct, maybe
even
half-way between your "shutdown" and "shutdown_done", so you
need to be able
to unconditionally abort and release locks.

Ewan.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McCullough

2006-Aug-29 23:35 UTC

head link

Re: [Xen-devel] XenStore Watch Behavior

Ewan Mellor wrote:> If you want to have data that outlive the domain (I presume in your case
for
> just a short while) then you should put them somewhere other than
> /local/domain.  There is a /tool/<yournamehere> hierarchy reserved
for
> third-party tools, if that suits you better.  You would then have to handle
> all the sweep-up yourself of course.
> 
> In your case, couldn''t you just release the semaphore off the
@releaseDomain
> watch?  Don''t forget, domains can spontaneously self-destruct,
maybe even
> half-way between your "shutdown" and "shutdown_done",
so you need to be able
> to unconditionally abort and release locks.
I am getting the same behavior with xswatch when watching on /tool/blah
as with the /local/domain/%u/blah.  The watch I added to @releaseDomain
is also not getting triggered.

Removing the wait for the shutdown_done allows it to come to completion.
 I think it may be the case that the initial @releaseDomain then
triggers the destroyDomain in XendDomain.py via refresh() which then
triggers the  destroy() in image.py.  Then, by blocking, we prevent the
original xswatch from coming to completion and block our own watch from
ever getting triggered.

My initial thought is to return to using a separately created xshandle
for my blocking channel.

If this is the case, how do we want to develop the semantics?

-John

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2006 - XenStore Watch Behavior

[Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior

Re: [Xen-devel] XenStore Watch Behavior