Roger Pau Monne
2012-Jan-14 13:41 UTC
[PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove
# HG changeset patch # User Roger Pau Monne <roger.pau@entel.upc.edu> # Date 1326548456 -3600 # Node ID 1b95948228b84c986764488cbb7683d0d109be90 # Parent 5b2676ac13218951698c49fa0350f2ac48220f3d libxl: Atomicaly check backend state and set it to 5 at device_remove libxl__device_remove was setting backend state to 5, which could create a race condition, since the previous check for state != 4 and setting state to 5 was not done inside the same transaction, so the kernel could change the state to 6 in the space between the check for state != 4 and setting it to 5. The state != 4 check and setting it to 5 should happen in the same transaction, to assure that nobody is modifying it behind our back. Changes since v1: * Do the check and set in the same transaction, instead of removing the set state to 5. Signed-off-by: Roger Pau Monne <roger.pau@entel.upc.edu> diff -r 5b2676ac1321 -r 1b95948228b8 tools/libxl/libxl_device.c --- a/tools/libxl/libxl_device.c Mon Jan 09 16:01:44 2012 +0100 +++ b/tools/libxl/libxl_device.c Sat Jan 14 14:40:56 2012 +0100 @@ -442,19 +442,23 @@ int libxl__device_remove(libxl__gc *gc, xs_transaction_t t; char *be_path = libxl__device_backend_path(gc, dev); char *state_path = libxl__sprintf(gc, "%s/state", be_path); - char *state = libxl__xs_read(gc, XBT_NULL, state_path); + char *state; int rc = 0; - if (!state) +retry_transaction: + t = xs_transaction_start(ctx->xsh); + state = libxl__xs_read(gc, t, state_path); + if (!state) { + xs_transaction_end(ctx->xsh, t, 0); goto out; + } if (atoi(state) != 4) { + xs_transaction_end(ctx->xsh, t, 0); libxl__device_destroy_tapdisk(gc, be_path); xs_rm(ctx->xsh, XBT_NULL, be_path); goto out; } -retry_transaction: - t = xs_transaction_start(ctx->xsh); xs_write(ctx->xsh, t, libxl__sprintf(gc, "%s/online", be_path), "0", strlen("0")); xs_write(ctx->xsh, t, state_path, "5", strlen("5")); if (!xs_transaction_end(ctx->xsh, t, 0)) {
Ian Jackson
2012-Feb-20 16:20 UTC
Re: [PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove
Roger Pau Monne writes ("[Xen-devel] [PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove"):> libxl: Atomicaly check backend state and set it to 5 at device_remove > > libxl__device_remove was setting backend state to 5, which could > create a race condition, since the previous check for state != 4 and > setting state to 5 was not done inside the same transaction, so the > kernel could change the state to 6 in the space between the check for > state != 4 and setting it to 5. > > The state != 4 check and setting it to 5 should happen in the same > transaction, to assure that nobody is modifying it behind our back.This looks like a correct fix to me. However, the code in libxl has changed now: libxl__device_remove is now libxl__initiate_device_remove. Also can you please arrange to use "goto out" style error handling ? Eg something like: + xs_transaction_t t = 0; ... out: + if (t) xs_transaction_end(ctx->xsh, t, 0); device_remove_cleanup(gc, aorm); TBH the meat of this function is in general rather unreconstructed. Lots of unchecked xs_write etc. I don''t know if you want to fix that while you''re there. I wimped out on doing that in my recent event series. Up to you. Really we should have some kind of more cooked xenstore transaction arrangements, probably in the gc, and certainly with a suitable loop macro so we can say: IN_XENSTORE_TRANSACTION { read, write, etc. } But that''s a task for another day and it''s not on the critical path for 4.2 so I''m putting it off. Thanks, Ian.
Roger Pau Monné
2012-Feb-21 09:46 UTC
Re: [PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove
2012/2/20 Ian Jackson <Ian.Jackson@eu.citrix.com>:> Roger Pau Monne writes ("[Xen-devel] [PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove"): >> libxl: Atomicaly check backend state and set it to 5 at device_remove >> >> libxl__device_remove was setting backend state to 5, which could >> create a race condition, since the previous check for state != 4 and >> setting state to 5 was not done inside the same transaction, so the >> kernel could change the state to 6 in the space between the check for >> state != 4 and setting it to 5. >> >> The state != 4 check and setting it to 5 should happen in the same >> transaction, to assure that nobody is modifying it behind our back. > > This looks like a correct fix to me. However, the code in libxl has > changed now: libxl__device_remove is now > libxl__initiate_device_remove. > > Also can you please arrange to use "goto out" style error handling ? > > Eg something like: > > + xs_transaction_t t = 0; > ... > out: > + if (t) xs_transaction_end(ctx->xsh, t, 0); > device_remove_cleanup(gc, aorm);I will commit an updated version of this patch with my hotplug series, that matches the current state of libxl__initiate_device_remove.> > TBH the meat of this function is in general rather unreconstructed. > Lots of unchecked xs_write etc. I don't know if you want to fix that > while you're there. I wimped out on doing that in my recent event > series. Up to you.Will decide that later, depending on the time frame.> Really we should have some kind of more cooked xenstore transaction > arrangements, probably in the gc, and certainly with a suitable loop > macro so we can say: > IN_XENSTORE_TRANSACTION { > read, write, etc. > } > But that's a task for another day and it's not on the critical path > for 4.2 so I'm putting it off.Also I've found myself using a lot a construction like the following: - check xenstore path - If path value is different than x - Set up a watch - Wait for events. And I'm afraid it's not concurrent safe, since someone can perform a bunch of xenstore changes in the space between setting the watch, and wait for events. The watcher will only get notified of the last xenstore change, but not all changes that happened between.> Thanks, > Ian._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xensource.com/xen-devel
Ian Jackson
2012-Feb-21 12:28 UTC
Re: [PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove
Roger Pau Monné writes ("Re: [Xen-devel] [PATCH v2] libxl: Atomicaly check backend state and set it to 5 at device_remove"):> Also I''ve found myself using a lot a construction like the following: > > - check xenstore path > - If path value is different than x > - Set up a watch > - Wait for events. > > And I''m afraid it''s not concurrent safe, since someone can perform a > bunch of xenstore changes in the space between setting the watch, and > wait for events. The watcher will only get notified of the last > xenstore change, but not all changes that happened between.xenstore is not a message-passing protocol in that way. There is no way to be sure that you see all the intermediate values of a particular xenstore node. The required approach is for the protocol not to care about intermediate values, or for the "sender" to coordinate with the "receiver" somehow. If all you mean is that you worry about the path changing between the first read and the setting up of the watch, then you''re fine: every watch is guaranteed to fire once right after you set it up, so your watch handler function will definitely read the path again and you won''t sit waiting for an event that never comes. Ian.