Ian Jackson
2010-Sep-10 16:54 UTC
[Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang
If a multithreaded caller creates a thread which calls xs_read_watch, before it has set any watches with xs_watch, the thread in xs_read_watch will enter read_message and sit reading the xenstored fd without the appropriate locks held. Other threads can then concurrently read the xenstored fd, which naturally does not work very well. Symptoms of this bug which I have been able to reproduce include failure of xend startup to finish, due to a deadlock; results could also include reading corrupted data from xenstore. In this patch we arrange for xs_read_watch to always rely on the reader thread created by xs_watch. If no watches have been set, then xs_read_watch will block until one has been. If the library is compiled non-threaded xs_read_watch unconditionally does the reading in the current thread. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> diff -r 098790dd9327 tools/xenstore/xs.c --- a/tools/xenstore/xs.c Thu Sep 09 17:59:33 2010 +0100 +++ b/tools/xenstore/xs.c Fri Sep 10 17:46:43 2010 +0100 @@ -79,6 +79,20 @@ struct xs_handle { /* One request at a time. */ pthread_mutex_t request_mutex; + + /* Lock discipline: + * Only holder of the request lock may write to h->fd. + * Only holder of the request lock may access read_thr_exists. + * If read_thr_exists==0, only holder of request lock may read h->fd; + * If read_thr_exists==1, only the read thread may read h->fd. + * Only holder of the reply lock may access reply_list. + * Only holder of the watch lock may access watch_list. + * Lock hierarchy: + * The order in which to acquire locks is + * request_mutex + * reply_mutex + * watch_mutex + */ }; #define mutex_lock(m) pthread_mutex_lock(m) @@ -662,21 +676,27 @@ char **xs_read_watch(struct xs_handle *h struct xs_stored_msg *msg; char **ret, *strings, c = 0; unsigned int num_strings, i; - int read_from_thread; - - read_from_thread = read_thread_exists(h); - - /* Read from comms channel ourselves if there is no reader thread. */ - if (!read_from_thread && (read_message(h) == -1)) - return NULL; mutex_lock(&h->watch_mutex); - /* Wait on the condition variable for a watch to fire. */ #ifdef USE_PTHREAD - while (list_empty(&h->watch_list) && read_from_thread && h->fd != -1) + /* Wait on the condition variable for a watch to fire. + * If the reader thread doesn''t exist yet, then that''s because + * we haven''t called xs_watch. Presumably the application + * will do so later; in the meantime we just block. + */ + while (list_empty(&h->watch_list) && h->fd != -1) condvar_wait(&h->watch_condvar, &h->watch_mutex); -#endif +#else /* !defined(USE_PTHREAD) */ + /* Read from comms channel ourselves if there are no threads + * and therefore no reader thread. */ + + assert(!read_thread_exists(h)); /* not threadsafe but worth a check */ + if ((read_message(h) == -1)) + return NULL; + +#endif /* !defined(USE_PTHREAD) */ + if (list_empty(&h->watch_list)) { mutex_unlock(&h->watch_mutex); errno = EINVAL; @@ -900,6 +920,10 @@ char *xs_debug_command(struct xs_handle static int read_message(struct xs_handle *h) { + /* IMPORTANT: It is forbidden to call this function without + * acquiring the request lock and checking that h->read_thr_exists + * is false. See "Lock discipline" in struct xs_handle, above. */ + struct xs_stored_msg *msg = NULL; char *body = NULL; int saved_errno = 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Sep-10 16:58 UTC
[Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang
I wrote:> If a multithreaded caller creates a thread which calls xs_read_watch, > before it has set any watches with xs_watch, the thread in > xs_read_watch will enter read_message and sit reading the xenstored fd > without the appropriate locks held. Other threads can then > concurrently read the xenstored fd, which naturally does not work very > well.While I was investigating this, I noticed that xs_fileno, which is used by numerous applications to be able to select() waiting for xenstore watches, always returns -1 and has done so since 7268:2144de6eabcc "Make libxenstore thread-safe" in October 2008. This is because nothing ever creates the pipe implied by watch_pipe. xs_fileno should do so (and care should be taken that locking is appropriate, which needs some thought). I won''t send a patch for this right away as I want to see whether my deadlock bugfix works, first. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Sep-10 17:36 UTC
Re: [Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang
Nice catch! On Fri, 2010-09-10 at 17:54 +0100, Ian Jackson wrote:> @@ -662,21 +676,27 @@ char **xs_read_watch(struct xs_handle *h > struct xs_stored_msg *msg; > char **ret, *strings, c = 0; > unsigned int num_strings, i; > - int read_from_thread; > - > - read_from_thread = read_thread_exists(h); > - > - /* Read from comms channel ourselves if there is no reader thread. */ > - if (!read_from_thread && (read_message(h) == -1)) > - return NULL; > > mutex_lock(&h->watch_mutex); > > - /* Wait on the condition variable for a watch to fire. */ > #ifdef USE_PTHREAD > - while (list_empty(&h->watch_list) && read_from_thread && h->fd != -1) > + /* Wait on the condition variable for a watch to fire. > + * If the reader thread doesn''t exist yet, then that''s because > + * we haven''t called xs_watch. Presumably the application > + * will do so later; in the meantime we just block. > + */ > + while (list_empty(&h->watch_list) && h->fd != -1) > condvar_wait(&h->watch_condvar, &h->watch_mutex); > -#endif > +#else /* !defined(USE_PTHREAD) */ > + /* Read from comms channel ourselves if there are no threads > + * and therefore no reader thread. */ > + > + assert(!read_thread_exists(h)); /* not threadsafe but worth a check */ > + if ((read_message(h) == -1)) > + return NULL; > + > +#endif /* !defined(USE_PTHREAD) */ > + > if (list_empty(&h->watch_list)) { > mutex_unlock(&h->watch_mutex); > errno = EINVAL;read_reply contains a very similar pattern to the above. I assume it is safe on that occasion because xs_talkv (the only caller of read_reply) holds request_mutex? If so then: Acked-by: Ian Campbell <ian.campbell@citrix.com>> @@ -900,6 +920,10 @@ char *xs_debug_command(struct xs_handle > > static int read_message(struct xs_handle *h) > { > + /* IMPORTANT: It is forbidden to call this function without > + * acquiring the request lock and checking that h->read_thr_exists > + * is false. See "Lock discipline" in struct xs_handle, above. */ > +Took me ages to figure realise this didn''t apply ifndef USE_PTHREAD, which is pretty obvious. Time to call it a day I think. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Sep-10 17:36 UTC
[Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang
Ian Jackson writes ("[Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang"):> While I was investigating this, I noticed that xs_fileno, which is > used by numerous applications to be able to select() waiting for > xenstore watches, always returns -1 and has done so since > 7268:2144de6eabcc "Make libxenstore thread-safe" in October 2008.I was wrong. Gianni has pointed me to the pipe() call which is cleverly hidden by the rather opaque coding style. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Sep-10 17:42 UTC
Re: [Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang
On Fri, 2010-09-10 at 17:58 +0100, Ian Jackson wrote:> I wrote: > > If a multithreaded caller creates a thread which calls xs_read_watch, > > before it has set any watches with xs_watch, the thread in > > xs_read_watch will enter read_message and sit reading the xenstored fd > > without the appropriate locks held. Other threads can then > > concurrently read the xenstored fd, which naturally does not work very > > well. > > While I was investigating this, I noticed that xs_fileno, which is > used by numerous applications to be able to select() waiting for > xenstore watches, always returns -1 and has done so since > 7268:2144de6eabcc "Make libxenstore thread-safe" in October 2008. > > This is because nothing ever creates the pipe implied by watch_pipe. > xs_fileno should do soDoesn''t the call to pipe(2) in: if ((h->watch_pipe[0] == -1) && (pipe(h->watch_pipe) != -1)) do it?> (and care should be taken that locking is > appropriate, which needs some thought).It seems to only be accessed under watch_mutex, apart from get_handle() xs_daemon_destroy_postfork() and xs_daemon_close() which I guess are safe. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Sep-10 18:02 UTC
Re: [Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang
Ian Campbell writes ("Re: [Xen-devel] [PATCH] libxenstore: fix threading bug which cause xend startup hang"):> Nice catch!...> read_reply contains a very similar pattern to the above. I assume it is > safe on that occasion because xs_talkv (the only caller of read_reply) > holds request_mutex?Yes, exactly. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel