These two patches have been posted before. The fd reentrancy hazard fix (1/1) has been improved and simplified. NB I have NOT executed these against recent xen-unstable tip and I haven''t tested 1/1 since I changed it. These should not be applied without testing. But I thought it better to post them than leave them lurking in my tree. 1/2 libxl: fix reentrancy hazard in fd event processing 2/2 libxl: do not blunder on if bootloader fails
Ian Jackson
2012-Jul-20 18:18 UTC
[PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
In afterpoll_internal, the callback functions may register and deregister events arbitrarily. This means that we need to consider the reentrancy-safety of the event machinery state variables. Most of the code is safe but the fd handling is not. Fix this by arranging to restart the fd scan loop every time we call one of these callback functions. For this loop to terminate, we modify afterpoll_check_fd so that it returns only once for each of afterpoll''s efds. Another possible solution would be simply to return from afterpoll_internal after calling efd->func. That would be a small and more obviously correct change but would prevent the process from handling more than one fd event with a single call to poll. This is apropos of a report from Roger Pau Monne to me (pers.comm.) of this crash on NetBSD: Program terminated with signal 11, Segmentation fault. #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) at libxl_event.c:856 856 if (fds[slot].fd != fd) Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Reported-by: Roger Pau Monne <roger.pau@citrix.com> - Changes since initial posting: * Abolish the rindices[] return parameter from afterpoll_check_fd. Instead have afterpoll_check_fd invalidate the slot itself, so that it always only returns nonzero once for each slot. * Better commit message. --- tools/libxl/libxl_event.c | 90 +++++++++++++++++++++++++++++++++++------- tools/libxl/libxl_internal.h | 2 +- 2 files changed, 76 insertions(+), 16 deletions(-) diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c index 1957505..8902314 100644 --- a/tools/libxl/libxl_event.c +++ b/tools/libxl/libxl_event.c @@ -839,18 +839,26 @@ int libxl_osevent_beforepoll(libxl_ctx *ctx, int *nfds_io, static int afterpoll_check_fd(libxl__poller *poller, const struct pollfd *fds, int nfds, int fd, int events) - /* returns mask of events which were requested and occurred */ + /* Returns mask of events which were requested and occurred. Will + * return nonzero only once for each (poller,fd,events) + * combination, until the next beforepoll. If events from + * different combinations overlap, between one such combination + * and all distinct combinations will produce nonzero returns. */ { if (fd >= poller->fd_rindices_allocd) /* added after we went into poll, have to try again */ return 0; + events |= POLLERR | POLLHUP; + int i, revents = 0; for (i=0; i<3; i++) { - int slot = poller->fd_rindices[fd][i]; + int *slotp = &poller->fd_rindices[fd][i]; + int slot = *slotp; if (slot >= nfds) - /* stale slot entry; again, added afterwards */ + /* stale slot entry (again, added afterwards), */ + /* or slot for which we have already returned nonzero */ continue; if (fds[slot].fd != fd) @@ -858,11 +866,16 @@ static int afterpoll_check_fd(libxl__poller *poller, continue; assert(!(fds[slot].revents & POLLNVAL)); - revents |= fds[slot].revents; - } - /* we mask in case requested events have changed */ - revents &= (events | POLLERR | POLLHUP); + /* we mask in case requested events have changed */ + int slot_revents = fds[slot].revents & events; + if (!slot_revents) + /* this slot is for a different set of events */ + continue; + + revents |= slot_revents; + *slotp = INT_MAX; /* so that next time we''ll see slot >= nfds */ + } return revents; } @@ -876,16 +889,63 @@ static void afterpoll_internal(libxl__egc *egc, libxl__poller *poller, EGC_GC; libxl__ev_fd *efd; - LIBXL_LIST_FOREACH(efd, &CTX->efds, entry) { - if (!efd->events) - continue; + /* + * Warning! Reentrancy hazards! + * + * Many parts of this function eventually call arbitrary callback + * functions which may modify the event handling data structures. + * + * Of the data structures used here: + * + * egc, poller, now + * are allocated by our caller and relate to the + * current thread and its call stack down into the + * event machinery; it is not freed until we return. + * So it is safe. + * + * fds is either what application passed into + * libxl_osevent_afterpoll (which, although this + * isn''t explicitly stated, clearly must remain + * valid until libxl_osevent_afterpoll returns) or + * it''s poller->fd_polls which is modified only by + * our (non-recursive) caller eventloop_iteration. + * + * CTX comes from our caller, and applications are + * forbidden from destroying it while we are running. + * So the ctx pointer itself is safe to use; now + * for its contents: + * + * CTX->etimes is used in a simple reentrancy-safe manner. + * + * CTX->efds is more complicated; see below. + */ - int revents = afterpoll_check_fd(poller,fds,nfds, efd->fd,efd->events); - if (revents) { - DBG("ev_fd=%p occurs fd=%d events=%x revents=%x", - efd, efd->fd, efd->events, revents); - efd->func(egc, efd, efd->fd, efd->events, revents); + for (;;) { + /* We restart our scan of fd events whenever we call a + * callback function. This is necessary because such + * a callback might make arbitrary changes to CTX->efds. + * We invalidate the fd_rindices[] entries which were used + * so that we don''t call the same function again. */ + int revents; + + LIBXL_LIST_FOREACH(efd, &CTX->efds, entry) { + + if (!efd->events) + continue; + + revents = afterpoll_check_fd(poller,fds,nfds, + efd->fd,efd->events); + if (revents) + goto found_fd_event; } + /* no ordinary fd events, then */ + break; + + found_fd_event: + DBG("ev_fd=%p occurs fd=%d events=%x revents=%x", + efd, efd->fd, efd->events, revents); + + efd->func(egc, efd, efd->fd, efd->events, revents); } if (afterpoll_check_fd(poller,fds,nfds, poller->wakeup_pipe[0],POLLIN)) { diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 2781398..e938660 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -272,7 +272,7 @@ struct libxl__poller { int fd_polls_allocd; int fd_rindices_allocd; - int (*fd_rindices)[3]; /* see libxl_osevent_beforepoll */ + int (*fd_rindices)[3]; /* see libxl_event.c:beforepoll_internal */ int wakeup_pipe[2]; /* 0 means no fd allocated */ }; -- 1.7.2.5
Ian Jackson
2012-Jul-20 18:18 UTC
[PATCH 2/2] libxl: do not blunder on if bootloader fails
If the bootloader failed, we would call the creation failure callback but _also_ blunder on trying to recreate the domain, due to a missing "return". Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reported-by: Roger Pau Monne <roger.pau@citrix.com> --- tools/libxl/libxl_create.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 294a73f..0bfa8ba 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -672,7 +672,10 @@ static void domcreate_bootloader_done(libxl__egc *egc, libxl__srm_restore_autogen_callbacks *const callbacks &dcs->shs.callbacks.restore.a; - if (rc) domcreate_rebuild_done(egc, dcs, rc); + if (rc) { + domcreate_rebuild_done(egc, dcs, rc); + return; + } /* consume bootloader outputs. state->pv_{kernel,ramdisk} have * been initialised by the bootloader already. -- 1.7.2.5
Ian Jackson writes ("[PATCH 0/2] libxl: event fixes for Roger"):> These two patches have been posted before. The fd reentrancy hazard > fix (1/1) has been improved and simplified. > > NB I have NOT executed these against recent xen-unstable tip and I > haven''t tested 1/1 since I changed it. These should not be applied > without testing. But I thought it better to post them than leave them > lurking in my tree. > > 1/2 libxl: fix reentrancy hazard in fd event processing > 2/2 libxl: do not blunder on if bootloader failsRoger, do you want me to prepare a patch to fix the bootloader pty HUP bug which shows up on NetBSD ? Ian.
Ian Campbell
2012-Jul-23 11:09 UTC
Re: [PATCH 2/2] libxl: do not blunder on if bootloader fails
On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote:> If the bootloader failed, we would call the creation failure callback > but _also_ blunder on trying to recreate the domain, due to a missing > "return". > > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> > Reported-by: Roger Pau Monne <roger.pau@citrix.com>I have been running with this one applied locally since you first posted it so I think it is good to go in. Acked-by: Ian Campbell <ian.campbell@citrix.com>> --- > tools/libxl/libxl_create.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 294a73f..0bfa8ba 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -672,7 +672,10 @@ static void domcreate_bootloader_done(libxl__egc *egc, > libxl__srm_restore_autogen_callbacks *const callbacks > &dcs->shs.callbacks.restore.a; > > - if (rc) domcreate_rebuild_done(egc, dcs, rc); > + if (rc) { > + domcreate_rebuild_done(egc, dcs, rc); > + return; > + } > > /* consume bootloader outputs. state->pv_{kernel,ramdisk} have > * been initialised by the bootloader already.
Ian Campbell
2012-Jul-23 12:13 UTC
Re: [PATCH 2/2] libxl: do not blunder on if bootloader fails
On Mon, 2012-07-23 at 12:09 +0100, Ian Campbell wrote:> On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote: > > If the bootloader failed, we would call the creation failure callback > > but _also_ blunder on trying to recreate the domain, due to a missing > > "return". > > > > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> > > Reported-by: Roger Pau Monne <roger.pau@citrix.com> > > I have been running with this one applied locally since you first posted > it so I think it is good to go in. > > Acked-by: Ian Campbell <ian.campbell@citrix.com>I have applied this.> > > --- > > tools/libxl/libxl_create.c | 5 ++++- > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > > index 294a73f..0bfa8ba 100644 > > --- a/tools/libxl/libxl_create.c > > +++ b/tools/libxl/libxl_create.c > > @@ -672,7 +672,10 @@ static void domcreate_bootloader_done(libxl__egc *egc, > > libxl__srm_restore_autogen_callbacks *const callbacks > > &dcs->shs.callbacks.restore.a; > > > > - if (rc) domcreate_rebuild_done(egc, dcs, rc); > > + if (rc) { > > + domcreate_rebuild_done(egc, dcs, rc); > > + return; > > + } > > > > /* consume bootloader outputs. state->pv_{kernel,ramdisk} have > > * been initialised by the bootloader already. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Jul-25 16:05 UTC
Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote:> In afterpoll_internal, the callback functions may register and > deregister events arbitrarily. This means that we need to consider > the reentrancy-safety of the event machinery state variables. > > Most of the code is safe but the fd handling is not. Fix this by > arranging to restart the fd scan loop every time we call one of these > callback functions. > > For this loop to terminate, we modify afterpoll_check_fd so that it > returns only once for each of afterpoll''s efds. > > Another possible solution would be simply to return from > afterpoll_internal after calling efd->func. That would be a small and > more obviously correct change but would prevent the process from > handling more than one fd event with a single call to poll. > > This is apropos of a report from Roger Pau Monne to me (pers.comm.) > of this crash on NetBSD: > > Program terminated with signal 11, Segmentation fault. > #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) > at libxl_event.c:856 > 856 if (fds[slot].fd != fd)Has Roger or you tested this now? It looks plausible to me.> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index 2781398..e938660 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -272,7 +272,7 @@ struct libxl__poller { > int fd_polls_allocd; > > int fd_rindices_allocd; > - int (*fd_rindices)[3]; /* see libxl_osevent_beforepoll */ > + int (*fd_rindices)[3]; /* see libxl_event.c:beforepoll_internal */do you mean afterpoll here? Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson
2012-Jul-25 16:10 UTC
Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
Ian Campbell writes ("Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing"):> On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote:...> > Program terminated with signal 11, Segmentation fault. > > #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) > > at libxl_event.c:856 > > 856 if (fds[slot].fd != fd) > > Has Roger or you tested this now?I still haven''t tested the new version I''m afraid.> It looks plausible to me.:-).> > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > > index 2781398..e938660 100644 > > --- a/tools/libxl/libxl_internal.h > > +++ b/tools/libxl/libxl_internal.h > > @@ -272,7 +272,7 @@ struct libxl__poller { > > int fd_polls_allocd; > > > > int fd_rindices_allocd; > > - int (*fd_rindices)[3]; /* see libxl_osevent_beforepoll */ > > + int (*fd_rindices)[3]; /* see libxl_event.c:beforepoll_internal */ > > do you mean afterpoll here?No. I mean this comment: /* * In order to be able to efficiently find the libxl__ev_fd for a * struct poll during _afterpoll, we maintain a shadow data * structure in CTX->fd_rindices: each fd corresponds to a slot in * fd_rindices, and each element in the rindices is three indices * into the fd array (for POLLIN, POLLPRI and POLLOUT). */ which is halfway down beforepoll_internal. Ian.
Ian Campbell
2012-Jul-25 16:33 UTC
Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
On Wed, 2012-07-25 at 17:10 +0100, Ian Jackson wrote:> Ian Campbell writes ("Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing"): > > On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote: > ... > > > Program terminated with signal 11, Segmentation fault. > > > #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) > > > at libxl_event.c:856 > > > 856 if (fds[slot].fd != fd) > > > > Has Roger or you tested this now? > > I still haven''t tested the new version I''m afraid.OK, I''ll hold off on committing then.> > > It looks plausible to me. > > :-). > > > > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > > > index 2781398..e938660 100644 > > > --- a/tools/libxl/libxl_internal.h > > > +++ b/tools/libxl/libxl_internal.h > > > @@ -272,7 +272,7 @@ struct libxl__poller { > > > int fd_polls_allocd; > > > > > > int fd_rindices_allocd; > > > - int (*fd_rindices)[3]; /* see libxl_osevent_beforepoll */ > > > + int (*fd_rindices)[3]; /* see libxl_event.c:beforepoll_internal */ > > > > do you mean afterpoll here? > > No. I mean this comment: > > /* > * In order to be able to efficiently find the libxl__ev_fd for a > * struct poll during _afterpoll, we maintain a shadow data > * structure in CTX->fd_rindices: each fd corresponds to a slot in > * fd_rindices, and each element in the rindices is three indices > * into the fd array (for POLLIN, POLLPRI and POLLOUT). > */ > > which is halfway down beforepoll_internal.Ah, OK. I thought it was a reference to one of the new comments. Nevermnd.> > Ian.
Roger Pau Monne
2012-Jul-25 16:49 UTC
Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
Ian Campbell wrote:> On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote: >> In afterpoll_internal, the callback functions may register and >> deregister events arbitrarily. This means that we need to consider >> the reentrancy-safety of the event machinery state variables. >> >> Most of the code is safe but the fd handling is not. Fix this by >> arranging to restart the fd scan loop every time we call one of these >> callback functions. >> >> For this loop to terminate, we modify afterpoll_check_fd so that it >> returns only once for each of afterpoll''s efds. >> >> Another possible solution would be simply to return from >> afterpoll_internal after calling efd->func. That would be a small and >> more obviously correct change but would prevent the process from >> handling more than one fd event with a single call to poll. >> >> This is apropos of a report from Roger Pau Monne to me (pers.comm.) >> of this crash on NetBSD: >> >> Program terminated with signal 11, Segmentation fault. >> #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) >> at libxl_event.c:856 >> 856 if (fds[slot].fd != fd) > > Has Roger or you tested this now?This works ok.> It looks plausible to me. > >> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h >> index 2781398..e938660 100644 >> --- a/tools/libxl/libxl_internal.h >> +++ b/tools/libxl/libxl_internal.h >> @@ -272,7 +272,7 @@ struct libxl__poller { >> int fd_polls_allocd; >> >> int fd_rindices_allocd; >> - int (*fd_rindices)[3]; /* see libxl_osevent_beforepoll */ >> + int (*fd_rindices)[3]; /* see libxl_event.c:beforepoll_internal */ > > do you mean afterpoll here? > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > >
Ian Jackson
2012-Jul-26 14:04 UTC
Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
Ian Campbell writes ("Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing"):> On Wed, 2012-07-25 at 17:10 +0100, Ian Jackson wrote: > > Ian Campbell writes ("Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing"): > > > On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote: > > ... > > > > Program terminated with signal 11, Segmentation fault. > > > > #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) > > > > at libxl_event.c:856 > > > > 856 if (fds[slot].fd != fd) > > > > > > Has Roger or you tested this now? > > > > I still haven''t tested the new version I''m afraid. > > OK, I''ll hold off on committing then.I have now tested these as part of my testing of Roger''s hotplug series. Ian.
Ian Campbell
2012-Jul-26 16:23 UTC
Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing
On Thu, 2012-07-26 at 15:04 +0100, Ian Jackson wrote:> Ian Campbell writes ("Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing"): > > On Wed, 2012-07-25 at 17:10 +0100, Ian Jackson wrote: > > > Ian Campbell writes ("Re: [PATCH 1/2] libxl: fix reentrancy hazard in fd event processing"): > > > > On Fri, 2012-07-20 at 19:18 +0100, Ian Jackson wrote: > > > ... > > > > > Program terminated with signal 11, Segmentation fault. > > > > > #0 0x00007f7ff743131b in afterpoll_check_fd (poller=<optimized out>, fds=0x7f7ff7b241c0, nfds=7, fd=-1, events=1) > > > > > at libxl_event.c:856 > > > > > 856 if (fds[slot].fd != fd) > > > > > > > > Has Roger or you tested this now? > > > > > > I still haven''t tested the new version I''m afraid. > > > > OK, I''ll hold off on committing then. > > I have now tested these as part of my testing of Roger''s hotplug > series.Thx. Added Roger''s Tested-by and my Acked-by and committed.