Bamvor Jian Zhang
2013-Nov-25 09:18 UTC
[PATCH] fix assert fail in libxl__sigchld_installhandler
when libxl_sigchld_owner_libxl_always is used in libvirt or other tools stack, there is a assertion fail in restore in libxl__sigchld_installhandler during the following sequence: create -> save -> restore. here is the backtrace: #0 0x00007f7a522453d5 in raise () from /lib64/libc.so.6 #1 0x00007f7a52246858 in abort () from /lib64/libc.so.6 #2 0x00007f7a5223e2e2 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007f7a5223e392 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f7a48008113 in libxl__sigchld_installhandler ( gc=gc@entry=0x7f7a4f080480) at libxl_fork.c:225 #5 0x00007f7a480081f4 in perhaps_installhandler (gc=gc@entry=0x7f7a4f080480, creating=creating@entry=false) at libxl_fork.c:272 #6 0x00007f7a4800851f in libxl_childproc_setmode (ctx=0x7f7a400c0ee0, hooks=hooks@entry=0x7f7a482694a0 <childproc_hooks>, user=user@entry=0x7f7a48474f80 <child_info>) at libxl_fork.c:427 the sigchld_owner exists but it is not the same as the new CTX. meanwhile the old sigchild is not removed in perhaps_removehandler. Signed-off-by: Bamvor Jian Zhang <bjzhang@suse.com> --- tools/libxl/libxl_fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c index 4ae9f94..9f1a5ce 100644 --- a/tools/libxl/libxl_fork.c +++ b/tools/libxl/libxl_fork.c @@ -252,7 +252,7 @@ static bool chldmode_ours(libxl_ctx *ctx, bool creating) case libxl_sigchld_owner_mainloop: return 0; case libxl_sigchld_owner_libxl_always: - return 1; + return creating || !LIBXL_LIST_EMPTY(&ctx->children); } abort(); } -- 1.8.1.4
Ian Jackson
2013-Nov-25 11:56 UTC
Re: [PATCH] fix assert fail in libxl__sigchld_installhandler
Bamvor Jian Zhang writes ("[PATCH] fix assert fail in libxl__sigchld_installhandler"):> when libxl_sigchld_owner_libxl_always is used in libvirt or other > tools stack, there is a assertion fail in restore in > libxl__sigchld_installhandler during the following sequence: > create -> save -> restore. > > here is the backtrace: > #0 0x00007f7a522453d5 in raise () from /lib64/libc.so.6 > #1 0x00007f7a52246858 in abort () from /lib64/libc.so.6 > #2 0x00007f7a5223e2e2 in __assert_fail_base () from /lib64/libc.so.6 > #3 0x00007f7a5223e392 in __assert_fail () from /lib64/libc.so.6 > #4 0x00007f7a48008113 in libxl__sigchld_installhandler ( > gc=gc@entry=0x7f7a4f080480) at libxl_fork.c:225 > #5 0x00007f7a480081f4 in perhaps_installhandler (gc=gc@entry=0x7f7a4f080480, > creating=creating@entry=false) at libxl_fork.c:272 > #6 0x00007f7a4800851f in libxl_childproc_setmode (ctx=0x7f7a400c0ee0, > hooks=hooks@entry=0x7f7a482694a0 <childproc_hooks>, > user=user@entry=0x7f7a48474f80 <child_info>) at libxl_fork.c:427 > > the sigchld_owner exists but it is not the same as the new CTX. > meanwhile the old sigchild is not removed in perhaps_removehandler.I think the assertion has detected a bug in your use of libxl, not a bug in libxl. If you use libxl_sigchld_owner_libxl_always you''re telling libxl that this libxl_ctx should always own the entire application''s SIGCHLDs. That is obviously not compatible with having multiple different libxl_ctx''s. If you are running multiple different long-running libxl operations in parallel, then either: (a) they must all use the same libxl ctx or (b) you must specify libxl_sigchld_owner_mainloop and demultiplex SIGCHLD yourself This wasn''t 100% clear from the doc comment. I propose the patch below to fix this. Thanks, Ian. From 73b6484cbd5409101308c6c640eb7597f63256b9 Mon Sep 17 00:00:00 2001 From: Ian Jackson <ian.jackson@eu.citrix.com> Date: Mon, 25 Nov 2013 11:53:28 +0000 Subject: [PATCH] libxl: doc comment: clarify SIGCHLD demultiplexing requirements Update the comment to clarify that libxl_sigchld_owner_libxl_always implies having only one libxl_ctx, and that libxl_sigchld_owner_mainloop requires one call to libxl_childproc_exited per ctx. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> --- tools/libxl/libxl_event.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h index 27a65dc..5b2c689 100644 --- a/tools/libxl/libxl_event.h +++ b/tools/libxl/libxl_event.h @@ -446,13 +446,15 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl) * * libxl_sigchld_owner_mainloop: * The application must install a SIGCHLD handler and reap (at - * least) all of libxl''s children and pass their exit status - * to libxl by calling libxl_childproc_exited. + * least) all of libxl''s children and pass their exit status to + * libxl by calling libxl_childproc_exited. (If the application + * has multiple libxl ctx''s, it must call libxl_childproc_exited + * on each ctx.) * * libxl_sigchld_owner_libxl_always: * The application expects libxl to reap all of its children, * and provides a callback to be notified of their exit - * statues. + * statues. The application must have only one libxl_ctx. * * An application which fails to call setmode, or which passes 0 for * hooks, while it uses any libxl operation which might -- 1.7.10.4
Ian Campbell
2013-Nov-25 11:59 UTC
Re: [PATCH] fix assert fail in libxl__sigchld_installhandler
On Mon, 2013-11-25 at 11:56 +0000, Ian Jackson wrote:> Bamvor Jian Zhang writes ("[PATCH] fix assert fail in libxl__sigchld_installhandler"): > > when libxl_sigchld_owner_libxl_always is used in libvirt or other > > tools stack, there is a assertion fail in restore in > > libxl__sigchld_installhandler during the following sequence: > > create -> save -> restore. > > > > here is the backtrace: > > #0 0x00007f7a522453d5 in raise () from /lib64/libc.so.6 > > #1 0x00007f7a52246858 in abort () from /lib64/libc.so.6 > > #2 0x00007f7a5223e2e2 in __assert_fail_base () from /lib64/libc.so.6 > > #3 0x00007f7a5223e392 in __assert_fail () from /lib64/libc.so.6 > > #4 0x00007f7a48008113 in libxl__sigchld_installhandler ( > > gc=gc@entry=0x7f7a4f080480) at libxl_fork.c:225 > > #5 0x00007f7a480081f4 in perhaps_installhandler (gc=gc@entry=0x7f7a4f080480, > > creating=creating@entry=false) at libxl_fork.c:272 > > #6 0x00007f7a4800851f in libxl_childproc_setmode (ctx=0x7f7a400c0ee0, > > hooks=hooks@entry=0x7f7a482694a0 <childproc_hooks>, > > user=user@entry=0x7f7a48474f80 <child_info>) at libxl_fork.c:427 > > > > the sigchld_owner exists but it is not the same as the new CTX. > > meanwhile the old sigchild is not removed in perhaps_removehandler. > > I think the assertion has detected a bug in your use of libxl, not a > bug in libxl. > > If you use libxl_sigchld_owner_libxl_always you''re telling libxl that > this libxl_ctx should always own the entire application''s SIGCHLDs. > That is obviously not compatible with having multiple different > libxl_ctx''s. > > If you are running multiple different long-running libxl operations in > parallel, then either: > (a) they must all use the same libxl ctx > or > (b) you must specify libxl_sigchld_owner_mainloop and demultiplex > SIGCHLD yourselfI was going to ask if it were valid to use libxl_sigchld_owner_libxl_always for one ctx and libxl_sigchld_owner_mainloop for all the others (in essence delegating the requirements of _mainloop to the one special ctx). The answer is no, due to the requirement for users of _mainloop to call libxl_childproc_exited, which _libxl_always does not do for you, certainly not for other ctxts.> > This wasn''t 100% clear from the doc comment. I propose the patch > below to fix this. > > Thanks, > Ian. > > From 73b6484cbd5409101308c6c640eb7597f63256b9 Mon Sep 17 00:00:00 2001 > From: Ian Jackson <ian.jackson@eu.citrix.com> > Date: Mon, 25 Nov 2013 11:53:28 +0000 > Subject: [PATCH] libxl: doc comment: clarify SIGCHLD demultiplexing > requirements > > Update the comment to clarify that libxl_sigchld_owner_libxl_always > implies having only one libxl_ctx, and that > libxl_sigchld_owner_mainloop requires one call to > libxl_childproc_exited per ctx. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>> --- > tools/libxl/libxl_event.h | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h > index 27a65dc..5b2c689 100644 > --- a/tools/libxl/libxl_event.h > +++ b/tools/libxl/libxl_event.h > @@ -446,13 +446,15 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl) > * > * libxl_sigchld_owner_mainloop: > * The application must install a SIGCHLD handler and reap (at > - * least) all of libxl''s children and pass their exit status > - * to libxl by calling libxl_childproc_exited. > + * least) all of libxl''s children and pass their exit status to > + * libxl by calling libxl_childproc_exited. (If the application > + * has multiple libxl ctx''s, it must call libxl_childproc_exited > + * on each ctx.) > * > * libxl_sigchld_owner_libxl_always: > * The application expects libxl to reap all of its children, > * and provides a callback to be notified of their exit > - * statues. > + * statues. The application must have only one libxl_ctx. > * > * An application which fails to call setmode, or which passes 0 for > * hooks, while it uses any libxl operation which might
Ian Jackson
2013-Nov-25 16:42 UTC
Re: [PATCH] fix assert fail in libxl__sigchld_installhandler
Ian Campbell writes ("Re: [PATCH] fix assert fail in libxl__sigchld_installhandler"):> On Mon, 2013-11-25 at 11:56 +0000, Ian Jackson wrote:...> > If you are running multiple different long-running libxl operations in > > parallel, then either: > > (a) they must all use the same libxl ctx > > or > > (b) you must specify libxl_sigchld_owner_mainloop and demultiplex > > SIGCHLD yourself > > I was going to ask if it were valid to use > libxl_sigchld_owner_libxl_always for one ctx and > libxl_sigchld_owner_mainloop for all the others (in essence delegating > the requirements of _mainloop to the one special ctx). The answer is no, > due to the requirement for users of _mainloop to call > libxl_childproc_exited, which _libxl_always does not do for you, > certainly not for other ctxts.I guess you _could_ do that, actually. What you would do is provide a reaped_callback to the _libxl_always ctx. And in that callback, you would iterate over all your other ctxs calling libxl_childproc_reaped. The locking would be tricky. Hmm. I guess you could have a special libxl ctx whose only purpose was to deal with the sigchld. Then its lock would be outside the lock protecting your list of contexts, and outside all of your libxl locks, and the reaped_callback could just do the obvious thing. You would have to call libxl_event_wait on that ctx on some thread. Is it best not to mention these possibilities ? Ian.
Ian Campbell
2013-Nov-25 16:52 UTC
Re: [PATCH] fix assert fail in libxl__sigchld_installhandler
On Mon, 2013-11-25 at 16:42 +0000, Ian Jackson wrote:> Ian Campbell writes ("Re: [PATCH] fix assert fail in libxl__sigchld_installhandler"): > > On Mon, 2013-11-25 at 11:56 +0000, Ian Jackson wrote: > ... > > > If you are running multiple different long-running libxl operations in > > > parallel, then either: > > > (a) they must all use the same libxl ctx > > > or > > > (b) you must specify libxl_sigchld_owner_mainloop and demultiplex > > > SIGCHLD yourself > > > > I was going to ask if it were valid to use > > libxl_sigchld_owner_libxl_always for one ctx and > > libxl_sigchld_owner_mainloop for all the others (in essence delegating > > the requirements of _mainloop to the one special ctx). The answer is no, > > due to the requirement for users of _mainloop to call > > libxl_childproc_exited, which _libxl_always does not do for you, > > certainly not for other ctxts. > > I guess you _could_ do that, actually. > > What you would do is provide a reaped_callback to the _libxl_always > ctx. And in that callback, you would iterate over all your other ctxs > calling libxl_childproc_reaped. > > The locking would be tricky. > > Hmm. I guess you could have a special libxl ctx whose only purpose > was to deal with the sigchld. Then its lock would be outside the lock > protecting your list of contexts, and outside all of your libxl locks, > and the reaped_callback could just do the obvious thing. You would > have to call libxl_event_wait on that ctx on some thread. > > Is it best not to mention these possibilities ?I think so!> > Ian.
Ian Jackson
2013-Nov-26 11:47 UTC
Re: [PATCH] fix assert fail in libxl__sigchld_installhandler
Ian Campbell writes ("Re: [PATCH] fix assert fail in libxl__sigchld_installhandler"):> On Mon, 2013-11-25 at 11:56 +0000, Ian Jackson wrote: > > From 73b6484cbd5409101308c6c640eb7597f63256b9 Mon Sep 17 00:00:00 2001 > > From: Ian Jackson <ian.jackson@eu.citrix.com> > > Date: Mon, 25 Nov 2013 11:53:28 +0000 > > Subject: [PATCH] libxl: doc comment: clarify SIGCHLD demultiplexing > > requirements > > > > Update the comment to clarify that libxl_sigchld_owner_libxl_always > > implies having only one libxl_ctx, and that > > libxl_sigchld_owner_mainloop requires one call to > > libxl_childproc_exited per ctx. > > > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > Acked-by: Ian Campbell <ian.campbell@citrix.com>I have committed this with the slight clarification below. Thanks, Ian. diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h index 5b2c689..6261f99 100644 --- a/tools/libxl/libxl_event.h +++ b/tools/libxl/libxl_event.h @@ -454,7 +454,8 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl) * libxl_sigchld_owner_libxl_always: * The application expects libxl to reap all of its children, * and provides a callback to be notified of their exit - * statues. The application must have only one libxl_ctx. + * statues. The application must have only one libxl_ctx + * configured this way. * * An application which fails to call setmode, or which passes 0 for * hooks, while it uses any libxl operation which might