Richard W.M. Jones
2018-Nov-13 10:42 UTC
Re: [Libguestfs] collectd leaks SIGCHLD == SIG_IGN into plugins
On Tue, Nov 13, 2018 at 10:04:33AM +0000, Daniel P. Berrangé wrote:> On Fri, Nov 09, 2018 at 12:19:30PM +0000, Richard W.M. Jones wrote: > > Peter Dimitrov and myself were debugging a very peculiar bug when > > libguestfs is run as a plugin from collectd: > > > > https://www.redhat.com/archives/libguestfs/2018-November/thread.html#00023 > > > > The long story short is that collectd leaks SIGCHLD == SIG_IGN setting > > into plugins: > > > > https://www.redhat.com/archives/libguestfs/2018-November/msg00095.html > > > > This means that any plugin that does the usual pattern of: > > > > pid = fork (); > > ... > > if (waitpid (pid, NULL, 0) == -1) { > > perror ("waitpid"); > > exit (EXIT_FAILURE); > > } > > > > will fail, because the forked subprocess is automatically reaped > > before waitpid is called, resulting in the wait failing with errno => > ECHILD. > > > > It is possible to work around this by adding: > > > > signal (SIGCHLD, SIG_DFL); > > > > to the plugin. However I believe this is a bug in collectd, and it > > should sanitize signals (and maybe other things) before running > > plugins. > > Yes, I'm inclined to agree. If an app or library is spawning external > processes it should take care to restore signal setup to a "normal" state. > This means not only setting SIG_DFL for all signals, but also just as > importantly, the signal mask. It should be set to all ones before calling > fork, and set back to all zeroes immedaitely before execve, as illustrated > in libvirt to avoid races: > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/vircommand.c;h=de937f6f9aa91abb518eac98bfac9dcf37e1f5df;hb=HEAD#l304 > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/vircommand.c;h=de937f6f9aa91abb518eac98bfac9dcf37e1f5df;hb=HEAD#l360This is absolutely right. What's interesting and new here is that collectd is actually using dlopen to call external plugins (so not forking itself). See collectd.git/src/daemon/plugin.c However it's still not sanitizing the environment sufficiently around these calls, so it's the same sort of problem, but in a different context that I'd not seen/understood before. nbdkit has the same problem :-( We already dealt with umask a few years ago, now I've got to check our signal handlers too ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Daniel P. Berrangé
2018-Nov-13 10:53 UTC
Re: [Libguestfs] collectd leaks SIGCHLD == SIG_IGN into plugins
On Tue, Nov 13, 2018 at 10:42:20AM +0000, Richard W.M. Jones wrote:> On Tue, Nov 13, 2018 at 10:04:33AM +0000, Daniel P. Berrangé wrote: > > On Fri, Nov 09, 2018 at 12:19:30PM +0000, Richard W.M. Jones wrote: > > > Peter Dimitrov and myself were debugging a very peculiar bug when > > > libguestfs is run as a plugin from collectd: > > > > > > https://www.redhat.com/archives/libguestfs/2018-November/thread.html#00023 > > > > > > The long story short is that collectd leaks SIGCHLD == SIG_IGN setting > > > into plugins: > > > > > > https://www.redhat.com/archives/libguestfs/2018-November/msg00095.html > > > > > > This means that any plugin that does the usual pattern of: > > > > > > pid = fork (); > > > ... > > > if (waitpid (pid, NULL, 0) == -1) { > > > perror ("waitpid"); > > > exit (EXIT_FAILURE); > > > } > > > > > > will fail, because the forked subprocess is automatically reaped > > > before waitpid is called, resulting in the wait failing with errno => > > ECHILD. > > > > > > It is possible to work around this by adding: > > > > > > signal (SIGCHLD, SIG_DFL); > > > > > > to the plugin. However I believe this is a bug in collectd, and it > > > should sanitize signals (and maybe other things) before running > > > plugins. > > > > Yes, I'm inclined to agree. If an app or library is spawning external > > processes it should take care to restore signal setup to a "normal" state. > > This means not only setting SIG_DFL for all signals, but also just as > > importantly, the signal mask. It should be set to all ones before calling > > fork, and set back to all zeroes immedaitely before execve, as illustrated > > in libvirt to avoid races: > > > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/vircommand.c;h=de937f6f9aa91abb518eac98bfac9dcf37e1f5df;hb=HEAD#l304 > > > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/vircommand.c;h=de937f6f9aa91abb518eac98bfac9dcf37e1f5df;hb=HEAD#l360 > > This is absolutely right. What's interesting and new here is that > collectd is actually using dlopen to call external plugins (so not > forking itself). See collectd.git/src/daemon/plugin.c > > However it's still not sanitizing the environment sufficiently around > these calls, so it's the same sort of problem, but in a different > context that I'd not seen/understood before.Hmm, in the dlopen() plugin case I think it is less clear cut. I'm fuzzy on the collectd architecture, but if it is multi-threaded, then it is not practical to set SIGCHLD to a particular value around the plugin call, because signal handlers are process wide, not per thread. So from that POV, I think the plugin would have to be written to cope with whatever setup it receives. If you really need SIGCHLD to work, then the plugin would need to immediately fork and have the parent do a blocking waitpid. Then the real work can be done in the fork'd child, which can safely reset SIGCLD and spawn the real external processes as needed. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Florian Forster
2018-Nov-13 11:00 UTC
Re: [Libguestfs] [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins
Hi, thank you very much for reporting this! Sounds like a bug in the exec plugin – it never ceases to amaze me how many issues a single plugin can have ;)> > > This means that any plugin that does the usual pattern of: > > > > > > pid = fork ();Note that the exec plugin is the *only* plugin that does this. All other plugins are forbidden to fork(), popen() or create new processes in any other way. The only plugin doing that, the exec plugin, has had enough issues over the years for me to feel justified in that decision. ;-) As mentioned before, a Github issue would be appreciated so we can properly track this problem. Thanks and best regards, —octo -- Florian octo Forster Hacker in training GnuPG: 0x0C705A15 http://octo.it/
Florian Forster
2018-Nov-13 11:07 UTC
Re: [Libguestfs] [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins
A quick glance shows that the exec plugin actually is clearing the signal mask: https://github.com/collectd/collectd/blob/master/src/exec.c#L526 Can you give some more context when this problem comes up? Ideally in a Github issue (hint, hint ;). -- Florian octo Forster Hacker in training GnuPG: 0x0C705A15 http://octo.it/
Richard W.M. Jones
2018-Nov-13 11:34 UTC
Re: [Libguestfs] [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins
On Tue, Nov 13, 2018 at 12:00:19PM +0100, Florian Forster wrote:> Hi, > > thank you very much for reporting this! Sounds like a bug in the exec plugin – > it never ceases to amaze me how many issues a single plugin can have ;) > > > > > This means that any plugin that does the usual pattern of: > > > > > > > > pid = fork (); > > Note that the exec plugin is the *only* plugin that does this. All other > plugins are forbidden to fork(), popen() or create new processes in any other > way. The only plugin doing that, the exec plugin, has had enough issues over > the years for me to feel justified in that decision. ;-) > > As mentioned before, a Github issue would be appreciated so we can properly > track this problem.I don't know if Peter is using the exec plugin or is trying to write an ordinary plugin. However the library he is using (libguestfs) certainly does fork subprocess(es). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Possibly Parallel Threads
- Re: [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins
- Re: collectd leaks SIGCHLD == SIG_IGN into plugins
- Re: [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins
- Re: [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins
- Re: [collectd] collectd leaks SIGCHLD == SIG_IGN into plugins