It seems that the system() and system2() functions don't close file descriptors between the fork() and exec() (on Unix platforms, of course). This means that the child processes inherit open files and socket connections. Running this (from a terminal) will result in the child process writing to a file that was opened by R: R f <- file('foo.txt', 'w') system('echo "abc" >&3') You can also see the open files if you run the following: f <- file('foo.txt', 'w') system2('sleep', '100', wait=F) And then in another terminal: lsof -c R -c sleep it will show that both the R and sleep processes have the file open: ... R 324 root 3w REG 0,48 0 4259 /foo.txt ... sleep 327 root 3w REG 0,48 0 4259 /foo.txt This behavior can cause problems if R spawns a child process that outlives the R process, but keeps open some resources. Would it be possible to add an option to close file descriptors for child processes? It would be nice if that were the default, but I suspect that making that change would break a lot of existing code. To take an example from the Python world, subprocess.Popen() has an option, close_fds, which closes all file descriptors except 0, 1, and 2. https://docs.python.org/2/library/subprocess.html#popen-constructor -Winston [[alternative HTML version deleted]]
In addition to the issue of a child process holding onto open files, the child process can also manipulate a file descriptor in a way that affects the parent process. For example, calling lseek() in the child process will move the file offset in the parent process. Here is a set of commands that demonstrates it. They can be copied and pasted in a terminal. What it does: - Creates C program that seeks to the beginning of a file descriptor, and compiles it to a program named "lseek". - Creates a file with some text in it. - Starts R. In R: - Opens the text file and reads the first line. - Runs lseek in a child process. - Reads the rest of the lines. echo "#include <unistd.h> int main(void) { lseek(3, 0, SEEK_SET); }" > lseek.c gcc lseek.c -o lseek echo "line 1 line 2 line 3" > lines.txt R f <- file('lines.txt', 'r') cat(readLines(f, n = 1), sep = "\n") system('./lseek') cat(readLines(f), sep = "\n") Here's what it outputs:> f <- file('lines.txt', 'r') > cat(readLines(f, n = 1), sep = "\n")line 1> system('./lseek') > cat(readLines(f), sep = "\n")line 2 line 3 line 1 line 2 line 3 The child process has changed what the parent process reads from the file. (I'm guessing that the reason readLines() prints out "line 2" and "line 3" before starting over is because it has already buffered the whole file before lseek is executed.) This is obviously a highly contrived case, but it illustrates what's possible. The other issue I mentioned, with child processes holding open files after the R process exits, is more likely to cause problems in the real world. That's actually how I encountered this issue in the first place: when restarting R inside of RStudio on a Mac, if there are any extant child processes started by system(), they keep some files open, and this causes RStudio to hang. (There's a fix in progress for RStudio for this particular issue.) -Winston On Tue, Apr 18, 2017 at 3:20 PM, Winston Chang <winstonchang1 at gmail.com> wrote:> It seems that the system() and system2() functions don't close file > descriptors between the fork() and exec() (on Unix platforms, of course). > This means that the child processes inherit open files and socket > connections. > > Running this (from a terminal) will result in the child process writing to > a file that was opened by R: > > R > f <- file('foo.txt', 'w') > system('echo "abc" >&3') > > > > You can also see the open files if you run the following: > f <- file('foo.txt', 'w') > system2('sleep', '100', wait=F) > > And then in another terminal: > lsof -c R -c sleep > it will show that both the R and sleep processes have the file open: > ... > R 324 root 3w REG 0,48 0 4259 /foo.txt > ... > sleep 327 root 3w REG 0,48 0 4259 /foo.txt > > > This behavior can cause problems if R spawns a child process that outlives > the R process, but keeps open some resources. > > Would it be possible to add an option to close file descriptors for child > processes? It would be nice if that were the default, but I suspect that > making that change would break a lot of existing code. > > To take an example from the Python world, subprocess.Popen() has an > option, close_fds, which closes all file descriptors except 0, 1, and 2. > https://docs.python.org/2/library/subprocess.html#popen-constructor > > > -Winston >[[alternative HTML version deleted]]
In S+ on Unix-alikes we dealt with this issue by using fcntl(fd, F_SETFD, 1) to set the close-on-exec flag on a file descriptor as soon as we opened it. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Apr 19, 2017 at 8:40 PM, Winston Chang <winstonchang1 at gmail.com> wrote:> In addition to the issue of a child process holding onto open files, the > child process can also manipulate a file descriptor in a way that affects > the parent process. For example, calling lseek() in the child process will > move the file offset in the parent process. > > Here is a set of commands that demonstrates it. They can be copied and > pasted in a terminal. What it does: > - Creates C program that seeks to the beginning of a file descriptor, and > compiles it to a program named "lseek". > - Creates a file with some text in it. > - Starts R. In R: > - Opens the text file and reads the first line. > - Runs lseek in a child process. > - Reads the rest of the lines. > > > echo "#include <unistd.h> > int main(void) { > lseek(3, 0, SEEK_SET); > }" > lseek.c > > gcc lseek.c -o lseek > > echo "line 1 > line 2 > line 3" > lines.txt > > R > f <- file('lines.txt', 'r') > cat(readLines(f, n = 1), sep = "\n") > system('./lseek') > cat(readLines(f), sep = "\n") > > > Here's what it outputs: >> f <- file('lines.txt', 'r') >> cat(readLines(f, n = 1), sep = "\n") > line 1 >> system('./lseek') >> cat(readLines(f), sep = "\n") > line 2 > line 3 > line 1 > line 2 > line 3 > > The child process has changed what the parent process reads from the file. > (I'm guessing that the reason readLines() prints out "line 2" and "line 3" > before starting over is because it has already buffered the whole file > before lseek is executed.) > > This is obviously a highly contrived case, but it illustrates what's > possible. The other issue I mentioned, with child processes holding open > files after the R process exits, is more likely to cause problems in the > real world. That's actually how I encountered this issue in the first > place: when restarting R inside of RStudio on a Mac, if there are any > extant child processes started by system(), they keep some files open, and > this causes RStudio to hang. (There's a fix in progress for RStudio for > this particular issue.) > > -Winston > > > > On Tue, Apr 18, 2017 at 3:20 PM, Winston Chang <winstonchang1 at gmail.com> > wrote: > >> It seems that the system() and system2() functions don't close file >> descriptors between the fork() and exec() (on Unix platforms, of course). >> This means that the child processes inherit open files and socket >> connections. >> >> Running this (from a terminal) will result in the child process writing to >> a file that was opened by R: >> >> R >> f <- file('foo.txt', 'w') >> system('echo "abc" >&3') >> >> >> >> You can also see the open files if you run the following: >> f <- file('foo.txt', 'w') >> system2('sleep', '100', wait=F) >> >> And then in another terminal: >> lsof -c R -c sleep >> it will show that both the R and sleep processes have the file open: >> ... >> R 324 root 3w REG 0,48 0 4259 /foo.txt >> ... >> sleep 327 root 3w REG 0,48 0 4259 /foo.txt >> >> >> This behavior can cause problems if R spawns a child process that outlives >> the R process, but keeps open some resources. >> >> Would it be possible to add an option to close file descriptors for child >> processes? It would be nice if that were the default, but I suspect that >> making that change would break a lot of existing code. >> >> To take an example from the Python world, subprocess.Popen() has an >> option, close_fds, which closes all file descriptors except 0, 1, and 2. >> https://docs.python.org/2/library/subprocess.html#popen-constructor >> >> >> -Winston >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel