frederik at ofb.net
2017-Apr-26 04:13 UTC
[Rd] tempdir() may be deleted during long-running R session
On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote:> Might this combination serve the purpose: > * R session keeps an open handle on the tempdir it creates, > * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories)Good suggestion but doesn't work with the (increasingly popular) "Systemd": $ mkdir /tmp/somedir $ touch -d "12 days ago" /tmp/somedir/ $ cd /tmp/somedir/ $ sudo systemd-tmpfiles --clean $ ls /tmp/somedir/ ls: cannot access '/tmp/somedir/': No such file or directory I would advocate just changing 'tempfile()' so that it recreates the directory where the file is (the "dirname") before returning the file path. This would have fixed the issue I ran into. Changing 'tempdir()' to recreate the directory is another option. Thanks, Frederick
Martin Maechler
2017-Apr-26 08:21 UTC
[Rd] tempdir() may be deleted during long-running R session
>>>>> <frederik at ofb.net> >>>>> on Tue, 25 Apr 2017 21:13:59 -0700 writes:> On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote: >> Might this combination serve the purpose: >> * R session keeps an open handle on the tempdir it creates, >> * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories) I also agree that the above would be ideal - if possible. > Good suggestion but doesn't work with the (increasingly popular) > "Systemd": > $ mkdir /tmp/somedir > $ touch -d "12 days ago" /tmp/somedir/ > $ cd /tmp/somedir/ > $ sudo systemd-tmpfiles --clean > $ ls /tmp/somedir/ > ls: cannot access '/tmp/somedir/': No such file or directory Some thing like your example is what I'd expect is always a possibility on some platforms, all of course depending on low things such as root/syadmin/... "permission" to clean up etc. Jeroeen mentioned the fact that tempdir()s also can disappear for other reasons {his was multicore child processes .. bugously(?) implemented}. Further reasons may be race conditions / user code bugs / user errors, etc. Note that the R process which created the tempdir on startup always has the permission to remove it again. But you can also think a full file system, etc. Current R-devel's tempdir(check = TRUE) would create a new one or give an error (and then the user should be able to use Sys.setenv("TEMPDIR" ...) to a directory she has write-permission ) Gabe's point of course is important too: If you have a long running process that uses a tempfile, and if "big brother" has removed the full tempdir() you will be "unhappy" in any case. Trying to prevent big brother from doing that in all cases seems "not easy" in any case. I did want to provide an easy solution to the OP situation: Suddenly tmpdir() is gone, and quite a few things stop working in the current R process {he mentioned help(), e.g.}. With new tmpdir(check=TRUE) facility, code could be changed to replace tempfile("foo") either by tempfile("foo", tmpdir=tempdir(check=TRUE)) or by something like tryCatch(tempfile("foo"), error = function(e) tempfile("foo", tmpdir=tempdir(check=TRUE))) or be even more sophisticated. We could also consider allowing check = TRUE | NA | FALSE and make NA the default and have that correspond to check =TRUE but additionally do the equivalent of warning("tempdir() has become invalid and been recreated") in case the tempdir() had been invalid. > I would advocate just changing 'tempfile()' so that it recreates the > directory where the file is (the "dirname") before returning the file > path. This would have fixed the issue I ran into. Changing 'tempdir()' > to recreate the directory is another option. In the end I had decided that tempfile("foo", tmpdir = tempdir(check = TRUE)) is actually better self-documenting than tempfile("foo", checkDir = TRUE) which was my first inclination. Note again that currently, the checking is _off_ by default. I've just provided a tool -- which was relatively easy and platform independent! --- to do more (real and thought) experiments. Martin
Duncan Murdoch
2017-Apr-26 12:29 UTC
[Rd] tempdir() may be deleted during long-running R session
On 26/04/2017 4:21 AM, Martin Maechler wrote:>>>>>> <frederik at ofb.net> >>>>>> on Tue, 25 Apr 2017 21:13:59 -0700 writes: > > > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote: > >> Might this combination serve the purpose: > >> * R session keeps an open handle on the tempdir it creates, > >> * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories) > > I also agree that the above would be ideal - if possible. > > > Good suggestion but doesn't work with the (increasingly popular) > > "Systemd": > > > $ mkdir /tmp/somedir > > $ touch -d "12 days ago" /tmp/somedir/ > > $ cd /tmp/somedir/ > > $ sudo systemd-tmpfiles --clean > > $ ls /tmp/somedir/ > > ls: cannot access '/tmp/somedir/': No such file or directory > > Some thing like your example is what I'd expect is always a > possibility on some platforms, all of course depending on low > things such as root/syadmin/... "permission" to clean up etc. > > Jeroeen mentioned the fact that tempdir()s also can disappear > for other reasons {his was multicore child processes > .. bugously(?) implemented}. > Further reasons may be race conditions / user code bugs / user > errors, etc. > Note that the R process which created the tempdir on startup > always has the permission to remove it again. But you can also > think a full file system, etc. > > Current R-devel's tempdir(check = TRUE) would create a new > one or give an error (and then the user should be able to use > Sys.setenv("TEMPDIR" ...) > to a directory she has write-permission ) > > Gabe's point of course is important too: If you have a long > running process that uses a tempfile, > and if "big brother" has removed the full tempdir() you will > be "unhappy" in any case. > Trying to prevent big brother from doing that in all cases seems > "not easy" in any case. > > I did want to provide an easy solution to the OP situation: > Suddenly tmpdir() is gone, and quite a few things stop working > in the current R process {he mentioned help(), e.g.}. > With new tmpdir(check=TRUE) facility, code could be changed > to replace > > tempfile("foo") > > either by > tempfile("foo", tmpdir=tempdir(check=TRUE)) > > or by something like > > tryCatch(tempfile("foo"), > error = function(e) > tempfile("foo", tmpdir=tempdir(check=TRUE))) > > or be even more sophisticated. > > We could also consider allowing check = TRUE | NA | FALSE > > and make NA the default and have that correspond to > check =TRUE but additionally do the equivalent of > warning("tempdir() has become invalid and been recreated") > in case the tempdir() had been invalid. > > > I would advocate just changing 'tempfile()' so that it recreates the > > directory where the file is (the "dirname") before returning the file > > path. This would have fixed the issue I ran into. Changing 'tempdir()' > > to recreate the directory is another option. > > In the end I had decided that > > tempfile("foo", tmpdir = tempdir(check = TRUE)) > > is actually better self-documenting than > > tempfile("foo", checkDir = TRUE) > > which was my first inclination. > > Note again that currently, the checking is _off_ by default. > I've just provided a tool -- which was relatively easy and > platform independent! --- to do more (real and thought) > experiments.This seems like the wrong approach. The problem occurs as soon as the tempdir() gets cleaned up: there could be information in temp files that gets lost at that point. So the solution should be to prevent the cleanup, not to continue on after it has occurred (as "check = TRUE" does). This follows the principle that it's better for the process to always die than to sometimes silently produce incorrect results. Frederick posted the way to do this in systems using systemd. We should be putting that in place, or the equivalent on systems using other tempfile cleanups. This looks to me like something that "make install" should do, or perhaps it should be done by people putting together packages for specific systems. Duncan Murdoch
Possibly Parallel Threads
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session