Duncan Murdoch
2017-Apr-26 12:29 UTC
[Rd] tempdir() may be deleted during long-running R session
On 26/04/2017 4:21 AM, Martin Maechler wrote:>>>>>> <frederik at ofb.net> >>>>>> on Tue, 25 Apr 2017 21:13:59 -0700 writes: > > > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote: > >> Might this combination serve the purpose: > >> * R session keeps an open handle on the tempdir it creates, > >> * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories) > > I also agree that the above would be ideal - if possible. > > > Good suggestion but doesn't work with the (increasingly popular) > > "Systemd": > > > $ mkdir /tmp/somedir > > $ touch -d "12 days ago" /tmp/somedir/ > > $ cd /tmp/somedir/ > > $ sudo systemd-tmpfiles --clean > > $ ls /tmp/somedir/ > > ls: cannot access '/tmp/somedir/': No such file or directory > > Some thing like your example is what I'd expect is always a > possibility on some platforms, all of course depending on low > things such as root/syadmin/... "permission" to clean up etc. > > Jeroeen mentioned the fact that tempdir()s also can disappear > for other reasons {his was multicore child processes > .. bugously(?) implemented}. > Further reasons may be race conditions / user code bugs / user > errors, etc. > Note that the R process which created the tempdir on startup > always has the permission to remove it again. But you can also > think a full file system, etc. > > Current R-devel's tempdir(check = TRUE) would create a new > one or give an error (and then the user should be able to use > Sys.setenv("TEMPDIR" ...) > to a directory she has write-permission ) > > Gabe's point of course is important too: If you have a long > running process that uses a tempfile, > and if "big brother" has removed the full tempdir() you will > be "unhappy" in any case. > Trying to prevent big brother from doing that in all cases seems > "not easy" in any case. > > I did want to provide an easy solution to the OP situation: > Suddenly tmpdir() is gone, and quite a few things stop working > in the current R process {he mentioned help(), e.g.}. > With new tmpdir(check=TRUE) facility, code could be changed > to replace > > tempfile("foo") > > either by > tempfile("foo", tmpdir=tempdir(check=TRUE)) > > or by something like > > tryCatch(tempfile("foo"), > error = function(e) > tempfile("foo", tmpdir=tempdir(check=TRUE))) > > or be even more sophisticated. > > We could also consider allowing check = TRUE | NA | FALSE > > and make NA the default and have that correspond to > check =TRUE but additionally do the equivalent of > warning("tempdir() has become invalid and been recreated") > in case the tempdir() had been invalid. > > > I would advocate just changing 'tempfile()' so that it recreates the > > directory where the file is (the "dirname") before returning the file > > path. This would have fixed the issue I ran into. Changing 'tempdir()' > > to recreate the directory is another option. > > In the end I had decided that > > tempfile("foo", tmpdir = tempdir(check = TRUE)) > > is actually better self-documenting than > > tempfile("foo", checkDir = TRUE) > > which was my first inclination. > > Note again that currently, the checking is _off_ by default. > I've just provided a tool -- which was relatively easy and > platform independent! --- to do more (real and thought) > experiments.This seems like the wrong approach. The problem occurs as soon as the tempdir() gets cleaned up: there could be information in temp files that gets lost at that point. So the solution should be to prevent the cleanup, not to continue on after it has occurred (as "check = TRUE" does). This follows the principle that it's better for the process to always die than to sometimes silently produce incorrect results. Frederick posted the way to do this in systems using systemd. We should be putting that in place, or the equivalent on systems using other tempfile cleanups. This looks to me like something that "make install" should do, or perhaps it should be done by people putting together packages for specific systems. Duncan Murdoch
Dirk Eddelbuettel
2017-Apr-26 13:40 UTC
[Rd] tempdir() may be deleted during long-running R session
On 26 April 2017 at 08:29, Duncan Murdoch wrote: | This seems like the wrong approach. The problem occurs as soon as the | tempdir() gets cleaned up: there could be information in temp files | that gets lost at that point. So the solution should be to prevent the | cleanup, not to continue on after it has occurred (as "check = TRUE" | does). This follows the principle that it's better for the process to | always die than to sometimes silently produce incorrect results. That is generally true, but also "hard" as we don't have a handle on the OS. | Frederick posted the way to do this in systems using systemd. We should While that was a very helpful post yet it may only apply to Arch Linux as stated. My Ubuntu systems at home and work all run systemd too, but do _not_ automatically remove tempfiles. Yet what he suggested is quite right: we should define a proper config file for this facility and then possibly also use the /run directory as many other services now and (of course) also either TEMPDIR or later the code to have /run be another fallback if TMP, TEMP, TMPDIR, ... are unset. Distribution maintainers such as yours truly could then include this configuration. | be putting that in place, or the equivalent on systems using other | tempfile cleanups. This looks to me like something that "make install" | should do, or perhaps it should be done by people putting together | packages for specific systems. Doesn't 'make install' only write to $RHOME/ and below, plus $PREFIX/bin ? Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Tomas Kalibera
2017-Apr-26 14:39 UTC
[Rd] tempdir() may be deleted during long-running R session
I agree this should be solved in configuration of systemd/tmpreaper/whatever tmp cleaner - the cleanup must be prevented in configuration files of these tools. Moving session directories under /var/run (XDG_RUNTIME_DIR) does not seem like a good solution to me, sooner or later someone might come with auto-cleaning that directory too. It might still be useful if R could sometimes detect when automated cleanup happened and warn the user. Perhaps a simple way could be to always create an empty file inside session directory, like ".tmp_cleaner_trap". R would never touch this file, but check its existence time-to-time. If it gets deleted, R would issue a warning and ask the user to check tmp cleaner configuration. The idea is that this file will be the oldest one in the session directory, so would get cleaned up first. Tomas On 04/26/2017 02:29 PM, Duncan Murdoch wrote:> On 26/04/2017 4:21 AM, Martin Maechler wrote: >>>>>>> <frederik at ofb.net> >>>>>>> on Tue, 25 Apr 2017 21:13:59 -0700 writes: >> >> > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote: >> >> Might this combination serve the purpose: >> >> * R session keeps an open handle on the tempdir it creates, >> >> * whatever tempdir harvesting cron job the user has be made >> sensitive enough not to delete open files (including open directories) >> >> I also agree that the above would be ideal - if possible. >> >> > Good suggestion but doesn't work with the (increasingly popular) >> > "Systemd": >> >> > $ mkdir /tmp/somedir >> > $ touch -d "12 days ago" /tmp/somedir/ >> > $ cd /tmp/somedir/ >> > $ sudo systemd-tmpfiles --clean >> > $ ls /tmp/somedir/ >> > ls: cannot access '/tmp/somedir/': No such file or directory >> >> Some thing like your example is what I'd expect is always a >> possibility on some platforms, all of course depending on low >> things such as root/syadmin/... "permission" to clean up etc. >> >> Jeroeen mentioned the fact that tempdir()s also can disappear >> for other reasons {his was multicore child processes >> .. bugously(?) implemented}. >> Further reasons may be race conditions / user code bugs / user >> errors, etc. >> Note that the R process which created the tempdir on startup >> always has the permission to remove it again. But you can also >> think a full file system, etc. >> >> Current R-devel's tempdir(check = TRUE) would create a new >> one or give an error (and then the user should be able to use >> Sys.setenv("TEMPDIR" ...) >> to a directory she has write-permission ) >> >> Gabe's point of course is important too: If you have a long >> running process that uses a tempfile, >> and if "big brother" has removed the full tempdir() you will >> be "unhappy" in any case. >> Trying to prevent big brother from doing that in all cases seems >> "not easy" in any case. >> >> I did want to provide an easy solution to the OP situation: >> Suddenly tmpdir() is gone, and quite a few things stop working >> in the current R process {he mentioned help(), e.g.}. >> With new tmpdir(check=TRUE) facility, code could be changed >> to replace >> >> tempfile("foo") >> >> either by >> tempfile("foo", tmpdir=tempdir(check=TRUE)) >> >> or by something like >> >> tryCatch(tempfile("foo"), >> error = function(e) >> tempfile("foo", tmpdir=tempdir(check=TRUE))) >> >> or be even more sophisticated. >> >> We could also consider allowing check = TRUE | NA | FALSE >> >> and make NA the default and have that correspond to >> check =TRUE but additionally do the equivalent of >> warning("tempdir() has become invalid and been recreated") >> in case the tempdir() had been invalid. >> >> > I would advocate just changing 'tempfile()' so that it >> recreates the >> > directory where the file is (the "dirname") before returning >> the file >> > path. This would have fixed the issue I ran into. Changing >> 'tempdir()' >> > to recreate the directory is another option. >> >> In the end I had decided that >> >> tempfile("foo", tmpdir = tempdir(check = TRUE)) >> >> is actually better self-documenting than >> >> tempfile("foo", checkDir = TRUE) >> >> which was my first inclination. >> >> Note again that currently, the checking is _off_ by default. >> I've just provided a tool -- which was relatively easy and >> platform independent! --- to do more (real and thought) >> experiments. > > This seems like the wrong approach. The problem occurs as soon as the > tempdir() gets cleaned up: there could be information in temp files > that gets lost at that point. So the solution should be to prevent > the cleanup, not to continue on after it has occurred (as "check = > TRUE" does). This follows the principle that it's better for the > process to always die than to sometimes silently produce incorrect > results. > > Frederick posted the way to do this in systems using systemd. We > should be putting that in place, or the equivalent on systems using > other tempfile cleanups. This looks to me like something that "make > install" should do, or perhaps it should be done by people putting > together packages for specific systems. > > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2017-Apr-26 15:09 UTC
[Rd] tempdir() may be deleted during long-running R session
>>>>> Dirk Eddelbuettel <edd at debian.org> >>>>> on Wed, 26 Apr 2017 08:40:38 -0500 writes:> On 26 April 2017 at 08:29, Duncan Murdoch wrote: > | This seems like the wrong approach. The problem occurs as soon as the > | tempdir() gets cleaned up: there could be information in temp files > | that gets lost at that point. So the solution should be to prevent the > | cleanup, not to continue on after it has occurred (as "check = TRUE" > | does). This follows the principle that it's better for the process to > | always die than to sometimes silently produce incorrect results. > That is generally true, but also "hard" as we don't have a handle on the OS . Indeed... and that was the reason I've proposed the simple platform agnostic tool which does not entirely solve the problem (in this sense I agree with "wrong approach") but allows to mitigate it and (by followup changes) to work around many use case problems. > | Frederick posted the way to do this in systems using systemd. We should > While that was a very helpful post yet it may only apply to Arch Linux as > stated. My Ubuntu systems at home and work all run systemd too, but do _not_ > automatically remove tempfiles. > Yet what he suggested is quite right: we should define a proper config file > for this facility and then possibly also use the /run directory as many other > services now and (of course) also either TEMPDIR or later the code to have > /run be another fallback if TMP, TEMP, TMPDIR, ... are unset. > Distribution maintainers such as yours truly could then include this > configuration. > | be putting that in place, or the equivalent on systems using other > | tempfile cleanups. This looks to me like something that "make install" > | should do, or perhaps it should be done by people putting together > | packages for specific systems. > Doesn't 'make install' only write to $RHOME/ and below, plus $PREFIX/bin ? Also, 'make install' is optional for good reasons. E.g., I never ever run 'make install': I typically always have many R versions, all available in the shell and ESS (Emacs Speaks Statistics) via symbolic links into a directory on PATH. Dirk mentioned (as well) that this is all very platform specific which I do think is important. From my typical OS point of view: Why should the user who runs R not have the right to delete the tempdir which was created by the process that she runs and hence owns ? I agree it would be an improvement if we made such deletion much harder than it is now, and yes, there may be great (almost) cross-platform tools available to manage this much better than we do now, e.g., via open files. Before we are there, I would find it useful to have a new 'tempdir' (i.e. folder/directory for R's temporary files) to be re-created manually or automagically in those cases it has disappeared, and that is within easy reach via the proposed tempdir() functionality. OTOH, I typically live very well by quickly killing and restarting R (from inside ESS). The OP issue was to help newbies and computer-non-experts, the latter nowadays comprising more than 90% of R users (I'd guess ~ 98% looking at our otherwise smart students). These are typically "slightly" confused when they ask for help and get a pretty severe error message: > ?lm Error in file(out, "wt") : cannot open the connection In addition: Warning message: In file(out, "wt") : cannot open file '/tmp/RtmpztK6f7/Rtxt36972b91938': No such file or directory Martin
Dirk Eddelbuettel
2017-Apr-26 15:14 UTC
[Rd] tempdir() may be deleted during long-running R session
On 26 April 2017 at 16:39, Tomas Kalibera wrote: | I agree this should be solved in configuration of | systemd/tmpreaper/whatever tmp cleaner - the cleanup must be prevented Yep. | in configuration files of these tools. Moving session directories under | /var/run (XDG_RUNTIME_DIR) does not seem like a good solution to me, | sooner or later someone might come with auto-cleaning that directory too. (These days it seems /run is used too. I seem to have a screenful of things in there.) | It might still be useful if R could sometimes detect when automated | cleanup happened and warn the user. Perhaps a simple way could be to | always create an empty file inside session directory, like | ".tmp_cleaner_trap". R would never touch this file, but check its | existence time-to-time. If it gets deleted, R would issue a warning and | ask the user to check tmp cleaner configuration. The idea is that this | file will be the oldest one in the session directory, so would get | cleaned up first. That's a very good third idea. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Duncan Murdoch
2017-Apr-26 17:47 UTC
[Rd] tempdir() may be deleted during long-running R session
On 26/04/2017 10:39 AM, Tomas Kalibera wrote:> > I agree this should be solved in configuration of > systemd/tmpreaper/whatever tmp cleaner - the cleanup must be prevented > in configuration files of these tools. Moving session directories under > /var/run (XDG_RUNTIME_DIR) does not seem like a good solution to me, > sooner or later someone might come with auto-cleaning that directory too. > > It might still be useful if R could sometimes detect when automated > cleanup happened and warn the user. Perhaps a simple way could be to > always create an empty file inside session directory, like > ".tmp_cleaner_trap". R would never touch this file, but check its > existence time-to-time. If it gets deleted, R would issue a warning and > ask the user to check tmp cleaner configuration. The idea is that this > file will be the oldest one in the session directory, so would get > cleaned up first.Yes, I like that idea, as long as checking for its existence doesn't make some system think it is in use and therefore protected from deletion. Duncan Murdoch> > Tomas > > > On 04/26/2017 02:29 PM, Duncan Murdoch wrote: >> On 26/04/2017 4:21 AM, Martin Maechler wrote: >>>>>>>> <frederik at ofb.net> >>>>>>>> on Tue, 25 Apr 2017 21:13:59 -0700 writes: >>> >>> > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote: >>> >> Might this combination serve the purpose: >>> >> * R session keeps an open handle on the tempdir it creates, >>> >> * whatever tempdir harvesting cron job the user has be made >>> sensitive enough not to delete open files (including open directories) >>> >>> I also agree that the above would be ideal - if possible. >>> >>> > Good suggestion but doesn't work with the (increasingly popular) >>> > "Systemd": >>> >>> > $ mkdir /tmp/somedir >>> > $ touch -d "12 days ago" /tmp/somedir/ >>> > $ cd /tmp/somedir/ >>> > $ sudo systemd-tmpfiles --clean >>> > $ ls /tmp/somedir/ >>> > ls: cannot access '/tmp/somedir/': No such file or directory >>> >>> Some thing like your example is what I'd expect is always a >>> possibility on some platforms, all of course depending on low >>> things such as root/syadmin/... "permission" to clean up etc. >>> >>> Jeroeen mentioned the fact that tempdir()s also can disappear >>> for other reasons {his was multicore child processes >>> .. bugously(?) implemented}. >>> Further reasons may be race conditions / user code bugs / user >>> errors, etc. >>> Note that the R process which created the tempdir on startup >>> always has the permission to remove it again. But you can also >>> think a full file system, etc. >>> >>> Current R-devel's tempdir(check = TRUE) would create a new >>> one or give an error (and then the user should be able to use >>> Sys.setenv("TEMPDIR" ...) >>> to a directory she has write-permission ) >>> >>> Gabe's point of course is important too: If you have a long >>> running process that uses a tempfile, >>> and if "big brother" has removed the full tempdir() you will >>> be "unhappy" in any case. >>> Trying to prevent big brother from doing that in all cases seems >>> "not easy" in any case. >>> >>> I did want to provide an easy solution to the OP situation: >>> Suddenly tmpdir() is gone, and quite a few things stop working >>> in the current R process {he mentioned help(), e.g.}. >>> With new tmpdir(check=TRUE) facility, code could be changed >>> to replace >>> >>> tempfile("foo") >>> >>> either by >>> tempfile("foo", tmpdir=tempdir(check=TRUE)) >>> >>> or by something like >>> >>> tryCatch(tempfile("foo"), >>> error = function(e) >>> tempfile("foo", tmpdir=tempdir(check=TRUE))) >>> >>> or be even more sophisticated. >>> >>> We could also consider allowing check = TRUE | NA | FALSE >>> >>> and make NA the default and have that correspond to >>> check =TRUE but additionally do the equivalent of >>> warning("tempdir() has become invalid and been recreated") >>> in case the tempdir() had been invalid. >>> >>> > I would advocate just changing 'tempfile()' so that it >>> recreates the >>> > directory where the file is (the "dirname") before returning >>> the file >>> > path. This would have fixed the issue I ran into. Changing >>> 'tempdir()' >>> > to recreate the directory is another option. >>> >>> In the end I had decided that >>> >>> tempfile("foo", tmpdir = tempdir(check = TRUE)) >>> >>> is actually better self-documenting than >>> >>> tempfile("foo", checkDir = TRUE) >>> >>> which was my first inclination. >>> >>> Note again that currently, the checking is _off_ by default. >>> I've just provided a tool -- which was relatively easy and >>> platform independent! --- to do more (real and thought) >>> experiments. >> >> This seems like the wrong approach. The problem occurs as soon as the >> tempdir() gets cleaned up: there could be information in temp files >> that gets lost at that point. So the solution should be to prevent >> the cleanup, not to continue on after it has occurred (as "check >> TRUE" does). This follows the principle that it's better for the >> process to always die than to sometimes silently produce incorrect >> results. >> >> Frederick posted the way to do this in systems using systemd. We >> should be putting that in place, or the equivalent on systems using >> other tempfile cleanups. This looks to me like something that "make >> install" should do, or perhaps it should be done by people putting >> together packages for specific systems. >> >> Duncan Murdoch >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >
frederik at ofb.net
2017-Apr-27 20:44 UTC
[Rd] tempdir() may be deleted during long-running R session
> Frederick posted the way to do this in systems using systemd. We should be > putting that in place, or the equivalent on systems using other tempfile > cleanups. This looks to me like something that "make install" should do, or > perhaps it should be done by people putting together packages for specific > systems.For Arch's 'apache' package it is just a file in the package directory that gets installed by the PKGBUILD script: install -D -m644 "${srcdir}/apache.tmpfiles.conf" "${pkgdir}/usr/lib/tmpfiles.d/apache.conf" Similarly for the 'screen' package. I added a task in the Arch bug tracker to do this for R: https://bugs.archlinux.org/task/53848 I don't get the sense that it is customary to put examples of such files into the source distribution, so I think nothing more needs to be done on your side (aside from alerting package maintainers for other distributions). Thanks, Frederick
Apparently Analagous Threads
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session
- tempdir() may be deleted during long-running R session