Mikko Korpela
2017-Apr-21 12:13 UTC
[Rd] tempdir() may be deleted during long-running R session
On 21/04/17 14:03, Prof Brian Ripley wrote:> From the R-admin manual ?5: > > 'Various environment variables can be set to determine where R creates > its per-session temporary directory. The environment variables TMPDIR, > TMP and TEMP are searched in turn and the first one which is set and > points to a writable area is used. If none do, the final default is /tmp > on Unix-alikes and the value of R_USER on Windows. The path should be an > absolute path not containing spaces (and it is best to avoid > non-alphanumeric characters such as +). > > Some Unix-alike systems are set up to remove files and directories > periodically from /tmp, for example by a cron job running tmpwatch. Set > TMPDIR to another directory before starting long-running jobs on such a > system.'I am sorry for having missed this part of the manual, where the issue indeed is clearly documented.> > > On 21/04/2017 11:49, Mikko Korpela wrote: >> Temporary files not accessed for a long time are automatically removed >> in some Linux distributions and probably other operating systems too, >> depending on system configuration. This may affect the per-session >> temporary directory, the path of which is returned by tempdir(). I think > > Not for those who follow the manual and know that sysadmnins have > enabled such a script. > >> it would be nice if R automatically tried to recreate a missing >> tempdir() but this could have some performance implications.Despite my obvious failure to read the manual and report this properly, I will try to make a case. I understand that data stored in a temporary file may disappear, and for that reason using an alternative TMPDIR might be advisable. However, I think that creating a new temporary file is a different case, and it would be nice if `?` and `help` continued to work, for example. I understand if this will not be put on the R core list of things to do.>> >> I ran the same test (below) on R 3.3.3 patched, R 3.4.0 beta, and >> R-devel, all at r72499 (2017-04-09) and compiled by myself. The results >> from the test were practically identical on all of those versions, the >> test platform being Ubuntu 14.04.5 LTS. This system is configured for a >> /tmp cleanup threshold of 7 days of inactivity (which is the default). >> After a wait of roughly 10 days, the R temporary directory had been >> deleted by an automatic cleanup procedure, and a call to `?` failed. >> This StackExchange question has some answers about the Ubuntu /tmp >> cleanup practice: https://askubuntu.com/q/20783 >> >> a <- print(tempdir()) >> # [1] "/tmp/user/1069138/RtmpGc9M5z" >> dir.exists(a) # TRUE >> # [1] TRUE >> Sys.time() >> # [1] "2017-04-10 16:00:30 EEST" >> ## Wait for one week (Ubuntu 14.04.5 LTS) >> print(Sys.time()); ?regex >> # [1] "2017-04-20 14:17:29 EEST" >> # Error in file(out, "wt") : cannot open the connection >> # In addition: Warning message: >> # In file(out, "wt") : >> # cannot open file '/tmp/user/1069138/RtmpGc9M5z/Rtxt3dbb65870ad4': No >> such file or directory >> b <- print(tempdir()) >> # [1] "/tmp/user/1069138/RtmpGc9M5z" >> identical(a, b) >> # [1] TRUE >> dir.exists(b) >> # [1] FALSE >>-- Mikko Korpela Department of Geosciences and Geography University of Helsinki
Dirk Eddelbuettel
2017-Apr-21 12:42 UTC
[Rd] tempdir() may be deleted during long-running R session
On 21 April 2017 at 15:13, Mikko Korpela wrote: | Despite my obvious failure to read the manual and report this properly, | I will try to make a case. I understand that data stored in a temporary | file may disappear, and for that reason using an alternative TMPDIR | might be advisable. However, I think that creating a new temporary file | is a different case, and it would be nice if `?` and `help` continued to | work, for example. I understand if this will not be put on the R core | list of things to do. It's complicated as it is clearly an interaction between the hosting OS and the R application running. "R cannot know" what policy the host OS may be having. You could also talk to your sys.admins and have the service configured. Eg on my system the description for the tmpreaper package reads This package provides a program that can be used to clean out temporary-file directories. It recursively searches the directory, refusing to chdir() across symlinks, and removes files that haven't been accessed in a user-specified amount of time. You can specify a set of files to protect from deletion with a shell pattern. It will not remove files owned by the process EUID that have the `w' bit clear, unless you ask it to, much like `rm -f'. `tmpreaper' will not remove symlinks, sockets, fifos, or special files unless given a command line option enabling it to. . WARNING: Please do not run `tmpreaper' on `/'. There are no protections against this written into the program, as that would prevent it from functioning the way you'd expect it to in a `chroot(8)' environment. . The daily tmpreaper run can be configured through /etc/tmpreaper.conf . which makes it clear that you can configure local behaviour. Lastly, as the manual referenced in the initial reply says, you are in fact in full control of this as you can set the environment variables for your R sessions. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
frederik at ofb.net
2017-Apr-21 17:34 UTC
[Rd] tempdir() may be deleted during long-running R session
Hi Mikko, I was bitten by this recently and I think some of the replies are missing the point. As I understand it, the problem consists of these elements: 1. When R starts, it creates a directory like /tmp/RtmpVIeFj4 2. Right after R starts I can create files in this directory with no error 3. After some hours or days I can no longer create files in this directory, because it has been deleted If R expected the directory to be deleted at random, and if we expect users to call dir.create every time they access tempdir, then why did R create the directory for us at the beginning of the session? That's just setting people up to get weird bugs, which only appear in difficult-to-reproduce situations (i.e. after the session has been open for a long time). I think before we dismiss this we should think about possible in-R solutions and why they are not feasible. Are there any packages which would break if a call to 'tempdir' automatically recreated this directory? (Or would it be too much of a performance hit to have 'tempdir' check and even just issue a warning when the directory is found not to exist?) Should we have a timer which periodically updates the modification time of tempdir()? What do other long-running programs do (e.g. screen, emacs)? Thank you, Frederick P.S. I noticed that dir.create does not seem to update the access or modification time of the file. So there is also a remote possibility that the directory could be "cleaned up" in between calling 'dir.create()' and putting a file in it. Maybe this is nitpicky, but if we accept that the *really* correct practice is more complicated than just calling 'dir.create()', this also argues for putting the proper invocations into some kind of standard function - either 'tempdir()' or something else.