Doran, Harold
2005-Jul-08 11:49 UTC
[R] Possible Solution to Tempfile error (for documentation)
Dear List: I'm posting this to provide a possible solution and to document to what appears to be an R limitation. The solution is more of a cheap hack that works for now. To provide a little background, I am looping through a dataframe and creating Sweave documents using data from each row in the dataframe. It appears that this technique is not scalable to large dataframes without making some changes to the way tempfiles are handled. In the background, R generates tempfiles in a directory using a sequential number. In my case, the numbers ranged from Rf1 to Rf32767. The next in line would of course be Rf32768. This number happens to coincide with 2^15-1, which my programmer colleagues tell me is the maximum positive value a software program with 16 bit processing can create. So, R recognizes that it cannot create a new tempfile and stops the loop. According to my collegues, programmers need to be aware of this particular number and often include some flexibility within the software to allow for the temp file settings to use an unsigned or 32 bit integer for temp file names. However, I do not believe R has this capacity. So, in the interim, I have created a call using shell() to go out the the OS and delete all temp files by brute force at certain intervals within the loop. Doing so after each iteration was prohibitively slow. I am still experimenting, but I have found that choosing some arbitrary number makes this solution feasible. Within my for() loop, I have included the following conditional which goes out to DOS and deletes all temp files periodically. The portion of the shell command "/Q" is needed to make sure DOS does not prompt the user with the "Are you sure you want to delete the files". This bypasses that prompt and allows for the files to be deleted without confirmation. Here is the conditional within the for loop if (i==700|i==1400|i==2100|i==2800|i==3500|i==4200){ setwd(tempdir()) shell(paste("del /Q *.*")) setwd("G:/path/where/I/want/Sweave/files") } I tested this solution and it has extended the life of my loop and appears to solve the problem. Hence, in the absence of making changes to the way tempfiles are handled inside the software, this has proven useful to me and maybe it will to others as well. I'm not sure this is the best way to handle this and most welcome any better ideas, but it seems to provide a route that circumvents a problem. Best, Harold Windows XP R 2.11 [[alternative HTML version deleted]]
Duncan Murdoch
2005-Jul-08 12:05 UTC
[R] Possible Solution to Tempfile error (for documentation)
Doran, Harold wrote:> Dear List: > > I'm posting this to provide a possible solution and to document to what > appears to be an R limitation. The solution is more of a cheap hack that > works for now. To provide a little background, I am looping through a > dataframe and creating Sweave documents using data from each row in the > dataframe. It appears that this technique is not scalable to large > dataframes without making some changes to the way tempfiles are handled. > > In the background, R generates tempfiles in a directory using a > sequential number. In my case, the numbers ranged from Rf1 to Rf32767. > The next in line would of course be Rf32768. This number happens to > coincide with 2^15-1, which my programmer colleagues tell me is the > maximum positive value a software program with 16 bit processing can > create. So, R recognizes that it cannot create a new tempfile and stops > the loop.In Windows (which I think you're using), R uses the C function rand() to generate a "random" filename. This does have a maximum output of 32767. It would be easy to change, but Windows file systems aren't particularly good at handling such large directories; maybe this limit is a sign that you need to change the algorithm?> According to my collegues, programmers need to be aware of this > particular number and often include some flexibility within the software > to allow for the temp file settings to use an unsigned or 32 bit integer > for temp file names. However, I do not believe R has this capacity. > > So, in the interim, I have created a call using shell() to go out the > the OS and delete all temp files by brute force at certain intervals > within the loop. Doing so after each iteration was prohibitively slow. I > am still experimenting, but I have found that choosing some arbitrary > number makes this solution feasible. > > Within my for() loop, I have included the following conditional which > goes out to DOS and deletes all temp files periodically. The portion of > the shell command "/Q" is needed to make sure DOS does not prompt the > user with the "Are you sure you want to delete the files". This bypasses > that prompt and allows for the files to be deleted without confirmation. > > > Here is the conditional within the for loop > > if (i==700|i==1400|i==2100|i==2800|i==3500|i==4200){ > setwd(tempdir()) > shell(paste("del /Q *.*")) > setwd("G:/path/where/I/want/Sweave/files") > }If this works, it indicates that the temp files are no longer needed by this point. So the real question is, why are they still there? Shouldn't they have been deleted after they were used? I wasn't following your previous posts, but did you create the tempfiles (in which case you should delete them once you were done with them), or did R? Duncan Murdoch> > I tested this solution and it has extended the life of my loop and > appears to solve the problem. Hence, in the absence of making changes to > the way tempfiles are handled inside the software, this has proven > useful to me and maybe it will to others as well. > > I'm not sure this is the best way to handle this and most welcome any > better ideas, but it seems to provide a route that circumvents a > problem.