ripley@stats.ox.ac.uk
2005-May-19 18:47 UTC
[Rd] problems with truncate() with files > 2Gb under Windows (PR#7880)
__USE_LARGEFILE is a standard Unix way to allow > 2Gb files on 32-bit OSes by using f{seek,tell}o Take a look at the definition of f_tell: #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) #define f_seek fseeko #define f_tell ftello #else #ifdef Win32 #define f_seek fseeko64 #define f_tell ftello64 #else #define f_seek fseek #define f_tell ftell #endif #endif Windows support for > 2Gb files seemed flaky, but we did not think it was R's job to report OS deficiencies. I've now used off64_t in file_seek under Windows. On Thu, 19 May 2005 tplate@blackmesacapital.com wrote:> This message relates to handling files > 2Gb under Windows. (I use 2Gb > as shorthand for 2^31-1 -- the largest integer representable in a signed > 32 bit integer.) > > First issue: truncate() is not able to successfully truncate files at a > position > 2Gb. This appears to be due to the use of the Windows > function chsize() in file_truncate() in main/connections.c (chsize() > takes a long int specification of the file size, so we would not expect > it to work for positions > 2Gb). > > The Windows API has the function SetEndOfFile(handle) that is > supposed to truncate the file to the current position. However, this > function does not seem to function correctly when the current position > is beyond 2Gb, so it is not improvement on chsize() (at least under > Windows 2000). My explorations with Windows 2000 SP2 and XP Prof SP1 > indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to > sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb. > So I have no suggestions on how to get this to work. Probably, the > best thing to do would be to stop with in error in the appropriate > situations. > > Second issue: although the R function seek() can take a seek position > specified as a double, which allows it to seek to a position beyond 2Gb, > the return value from seek() appears to be a 32-bit signed integer, > resulting in strange (incorrect) return values from seek(), though > otherwise not affecting correct operation. > > Inspecting the code, I wonder whether the lines > > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t pos = f_tell(fp); > #else > long pos = f_tell(fp); > #endif > > in the definition of file_seek() in main/connections.c should be more > along the lines of the code defining struct fileconn in > include/Rconnections.h: > > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t rpos, wpos; > #else > #ifdef Win32 > off64_t rpos, wpos; > #else > long rpos, wpos; > #endif > #endif > > I compiled and tested a version of R devel 2.2.0 with the appropriate > simple change to file_seek() in main/connections.c, and with it, seek() > correctly returned file positions beyond 2Gb. However, I don't know > the purpose of the #define __USE_LARGEFILE (and I couldn't find any info > about googling about it on r-project.org), so I'm hesitant to offer a > patch. Here's the new block of code I used in main/connections.c that > worked ok under Windows : > > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t pos = f_tell(fp); > #else > #ifdef Win32 > off64_t pos = f_tell(fp); > #else > long pos = f_tell(fp); > #endif > #endif > > I'll be happy to submit a patch that addresses these issues, if someone > will explain the usage and purpose of __USE_LARGEFILE. > > The following transcript, which illustrates both issues (without my > mods), was created from an installation based on the precompiled version > of R for Windows. (rw2010.exe). > > -- Tony Plate > >> options(digits=15) >> >> # can truncate a short file from 8 bytes to 4 bytes >> # first create a file with 8 bytes >> f <- file("tmp1.txt", "wb") >> writeLines(c("abc", "def"), f) >> close(f) >> # check length then truncate to 4 bytes >> f <- file("tmp1.txt", "r+b") >> seek(f, 0, "end") > [1] 0 >> seek(f, NA) > [1] 8 >> seek(f, 4) > [1] 8 >> truncate(f) > NULL >> seek(f, 0, "end") > [1] 4 >> seek(f, NA) > [1] 4 >> close(f) >> # can truncate a long file from 2000000008 bytes to 2000000004 bytes >> # first create a file with 2000000008 bytes (slightly < 2^31) >> f <- file("tmp1.txt", "wb") >> seek(f, 2000000000) > [1] 0 >> writeLines(c("abc", "def"), f) >> close(f) >> f <- file("tmp1.txt", "r+b") >> seek(f, 0, "end") > [1] 0 >> seek(f, NA) > [1] 2000000008 >> seek(f, 2000000004) > [1] 2000000008 >> truncate(f) > NULL >> seek(f, 0, "end") > [1] 2000000004 >> seek(f, NA) > [1] 2000000004 >> close(f) >> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes >> # first create a file with 2200000008 bytes (slightly > 2^31) >> f <- file("tmp1.txt", "wb") >> seek(f, 2200000000) > [1] 0 >> writeLines(c("abc", "def"), f) >> close(f) >> f <- file("tmp1.txt", "r+b") >> seek(f, 0, "end") > [1] 0 >> seek(f, NA) # bad reported value of the current position of "2200000008" > [1] -2094967288 >> 2200000008 - 2^32 > [1] -2094967288 >> seek(f, 2200000004) > [1] -2094967288 >> truncate(f) # doesn't work! > NULL >> seek(f, 0, "end") > [1] -2094967288 >> # see if we successfully truncated... (no -- same length as before >> # can also verify this by watching file size with 'ls -l') >> seek(f, NA) # file is same size as before the attempted truncation > [1] -2094967288 >> close(f) >> version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R >> > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Maybe Matching Threads
- problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)
- fix for broken largefile seek() on 32-bit linux (PR#9883)
- seek(con, 0, "end", rw="r") does not always work correctly (PR#7901)
- (PR#7899) seek(con, 0, "end", rw="r") does not always work
- [LLVMdev] Why a function pointer field in a LLVM IR struct is replaced by {}*?