tplate@blackmesacapital.com
2005-May-19 17:48 UTC
[Rd] problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)
This message relates to handling files > 2Gb under Windows. (I use 2Gb as shorthand for 2^31-1 -- the largest integer representable in a signed 32 bit integer.) First issue: truncate() is not able to successfully truncate files at a position > 2Gb. This appears to be due to the use of the Windows function chsize() in file_truncate() in main/connections.c (chsize() takes a long int specification of the file size, so we would not expect it to work for positions > 2Gb). The Windows API has the function SetEndOfFile(handle) that is supposed to truncate the file to the current position. However, this function does not seem to function correctly when the current position is beyond 2Gb, so it is not improvement on chsize() (at least under Windows 2000). My explorations with Windows 2000 SP2 and XP Prof SP1 indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb. So I have no suggestions on how to get this to work. Probably, the best thing to do would be to stop with in error in the appropriate situations. Second issue: although the R function seek() can take a seek position specified as a double, which allows it to seek to a position beyond 2Gb, the return value from seek() appears to be a 32-bit signed integer, resulting in strange (incorrect) return values from seek(), though otherwise not affecting correct operation. Inspecting the code, I wonder whether the lines #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) off_t pos = f_tell(fp); #else long pos = f_tell(fp); #endif in the definition of file_seek() in main/connections.c should be more along the lines of the code defining struct fileconn in include/Rconnections.h: #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) off_t rpos, wpos; #else #ifdef Win32 off64_t rpos, wpos; #else long rpos, wpos; #endif #endif I compiled and tested a version of R devel 2.2.0 with the appropriate simple change to file_seek() in main/connections.c, and with it, seek() correctly returned file positions beyond 2Gb. However, I don't know the purpose of the #define __USE_LARGEFILE (and I couldn't find any info about googling about it on r-project.org), so I'm hesitant to offer a patch. Here's the new block of code I used in main/connections.c that worked ok under Windows : #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) off_t pos = f_tell(fp); #else #ifdef Win32 off64_t pos = f_tell(fp); #else long pos = f_tell(fp); #endif #endif I'll be happy to submit a patch that addresses these issues, if someone will explain the usage and purpose of __USE_LARGEFILE. The following transcript, which illustrates both issues (without my mods), was created from an installation based on the precompiled version of R for Windows. (rw2010.exe). -- Tony Plate> options(digits=15) > > # can truncate a short file from 8 bytes to 4 bytes > # first create a file with 8 bytes > f <- file("tmp1.txt", "wb") > writeLines(c("abc", "def"), f) > close(f) > # check length then truncate to 4 bytes > f <- file("tmp1.txt", "r+b") > seek(f, 0, "end")[1] 0> seek(f, NA)[1] 8> seek(f, 4)[1] 8> truncate(f)NULL> seek(f, 0, "end")[1] 4> seek(f, NA)[1] 4> close(f) > # can truncate a long file from 2000000008 bytes to 2000000004 bytes > # first create a file with 2000000008 bytes (slightly < 2^31) > f <- file("tmp1.txt", "wb") > seek(f, 2000000000)[1] 0> writeLines(c("abc", "def"), f) > close(f) > f <- file("tmp1.txt", "r+b") > seek(f, 0, "end")[1] 0> seek(f, NA)[1] 2000000008> seek(f, 2000000004)[1] 2000000008> truncate(f)NULL> seek(f, 0, "end")[1] 2000000004> seek(f, NA)[1] 2000000004> close(f) > # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes > # first create a file with 2200000008 bytes (slightly > 2^31) > f <- file("tmp1.txt", "wb") > seek(f, 2200000000)[1] 0> writeLines(c("abc", "def"), f) > close(f) > f <- file("tmp1.txt", "r+b") > seek(f, 0, "end")[1] 0> seek(f, NA) # bad reported value of the current position of "2200000008"[1] -2094967288> 2200000008 - 2^32[1] -2094967288> seek(f, 2200000004)[1] -2094967288> truncate(f) # doesn't work!NULL> seek(f, 0, "end")[1] -2094967288> # see if we successfully truncated... (no -- same length as before > # can also verify this by watching file size with 'ls -l') > seek(f, NA) # file is same size as before the attempted truncation[1] -2094967288> close(f) > version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 1.0 year 2005 month 04 day 18 language R>
Prof Brian Ripley
2005-May-20 11:21 UTC
[Rd] problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)
To follow up on the truncate() part of this, Windows does not use chsize directly any more, but ftruncate like all other platforms. However, truncate() was limited to files < 2Gb on all platforms. I have changed the latter and your example now works both on 32-bit Windows and on 64-bit Linux. On Thu, 19 May 2005 tplate@blackmesacapital.com wrote:> This message relates to handling files > 2Gb under Windows. (I use 2Gb > as shorthand for 2^31-1 -- the largest integer representable in a signed > 32 bit integer.) > > First issue: truncate() is not able to successfully truncate files at a > position > 2Gb. This appears to be due to the use of the Windows > function chsize() in file_truncate() in main/connections.c (chsize() > takes a long int specification of the file size, so we would not expect > it to work for positions > 2Gb). > > The Windows API has the function SetEndOfFile(handle) that is > supposed to truncate the file to the current position. However, this > function does not seem to function correctly when the current position > is beyond 2Gb, so it is not improvement on chsize() (at least under > Windows 2000). My explorations with Windows 2000 SP2 and XP Prof SP1 > indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to > sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb. > So I have no suggestions on how to get this to work. Probably, the > best thing to do would be to stop with in error in the appropriate > situations. > > Second issue: although the R function seek() can take a seek position > specified as a double, which allows it to seek to a position beyond 2Gb, > the return value from seek() appears to be a 32-bit signed integer, > resulting in strange (incorrect) return values from seek(), though > otherwise not affecting correct operation. > > Inspecting the code, I wonder whether the lines > > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t pos = f_tell(fp); > #else > long pos = f_tell(fp); > #endif > > in the definition of file_seek() in main/connections.c should be more > along the lines of the code defining struct fileconn in > include/Rconnections.h: > > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t rpos, wpos; > #else > #ifdef Win32 > off64_t rpos, wpos; > #else > long rpos, wpos; > #endif > #endif > > I compiled and tested a version of R devel 2.2.0 with the appropriate > simple change to file_seek() in main/connections.c, and with it, seek() > correctly returned file positions beyond 2Gb. However, I don't know > the purpose of the #define __USE_LARGEFILE (and I couldn't find any info > about googling about it on r-project.org), so I'm hesitant to offer a > patch. Here's the new block of code I used in main/connections.c that > worked ok under Windows : > > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t pos = f_tell(fp); > #else > #ifdef Win32 > off64_t pos = f_tell(fp); > #else > long pos = f_tell(fp); > #endif > #endif > > I'll be happy to submit a patch that addresses these issues, if someone > will explain the usage and purpose of __USE_LARGEFILE. > > The following transcript, which illustrates both issues (without my > mods), was created from an installation based on the precompiled version > of R for Windows. (rw2010.exe). > > -- Tony Plate > >> options(digits=15) >> >> # can truncate a short file from 8 bytes to 4 bytes >> # first create a file with 8 bytes >> f <- file("tmp1.txt", "wb") >> writeLines(c("abc", "def"), f) >> close(f) >> # check length then truncate to 4 bytes >> f <- file("tmp1.txt", "r+b") >> seek(f, 0, "end") > [1] 0 >> seek(f, NA) > [1] 8 >> seek(f, 4) > [1] 8 >> truncate(f) > NULL >> seek(f, 0, "end") > [1] 4 >> seek(f, NA) > [1] 4 >> close(f) >> # can truncate a long file from 2000000008 bytes to 2000000004 bytes >> # first create a file with 2000000008 bytes (slightly < 2^31) >> f <- file("tmp1.txt", "wb") >> seek(f, 2000000000) > [1] 0 >> writeLines(c("abc", "def"), f) >> close(f) >> f <- file("tmp1.txt", "r+b") >> seek(f, 0, "end") > [1] 0 >> seek(f, NA) > [1] 2000000008 >> seek(f, 2000000004) > [1] 2000000008 >> truncate(f) > NULL >> seek(f, 0, "end") > [1] 2000000004 >> seek(f, NA) > [1] 2000000004 >> close(f) >> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes >> # first create a file with 2200000008 bytes (slightly > 2^31) >> f <- file("tmp1.txt", "wb") >> seek(f, 2200000000) > [1] 0 >> writeLines(c("abc", "def"), f) >> close(f) >> f <- file("tmp1.txt", "r+b") >> seek(f, 0, "end") > [1] 0 >> seek(f, NA) # bad reported value of the current position of "2200000008" > [1] -2094967288 >> 2200000008 - 2^32 > [1] -2094967288 >> seek(f, 2200000004) > [1] -2094967288 >> truncate(f) # doesn't work! > NULL >> seek(f, 0, "end") > [1] -2094967288 >> # see if we successfully truncated... (no -- same length as before >> # can also verify this by watching file size with 'ls -l') >> seek(f, NA) # file is same size as before the attempted truncation > [1] -2094967288 >> close(f) >> version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R >> > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Possibly Parallel Threads
- problems with truncate() with files > 2Gb under Windows (PR#7880)
- fix for broken largefile seek() on 32-bit linux (PR#9883)
- (PR#7899) seek(con, 0, "end", rw="r") does not always work
- seek(con, 0, "end", rw="r") does not always work correctly (PR#7901)
- [LLVMdev] Why a function pointer field in a LLVM IR struct is replaced by {}*?