Sean O'Riordain
2013-Apr-10  14:20 UTC
[Rd] Issue with Control-Z in a text file on Windows - readLines() appears to truncate
Working on Windows I have had to deal with CSV files that,
unfortunately, contain embedded Control-Zs, i.e. ASCII character 26 in
decimal, and the readLines() function in R on Windows (2.15.2 and
3.0.0) appears to truncate at the control-Z.  There is no problem at
all on Ubuntu Linux with R 3.0.0.
Am I mistaken or is this genuine?
# Create a small file with embedded Control-Z
h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(65, 26, 65))),
'",99')
h3
#  "1,34,44.4,\" A\032A \",99"
writeLines(h3, 'h3.txt')
# now attempt to read the file back in
h3a <- readLines('h3.txt')
# but on Windows 2.15.2 and 3.0.0 I get the message
#Warning message:
#In readLines("h3.txt") : incomplete final line found on
'h3.txt'
h3a
# [1] "1,34,44.4,\" A"
# so it drops from the Control-Z onwards
####
# The following is my rough and ready workaround - I'm sure there is a
cleaner way
fnam <- 'h3.txt'
tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(fnam)$size, 100))
tmp.char <- rawToChar(tmp.bin)
txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
txt
# [1] "1,34,44.4,\" A\032A \",99"
This was on 64-bit R on a 64-bit Windows 7, but it also appears to be
the case in a 32-bit R 2.15.2 on 32-bit Windows-7 inside in a
VirtualBox.
Kind regards,
Sean O'Riordain
Trinity College
Dublin
Duncan Murdoch
2013-Apr-10  19:47 UTC
[Rd] Issue with Control-Z in a text file on Windows - readLines() appears to truncate
On 10/04/2013 10:20 AM, Sean O'Riordain wrote:> Working on Windows I have had to deal with CSV files that, > unfortunately, contain embedded Control-Zs, i.e. ASCII character 26 in > decimal, and the readLines() function in R on Windows (2.15.2 and > 3.0.0) appears to truncate at the control-Z. There is no problem at > all on Ubuntu Linux with R 3.0.0. > > Am I mistaken or is this genuine?Ctrl-Z is the old text file EOF marker from MSDOS. readLines() normally reads files in text mode using the Microsoft Visual C libraries, so I wouldn't be surprised if they respect Ctrl-Z as EOF. A simpler workaround than the one you used is to read the file in binary mode, e.g. f <- file("h3.txt", "rb") readLines(f) close(f) See the ?file help topic for a discussion of the limitations this may impose on you. Duncan Murdoch> > # Create a small file with embedded Control-Z > h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(65, 26, 65))), '",99') > h3 > # "1,34,44.4,\" A\032A \",99" > writeLines(h3, 'h3.txt') > > # now attempt to read the file back in > h3a <- readLines('h3.txt') > # but on Windows 2.15.2 and 3.0.0 I get the message > #Warning message: > #In readLines("h3.txt") : incomplete final line found on 'h3.txt' > h3a > # [1] "1,34,44.4,\" A" > # so it drops from the Control-Z onwards > > #### > # The following is my rough and ready workaround - I'm sure there is a > cleaner way > fnam <- 'h3.txt' > tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(fnam)$size, 100)) > tmp.char <- rawToChar(tmp.bin) > txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE)) > txt > # [1] "1,34,44.4,\" A\032A \",99" > > This was on 64-bit R on a 64-bit Windows 7, but it also appears to be > the case in a 32-bit R 2.15.2 on 32-bit Windows-7 inside in a > VirtualBox. > > Kind regards, > Sean O'Riordain > Trinity College > Dublin > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel