Heuvel, E.G. van den (Guido)
2025-Jun-25 06:59 UTC
[R] Potential bug in readLines when reading empty lines
Hi all, I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows: If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case: --- print(R.version) # platform x86_64-w64-mingw32 # arch x86_64 # os mingw32 # crt ucrt # system x86_64, mingw32 # status # major 4 # minor 4.0 # year 2024 # month 04 # day 24 # svn rev 86474 # language R # version.string R version 4.4.0 (2024-04-24 ucrt) # nickname Puppy Cup txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") # Write txt_original as binary to avoid unwanted conversion of end of line markers writeBin(charToRaw(txt_original), "test.txt") txt_actual <- readLines("test.txt") print(txt_actual) # [1] "Line 1" "Line 3" --- I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped. Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact? Best regards, Guido van den Heuvel Statistics Netherlands
Duncan Murdoch
2025-Jun-25 14:02 UTC
[R] Potential bug in readLines when reading empty lines
On 2025-06-25 2:59 a.m., Heuvel, E.G. van den (Guido) via R-help wrote:> Hi all, > > I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows: > > If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case: > > --- > print(R.version) > # platform x86_64-w64-mingw32 > # arch x86_64 > # os mingw32 > # crt ucrt > # system x86_64, mingw32 > # status > # major 4 > # minor 4.0 > # year 2024 > # month 04 > # day 24 > # svn rev 86474 > # language R > # version.string R version 4.4.0 (2024-04-24 ucrt) > # nickname Puppy Cup > > txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")Doesn't that produce the same thing as "Line 1\r\r\nLine 3\r\n" when you write it with writeBin? If I read the ?readLines page correctly, that string contains 3 lines: Line 1 CR CR LF Line 3 CR LF On the other hand, when I use your construction or mine, I get 4 lines read by my Mac: readLines("test.txt") [1] "Line 1" "" "" "Line 3" I'd guess it is processing it as Line 1 CR CR LF Line 3 CR LF So I think there are definitely bugs or bad docs here. Duncan Murdoch> > # Write txt_original as binary to avoid unwanted conversion of end of line markers > writeBin(charToRaw(txt_original), "test.txt") > > txt_actual <- readLines("test.txt") > print(txt_actual) > # [1] "Line 1" "Line 3" > --- > > I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped. > > Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact? > > Best regards, > > Guido van den Heuvel > Statistics Netherlands > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2025-Jun-25 14:10 UTC
[R] Potential bug in readLines when reading empty lines
As a longtime programmer, I would say that your file is at fault... there is no programming standard that says any software needs to handle this kind of data in any defined way. More specifically, the only standards-based requirements I am aware of require the programmer to specify whether the file is a text file (per the convention drive by the OS) or a binary file. The fact that your file does not conform with a consistent line end mark convention means that any "automatic" identification of line end conventions is completely optional. Looking at this from the perspective of a user, I think you have two options: fix the process that is feeding you invalid data, or use binary mode to implement the parsing behavior you wish to obtain for this file format. In addition, I suppose you could develop a generic line end handling algorithm that you think would resolve this and submit a suggestion/patch to R and hope someone agrees that such a change won't cause more havoc than it avoids for other users. But that would be unlikely to happen in a timely fashion for your current needs. On June 24, 2025 11:59:58 PM PDT, "Heuvel, E.G. van den (Guido) via R-help" <r-help at r-project.org> wrote:>Hi all, > >I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows: > >If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case: > >--- >print(R.version) ># platform x86_64-w64-mingw32 ># arch x86_64 ># os mingw32 ># crt ucrt ># system x86_64, mingw32 ># status ># major 4 ># minor 4.0 ># year 2024 ># month 04 ># day 24 ># svn rev 86474 ># language R ># version.string R version 4.4.0 (2024-04-24 ucrt) ># nickname Puppy Cup > >txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") > ># Write txt_original as binary to avoid unwanted conversion of end of line markers >writeBin(charToRaw(txt_original), "test.txt") > >txt_actual <- readLines("test.txt") >print(txt_actual) ># [1] "Line 1" "Line 3" > --- > >I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped. > >Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact? > >Best regards, > >Guido van den Heuvel >Statistics Netherlands > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide https://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Ebert,Timothy Aaron
2025-Jun-25 14:39 UTC
[R] Potential bug in readLines when reading empty lines
The end of file is a problem. In my case I have data files that can end in one of several ways. A line can end with \r or \n. 1) No line feed at the end of the last row of data. 2) One line feed at the end of the last row of data. 3) Multiple line feeds at the end of the last row of data. 4) All of the above except with carriage return. 5) A file could end with a line feed and a carriage return. Some of this is "self-inflicted." People can open the data files in some other program and "accidentally" add a line feed or several. They then save and close the file before sending it to me. 1) Place all files in one folder with nothing else in the folder. 2) In R get the folder from the user. I used chose.dir() 3) get a list of all files using list.files() 4) Loop through all of the files. a) read the file in binary using readBin() b) Identify if the file uses \r\n, \n or \r. # This code will do the first step in counting the number of \r\n, then one removes \r\n from the file (if it exists) and counts \r and then \n. num_crlf <- length(gregexpr("\r\n", content, fixed = TRUE)[[1]]) b) remove all \n and \r at the end of the file. c) add one \n or \r to the end of the file as identified in 4a. d) save file e) end loop The exact code will depend on what sort of files you are dealing with. Unexpected files can generate errors unless trapped for. An empty file, or a file that has been edited by multiple users. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Heuvel, E.G. van den (Guido) via R-help Sent: Wednesday, June 25, 2025 3:00 AM To: 'r-help at R-project.org' <r-help at R-project.org> Subject: [R] Potential bug in readLines when reading empty lines [External Email] Hi all, I encountered some weird behaviour with readLines() recently, and I am wondering if this might be a bug, or, if it is not, how to resolve it. The issue is as follows: If I have a text file where a line ends with just a carriage return (\r, CR) while the next line is empty and ends in a carriage return / linefeed (\r\n, CR LF), then the empty line is skipped when reading the file with readLines. The following code contains a test case: --- print(R.version) # platform x86_64-w64-mingw32 # arch x86_64 # os mingw32 # crt ucrt # system x86_64, mingw32 # status # major 4 # minor 4.0 # year 2024 # month 04 # day 24 # svn rev 86474 # language R # version.string R version 4.4.0 (2024-04-24 ucrt) # nickname Puppy Cup txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") # Write txt_original as binary to avoid unwanted conversion of end of line markers writeBin(charToRaw(txt_original), "test.txt") txt_actual <- readLines("test.txt") print(txt_actual) # [1] "Line 1" "Line 3" --- I included the output of this script on my machine in the comments. I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line is skipped. Is this a bug? And if not, how should I read test.txt in such a way that the empty 2nd line is left intact? Best regards, Guido van den Heuvel Statistics Netherlands ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Enrico Schumann
2025-Jun-25 14:43 UTC
[R] Potential bug in readLines when reading empty lines
Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help at r-project.org>:> Hi all, > > I encountered some weird behaviour with readLines() recently, and I > am wondering if this might be a bug, or, if it is not, how to > resolve it. The issue is as follows: > > If I have a text file where a line ends with just a carriage return > (\r, CR) while the next line is empty and ends in a carriage return > / linefeed (\r\n, CR LF), then the empty line is skipped when > reading the file with readLines. The following code contains a test > case: > > --- > print(R.version) > # platform x86_64-w64-mingw32 > # arch x86_64 > # os mingw32 > # crt ucrt > # system x86_64, mingw32 > # status > # major 4 > # minor 4.0 > # year 2024 > # month 04 > # day 24 > # svn rev 86474 > # language R > # version.string R version 4.4.0 (2024-04-24 ucrt) > # nickname Puppy Cup > > txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") > > # Write txt_original as binary to avoid unwanted conversion of end > of line markers > writeBin(charToRaw(txt_original), "test.txt") > > txt_actual <- readLines("test.txt") > print(txt_actual) > # [1] "Line 1" "Line 3" > --- > > I included the output of this script on my machine in the comments. > I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), > but the empty line is skipped. > > Is this a bug? And if not, how should I read test.txt in such a way > that the empty 2nd line is left intact? > > Best regards, > > Guido van den Heuvel > Statistics NetherlandsWhat would be your "rule" for identifying lines? From your desired output, it seems \r should be end-of-line, and \n is to be ignored. Then you could do something like that: raw <- readChar("test.txt", 1000) raw <- gsub("\n", "", raw) strsplit(raw, "\r")[[1]] ## [1] "Line 1" "" "Line 3" But it requires you to specify the number of characters to read (or write a loop). -- Enrico Schumann Lucerne, Switzerland https://enricoschumann.net