Enrico Schumann
2025-Jun-25 14:43 UTC
[R] Potential bug in readLines when reading empty lines
Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help at r-project.org>:> Hi all, > > I encountered some weird behaviour with readLines() recently, and I > am wondering if this might be a bug, or, if it is not, how to > resolve it. The issue is as follows: > > If I have a text file where a line ends with just a carriage return > (\r, CR) while the next line is empty and ends in a carriage return > / linefeed (\r\n, CR LF), then the empty line is skipped when > reading the file with readLines. The following code contains a test > case: > > --- > print(R.version) > # platform x86_64-w64-mingw32 > # arch x86_64 > # os mingw32 > # crt ucrt > # system x86_64, mingw32 > # status > # major 4 > # minor 4.0 > # year 2024 > # month 04 > # day 24 > # svn rev 86474 > # language R > # version.string R version 4.4.0 (2024-04-24 ucrt) > # nickname Puppy Cup > > txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") > > # Write txt_original as binary to avoid unwanted conversion of end > of line markers > writeBin(charToRaw(txt_original), "test.txt") > > txt_actual <- readLines("test.txt") > print(txt_actual) > # [1] "Line 1" "Line 3" > --- > > I included the output of this script on my machine in the comments. > I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), > but the empty line is skipped. > > Is this a bug? And if not, how should I read test.txt in such a way > that the empty 2nd line is left intact? > > Best regards, > > Guido van den Heuvel > Statistics NetherlandsWhat would be your "rule" for identifying lines? From your desired output, it seems \r should be end-of-line, and \n is to be ignored. Then you could do something like that: raw <- readChar("test.txt", 1000) raw <- gsub("\n", "", raw) strsplit(raw, "\r")[[1]] ## [1] "Line 1" "" "Line 3" But it requires you to specify the number of characters to read (or write a loop). -- Enrico Schumann Lucerne, Switzerland https://enricoschumann.net
Heuvel, E.G. van den (Guido)
2025-Jun-25 14:56 UTC
[R] Potential bug in readLines when reading empty lines
-----Oorspronkelijk bericht----- Van: Enrico Schumann <es at enricoschumann.net> Verzonden: woensdag 25 juni 2025 16:44 Aan: Heuvel, E.G. van den (Guido) <g.vandenheuvel at cbs.nl> CC: 'r-help at R-project.org' <r-help at r-project.org> Onderwerp: Re: [R] Potential bug in readLines when reading empty lines [Externe email] Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help at r-project.org>:> Hi all, > > I encountered some weird behaviour with readLines() recently, and I > >am wondering if this might be a bug, or, if it is not, how to > resolve >it. The issue is as follows: > > If I have a text file where a line ends with just a carriage return > >(\r, >CR) while the next line is empty and ends in a carriage return > / >linefeed (\r\n, CR LF), then the empty line is skipped when > reading >the file with readLines. The following code contains a test > case: > > --- > print(R.version) > # platform x86_64-w64-mingw32 > # arch x86_64 > # os mingw32 > # crt ucrt > # system x86_64, mingw32 > # status > # major 4 > # minor 4.0 > # year 2024 > # month 04 > # day 24 > # svn rev 86474 > # language R > # version.string R version 4.4.0 (2024-04-24 ucrt) > # nickname Puppy Cup > > txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") > > # Write txt_original as binary to avoid unwanted conversion of end > >of line markers writeBin(charToRaw(txt_original), "test.txt") > > txt_actual <- readLines("test.txt") > print(txt_actual) > # [1] "Line 1" "Line 3" > --- > > I included the output of this script on my machine in the comments. > >I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), > >but the empty line is skipped. > > Is this a bug? And if not, how should I read test.txt in such a way > >that the empty 2nd line is left intact? > > Best regards, > > Guido van den Heuvel > Statistics NetherlandsWhat would be your "rule" for identifying lines? From your desired output, it seems \r should be end-of-line, and \n is to be ignored. Then you could do something like that: raw <- readChar("test.txt", 1000) raw <- gsub("\n", "", raw) strsplit(raw, "\r")[[1]] ## [1] "Line 1" "" "Line 3" But it requires you to specify the number of characters to read (or write a loop). -- Enrico Schumann Lucerne, Switzerland https://enricoschumann.net My preferred rule would be the current documentation of the readLines function. Specifically, the line "Whatever mode the connection is opened in, any of LF, CRLF or CR will be accepted as the EOL marker for a line."
Enrico Schumann
2025-Jun-25 15:17 UTC
[R] Potential bug in readLines when reading empty lines
Quoting "Heuvel, E.G. van den (Guido)" <g.vandenheuvel at cbs.nl>:> -----Oorspronkelijk bericht----- > Van: Enrico Schumann <es at enricoschumann.net> > Verzonden: woensdag 25 juni 2025 16:44 > Aan: Heuvel, E.G. van den (Guido) <g.vandenheuvel at cbs.nl> > CC: 'r-help at R-project.org' <r-help at r-project.org> > Onderwerp: Re: [R] Potential bug in readLines when reading empty lines > > > Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help at r-project.org>: > >> Hi all, >> >> I encountered some weird behaviour with readLines() recently, and I > >> am wondering if this might be a bug, or, if it is not, how to > resolve >> it. The issue is as follows: >> >> If I have a text file where a line ends with just a carriage return > >> (\r, >> CR) while the next line is empty and ends in a carriage return > / >> linefeed (\r\n, CR LF), then the empty line is skipped when > reading >> the file with readLines. The following code contains a test > case: >> >> --- >> print(R.version) >> # platform x86_64-w64-mingw32 >> # arch x86_64 >> # os mingw32 >> # crt ucrt >> # system x86_64, mingw32 >> # status >> # major 4 >> # minor 4.0 >> # year 2024 >> # month 04 >> # day 24 >> # svn rev 86474 >> # language R >> # version.string R version 4.4.0 (2024-04-24 ucrt) >> # nickname Puppy Cup >> >> txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") >> >> # Write txt_original as binary to avoid unwanted conversion of end > >> of line markers writeBin(charToRaw(txt_original), "test.txt") >> >> txt_actual <- readLines("test.txt") >> print(txt_actual) >> # [1] "Line 1" "Line 3" >> --- >> >> I included the output of this script on my machine in the comments. > >> I would expect txt_actual to be equal to c("Line 1", "", "Line 3"), > >> but the empty line is skipped. >> >> Is this a bug? And if not, how should I read test.txt in such a way > >> that the empty 2nd line is left intact? >> >> Best regards, >> >> Guido van den Heuvel >> Statistics Netherlands > > What would be your "rule" for identifying lines? From your desired > output, it seems \r should be end-of-line, and \n is to be ignored. > Then you could do something like that: > > raw <- readChar("test.txt", 1000) > raw <- gsub("\n", "", raw) > strsplit(raw, "\r")[[1]] > ## [1] "Line 1" "" "Line 3" > > But it requires you to specify the number of characters to read (or > write a loop). > > > My preferred rule would be the current documentation of the > readLines function. Specifically, the line "Whatever mode the > connection is opened in, any of LF, CRLF or CR will be accepted as > the EOL marker for a line."As a workaround, you could do something like this: raw <- readChar("test.txt", 1000) raw <- gsub("\r\n", "\n", raw) raw <- gsub("\r", "\n", raw) strsplit(raw, "\n")[[1]] ## [1] "Line 1" "" "Line 3" Of course, if the true file was created with txt <- paste0("Line 1\r", "\n", "\r\n", "Line 4\r") then "Line 1\r" and "\n" will be merged into one line. -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net
Rui Barradas
2025-Jun-25 18:57 UTC
[R] Potential bug in readLines when reading empty lines
?s 15:43 de 25/06/2025, Enrico Schumann escreveu:> > Quoting "Heuvel, E.G. van den (Guido) via R-help" <r-help at r-project.org>: > >> Hi all, >> >> I encountered some weird behaviour with readLines() recently, and I am >> wondering if this might be a bug, or, if it is not, how to resolve it. >> The issue is as follows: >> >> If I have a text file where a line ends with just a carriage return >> (\r, CR) while the next line is empty and ends in a carriage return / >> linefeed (\r\n, CR LF), then the empty line is skipped when reading >> the file with readLines. The following code contains a test case: >> >> --- >> print(R.version) >> # platform?????? x86_64-w64-mingw32 >> # arch?????????? x86_64 >> # os???????????? mingw32 >> # crt??????????? ucrt >> # system???????? x86_64, mingw32 >> # status >> # major????????? 4 >> # minor????????? 4.0 >> # year?????????? 2024 >> # month????????? 04 >> # day??????????? 24 >> # svn rev??????? 86474 >> # language?????? R >> # version.string R version 4.4.0 (2024-04-24 ucrt) >> # nickname?????? Puppy Cup >> >> txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n") >> >> # Write txt_original as binary to avoid unwanted conversion of end of >> line markers >> writeBin(charToRaw(txt_original), "test.txt") >> >> txt_actual <- readLines("test.txt") >> print(txt_actual) >> # [1] "Line 1" "Line 3" >> ?--- >> >> I included the output of this script on my machine in the comments. I >> would expect txt_actual to be equal to c("Line 1", "", "Line 3"), but >> the empty line is skipped. >> >> Is this a bug? And if not, how should I read test.txt in such a way >> that the empty 2nd line is left intact? >> >> Best regards, >> >> Guido van den Heuvel >> Statistics Netherlands > > What would be your "rule" for identifying lines? From your desired output, > it seems \r should be end-of-line, and \n is to be ignored. Then you could > do something like that: > > ? raw <- readChar("test.txt", 1000) > ? raw <- gsub("\n", "", raw) > ? strsplit(raw, "\r")[[1]] > ? ## [1] "Line 1" ""?????? "Line 3" > > But it requires you to specify the number of characters to read (or write a > loop). > >Hello, Related, output is a mess: readChar("test.txt", n = file.size("test.txt")) |> textConnection() |> readLines() #> [1] "Line 1" "" "" "Line 3" "" Is this specific to Windows and to the way it treats "\r"? When I open the file in Notepad I only see two text lines, but when I open it with vim, it's Line 1^M Line 3 the carriage return is there. (And when I paste it here Line 1 Line 3 the <Ctrl+M> becomes an empty line.) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com