martin gregory
2024-Oct-27 12:13 UTC
[R] readLines on open connection reads only first write on MacOS
I was using readLines to read data from a file which is being written to by another process. readLines documentation says "If the connection is open it is read from its current position". With R 4.4.1 on Linux 5.15.160 this is true but does not seem to be the case as far as R 4.4.1 on MacOS 12.7.6 (Intel) is concerned. Here, the first write to the file is read correctly but subsequent reads return nothing. The minimal program below shows the behaviour and produces the following results: Linux: input 1: lines written/read: 4 / 4 input 2: lines written/read: 3 / 3 input 3: lines written/read: 2 / 2 MacOS: input 1: lines written/read: 4 / 4 input 2: lines written/read: 3 / 0 input 3: lines written/read: 2 / 0 I have searched NEWS and found only https://bugs.r-project.org/show_bug.cgi?id=18555, but I am not specifying an encoding so not sure whether it is relevant or not. I also tried with a 1 second delay between flush and read, but this had no effect. I have found an alternative, scan with skip of the number lines already read, and this works on both Linux and MacOS. But I would still like to know how to have readLines work with open connections on MacOS. Regards, Martin ## Program to demonstrate the behaviour ## input data rL <- list(paste0("Line ", 1:4), paste0("Line ", 1:3), paste0("Line ", 1:2)) ## create an empty file and open write and read connections to the file close(file("rL.log",open="w")) rLconn.w <- file("rL.log", open="a") rLconn.r <- file("rL.log", open="r") ## write the test data and read it for (i in 1:3) { writeLines(rL[[i]], rLconn.w) flush(rLconn.w) out <- readLines(rLconn.r, warn=FALSE) writeLines(c(paste0("input ", i, ": lines", " written/read: ", length(rL[[i]]), " / ", length(out)))) } close(rLconn.w) close(rLconn.r)
Tomas Kalibera
2025-Jan-16 14:01 UTC
[R] readLines on open connection reads only first write on MacOS
Thanks for the report. I've fixed this in R-devel so that even on macOS, one can read data added to a file after end of the file has once been reached, which also changes the behavior of your example on macOS to match Linux (and Windows). The difference comes from the C library shipped with the operating system. According to the C standard, functions for reading from file, such as fgetc() and fread(), should always fail to read data and indicate an end of file when the end of file indicator has already been set on the file. The C library on macOS behaves this way. On Linux and Windows, however, the read operations instead check again if any new data is present, even if the end-of-file indicator has been already set. That seems to be in violation of the standard, but it made the example program work without any attempt made on the R side to make it work (and this behavior is not documented). I've modified R-devel to clear the end of file indicator so that the behavior observed in R on Windows and Linux can now be observed also on macOS. Unless problems are found with this change, it would be part of the next (minor or major) R release. Strictly speaking, the example program is not portable: it opens the same file twice in the same process, which is implementation-defined behavior according to the C standard, yet it happens to work on current platforms R is supported on. I understand it is just an example and the real use case is that two processes communicate this way. I don't know the exact use case, but it might be more reliable in principle to use a different communication mechanism, such as a fifo or multiple files (each read only once) or a combination of both. Best Tomas On 10/27/24 13:13, martin gregory via R-help wrote:> I was using readLines to read data from a file which is being written to by another > process. readLines documentation says "If the connection is open it is read from its > current position". With R 4.4.1 on Linux 5.15.160 this is true but does not seem to be the > case as far as R 4.4.1 on MacOS 12.7.6 (Intel) is concerned. Here, the first write to the > file is read correctly but subsequent reads return nothing. The minimal program below > shows the behaviour and produces the following results: > > Linux: > input 1: lines written/read: 4 / 4 > input 2: lines written/read: 3 / 3 > input 3: lines written/read: 2 / 2 > > MacOS: > input 1: lines written/read: 4 / 4 > input 2: lines written/read: 3 / 0 > input 3: lines written/read: 2 / 0 > > I have searched NEWS and found only https://bugs.r-project.org/show_bug.cgi?id=18555, but > I am not specifying an encoding so not sure whether it is relevant or not. > > I also tried with a 1 second delay between flush and read, but this had no effect. > > I have found an alternative, scan with skip of the number lines already read, and this > works on both Linux and MacOS. But I would still like to know how to have readLines work > with open connections on MacOS. > > Regards, > Martin > > ## Program to demonstrate the behaviour > ## input data > rL <- list(paste0("Line ", 1:4), paste0("Line ", 1:3), paste0("Line ", 1:2)) > ## create an empty file and open write and read connections to the file > close(file("rL.log",open="w")) > rLconn.w <- file("rL.log", open="a") > rLconn.r <- file("rL.log", open="r") > ## write the test data and read it > for (i in 1:3) { > writeLines(rL[[i]], rLconn.w) > flush(rLconn.w) > out <- readLines(rLconn.r, warn=FALSE) > writeLines(c(paste0("input ", i, ": lines", > " written/read: ", length(rL[[i]]), > " / ", length(out)))) > } > close(rLconn.w) > close(rLconn.r) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- readLines() segfaults on large file & question on how to work around
- readLines() segfaults on large file & question on how to work around
- readLines() segfaults on large file & question on how to work around
- readLines() segfaults on large file & question on how to work around
- readLines() segfaults on large file & question on how to work around