Henrik Bengtsson
2017-Jan-04 03:02 UTC
[Rd] cat(s, file): infinite loop of "invalid char string in output conversion" warnings with UTF-8 encoding
The below code snippet gives a single warning: Warning message: In cat(s, file = tempfile()) : invalid char string in output conversion when n <= 10001, whereas with n >= 10002 it appears to be generating the same warning in an infinite loop in the call to cat(). n <- 10002L r <- raw(length = n) r[] <- charToRaw(" ") r[length(r)] <- as.raw(0xa9) s <- rawToChar(r) message("Encoding: native.enc") options(encoding = "native.enc") cat(s, file = tempfile()) message("Encoding: UTF-8") options(encoding = "UTF-8") cat(s, file = tempfile()) message("DONE") Here cat() never returns. The R process runs at 100% CPU, it does not appear to increase it's memory usage, and the call can be interrupted: ^C There were 50 or more warnings (use warnings() to see the first 50)> traceback()8: "factor" %in% attrib[["class", exact = TRUE]] 7: structure(list(message = as.character(message), call = call), class = class) 6: simpleWarning(msg, call) 5: doWithOneRestart(return(expr), restart) 4: withOneRestart(expr, restarts[[1L]]) 3: withRestarts({ .Internal(.signalCondition(simpleWarning(msg, call), msg, call)) .Internal(.dfltWarn(msg, call)) }, muffleWarning = function() NULL) 2: .signalSimpleWarning("invalid char string in output conversion", quote(cat(s, file = tempfile()))) 1: cat(s, file = tempfile()) ## SOME TROUBLESHOOTING Using options(warn = 1) shows that the "invalid char string in output conversion" warning is outputted over and over in an infinite loop. This warning is generated by dummy_vfprintf() defined in src/main/connections.c (https://github.com/wch/r-source/blob/R-3-3-branch/src/main/connections.c#L370); # define BUFSIZE 10000 int dummy_vfprintf(Rconnection con, const char *format, va_list ap) { [...] if(ires == (size_t)(-1) && errno != E2BIG) /* is this safe? */ warning(_("invalid char string in output conversion")); [...] } Note BUFSIZE, note the comment /* is this safe? */ (by Brian Ripley on 2005-01-05). ## SESSION DETAILS I can reproduce this on R 2.11.0, R 3.3.2 and R devel on Linux. It does not occur on R 3.3.2 for Windows under Linux Wine.> sessionInfo()R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base> sessionInfo()R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base> sessionInfo()R Under development (unstable) (2017-01-02 r71875) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.0