Dear R-help, I am having trouble reading a UTF-16LE formatted file. The issue appears to be a byte order mark at the beginning of the file. I have tried readLines(file, encoding='utf-16LE') but got me [1]"\xff\xfe1" "" "" "" "" "" Regards, Tim
? Wed, 28 Feb 2024 13:44:49 +0000 "Ebert,Timothy Aaron" <tebert at ufl.edu> ?????:> readLines(file, encoding='utf-16LE')There are two ways you could encounter an encoding in R. First are encoding markers placed on every string object, which declare the string to be encoded in UTF-8, Latin-1, the native locale encoding, or ASCII or "bytes". No other encodings are supported. The "encoding" argument of readLines() sets this marker. In order to support other encodings, R is able to convert the text as part of the input/output connections. help(readLines) points you towards that: you need to set the UTF-16LE encoding on the connection object. con <- file(file, encoding = 'UTF16LE') lines <- readLines(con) close(con) "UTF16LE" is not guaranteed to be supported, so see iconvlist() for the encodings that should work with your build of R. -- Best regards, Ivan
When you specify LE you are overriding any useful information that the BOM could convey... see https://softwareengineering.stackexchange.com/questions/370088/is-the-bom-optional-for-utf-16-and-utf-32. ?Encoding On February 28, 2024 5:44:49 AM PST, "Ebert,Timothy Aaron" <tebert at ufl.edu> wrote:>Dear R-help, > I am having trouble reading a UTF-16LE formatted file. The issue appears to be a byte order mark at the beginning of the file. I have tried readLines(file, encoding='utf-16LE') but got me > >[1]"\xff\xfe1" "" "" "" "" "" > > > >Regards, >Tim > > > > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.