Sunny Singha
2016-Mar-28 13:46 UTC
[R] Please guide -- UTF-8 locale setting fails on Windows on writing
Hi, I think I'm experiencing an issue regarding system Locale. I have exported '.csv' formatted data frames gathered from various social media platforms like facebook/twitter/G+, etc. I observe many variable/columns consists of strings formatted similar to below: "<U+0645><U+062D><U+0645><U+062F> <U+0627><U+0644><U+0633><U+0648><U+0627><U+062D>" As expected and I confirmed, in social media data, they are strings in different languages. Platform details are provide in the end of this mail. OS locale is set to English (United States) hence 'R' locale is 'English_United States.1252' I have attempted to change it to UTF-8 but receives below warning message: Warning message: In Sys.setlocale("LC_ALL", "UTF-8") : OS reports request to set locale to "UTF-8" cannot be honored I have gone through below forums but no resolution so far: --- http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/ --- http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/ I'm not sure whether the issue is while reading/extracting the data from media or while writing/exporting in Windows directory, but I don't experience similar issue in my personal Mac machine. I need some clarification here. How could I export the data just as I see on web ? Please guide. Regards, Sunny Platform I'm using:::::::::::::::::::::::::::: Operating System : Windows 7 Professional SP1 R version details: platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 2.3 year 2015 month 12 day 10 svn rev 69752 language R version.string R version 3.2.3 (2015-12-10) nickname Wooden Christmas-Tree
Milan Bouchet-Valat
2016-Mar-28 14:09 UTC
[R] Please guide -- UTF-8 locale setting fails on Windows on writing
Le lundi 28 mars 2016 ? 19:16 +0530, Sunny Singha a ?crit?:> Hi, > I think I'm experiencing an issue regarding system Locale. I have > exported '.csv' formatted data frames gathered from various social > media platforms like facebook/twitter/G+, etc. > > I observe many variable/columns consists of strings formatted similar to below: > " > " > > As expected and I confirmed, in social media data, they are strings in > different languages. > Platform details are provide in the end of this mail. OS locale is set > to English (United States) hence 'R' locale is 'English_United > States.1252' > > I have attempted to change it to UTF-8 but receives below warning message: > > Warning message: > In Sys.setlocale("LC_ALL", "UTF-8") : > ? OS reports request to set locale to "UTF-8" cannot be honoredYou don't need to set the locale. Just pass an appropriate value (e.g. "UTF-8") to read.csv() or write.csv()'s fileEncoding argument. You also didn't tell us what program you used to read these files. Some might guess the encoding incorrectly, or require you to choose it manually. Regards> I have gone through below forums but no resolution so far: > --- http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r > --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html > --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r > --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/ > --- http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/ > > I'm not sure whether the issue is while reading/extracting the data > from media or while writing/exporting in Windows directory, but I > don't experience similar issue in my personal Mac machine. I need some > clarification here. > > How could I export the data just as I see on web ???Please guide. > > Regards, > Sunny > > Platform I'm using:::::::::::::::::::::::::::: > Operating System : Windows 7 Professional SP1 > R version details: > platform???????x86_64-w64-mingw32 > arch???????????x86_64 > os?????????????mingw32 > system?????????x86_64, mingw32 > status > major??????????3 > minor??????????2.3 > year???????????2015 > month??????????12 > day????????????10 > svn rev????????69752 > language???????R > version.string R version 3.2.3 (2015-12-10) > nickname???????Wooden Christmas-Tree > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Sunny Singha
2016-Mar-28 14:42 UTC
[R] Please guide -- UTF-8 locale setting fails on Windows on writing
Milan, Ok, Let me take a case of facebook. I used Rfacebook package to get posts (getPost()) which returns list() of data frames(post, comments, Likes) let me demonstrate 2 cases of read and write just as you suggested, Case 1::::::::: Lets say one of the facebook comment has below string value, in Japanese language--> "?????? - ???????? ?????" On R console I now assign above string to variableas: x <- "?????? - ???????? ?????" and write it as below: write.csv(x, file='x.csv', row.names=F, fileEncoding='UTF-8') I get this string in the file ""<U+4E16><U+754C><U+9910><U+798F><U+4E8B><U+5DE5> - <U+9910><U+5EF3><U+54E1><U+5DE5><U+6C92><U+7CBE><U+6253><U+91C7> " Case 2:::::::::::::: I create a notepad 'x.txt' and save Japanese string "?????? - ???????? ?????" and read it as below: read.table('x.txt', fileEncoding='UTF-8'), I get below output: V1 1 ? Warning messages: 1: In read.table("x.txt", fileEncoding = "UTF-8") : invalid input found on input connection 'x.txt' 2: In read.table("x.txt", fileEncoding = "UTF-8") : incomplete final line found by readTableHeader on 'x.txt' Above was for demonstration, I'm infact reading social media data extracted, which ultimately is somewhere using httr package and returning data frames. I'm not sure how should I get it handled in Windows as I don't observe this behavior in Mac where system locase is set to 'en_US.UTF-8' Regards, Sunny On Mon, Mar 28, 2016 at 7:39 PM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:> Le lundi 28 mars 2016 ? 19:16 +0530, Sunny Singha a ?crit : >> Hi, >> I think I'm experiencing an issue regarding system Locale. I have >> exported '.csv' formatted data frames gathered from various social >> media platforms like facebook/twitter/G+, etc. >> >> I observe many variable/columns consists of strings formatted similar to below: >> " >> " >> >> As expected and I confirmed, in social media data, they are strings in >> different languages. >> Platform details are provide in the end of this mail. OS locale is set >> to English (United States) hence 'R' locale is 'English_United >> States.1252' >> >> I have attempted to change it to UTF-8 but receives below warning message: >> >> Warning message: >> In Sys.setlocale("LC_ALL", "UTF-8") : >> OS reports request to set locale to "UTF-8" cannot be honored > You don't need to set the locale. Just pass an appropriate value (e.g. > "UTF-8") to read.csv() or write.csv()'s fileEncoding argument. > > You also didn't tell us what program you used to read these files. Some > might guess the encoding incorrectly, or require you to choose it > manually. > > > Regards > >> I have gone through below forums but no resolution so far: >> --- http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r >> --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html >> --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r >> --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/ >> --- http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/ >> >> I'm not sure whether the issue is while reading/extracting the data >> from media or while writing/exporting in Windows directory, but I >> don't experience similar issue in my personal Mac machine. I need some >> clarification here. >> >> How could I export the data just as I see on web ? Please guide. >> >> Regards, >> Sunny >> >> Platform I'm using:::::::::::::::::::::::::::: >> Operating System : Windows 7 Professional SP1 >> R version details: >> platform x86_64-w64-mingw32 >> arch x86_64 >> os mingw32 >> system x86_64, mingw32 >> status >> major 3 >> minor 2.3 >> year 2015 >> month 12 >> day 10 >> svn rev 69752 >> language R >> version.string R version 3.2.3 (2015-12-10) >> nickname Wooden Christmas-Tree >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.