Yong Wang
2013-Jul-29 03:32 UTC
[R] Chinese characters in html source captured by download.file() are garbled code , how to convert it readable
Dear list, I am working with R to download numerous html source code from which the data extracted will be further processed. The problem is the Chinese character in the html source code are all garbled and I can't really find a way to convert them to something readable. This problem persists on ubuntu-10 and win-7, English environment. Not try Operating system in Chinese yet. I know literally nothing about encoding and a comprehensive search online does not save me from this woe. # the code download.file(" https://www.google.com.hk/finance/company_news?q=SHA:601857&gl=cn&num=200 ",destfile="tmp.txt") test<-readLines("tmp.txt",encoding="UTF-8") #the garbled code in "tmp.txt" and "test" is like below #��国�۪o�ѵM�a�ѥ��������q�]� Any help is highly appreciated. yong [[alternative HTML version deleted]]
Henrik Bengtsson
2013-Jul-29 16:03 UTC
[R] Chinese characters in html source captured by download.file() are garbled code , how to convert it readable
Try with adding mode="wb" to download.file(), or just use downloadFile() of R.utils. /Henrik On Sun, Jul 28, 2013 at 8:32 PM, Yong Wang <wangyong1 at gmail.com> wrote:> Dear list, > I am working with R to download numerous html source code from which the > data extracted will be further processed. > The problem is the Chinese character in the html source code are all > garbled and I can't really find a way to convert them to something readable. > This problem persists on ubuntu-10 and win-7, English environment. Not try > Operating system in Chinese yet. > I know literally nothing about encoding and a comprehensive search online > does not save me from this woe. > > # the code > download.file(" > https://www.google.com.hk/finance/company_news?q=SHA:601857&gl=cn&num=200 > ",destfile="tmp.txt") > test<-readLines("tmp.txt",encoding="UTF-8") > > #the garbled code in "tmp.txt" and "test" is like below > #??国??o??M?a??????????q?]? > > > Any help is highly appreciated. > > yong > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >