thr3ads.net - R help - [R] get only little part of html with htmlParse [Sep 2012]

If this information is useful, please help other people find it:
Share via:

水静流深

2012-Sep-04 05:49 UTC

[R] get only little part of html with htmlParse

Here is my code. 

there are three method to get text to be parded by htmlParse function. 

1.file on mycomputer 
options(encoding="gbk") 
library(XML) 
xmltext1 <- htmlParse("/home/tiger/Desktop/27174.htm" ) 

#/home/tiger/Desktop/27174.htm is the file of
http://www.jb51.net/article/27174.htm downloaded on my computer.

2.url 
options(encoding="gbk") 
library(XML) 
xmltext2 <- htmlParse("http://www.jb51.net/article/27174.htm" ) 

3.readLines 
options(encoding="gbk") 
library(XML) 
txt=readLines("http://www.jb51.net/article/27174.htm") 
xmltext3 <- htmlParse(txt,asText=TRUE) 

method1,and method2  are ok,they can get right content to be parsed. 
when i run method 3 ,to my surprise ,xmltext3 can get some  contents,but many
are gone,they are not the same as method1,and  method2,why?
you can get only little part of html. > xmltext3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="zh-cn"><head>
<meta http-equiv="Content-Type" content="text/html;
charset=gb2312">
<title>PYTHONæ£å</title>
</head></html>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more seemingly similar threads

R help - Sep 2012 - get only little part of html with htmlParse

[R] get only little part of html with htmlParse

Maybe Matching Threads

Wisdom of the Ancients