search for: htmlparse

Displaying 20 results from an estimated 48 matches for "htmlparse".

2013 Feb 21
4
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list. Looks like the same issue in Russian: library(RCurl) library(XML) u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1" a = getURL(u) a # Here - the Russian is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of these seem to fix it: htmlParse(a, encoding = "windows-1251") htmlParse(a, encoding = "CP1251") htmlParse(a, encoding = "cp1251") htmlParse(a, encoding = "iso8859-5") This is my locale: Sys.getlocale() &qu...
2013 Mar 20
1
htmlParse (from XML library) working sporadically in the same code
I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is  http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometime...
2012 May 21
1
htmlParse Error
I am trying to parse a webpage using the htmlParse command in XML package as follows: library(XML) u = "http://en.wikipedia.org/wiki/World_population" doc = htmlParse(u) I get the following error: Error in htmlParse(u) : error in creating parser for http://en.wikipedia.org/wiki/World_population I am using a R 2.13.1 (32 bit version...
2012 Jan 30
1
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = "http://humus101.com/?p=2737" a = getURL(u) a # Here - the hebrew is fine. a2 <- htmlParse(a) a2 # Here...
2011 Sep 05
2
htmlParse hangs or crashes
Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it...
2009 Jun 30
1
How to pass parameters to htmlParse Bank of Canada html pages
...first go url <- "http://banqueducanada.ca/en/rates/exchange-avg.html" select 12 months for Rates for the past and click "Get Rates" button. Then the page moves to address <- "http://banqueducanada.ca/cgi-bin/famecgi_fdps" and the rates show in the html page. htmlParse() can read the html document but htmlParse(address) did not work since we need to pass the selected field value and clicking Get Rates button event parameters to http://banqueducanada.ca/en/rates/exchange-avg.html. I was wondering if you know how to load in data from such html pages. Thanks, -jame...
2012 Sep 04
0
get only little part of html with htmlParse
Here is my code. there are three method to get text to be parded by htmlParse function. 1.file on mycomputer options(encoding="gbk") library(XML) xmltext1 <- htmlParse("/home/tiger/Desktop/27174.htm" ) #/home/tiger/Desktop/27174.htm is the file of http://www.jb51.net/article/27174.htm downloaded on my computer. 2.url options(encoding="gb...
2012 Sep 14
0
htmlParse pop ups over web pages
Hello All, I am trying to write a routine that loops over some links and parses those links using htmlParse.  The problem is that one of the links may display a pop up window on top of that link's web page.  If there is a pop up, the routine bombs and I get an error message that the url doesn't exist. Does the XML package (or perhaps another package) provide a way to deal with this issue? I'...
2012 May 19
1
Try Giving Invalid Argument Type Error
...ge is parsed without a problem. exh<-"NASDAQ" tic<-"EGHT" URL<-paste("http://www.advfn.com/p.php?pid=financials&btn=istart_date&mode=quarterly_reports&symbol=", exh,"%3A",tic,"&istart_date=0", sep = "") doc <- htmlParse(URL) However, when I change the value of tic it will not. tic<-"AACOU" URL<-paste("http://www.advfn.com/p.php?pid=financials&btn=istart_date&mode=quarterly_reports&symbol=", exh,"%3A",tic,"&istart_date=0", sep = "") doc <-...
2010 Mar 11
1
parse an HTML page with verbose error message (using XML)
I'm using the function htmlParse() in the XML package, and I need a little bit help on error handling while parsing an HTML page. So far I can use either the default way: # error = xmlErrorCumulator(), by default library(XML) doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/") # the error message is: # htm...
2013 Feb 28
0
Scraping data from website---Error in htmlParse: error in creating parser
.../nfl-fantasy-sports/), it displays the QB projections. When I click on another position (e.g., RB) it displays a new URL ( http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB). When I enter this new URL into the readHTMLTable function, I receive the following error: Error in htmlParse(" http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB/") : error in creating parser for http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB/ What's going on? Might this have something to do with Javascript? How can I scrape the RB project...
2018 May 23
0
Using R htmlParse() for manipulating URLs to access multiple pages
...com+ https://home.lala.com/bibi/blabla/chapter_vii_operational_modalities/701_wonderwall_18_oasis/701_wonderwall_18_oasis/ and so forth. Of course, I don't want to scrape the single URLs one by one. Hence, I am considering the base URL for parsing and to start from there onward. baseurl <- htmlParse( "https://home.lala.com/bibi/blabla/", encoding = "UTF-8") xpath <- "//div[@id='Page']/strong[2]" GetAllPages <- as.numeric(xpathSApply(baseurl, xpath, xmlValue)) Nevertheless, it does not work at all: > GetAllPages numeric(0) Any...
2011 Aug 29
1
reading tables from multiple HTML pages
...g data from tables in html using the XML package. I have included some code below. I am trying to loop through a series of html pages, each of which contains a single table from which I want to scrape data. However, some of the pages are blank - and so it throws me an error message when it gets to htmlParse(). The loop then closes out and I get the error message below: Error in htmlParse(url) : error in creating parser for http://www.szrd.gov.cn/viewcommondbfc.do?id=728 How might be best to go about keeping the loop running so I can parse the rest? ***********************************************...
2004 Jun 28
2
[Fwd: Irix install of omega fails.]
OK, I'll try again. Thanks, Jim. -------------- next part -------------- An embedded message was scrubbed... From: Jim Lynch <jwl@sgi.com> Subject: Irix install of omega fails. Date: Mon, 28 Jun 2004 14:16:46 -0400 Size: 2057 Url: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20040628/212669c1/Irixinstallofomegafails.eml
2012 Oct 17
0
postForm() in RCurl and library RHTMLForms
...alues. > > thanking you > veepsirtt > options(RCurlOptions = list(useragent = "R")) > library(RCurl) > url <- "http://www.bseindia.com/histdata/categorywise_turnover.asp" > wp = getURLContent(url) > > library(RHTMLForms) > library(XML) > doc = htmlParse(wp, asText = TRUE) > form = getHTMLFormDescription(doc)[[1]] > fun = createFunction(form) > o = fun(mmm = "9", yyy = "2012",url=" > http://www.bseindia.com/histdata/categorywise_turnover.asp") > > table = readHTMLTable(htmlParse(o, asText = TRUE), &g...
2012 Jun 08
0
XML htmlTreeParse fails with no obvious error
...ects/bioinformatics/Custom_Chip_Definition_File.html" htmlTreeParse(getURL(url)) The issue seems to originate in htmlTreeParse as getURL alone works and returns the expected content. I checked that it could not be an encoding issue and as far as I can tell it seems not to be. Moreover, using htmlParse(paste("http://",url,sep="") works. Note that htmlTreeParse(getURL(paste("http://",url,sep=""))) fails too, the "http://" is important only for htmlParse, so that it identifies it as an URL. This issue is rather new, and as I've been using the s...
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)
2009 May 12
2
import HTML tables
Hello, I was wondering if there is a function in R that imports tables directly from a HTML document. I know there are functions (say, getURL() from {RCurl} ) that download the entire page source, but here I refer to something like google document's function importHTML() (if you don't know this function, go check it, it's very useful). Anyway, if someone of something that does this
2010 Nov 04
3
postForm() in RCurl and library RHTMLForms
Hi RUsers, Suppose I want to see the data on the website url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" for the index "S&P CNX NIFTY" for dates "FromDate"="01-11-2010","ToDate"="02-11-2010" then read the html table from the page using readHTMLtable() I am using this code webpage <-
2012 Aug 09
2
read htm table error
Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message. Error in htmlParse(doc) : error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team" tables <- readHTMLTable(theurl) Regards, Kiung [[alternative HTML version deleted]]