similar to: parse an HTML page with verbose error message (using XML)

Displaying 20 results from an estimated 200 matches similar to: "parse an HTML page with verbose error message (using XML)"

2013 Mar 20
1
htmlParse (from XML library) working sporadically in the same code
I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse isĀ  http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n<-readHTMLTable(htmlParse(url)) But most of the
2013 Feb 21
4
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list. Looks like the same issue in Russian: library(RCurl) library(XML) u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1" a = getURL(u) a # Here - the Russian is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of these seem to fix it: htmlParse(a, encoding = "windows-1251") htmlParse(a, encoding =
2012 May 21
1
htmlParse Error
I am trying to parse a webpage using the htmlParse command in XML package as follows: library(XML) u = "http://en.wikipedia.org/wiki/World_population" doc = htmlParse(u) I get the following error: Error in htmlParse(u) : error in creating parser for http://en.wikipedia.org/wiki/World_population I am using a R 2.13.1 (32 bit version) on a 64 bit Windows. (I tried installing it in
2012 Jan 30
1
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = "http://humus101.com/?p=2737" a = getURL(u) a # Here - the hebrew is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of
2012 May 19
1
Try Giving Invalid Argument Type Error
Dear R Helpers, I am getting an error message from the try function that I don't understand so I am hoping that someone can help. I am scraping from web pages, but sometimes they disappear. When that happens I need to control for it with some sort of function. This web page is parsed without a problem. exh<-"NASDAQ" tic<-"EGHT"
2011 Sep 05
2
htmlParse hangs or crashes
Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the
2011 Aug 29
1
reading tables from multiple HTML pages
Hi, beginner to R and was having some problems scraping data from tables in html using the XML package. I have included some code below. I am trying to loop through a series of html pages, each of which contains a single table from which I want to scrape data. However, some of the pages are blank - and so it throws me an error message when it gets to htmlParse(). The loop then closes out and I
2009 Jun 30
1
How to pass parameters to htmlParse Bank of Canada html pages
To get USDCAD rates from Bank of Canada, we first go url <- "http://banqueducanada.ca/en/rates/exchange-avg.html" select 12 months for Rates for the past and click "Get Rates" button. Then the page moves to address <- "http://banqueducanada.ca/cgi-bin/famecgi_fdps" and the rates show in the html page. htmlParse() can read the html document but
2004 Feb 15
4
father and son heights
Faraway's book titled "Practical Regression and Anova using R", with full text available online at: http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf refers to a data set, stat500, which compares midterm and final grades. It can be used to illustrate similar concepts. A google search for faraway.zip will locate the actual data. --- Date: Sun, 15 Feb 2004 10:37:08 -0800
2012 Aug 09
2
read htm table error
Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message. Error in htmlParse(doc) : error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team" tables <- readHTMLTable(theurl) Regards, Kiung [[alternative HTML version
2012 Mar 21
1
Trouble installing the XML package
Hello everyone, I am probably not the only one having trouble with this package but here goes. I want to install XML on Ubuntu. I installed libxml2-dev and everything works out fine until I get the following: Error in reconcilePropertiesAndPrototype(name, slots, prototype, superClasses, : No definition was found for superclass "namedList" in the specification of class
2014 Dec 16
2
Replace atoi and atol with strtol strtoul:Need Help
Hello , I came across this function *HtmlParser::decode_entities(string &s)* in *xapian-application/omega/htmlparse.cc* which basically does is extract hex value if any or extract number.For extracting number atoi is used and value returned by it is stored in variable "val" , I think so replacing atoi with strtoul would be useful here as number can have larger value although the
2013 Jan 15
1
readHTMLTable (XML package)
Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that. > library(XML) > wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1) Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan
2010 Mar 18
1
Do colClasses in readHTMLTable (XML Package) work?
Hi, I can't get the colClasses option to work in the readHTMLTable function of the XML package. Here's a code fragment: require("XML") doc <- "http://www.nber.org/cycles/cyclesmain.html" table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The main table is the second one because it's embedded in the page table. xt
2012 Mar 22
2
trouble for parsing HTML files
Hi all, Using the XML package, I'm not able to parse some html webpages. Here is my code and the error message: library("XML") url <- "http://www.huffingtonpost.com/social/GraniteSkyline?action=fans" doc <- htmlParse(url) Error: Namespace prefix ??? of attribute (null) is not defined I've searched a lot on the Internet, but it's really difficult to find
2012 Apr 16
1
grep and XML
Hi all: I struggle a lot scraping web data. I still haven't got a handle on the XML package. I'd like to get particular exchange rates from this table: https://raw.github.com/currencybot/open-exchange-rates/master/latest.json This is the code that I'm working with: library(RCurl) library(XML)
2012 May 28
1
Rcurl, postForm()
Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic
2003 Sep 09
1
Building XML package for MacOS X
I am working to build the XML package for R on MacOS X. I have installed libxml2-2.5.9 into /usr/local. I set the LIBXML_INCDIR=/usr/local/include/libxml2. I use R INSTALL, I get the following: R INSTALL -c -l /usr/local/R/library XML_0.94-1.tar.gz : : : gcc -bundle -flat_namespace -undefined suppress -L/sw/lib -L/usr/local/lib -o XML.so DocParse.o EventParse.o ExpatParse.o HTMLParse.o
2009 Jun 23
1
How to find b entries using xPath?
We got all rows by: library(XML) doc = htmlParse('http://www.statcan.gc.ca/daily-quotidien/090520/t090520b1-eng.htm') rows = xpathSApply(doc, '//table/tbody/tr') The last row is: row_last = rows[15] row_last [[1]] <tr><td id="t1stub17" class="stub1 RGBShade"><b>Unsmoothed composite leading indicator</b></td>&#13; <td
2003 Oct 09
2
building XML-0.95-1 on MacOS
I am trying to build the XML package on MacOS. I am using the fink installation of libxml-1.8.17. The configuration information is: Configuration information: Libxml settings libxml include directory: /sw/include/gnome-xml libxml library directory: -L/sw/lib -lxml -lz -lz -lxml libxml 2: no Compilation flags: -I/sw/include/gnome-xml -I/sw/include/gnome-xml/libxml