thr3ads.net - similar to: "Scraping data from website---Error in htmlParse: error in creating parser"

Displaying 20 results from an estimated 2000 matches similar to: "Scraping data from website---Error in htmlParse: error in creating parser"

htmlParse (from XML library) working sporadically in the same code

2013 Mar 20

htmlParse (from XML library) working sporadically in the same code

I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n<-readHTMLTable(htmlParse(url)) But most of the

scraping with session cookies

2012 Sep 19

scraping with session cookies

Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write

reading tables from multiple HTML pages

2011 Aug 29

reading tables from multiple HTML pages

Hi, beginner to R and was having some problems scraping data from tables in html using the XML package. I have included some code below. I am trying to loop through a series of html pages, each of which contains a single table from which I want to scrape data. However, some of the pages are blank - and so it throws me an error message when it gets to htmlParse(). The loop then closes out and I

Scraping a web page.

2012 May 14

Scraping a web page.

Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>

How to suppress errors generated by readHTMLTable?

2009 Nov 26

How to suppress errors generated by readHTMLTable?

library(XML) download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079&unigene=&submit=Submit','index.html') tables=readHTMLTable("index.html",error=function(...){}) tables readHTMLTable gives me the following errors. Could somebody let me know how to suppress them? Opening and ending tag mismatch: center and table htmlParseEntityRef: expecting

readHTMLTable (XML package)

2013 Jan 15

readHTMLTable (XML package)

Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that. > library(XML) > wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1) Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan

Do colClasses in readHTMLTable (XML Package) work?

2010 Mar 18

Do colClasses in readHTMLTable (XML Package) work?

Hi, I can't get the colClasses option to work in the readHTMLTable function of the XML package. Here's a code fragment: require("XML") doc <- "http://www.nber.org/cycles/cyclesmain.html" table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The main table is the second one because it's embedded in the page table. xt

postForm() in RCurl and library RHTMLForms

2012 Oct 17

postForm() in RCurl and library RHTMLForms

Hi R Users, I want to get the data from the url given from 10/09/2012 to 15/10/2012. I don't know how to pass the parameters . ....................................................................................................................................... library(RHTMLForms) > > ff = getHTMLFormDescription("

Using R htmlParse() for manipulating URLs to access multiple pages

2018 May 23

Using R htmlParse() for manipulating URLs to access multiple pages

I am trying to scrape a manual from web. For privacy reasons, I cannot write here the exact URL, anyway, the structure is as follows: https://home.lala.com/bibi/blabla/chapter_i_organization/101_contracts/whatever/,DanaInfo=intranet.lala.com+ https://home.lala.com/bibi/blabla/chapter_i_organization/125_bills/,DanaInfo=intranet.lala.com+

read htm table error

2012 Aug 09

read htm table error

Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message. Error in htmlParse(doc) : error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team" tables <- readHTMLTable(theurl) Regards, Kiung [[alternative HTML version

Scraping a web page

2009 Dec 03

Scraping a web page

I would like to be able to submit a list of URLs of various webpages and extract the "content" i.e. not the mark-up of those pages. I can find plenty of examples in the XML library of extracting links from pages but I cannot seem to find a way to extract the text. Any help would be greatly appreciated - I will not know the structure of the URLs I would submit in advance. Any

My First Attempt at Screen Scraping with R

2011 May 06

My First Attempt at Screen Scraping with R

Hello Folks, I'm working on trying to scrape my first web site and ran into a issue because I'm really don't know anything about regular expressions in R. library(XML) library(RCurl) site <- "http://thisorthat.com/leader/month" site.doc <- htmlParse(site, ?, xmlValue) At the ?, I realize that I need to insert a regex command which will decipher the contents of the

postForm() in RCurl and library RHTMLForms

2010 Nov 04

postForm() in RCurl and library RHTMLForms

Hi RUsers, Suppose I want to see the data on the website url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" for the index "S&P CNX NIFTY" for dates "FromDate"="01-11-2010","ToDate"="02-11-2010" then read the html table from the page using readHTMLtable() I am using this code webpage <-

How to set cookies in RCurl

2012 Jun 07

How to set cookies in RCurl

Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content

htmlParse Error

2012 May 21

htmlParse Error

I am trying to parse a webpage using the htmlParse command in XML package as follows: library(XML) u = "http://en.wikipedia.org/wiki/World_population" doc = htmlParse(u) I get the following error: Error in htmlParse(u) : error in creating parser for http://en.wikipedia.org/wiki/World_population I am using a R 2.13.1 (32 bit version) on a 64 bit Windows. (I tried installing it in

Getting htmlParse to work with Hebrew? (on windows)

2012 Jan 30

Getting htmlParse to work with Hebrew? (on windows)

Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = "http://humus101.com/?p=2737" a = getURL(u) a # Here - the hebrew is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of

How to pass parameters to htmlParse Bank of Canada html pages

2009 Jun 30

How to pass parameters to htmlParse Bank of Canada html pages

To get USDCAD rates from Bank of Canada, we first go url <- "http://banqueducanada.ca/en/rates/exchange-avg.html" select 12 months for Rates for the past and click "Get Rates" button. Then the page moves to address <- "http://banqueducanada.ca/cgi-bin/famecgi_fdps" and the rates show in the html page. htmlParse() can read the html document but

get only little part of html with htmlParse

2012 Sep 04

get only little part of html with htmlParse

Here is my code. there are three method to get text to be parded by htmlParse function. 1.file on mycomputer options(encoding="gbk") library(XML) xmltext1 <- htmlParse("/home/tiger/Desktop/27174.htm" ) #/home/tiger/Desktop/27174.htm is the file of http://www.jb51.net/article/27174.htm downloaded on my computer. 2.url options(encoding="gbk")

htmlParse pop ups over web pages

2012 Sep 14

htmlParse pop ups over web pages

Hello All, I am trying to write a routine that loops over some links and parses those links using htmlParse. The problem is that one of the links may display a pop up window on top of that link's web page. If there is a pop up, the routine bombs and I get an error message that the url doesn't exist. Does the XML package (or perhaps another package) provide a way to deal with this

htmlParse hangs or crashes

2011 Sep 05

htmlParse hangs or crashes

Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the

similar to: Scraping data from website---Error in htmlParse: error in creating parser