thr3ads.net - similar to: "How to suppress errors from htmlTreeParse() function in XML package?"

Displaying 20 results from an estimated 500 matches similar to: "How to suppress errors from htmlTreeParse() function in XML package?"

Extract Data from a Webpage

2008 Dec 17

Extract Data from a Webpage

Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,

How to suppress errors generated by readHTMLTable?

2009 Nov 26

How to suppress errors generated by readHTMLTable?

library(XML) download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079&unigene=&submit=Submit','index.html') tables=readHTMLTable("index.html",error=function(...){}) tables readHTMLTable gives me the following errors. Could somebody let me know how to suppress them? Opening and ending tag mismatch: center and table htmlParseEntityRef: expecting

XML and RCurl: problem with encoding (htmlTreeParse)

2010 Jul 03

XML and RCurl: problem with encoding (htmlTreeParse)

Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |

Extracting text from html code using the RCurl package.

2008 Oct 06

Extracting text from html code using the RCurl package.

Dear R-help, I want to download the text from a web page, however what i end up with is the html code. Is there some option that i am missing in the RCurl package? Or is there another way to achieve this? This is the code i am using: > library(RCurl) > my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help' > html.file <- getURI(my.url, ssl.verifyhost = FALSE,

parsing Google search results

2009 Nov 16

parsing Google search results

Hi, how can I parse Google search results? The following code returns "integer(0)" instead of "1" although the results of the query clearly contain the regex "cran". #### address <- url("http://www.google.com/search?q=cran") open(address) lines <- readLines(address) grep("cran", lines[3]) #### Thanks Philip -- Philip Leifeld Max

Extraccion de datos de una Web

2016 Jan 18

Extraccion de datos de una Web

Buenas tardes, Quiero extraer datos de una web en la que ser relaciona la semana con la puntuación obtenida por un jugador. Ahora mismo llego a obtener elnodo en el que se relacionan la semana con la puntuación obtenida, pero no soy capaz de extraer esa informacion en una tabla de dos columna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo,

readPDF() -- unsure how to install xpdf to make this work?

2008 Nov 13

readPDF() -- unsure how to install xpdf to make this work?

Dear R-Help, I need to convert a set of '.pdf' files into an equivalent set of '.txt' files. This is so that i can do some text mining on the content. In the latest R-News letter (http://cran.r-project.org/doc/Rnews/ Rnews_2008-2.pdf), the package 'tm' for text mining is mentioned. In that lovely package, there is a function called 'readPDF()'. In order to use

R hangs after htmlTreeParse

2011 Aug 25

R hangs after htmlTreeParse

Dear colleagues, I'm trying to parse the html content from this webpage:

XML package example code?

2009 Nov 25

XML package example code?

I'm interested in parsing an html page. I should use XML, right? Could you somebody show me some example code? Is there a tutorial for this package?

Using a FOR LOOP to name objects

2012 Feb 29

Using a FOR LOOP to name objects

Hello, I am trying to use a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc,

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

2011 Oct 26

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-

Analyzing Publications from Pubmed via XML

2007 Dec 14

Analyzing Publications from Pubmed via XML

I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("

XML: Slower parsing over time with htmlTreeParse()

2010 Mar 15

XML: Slower parsing over time with htmlTreeParse()

Sorry, I listed the wrong package in the header of my previous post! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called

retrieve certain part from html

2009 Sep 23

retrieve certain part from html

Dear All, Can someone please guide me how to get the certain part from a long html language? e.g. "<td><a href='2005-01.html'>2005-01</a></td><td><a href='2006-01.html'>2006-01</a></td><td><a href='2007-01.html'>2007-01</a></td><td><a

Extraccion de datos de una Web

2016 Jan 19

Extraccion de datos de una Web

Muchas gracias a ambos!!!! Las dos soluciones me han funcionado. Un saludo. El día 18 de enero de 2016, 18:35, Carlos Ortega <cof en qualityexcellence.es> escribió: > Hola, > > Pero, si ya casi lo tienes... te quedan un par de pasos y ya está... > > Simplemente tienes que transformar "puntos_nodo" a bien un data.frame aunque > quedará lleno de cosas que no te

Downloading data from from internet

2009 Sep 24

Downloading data from from internet

Hi all, I want to download data from those two different sources, directly into R : http://www.rateinflation.com/consumer-price-index/usa-cpi.php http://eaindustry.nic.in/asp2/list_d.asp First one is CPI of US and 2nd one is WPI of India. Can anyone please give any clue how to download them directly into R. I want to make them zoo object for further analysis. Thanks, -- View this message in

XML htmlTreeParse fails with no obvious error

2012 Jun 08

XML htmlTreeParse fails with no obvious error

Hi all, Sorry for the rather uninformative subject, but the error I get is not very informative either. When using the XML and RCurl package to retrieve the content of an html page, htmlTreeParse fails, printing out the beginning of the HTML: Error in htmlTreeParse(getURL(url)) : File <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

Extracting a website text content using R

2007 Aug 01

Extracting a website text content using R

Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]]

RMySQL: Slower parsing over time with htmlTreeParse()

2010 Mar 15

RMySQL: Slower parsing over time with htmlTreeParse()

Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called repeatedly every minute over a couple of hours? Initially, a single parse takes about 0.5 seconds on my machine (Quad Core, 2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to 15 seconds or more.

similar to: How to suppress errors from htmlTreeParse() function in XML package?