similar to: How to suppress errors from htmlTreeParse() function in XML package?

Displaying 20 results from an estimated 500 matches similar to: "How to suppress errors from htmlTreeParse() function in XML package?"

2008 Dec 17
1
Extract Data from a Webpage
Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,
2009 Nov 26
1
How to suppress errors generated by readHTMLTable?
library(XML) download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079&unigene=&submit=Submit','index.html') tables=readHTMLTable("index.html",error=function(...){}) tables readHTMLTable gives me the following errors. Could somebody let me know how to suppress them? Opening and ending tag mismatch: center and table htmlParseEntityRef: expecting
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |
2008 Oct 06
3
Extracting text from html code using the RCurl package.
Dear R-help, I want to download the text from a web page, however what i end up with is the html code. Is there some option that i am missing in the RCurl package? Or is there another way to achieve this? This is the code i am using: > library(RCurl) > my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help' > html.file <- getURI(my.url, ssl.verifyhost = FALSE,
2009 Nov 16
2
parsing Google search results
Hi, how can I parse Google search results? The following code returns "integer(0)" instead of "1" although the results of the query clearly contain the regex "cran". #### address <- url("http://www.google.com/search?q=cran") open(address) lines <- readLines(address) grep("cran", lines[3]) #### Thanks Philip -- Philip Leifeld Max
2016 Jan 18
3
Extraccion de datos de una Web
Buenas tardes, Quiero extraer datos de una web en la que ser relaciona la semana con la puntuación obtenida por un jugador. Ahora mismo llego a obtener elnodo en el que se relacionan la semana con la puntuación obtenida, pero no soy capaz de extraer esa informacion en una tabla de dos columna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo,
2008 Nov 13
1
readPDF() -- unsure how to install xpdf to make this work?
Dear R-Help, I need to convert a set of '.pdf' files into an equivalent set of '.txt' files. This is so that i can do some text mining on the content. In the latest R-News letter (http://cran.r-project.org/doc/Rnews/ Rnews_2008-2.pdf), the package 'tm' for text mining is mentioned. In that lovely package, there is a function called 'readPDF()'. In order to use
2011 Aug 25
1
R hangs after htmlTreeParse
Dear colleagues, I'm trying to parse the html content from this webpage:
2009 Nov 25
2
XML package example code?
I'm interested in parsing an html page. I should use XML, right? Could you somebody show me some example code? Is there a tutorial for this package?
2012 Feb 29
2
Using a FOR LOOP to name objects
Hello, I am trying to use a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc,
2011 Oct 26
1
Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-
2007 Dec 14
6
Analyzing Publications from Pubmed via XML
I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("
2010 Mar 15
1
XML: Slower parsing over time with htmlTreeParse()
Sorry, I listed the wrong package in the header of my previous post! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called
2009 Sep 23
3
retrieve certain part from html
Dear All, Can someone please guide me how to get the certain part from a long html language? e.g. "<td><a href='2005-01.html'>2005-01</a></td><td><a href='2006-01.html'>2006-01</a></td><td><a href='2007-01.html'>2007-01</a></td><td><a
2016 Jan 19
2
Extraccion de datos de una Web
Muchas gracias a ambos!!!! Las dos soluciones me han funcionado. Un saludo. El día 18 de enero de 2016, 18:35, Carlos Ortega <cof en qualityexcellence.es> escribió: > Hola, > > Pero, si ya casi lo tienes... te quedan un par de pasos y ya está... > > Simplemente tienes que transformar "puntos_nodo" a bien un data.frame aunque > quedará lleno de cosas que no te
2009 Sep 24
2
Downloading data from from internet
Hi all, I want to download data from those two different sources, directly into R : http://www.rateinflation.com/consumer-price-index/usa-cpi.php http://eaindustry.nic.in/asp2/list_d.asp First one is CPI of US and 2nd one is WPI of India. Can anyone please give any clue how to download them directly into R. I want to make them zoo object for further analysis. Thanks, -- View this message in
2012 Jun 08
0
XML htmlTreeParse fails with no obvious error
Hi all, Sorry for the rather uninformative subject, but the error I get is not very informative either. When using the XML and RCurl package to retrieve the content of an html page, htmlTreeParse fails, printing out the beginning of the HTML: Error in htmlTreeParse(getURL(url)) : File <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2007 Aug 01
4
Extracting a website text content using R
Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]]
2010 Mar 15
0
RMySQL: Slower parsing over time with htmlTreeParse()
Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called repeatedly every minute over a couple of hours? Initially, a single parse takes about 0.5 seconds on my machine (Quad Core, 2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to 15 seconds or more.