thr3ads.net - similar to: "XML htmlTreeParse fails with no obvious error"

Displaying 20 results from an estimated 200 matches similar to: "XML htmlTreeParse fails with no obvious error"

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,

How to suppress errors from htmlTreeParse() function in XML package?

2008 Nov 04

How to suppress errors from htmlTreeParse() function in XML package?

Dear R-help, The following code downloads an html document into variable 'doc' and then stores an internal representation into variable 'html.tree'. Even if the html code is malformed, this still works which is fantastic. However, as in the example below, i do get some ouput from R in the console which i would like to suppress somehow, so i can keep my window a bit cleaner. I

RMySQL: Slower parsing over time with htmlTreeParse()

2010 Mar 15

RMySQL: Slower parsing over time with htmlTreeParse()

Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called repeatedly every minute over a couple of hours? Initially, a single parse takes about 0.5 seconds on my machine (Quad Core, 2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to 15 seconds or more.

R hangs after htmlTreeParse

2011 Aug 25

R hangs after htmlTreeParse

Dear colleagues, I'm trying to parse the html content from this webpage:

XML: Slower parsing over time with htmlTreeParse()

2010 Mar 15

XML: Slower parsing over time with htmlTreeParse()

Sorry, I listed the wrong package in the header of my previous post! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called

XML and RCurl: problem with encoding (htmlTreeParse)

2010 Jul 03

XML and RCurl: problem with encoding (htmlTreeParse)

Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |

make fails when using with-x=no on linux CentOS 5.3 (PR#13670)

2009 Apr 22

make fails when using with-x=no on linux CentOS 5.3 (PR#13670)

Full_Name: Nicolas Delhomme Version: 2.9.0 OS: Linux CentOS release 5.3 kernel 2.6.18-128.el5 arch x86_64 Submission from: (NULL) (194.94.44.4) Hi, The commands I used to compile R2.9.0 on CentOS ./compile --with-x=no make This fails with the following message: make[2]: Leaving directory `/home/delhomme/R-2.9.0/src/modules/vfonts' make[1]: Leaving directory

htmlParse hangs or crashes

2011 Sep 05

htmlParse hangs or crashes

Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the

Re ad HTML table

2007 Nov 18

Re ad HTML table

You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote: > > anyone care to explain how to read a html table, it's streaming data > (updated every second) and i am looking for a suitable function. > > The imported html

Extracting text from html code using the RCurl package.

2008 Oct 06

Extracting text from html code using the RCurl package.

Dear R-help, I want to download the text from a web page, however what i end up with is the html code. Is there some option that i am missing in the RCurl package? Or is there another way to achieve this? This is the code i am using: > library(RCurl) > my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help' > html.file <- getURI(my.url, ssl.verifyhost = FALSE,

rerender tcltk toplevel

2006 May 22

rerender tcltk toplevel

Hi everybody, I am trying to write a simple progress display based on a tcltk toplevel. My first approach was to use the progressBar widget from the BWidget library but since this is not available on every system (missing on at least almost all windows systems, I guess...) I wanted to have a backup there. So my second strategy was to use a simple toplevel with a label and update the tclvariable

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

2011 Oct 26

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-

Using a FOR LOOP to name objects

2012 Feb 29

Using a FOR LOOP to name objects

Hello, I am trying to use a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc,

how to write html output (webscraped using RCurl package) into file?

2012 Apr 21

how to write html output (webscraped using RCurl package) into file?

i want "http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html",showing information in webpage to be written in .txt file as it is(i don't want any html tag) i am using "RCurl" package >marathi<-htmlTreeParse("http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html") >marathi

Extract Data from a Webpage

2008 Dec 17

Extract Data from a Webpage

Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to

Bug with memory allocation when loading Rdata files iteratively?

2012 Feb 10

Bug with memory allocation when loading Rdata files iteratively?

Dear list, when iterating over a set of Rdata files that are loaded, analyzed and then removed from memory again, I experience a *significant* increase in an R process' memory consumption (killing the process eventually). It just seems like removing the object via |rm()| and firing |gc()| do not have any effect, so the memory consumption of each loaded R object cumulates until

Chinese characters encoding problem with XML

2008 Dec 31

Chinese characters encoding problem with XML

XML is a good tool reading data from web within R. But I wonder how could get the encoding correctly. library(XML) url <- 'http://www.szitic.com/docc/jz-lmzq.html' xml <- htmlTreeParse(url, useInternal=TRUE) q <- "//tbody/tr/td" dat <- unlist(xpathApply(xml, q, xmlValue)) df <- as.data.frame(t(matrix(dat, 4))) dt<-as.character(df[15,1]) The first column of df

Extraccion de datos de una Web

2016 Jan 18

Extraccion de datos de una Web

Buenas tardes, Quiero extraer datos de una web en la que ser relaciona la semana con la puntuación obtenida por un jugador. Ahora mismo llego a obtener elnodo en el que se relacionan la semana con la puntuación obtenida, pero no soy capaz de extraer esa informacion en una tabla de dos columna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo,

Need help reading website info with XML package and XPath

2011 May 30

Need help reading website info with XML package and XPath

Hi, I'm looking for help extracting some information of the zillow website. I'd like to do this for the general case where I manually change the address by modifying the url (see code below). With the url containing the address, I'd like to be able to extract the same information each time. The specific information I'd like to be able to extract includes the homedetails url, price

Problems with package XML

2002 May 08

Problems with package XML

I'm having some difficulties with the package XML. Namely, issuing the following commands: > library(XML) > hp <- htmlTreeParse('http://www.liacc.up.pt/~ltorgo/index.html',isURL=T) I get a flood of messages like this : Save workspace image? [y/n/c]: readline: warning: rl_prep_terminal: cannot get terminal settings My system is: > version _

similar to: XML htmlTreeParse fails with no obvious error