thr3ads.net - search: "htmltreeparse"

Displaying 20 results from an estimated 34 matches for "htmltreeparse".

2007 Nov 18

Re ad HTML table

You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote: > > anyone care to explain how to read a html table, it's streaming data > (updated every sec...

XML htmlTreeParse fails with no obvious error

2012 Jun 08

XML htmlTreeParse fails with no obvious error

Hi all, Sorry for the rather uninformative subject, but the error I get is not very informative either. When using the XML and RCurl package to retrieve the content of an html page, htmlTreeParse fails, printing out the beginning of the HTML: Error in htmlTreeParse(getURL(url)) : File <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&quo...

R hangs after htmlTreeParse

2011 Aug 25

R hangs after htmlTreeParse

...2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011") .x<-getURL(myurl) htmlTreeParse(.x, asText=T) This prints approximately 15 lines of the output from the html document and then mysteriously stops. The command line prompt does not reappear and force quit is the only option. I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL are installed. Yours, Simo...

How to suppress errors from htmlTreeParse() function in XML package?

2008 Nov 04

How to suppress errors from htmlTreeParse() function in XML package?

...is just letting me know that the html code is malformed, but for my purposes i can ignore that output. Is there a way to achieve this? ### Example: library(RCurl); library(XML) doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project %22&as_qdr=d1&num=100') html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE) ### Output - this is what i would like to suppress Tag nobr invalid htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' ### etc. I attempted to use try(expr, silent=TRUE) but that didn't work for me: > try(htmlTreeParse(doc, us...

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) > &...

XML: Slower parsing over time with htmlTreeParse()

2010 Mar 15

XML: Slower parsing over time with htmlTreeParse()

...eader of my previous post! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called repeatedly every minute over a couple of hours? Initially, a single parse takes about 0.5 seconds on my machine (Quad Core, 2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to 15 seconds or more. I''ve tried garbage collec...

XML and RCurl: problem with encoding (htmlTreeParse)

2010 Jul 03

XML and RCurl: problem with encoding (htmlTreeParse)

Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " | | ENGLISH" " " [3] " ()"...

RMySQL: Slower parsing over time with htmlTreeParse()

2010 Mar 15

RMySQL: Slower parsing over time with htmlTreeParse()

Dear List, has anyone of you experienced a significant increase in the time it takes to parse an URL via "htmlTreeParse()" when this function is called repeatedly every minute over a couple of hours? Initially, a single parse takes about 0.5 seconds on my machine (Quad Core, 2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to 15 seconds or more. I''ve tried garbage collec...

Extracting text from html code using the RCurl package.

2008 Oct 06

Extracting text from html code using the RCurl package.

...her way to achieve this? This is the code i am using: > library(RCurl) > my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help' > html.file <- getURI(my.url, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, followlocation = TRUE) > print(html.file) I thought perhaps the htmlTreeParse() function from the XML package might help, but I just don't know what to do next with it: > library(XML) > htmlTreeParse(html.file) Many thanks for any help you can provide, Tony Breyal > sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United...

XML package example code?

2009 Nov 25

XML package example code?

I'm interested in parsing an html page. I should use XML, right? Could you somebody show me some example code? Is there a tutorial for this package?

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

2011 Oct 26

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

...on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE) Now I have a lovely html tree, but don't know the best way to get just the text components (job descriptions, job titles, etc...) as they appear on the web site. I'd like to do a little text mining and make a wordcloud using the text. Can anybody suggest...

Extraccion de datos de una Web

2016 Jan 18

Extraccion de datos de una Web

...olumna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo, la segunda semana). De momento lo estoy obteniendo de la siguiente forma: url_jugador<-"http://localhost:8080/jugadores/Luis" txt_jugador <- getURL(url_jugador) doc<-htmlTreeParse(txt_jugador, useInternalNodes = TRUE) puntos_nodo<- xpathApply(doc, "//table[@class='points']/tr") puntos_nodo [[1]] <tr> <td class="semana">1</td> <td class="neg"/> <td> <div class="bar">6</div&gt...

htmlParse hangs or crashes

2011 Sep 05

htmlParse hangs or crashes

Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the URLs of the articles linked to on this search page). Regardless, I'd still like to understand why htmlParse doesn't work. Thank you for any insight. Yours, Simon Kiss myurl<-c("http:...

import HTML tables

2009 May 12

import HTML tables

Hello, I was wondering if there is a function in R that imports tables directly from a HTML document. I know there are functions (say, getURL() from {RCurl} ) that download the entire page source, but here I refer to something like google document's function importHTML() (if you don't know this function, go check it, it's very useful). Anyway, if someone of something that does this

Using a FOR LOOP to name objects

2012 Feb 29

Using a FOR LOOP to name objects

...a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc, "//table[@cellpadding = '3']") *my_list[i]*=lapply(tab_nodes, readHTMLTable) #problem is in this line names(*my_list[i]*)=c("Ins","outs") } The problem is that in iteraction #1, I need the info...

how to write html output (webscraped using RCurl package) into file?

2012 Apr 21

how to write html output (webscraped using RCurl package) into file?

i want "http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html",showing information in webpage to be written in .txt file as it is(i don't want any html tag) i am using "RCurl" package >marathi<-htmlTreeParse("http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html") >marathi >kasam<-marathi$children$html[["body"]][["pre"]][["text"]] >kasam > write(kasam,"papita.txt") Error in cat(list(...), file, sep, fill, labels, append) :...

Extract Data from a Webpage

2008 Dec 17

Extract Data from a Webpage

...like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to htmlTreeParse(), but I don't know how to go from there. thanks, Chuck Cleland > sessionInfo() R version 2.8.0 Patched (2008-12-04 r47066) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=Englis...

Bug with memory allocation when loading Rdata files iteratively?

2012 Feb 10

Bug with memory allocation when loading Rdata files iteratively?

...illing the process eventually). It just seems like removing the object via |rm()| and firing |gc()| do not have any effect, so the memory consumption of each loaded R object cumulates until there's no more memory left :-/ Possibly, this is also related to XML package functionality (mainly |htmlTreeParse| and |getNodeSet|), but I also experience the described behavior when simply iteratively loading and removing Rdata files. I've put together a little example that illustrates the memory ballooning mentioned above which you can find here: http://stackoverflow.com/questions/9220849/significan...

Chinese characters encoding problem with XML

2008 Dec 31

Chinese characters encoding problem with XML

XML is a good tool reading data from web within R. But I wonder how could get the encoding correctly. library(XML) url <- 'http://www.szitic.com/docc/jz-lmzq.html' xml <- htmlTreeParse(url, useInternal=TRUE) q <- "//tbody/tr/td" dat <- unlist(xpathApply(xml, q, xmlValue)) df <- as.data.frame(t(matrix(dat, 4))) dt<-as.character(df[15,1]) The first column of df is dates in Chinese. dt is one of the Chinese dates. When I copied the content of dt into the ema...

Need help reading website info with XML package and XPath

2011 May 30

Need help reading website info with XML package and XPath

...uessing my xpath statements are wrong or getNodeSet needs something else to get to information contained in a bubble on a webpage. Any suggestions or ideas would be GREATLY appreciated. library(XML) url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb" doc <- htmlTreeParse(url, useInternalNode=TRUE, isURL=TRUE) f1 <- getNodeSet(doc, "//a[contains(@href,'homedetails')]") f2 <- getNodeSet(doc, "//span[contains(@class,'price')]") f3 <- getNodeSet(doc, "//LIST[@Beds]") f4 <- getNodeSet(doc, "//LIST[@Baths]&qu...

search for: htmltreeparse