Displaying 20 results from an estimated 200 matches similar to: "XML htmlTreeParse fails with no obvious error"
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not able to get
encoding right in htmlTreeParse command. See below
> library(RCurl)
> library(XML)
>
> site <- getURL("http://www.aarresaari.net/jobboard/jobs.html")
> txt <- readLines(tc <- textConnection(site)); close(tc)
> txt <- htmlTreeParse(txt,
2008 Nov 04
2
How to suppress errors from htmlTreeParse() function in XML package?
Dear R-help,
The following code downloads an html document into variable 'doc' and
then stores an internal representation into variable 'html.tree'. Even
if the html code is malformed, this still works which is fantastic.
However, as in the example below, i do get some ouput from R in the
console which i would like to suppress somehow, so i can keep my
window a bit cleaner.
I
2010 Mar 15
0
RMySQL: Slower parsing over time with htmlTreeParse()
Dear List,
has anyone of you experienced a significant increase in the time it takes to
parse an URL via "htmlTreeParse()" when this function is called repeatedly
every minute over a couple of hours?
Initially, a single parse takes about 0.5 seconds on my machine (Quad Core,
2.67 GHz, 8 MB RAM, Windows 7 64 Bit), . After some time, this can go up to
15 seconds or more.
2011 Aug 25
1
R hangs after htmlTreeParse
Dear colleagues,
I'm trying to parse the html content from this webpage:
2010 Mar 15
1
XML: Slower parsing over time with htmlTreeParse()
Sorry, I listed the wrong package in the header of my previous post!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Dear List,
has anyone of you experienced a significant increase in the time it takes to
parse an URL via "htmlTreeParse()" when this function is called
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com"
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value=T))
[1] " |
2009 Apr 22
0
make fails when using with-x=no on linux CentOS 5.3 (PR#13670)
Full_Name: Nicolas Delhomme
Version: 2.9.0
OS: Linux CentOS release 5.3 kernel 2.6.18-128.el5 arch x86_64
Submission from: (NULL) (194.94.44.4)
Hi,
The commands I used to compile R2.9.0 on CentOS
./compile --with-x=no
make
This fails with the following message:
make[2]: Leaving directory `/home/delhomme/R-2.9.0/src/modules/vfonts'
make[1]: Leaving directory
2011 Sep 05
2
htmlParse hangs or crashes
Dear colleagues,
each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the
2007 Nov 18
4
Re ad HTML table
You can use htmlTreeParse and xpathApply from the XML library.
something like:
xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x)
xmlValue(x))
should do it.
Gamma wrote:
>
> anyone care to explain how to read a html table, it's streaming data
> (updated every second) and i am looking for a suitable function.
>
> The imported html
2008 Oct 06
3
Extracting text from html code using the RCurl package.
Dear R-help,
I want to download the text from a web page, however what i end up
with is the html code. Is there some option that i am missing in the
RCurl package? Or is there another way to achieve this? This is the
code i am using:
> library(RCurl)
> my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help'
> html.file <- getURI(my.url, ssl.verifyhost = FALSE,
2006 May 22
1
rerender tcltk toplevel
Hi everybody,
I am trying to write a simple progress display based on a tcltk
toplevel. My first approach was to use the progressBar widget from the
BWidget library but since this is not available on every system (missing
on at least almost all windows systems, I guess...) I wanted to have a
backup there. So my second strategy was to use a simple toplevel with a
label and update the tclvariable
2011 Oct 26
1
Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
Greetings,
I am trying to get all of the text from a web page as if I "selected
all" on the page, pasted into a text file, and then read in the text
file with read.csv().
# this is the actual page I'm trying to acquire text from:
web.pg <- readLines("http://www.airweb.org/?page=574")
# then parsed in hopes of an easier structure to work with:
web.pg <-
2012 Feb 29
2
Using a FOR LOOP to name objects
Hello,
I am trying to use a for loop to name objects in each iteraction. As in the
following example (which doesn't work quite well)
my_list<-c("A","B","C","D","E","F")
for(i in c(1:length(my_list))){
url<- "http://finance.yahoo.com"
doc = htmlTreeParse(url, useInternalNodes = T)
tab_nodes = xpathApply(doc,
2012 Apr 21
1
how to write html output (webscraped using RCurl package) into file?
i want
"http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html",showing
information in webpage to be written in .txt file as it is(i don't want any
html tag)
i am using "RCurl" package
>marathi<-htmlTreeParse("http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html")
>marathi
2008 Dec 17
1
Extract Data from a Webpage
Hi All:
I would like to extract the provider name, address, and phone number
from multiple webpages like this:
http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490
Based on searching R-help archives, it seems like the XML package
might have something useful for this task. I can load the XML package
and supply the url as an argument to
2012 Feb 10
1
Bug with memory allocation when loading Rdata files iteratively?
Dear list,
when iterating over a set of Rdata files that are loaded, analyzed and
then removed from memory again, I experience a *significant* increase in
an R process' memory consumption (killing the process eventually).
It just seems like removing the object via |rm()| and firing |gc()| do
not have any effect, so the memory consumption of each loaded R object
cumulates until
2008 Dec 31
1
Chinese characters encoding problem with XML
XML is a good tool reading data from web within R. But I wonder how could get the encoding correctly.
library(XML)
url <- 'http://www.szitic.com/docc/jz-lmzq.html'
xml <- htmlTreeParse(url, useInternal=TRUE)
q <- "//tbody/tr/td"
dat <- unlist(xpathApply(xml, q, xmlValue))
df <- as.data.frame(t(matrix(dat, 4)))
dt<-as.character(df[15,1])
The first column of df
2016 Jan 18
3
Extraccion de datos de una Web
Buenas tardes,
Quiero extraer datos de una web en la que ser relaciona la semana con
la puntuación obtenida por un jugador. Ahora mismo llego a obtener
elnodo en el que se relacionan la semana con la puntuación obtenida,
pero no soy capaz de extraer esa informacion en una tabla de dos
columna (semana, puntuacion) teniendo en cuenta que puede que haya
semanas que no haya puntuado (en el ejemplo,
2011 May 30
1
Need help reading website info with XML package and XPath
Hi, I'm looking for help extracting some information of the zillow website.
I'd like to do this for the general case where I manually change the address
by modifying the url (see code below). With the url containing the address,
I'd like to be able to extract the same information each time. The specific
information I'd like to be able to extract includes the homedetails url,
price
2002 May 08
0
Problems with package XML
I'm having some difficulties with the package XML.
Namely, issuing the following commands:
> library(XML)
> hp <- htmlTreeParse('http://www.liacc.up.pt/~ltorgo/index.html',isURL=T)
I get a flood of messages like this :
Save workspace image? [y/n/c]: readline: warning: rl_prep_terminal: cannot
get terminal settings
My system is:
> version
_