Displaying 20 results from an estimated 800 matches similar to: "import HTML tables"
2011 Aug 25
1
R hangs after htmlTreeParse
Dear colleagues,
I'm trying to parse the html content from this webpage:
2011 Sep 05
2
htmlParse hangs or crashes
Dear colleagues,
each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi,
I'm trying to download some data from the web and am running into
problems with 'embedded null' characters. These seem to indicate to R
that it should stop processing the page so I'd like to remove them.
I've been looking around and can't seem to identify exactly what the
character is and consequently how to remove it.
# THE CODE WORKS ON THIS PAGE
library(RCurl)
2008 Dec 17
1
Extract Data from a Webpage
Hi All:
I would like to extract the provider name, address, and phone number
from multiple webpages like this:
http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490
Based on searching R-help archives, it seems like the XML package
might have something useful for this task. I can load the XML package
and supply the url as an argument to
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com"
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value=T))
[1] " |
2009 Sep 23
3
retrieve certain part from html
Dear All,
Can someone please guide me how to get the certain part from a long html
language?
e.g.
"<td><a href='2005-01.html'>2005-01</a></td><td><a
href='2006-01.html'>2006-01</a></td><td><a
href='2007-01.html'>2007-01</a></td><td><a
2008 Oct 06
3
Extracting text from html code using the RCurl package.
Dear R-help,
I want to download the text from a web page, however what i end up
with is the html code. Is there some option that i am missing in the
RCurl package? Or is there another way to achieve this? This is the
code i am using:
> library(RCurl)
> my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help'
> html.file <- getURI(my.url, ssl.verifyhost = FALSE,
2009 Nov 25
2
XML package example code?
I'm interested in parsing an html page. I should use XML, right? Could
you somebody show me some example code? Is there a tutorial for this
package?
2011 Oct 26
1
Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
Greetings,
I am trying to get all of the text from a web page as if I "selected
all" on the page, pasted into a text file, and then read in the text
file with read.csv().
# this is the actual page I'm trying to acquire text from:
web.pg <- readLines("http://www.airweb.org/?page=574")
# then parsed in hopes of an easier structure to work with:
web.pg <-
2013 Feb 21
4
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list.
Looks like the same issue in Russian:
library(RCurl)
library(XML)
u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1"
a = getURL(u)
a # Here - the Russian is fine.
a2 <- htmlParse(a)
a2 # Here it is a mess...
None of these seem to fix it:
htmlParse(a, encoding = "windows-1251")
htmlParse(a, encoding =
2007 Nov 18
4
Re ad HTML table
You can use htmlTreeParse and xpathApply from the XML library.
something like:
xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x)
xmlValue(x))
should do it.
Gamma wrote:
>
> anyone care to explain how to read a html table, it's streaming data
> (updated every second) and i am looking for a suitable function.
>
> The imported html
2009 Jun 23
1
How to find b entries using xPath?
We got all rows by:
library(XML)
doc =
htmlParse('http://www.statcan.gc.ca/daily-quotidien/090520/t090520b1-eng.htm')
rows = xpathSApply(doc, '//table/tbody/tr')
The last row is:
row_last = rows[15]
row_last
[[1]]
<tr><td id="t1stub17" class="stub1 RGBShade"><b>Unsmoothed composite
leading indicator</b></td>
<td
2009 Jun 04
0
XSS (was Re: Centos 5.3 -> Apache - Under Attack ? Oh hell....)
Bob Hoffman wrote:
> Since each install uses the same pages basically, it is easy for a
autobot
> to find them all and zero day your forums, xss your whatever, and so on.
>
> Dang scary to leave JS on at all....even though you basically have too.
Mozilla is beginning to address this issue with Content Security Policy
-=-
2012 Mar 21
1
Trouble installing the XML package
Hello everyone,
I am probably not the only one having trouble with this package but here goes.
I want to install XML on Ubuntu. I installed libxml2-dev and
everything works out fine until I get the following:
Error in reconcilePropertiesAndPrototype(name, slots, prototype,
superClasses, :
No definition was found for superclass "namedList" in the
specification of class
2005 Oct 20
5
spliting an integer
Hi there,
From the vector X of integers,
X = c(11999, 122000, 81997)
I would like to make these two vectors:
Z= c(1999, 2000, 1997)
Y =c(1 , 12 , 8)
That is, each entry of vector Z receives the four last digits of each entry of X, and Y receives "the rest".
Any suggestions?
Thanks in advance,
Dimitri
[[alternative HTML version deleted]]
2011 May 30
1
Need help reading website info with XML package and XPath
Hi, I'm looking for help extracting some information of the zillow website.
I'd like to do this for the general case where I manually change the address
by modifying the url (see code below). With the url containing the address,
I'd like to be able to extract the same information each time. The specific
information I'd like to be able to extract includes the homedetails url,
price
2013 Mar 20
1
htmlParse (from XML library) working sporadically in the same code
I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse isĀ
http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0
Sometimes the following code works
n<-readHTMLTable(htmlParse(url))
But most of the
2008 Nov 04
2
How to suppress errors from htmlTreeParse() function in XML package?
Dear R-help,
The following code downloads an html document into variable 'doc' and
then stores an internal representation into variable 'html.tree'. Even
if the html code is malformed, this still works which is fantastic.
However, as in the example below, i do get some ouput from R in the
console which i would like to suppress somehow, so i can keep my
window a bit cleaner.
I
2012 May 21
1
htmlParse Error
I am trying to parse a webpage using the htmlParse command in XML package as
follows:
library(XML)
u = "http://en.wikipedia.org/wiki/World_population"
doc = htmlParse(u)
I get the following error:
Error in htmlParse(u) :
error in creating parser for http://en.wikipedia.org/wiki/World_population
I am using a R 2.13.1 (32 bit version) on a 64 bit Windows. (I tried
installing it in
2012 Jan 30
1
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list.
I wish to be able to have htmlParse work well with Hebrew, but it keeps to
scramble the Hebrew text in pages I feed into it.
For example:
# why can't I parse the Hebrew correctly?
library(RCurl)
library(XML)
u = "http://humus101.com/?p=2737"
a = getURL(u)
a # Here - the hebrew is fine.
a2 <- htmlParse(a)
a2 # Here it is a mess...
None of