Displaying 20 results from an estimated 400 matches similar to: "read htm table error"
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi,
I'm trying to download some data from the web and am running into
problems with 'embedded null' characters. These seem to indicate to R
that it should stop processing the page so I'd like to remove them.
I've been looking around and can't seem to identify exactly what the
character is and consequently how to remove it.
# THE CODE WORKS ON THIS PAGE
library(RCurl)
2012 Jun 07
1
How to set cookies in RCurl
Hi,
I am trying to access a website and read its content. The website is a
restricted access website that I access through a proxy server (which
therefore requires me to enable cookies). I have problems in allowing Rcurl
to receive and send cookies.
The following lines give me:
library(RCurl)
library(XML)
url <- "http://www.theurl.com"
content <- readHTMLTable(url)
content
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com"
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value=T))
[1] " |
2010 Oct 06
2
Converting scraped data
Dear Colleagues,
I used this code to scrape data from the URL conatined within. This
code should be reproducible.
require("XML")
library(XML)
theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
class(tables)
test<-data.frame(tables, stringsAsFactors=FALSE)
2010 Oct 10
1
Create single vector after looping through multiple data frames with GREP
Hello all,
I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below.
I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming
2011 Nov 16
1
Checking for monotonic sequence
I am scraping data from a web page using XML (excellent package BTW - that's scraping data the easy way!).
So far, I've got the code:
tables <- readHTMLTable(theurl)
rhf <- tables$tabResHistFull
div1 <- rhf[which(rhf$V1=="Div ps"),]
div1
which is giving me the result:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
15
2009 Oct 14
2
puzzle using gsub (and encodings maybe)
Hello,
Below is some output that shows my issue.
I have a variable x that I read from a file (more on this below)
> x
[1] "NEW YORK NEW ENGLAND"
> gsub(" -", "-", x) # this does not work!
[1] "NEW YORK NEW ENGLAND"
> Encoding(x) # is x in a special encoding? no
[1] "unknown"
> y = "NEW YORK -NEW
2013 Mar 20
1
htmlParse (from XML library) working sporadically in the same code
I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is
http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0
Sometimes the following code works
n<-readHTMLTable(htmlParse(url))
But most of the
2009 Nov 26
1
How to suppress errors generated by readHTMLTable?
library(XML)
download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079&unigene=&submit=Submit','index.html')
tables=readHTMLTable("index.html",error=function(...){})
tables
readHTMLTable gives me the following errors. Could somebody let me
know how to suppress them?
Opening and ending tag mismatch: center and table
htmlParseEntityRef: expecting
2010 Nov 04
3
postForm() in RCurl and library RHTMLForms
Hi RUsers,
Suppose I want to see the data on the website
url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
for the index "S&P CNX NIFTY" for
dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
then read the html table from the page using readHTMLtable()
I am using this code
webpage <-
2013 Jan 15
1
readHTMLTable (XML package)
Hi,
I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
> library(XML)
> wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1)
Error in htmlParse(doc) :
File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist
Dan
2010 Mar 18
1
Do colClasses in readHTMLTable (XML Package) work?
Hi,
I can't get the colClasses option to work in the readHTMLTable function
of the XML package. Here's a code fragment:
require("XML")
doc <- "http://www.nber.org/cycles/cyclesmain.html"
table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The
main table is the second one because it's embedded in the page table.
xt
2011 Aug 29
1
reading tables from multiple HTML pages
Hi, beginner to R and was having some problems scraping data from tables in
html using the XML package. I have included some code below.
I am trying to loop through a series of html pages, each of which contains a
single table from which I want to scrape data. However, some of the pages
are blank - and so it throws me an error message when it gets to
htmlParse(). The loop then closes out and I
2008 Dec 17
1
Extract Data from a Webpage
Hi All:
I would like to extract the provider name, address, and phone number
from multiple webpages like this:
http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490
Based on searching R-help archives, it seems like the XML package
might have something useful for this task. I can load the XML package
and supply the url as an argument to
2009 May 12
2
import HTML tables
Hello,
I was wondering if there is a function in R that imports tables directly
from a HTML document. I know there are functions (say, getURL() from {RCurl}
) that download the entire page source, but here I refer to something like
google document's function importHTML() (if you don't know this function, go
check it, it's very useful). Anyway, if someone of something that does this
2012 Mar 27
1
readHTLMTable help
Hello to everyone.
I´m using this function to download some information from a website.
This is the URL:
http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007&FechaIni=01-1-1980
If you go to that website you´ll find a table with meteorological
information. One column is called "Intesidad Máxima Diaria", and that is
the one i need.
I´ve been traying to extract that
2012 May 14
3
Scraping a web page.
Folks,
I want to scrape a series of web-page sources for strings like the following:
"/en/Ships/A-8605507.html"
"/en/Ships/Aalborg-8122830.html"
which appear in an href inside an <a> tag inside a <div> tag inside a table.
In fact all I want is the (exactly) 7-digit number before ".html".
The good news is that as far as I can tell the the <a>
2012 May 26
3
Problem with readHTMLTable
Hello All,
i was trying to simply run the readHTMLTable on the example published in the
package. And on a page I was working on. So running:
u = "http://en.wikipedia.org/wiki/List_of_countries_by_population"
tables = readHTMLTable(u)
returns the following error:
Error in tb[["thead"]] : subscript out of bounds
looking up this error on the web, didnt give me any hint. Is
2012 Jun 14
1
readHTMLTable function - unable to find an inherited method ~ for signature "NULL"
Hi R experts,
I have been playing with library(XML) recently and found out that
readHTMLTable workls flawlessly for some website, but it does give me an
error like below
... Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "readHTMLTable", for
signature "NULL"
let's say..for example, this code works fine
a
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not able to get
encoding right in htmlTreeParse command. See below
> library(RCurl)
> library(XML)
>
> site <- getURL("http://www.aarresaari.net/jobboard/jobs.html")
> txt <- readLines(tc <- textConnection(site)); close(tc)
> txt <- htmlTreeParse(txt,