thr3ads.net - similar to: "read htm table error"

Displaying 20 results from an estimated 400 matches similar to: "read htm table error"

Removing Embedded Null characters from text/html

2009 Oct 15

Removing Embedded Null characters from text/html

Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)

How to set cookies in RCurl

2012 Jun 07

How to set cookies in RCurl

Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content

XML and RCurl: problem with encoding (htmlTreeParse)

2010 Jul 03

XML and RCurl: problem with encoding (htmlTreeParse)

Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |

Converting scraped data

2010 Oct 06

Converting scraped data

Dear Colleagues, I used this code to scrape data from the URL conatined within. This code should be reproducible. require("XML") library(XML) theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm" tables <- readHTMLTable(theurl) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test<-data.frame(tables, stringsAsFactors=FALSE)

Create single vector after looping through multiple data frames with GREP

2010 Oct 10

Create single vector after looping through multiple data frames with GREP

Hello all, I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below. I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming

Checking for monotonic sequence

2011 Nov 16

Checking for monotonic sequence

I am scraping data from a web page using XML (excellent package BTW - that's scraping data the easy way!). So far, I've got the code: tables <- readHTMLTable(theurl) rhf <- tables$tabResHistFull div1 <- rhf[which(rhf$V1=="Div ps"),] div1 which is giving me the result: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 15

puzzle using gsub (and encodings maybe)

2009 Oct 14

puzzle using gsub (and encodings maybe)

Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) > x [1] "NEW YORK NEW ENGLAND" > gsub(" -", "-", x) # this does not work! [1] "NEW YORK NEW ENGLAND" > Encoding(x) # is x in a special encoding? no [1] "unknown" > y = "NEW YORK -NEW

htmlParse (from XML library) working sporadically in the same code

2013 Mar 20

htmlParse (from XML library) working sporadically in the same code

I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n<-readHTMLTable(htmlParse(url)) But most of the

How to suppress errors generated by readHTMLTable?

2009 Nov 26

How to suppress errors generated by readHTMLTable?

library(XML) download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079&unigene=&submit=Submit','index.html') tables=readHTMLTable("index.html",error=function(...){}) tables readHTMLTable gives me the following errors. Could somebody let me know how to suppress them? Opening and ending tag mismatch: center and table htmlParseEntityRef: expecting

postForm() in RCurl and library RHTMLForms

2010 Nov 04

postForm() in RCurl and library RHTMLForms

Hi RUsers, Suppose I want to see the data on the website url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm" for the index "S&P CNX NIFTY" for dates "FromDate"="01-11-2010","ToDate"="02-11-2010" then read the html table from the page using readHTMLtable() I am using this code webpage <-

readHTMLTable (XML package)

2013 Jan 15

readHTMLTable (XML package)

Hi, I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that. > library(XML) > wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html',1) Error in htmlParse(doc) : File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not exist Dan

Do colClasses in readHTMLTable (XML Package) work?

2010 Mar 18

Do colClasses in readHTMLTable (XML Package) work?

Hi, I can't get the colClasses option to work in the readHTMLTable function of the XML package. Here's a code fragment: require("XML") doc <- "http://www.nber.org/cycles/cyclesmain.html" table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The main table is the second one because it's embedded in the page table. xt

reading tables from multiple HTML pages

2011 Aug 29

reading tables from multiple HTML pages

Hi, beginner to R and was having some problems scraping data from tables in html using the XML package. I have included some code below. I am trying to loop through a series of html pages, each of which contains a single table from which I want to scrape data. However, some of the pages are blank - and so it throws me an error message when it gets to htmlParse(). The loop then closes out and I

Extract Data from a Webpage

2008 Dec 17

Extract Data from a Webpage

Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to

import HTML tables

2009 May 12

import HTML tables

Hello, I was wondering if there is a function in R that imports tables directly from a HTML document. I know there are functions (say, getURL() from {RCurl} ) that download the entire page source, but here I refer to something like google document's function importHTML() (if you don't know this function, go check it, it's very useful). Anyway, if someone of something that does this

readHTLMTable help

2012 Mar 27

readHTLMTable help

Hello to everyone. I´m using this function to download some information from a website. This is the URL: http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007&FechaIni=01-1-1980 If you go to that website you´ll find a table with meteorological information. One column is called "Intesidad Máxima Diaria", and that is the one i need. I´ve been traying to extract that

Scraping a web page.

2012 May 14

Scraping a web page.

Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>

Problem with readHTMLTable

2012 May 26

Problem with readHTMLTable

Hello All, i was trying to simply run the readHTMLTable on the example published in the package. And on a page I was working on. So running: u = "http://en.wikipedia.org/wiki/List_of_countries_by_population" tables = readHTMLTable(u) returns the following error: Error in tb[["thead"]] : subscript out of bounds looking up this error on the web, didnt give me any hint. Is

readHTMLTable function - unable to find an inherited method ~ for signature "NULL"

2012 Jun 14

readHTMLTable function - unable to find an inherited method ~ for signature "NULL"

Hi R experts, I have been playing with library(XML) recently and found out that readHTMLTable workls flawlessly for some website, but it does give me an error like below ... Error in function (classes, fdef, mtable) : unable to find an inherited method for function "readHTMLTable", for signature "NULL" let's say..for example, this code works fine a

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,

similar to: read htm table error