Displaying 20 results from an estimated 1000 matches similar to: "puzzle using gsub (and encodings maybe)"
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi,
I'm trying to download some data from the web and am running into
problems with 'embedded null' characters. These seem to indicate to R
that it should stop processing the page so I'd like to remove them.
I've been looking around and can't seem to identify exactly what the
character is and consequently how to remove it.
# THE CODE WORKS ON THIS PAGE
library(RCurl)
2012 Aug 09
2
read htm table error
Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message.
Error in htmlParse(doc) :
error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team
theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
tables <- readHTMLTable(theurl)
Regards,
Kiung
[[alternative HTML version
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com"
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value=T))
[1] " |
2008 Aug 27
1
RCurl: using netrc with curlPerform
Hello,
I am having trouble getting the curlPerform function to authenticate
using the .netrc file. From the documentation I've read it
certainly seems as though this function should be able to authenticate
via the .netrc file.
The example I am using here comes from the "R as a Web Client- the RCurl
package" paper and demonstrates using the .netrc file to access the
2012 Oct 30
2
RCurl - curlPerform - Time out?!?
Hi,
I am working with the RCurl package and I am using the curlPerform
function for an soap-query.
The problem is that the code is usually working well, but sometimes the
connection gets lost.
So I wrote a while-loop to repeat the query if anything might happened
so that the same query runs again, but if the query-faults it takes a
very long time for the repetition.
My question is if there
2010 Oct 06
2
Converting scraped data
Dear Colleagues,
I used this code to scrape data from the URL conatined within. This
code should be reproducible.
require("XML")
library(XML)
theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
class(tables)
test<-data.frame(tables, stringsAsFactors=FALSE)
2015 Feb 05
3
Rcurl crash in R-devel
Hello,
I don't know if the problem originates from R-devel 3.2 or Rcurl itself.
I post this message to the R-devel list and to the author of RCurl
(duncan at r-project.org).
> library("RCurl")
Le chargement a n?cessit? le package : bitops
> print(sessionInfo())
R Under development (unstable) (2015-02-03 r67717)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under:
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not able to get
encoding right in htmlTreeParse command. See below
> library(RCurl)
> library(XML)
>
> site <- getURL("http://www.aarresaari.net/jobboard/jobs.html")
> txt <- readLines(tc <- textConnection(site)); close(tc)
> txt <- htmlTreeParse(txt,
2010 Oct 10
1
Create single vector after looping through multiple data frames with GREP
Hello all,
I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below.
I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming
2012 May 14
3
Scraping a web page.
Folks,
I want to scrape a series of web-page sources for strings like the following:
"/en/Ships/A-8605507.html"
"/en/Ships/Aalborg-8122830.html"
which appear in an href inside an <a> tag inside a <div> tag inside a table.
In fact all I want is the (exactly) 7-digit number before ".html".
The good news is that as far as I can tell the the <a>
2007 Nov 12
1
Microsoft SOAP - Help!!
Hello,
I am trying to access Microsoft Live Search Using SOAP through R.
In R I am using the RCurl packages to make the calls.
I have the following situation that looks crazy and cannot figure out how to
solve it:
#SOAP Request
library(RCurl)
h = basicTextGatherer()
body='<?xml version="1.0" encoding="ISO-8859-15"?>
<SOAP-ENV:Envelope
2008 Dec 17
1
Extract Data from a Webpage
Hi All:
I would like to extract the provider name, address, and phone number
from multiple webpages like this:
http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490
Based on searching R-help archives, it seems like the XML package
might have something useful for this task. I can load the XML package
and supply the url as an argument to
2009 Jun 02
1
Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host
Dear all,
I am having a problem downloading webpages through R when i run it in
the DOS window under Windows Vista Basic. I have downloaded the
batchfiles from http://code.google.com/p/batchfiles/ and have
successfully set the PATH.
I open up 'Command Prompt' in Vista and type (after the C:\...>
stuff):
### START ###
C:\Users\Karen>Rscript -e "library(RCurl);
2013 Jul 23
2
downloading web content
Hello,
I am trying to use R to download a bunch of .csv files such as:
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia
I have tried the following and neither work:
a<- getURL("
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia")
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
and
2012 Jun 07
1
How to set cookies in RCurl
Hi,
I am trying to access a website and read its content. The website is a
restricted access website that I access through a proxy server (which
therefore requires me to enable cookies). I have problems in allowing Rcurl
to receive and send cookies.
The following lines give me:
library(RCurl)
library(XML)
url <- "http://www.theurl.com"
content <- readHTMLTable(url)
content
2011 Apr 29
1
RCurl and postForm()
Hi everybody,
I think that I am missing something fundamental in how strings are passed from a postForm() call in R to the curl or libcurl functions underneath. For example, I can do the following using curl from the command line:
$ curl -d "Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people"
2011 Nov 16
1
Checking for monotonic sequence
I am scraping data from a web page using XML (excellent package BTW - that's scraping data the easy way!).
So far, I've got the code:
tables <- readHTMLTable(theurl)
rhf <- tables$tabResHistFull
div1 <- rhf[which(rhf$V1=="Div ps"),]
div1
which is giving me the result:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
15
2008 Aug 28
1
RCurl: authentication when posting forms
Hi,
Has anyone successfully used RCurl for posting data to a
password-protected site? I
have tired using option netrc=1 with both postForm and curlPerform (with
postfields option) but can't authenticate.
I would happily provide more details if some one has had some experience
with this.
Thanks very much.
Valerie
2011 Nov 03
1
RGoogleTrends error in "getGTrends"
Hi all,
I've just installed RGoogleTrends Version:0.2-1 (after compiling it for
windows).
And when running the most basic command I get the following error:
> ans = getGTrends("coupon")
Error in curlPerform(url = url, curl = curl, .opts = .opts) :
embedded nul in string: '<ff><fe>Y'
In addition: Warning message:
RS-DBI driver warning: (closing pending
2010 Sep 16
2
FTP Download
Hi,
I have problems downloading complete folders via ftp with R. Single files
work fine.
I tried Rcurl, but it does not work.
This is my code:
url =
"ftp://disc2.nascom.nasa.gov/data/TRMM/Gridded/Derived_Products/3B42_V6/Daily/2009/"
filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf =
TRUE)
filenames = paste(url, strsplit(filenames, "\r*\n")[[1]], sep =