similar to: puzzle using gsub (and encodings maybe)

Displaying 20 results from an estimated 1000 matches similar to: "puzzle using gsub (and encodings maybe)"

2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)
2012 Aug 09
2
read htm table error
Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message. Error in htmlParse(doc) : error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team" tables <- readHTMLTable(theurl) Regards, Kiung [[alternative HTML version
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |
2008 Aug 27
1
RCurl: using netrc with curlPerform
Hello, I am having trouble getting the curlPerform function to authenticate using the .netrc file. From the documentation I've read it certainly seems as though this function should be able to authenticate via the .netrc file. The example I am using here comes from the "R as a Web Client- the RCurl package" paper and demonstrates using the .netrc file to access the
2012 Oct 30
2
RCurl - curlPerform - Time out?!?
Hi, I am working with the RCurl package and I am using the curlPerform function for an soap-query. The problem is that the code is usually working well, but sometimes the connection gets lost. So I wrote a while-loop to repeat the query if anything might happened so that the same query runs again, but if the query-faults it takes a very long time for the repetition. My question is if there
2010 Oct 06
2
Converting scraped data
Dear Colleagues, I used this code to scrape data from the URL conatined within. This code should be reproducible. require("XML") library(XML) theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm" tables <- readHTMLTable(theurl) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test<-data.frame(tables, stringsAsFactors=FALSE)
2015 Feb 05
3
Rcurl crash in R-devel
Hello, I don't know if the problem originates from R-devel 3.2 or Rcurl itself. I post this message to the R-devel list and to the author of RCurl (duncan at r-project.org). > library("RCurl") Le chargement a n?cessit? le package : bitops > print(sessionInfo()) R Under development (unstable) (2015-02-03 r67717) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under:
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,
2010 Oct 10
1
Create single vector after looping through multiple data frames with GREP
Hello all, I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below. I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming
2012 May 14
3
Scraping a web page.
Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>
2007 Nov 12
1
Microsoft SOAP - Help!!
Hello, I am trying to access Microsoft Live Search Using SOAP through R. In R I am using the RCurl packages to make the calls. I have the following situation that looks crazy and cannot figure out how to solve it: #SOAP Request library(RCurl) h = basicTextGatherer() body='<?xml version="1.0" encoding="ISO-8859-15"?> <SOAP-ENV:Envelope
2008 Dec 17
1
Extract Data from a Webpage
Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to
2009 Jun 02
1
Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host
Dear all, I am having a problem downloading webpages through R when i run it in the DOS window under Windows Vista Basic. I have downloaded the batchfiles from http://code.google.com/p/batchfiles/ and have successfully set the PATH. I open up 'Command Prompt' in Vista and type (after the C:\...> stuff): ### START ### C:\Users\Karen>Rscript -e "library(RCurl);
2013 Jul 23
2
downloading web content
Hello, I am trying to use R to download a bunch of .csv files such as: http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia I have tried the following and neither work: a<- getURL(" http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia") Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : embedded nul in string: and
2012 Jun 07
1
How to set cookies in RCurl
Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content
2011 Apr 29
1
RCurl and postForm()
Hi everybody, I think that I am missing something fundamental in how strings are passed from a postForm() call in R to the curl or libcurl functions underneath. For example, I can do the following using curl from the command line: $ curl -d "Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people"
2011 Nov 16
1
Checking for monotonic sequence
I am scraping data from a web page using XML (excellent package BTW - that's scraping data the easy way!). So far, I've got the code: tables <- readHTMLTable(theurl) rhf <- tables$tabResHistFull div1 <- rhf[which(rhf$V1=="Div ps"),] div1 which is giving me the result:        V1 V2    V3    V4    V5    V6    V7          V8    V9   V10   V11   V12   V13   V14  V15 15
2008 Aug 28
1
RCurl: authentication when posting forms
Hi, Has anyone successfully used RCurl for posting data to a password-protected site? I have tired using option netrc=1 with both postForm and curlPerform (with postfields option) but can't authenticate. I would happily provide more details if some one has had some experience with this. Thanks very much. Valerie
2011 Nov 03
1
RGoogleTrends error in "getGTrends"
Hi all, I've just installed RGoogleTrends Version:0.2-1 (after compiling it for windows). And when running the most basic command I get the following error: > ans = getGTrends("coupon") Error in curlPerform(url = url, curl = curl, .opts = .opts) : embedded nul in string: '<ff><fe>Y' In addition: Warning message: RS-DBI driver warning: (closing pending
2010 Sep 16
2
FTP Download
Hi, I have problems downloading complete folders via ftp with R. Single files work fine. I tried Rcurl, but it does not work. This is my code: url = "ftp://disc2.nascom.nasa.gov/data/TRMM/Gridded/Derived_Products/3B42_V6/Daily/2009/" filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf = TRUE) filenames = paste(url, strsplit(filenames, "\r*\n")[[1]], sep =