Displaying 20 results from an estimated 800 matches similar to: "Removing Embedded Null characters from text/html"
2009 Oct 14
2
puzzle using gsub (and encodings maybe)
Hello,
Below is some output that shows my issue.
I have a variable x that I read from a file (more on this below)
> x
[1] "NEW YORK NEW ENGLAND"
> gsub(" -", "-", x) # this does not work!
[1] "NEW YORK NEW ENGLAND"
> Encoding(x) # is x in a special encoding? no
[1] "unknown"
> y = "NEW YORK -NEW
2012 Aug 09
2
read htm table error
Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message.
Error in htmlParse(doc) :
error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team
theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
tables <- readHTMLTable(theurl)
Regards,
Kiung
[[alternative HTML version
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com"
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value=T))
[1] " |
2008 Aug 27
1
RCurl: using netrc with curlPerform
Hello,
I am having trouble getting the curlPerform function to authenticate
using the .netrc file. From the documentation I've read it
certainly seems as though this function should be able to authenticate
via the .netrc file.
The example I am using here comes from the "R as a Web Client- the RCurl
package" paper and demonstrates using the .netrc file to access the
2012 May 14
3
Scraping a web page.
Folks,
I want to scrape a series of web-page sources for strings like the following:
"/en/Ships/A-8605507.html"
"/en/Ships/Aalborg-8122830.html"
which appear in an href inside an <a> tag inside a <div> tag inside a table.
In fact all I want is the (exactly) 7-digit number before ".html".
The good news is that as far as I can tell the the <a>
2012 Jun 07
1
How to set cookies in RCurl
Hi,
I am trying to access a website and read its content. The website is a
restricted access website that I access through a proxy server (which
therefore requires me to enable cookies). I have problems in allowing Rcurl
to receive and send cookies.
The following lines give me:
library(RCurl)
library(XML)
url <- "http://www.theurl.com"
content <- readHTMLTable(url)
content
2009 Jun 02
1
Problem downloading webpages using batchfiles and RCurl from command line in Vista Basic - couldn't connect to host
Dear all,
I am having a problem downloading webpages through R when i run it in
the DOS window under Windows Vista Basic. I have downloaded the
batchfiles from http://code.google.com/p/batchfiles/ and have
successfully set the PATH.
I open up 'Command Prompt' in Vista and type (after the C:\...>
stuff):
### START ###
C:\Users\Karen>Rscript -e "library(RCurl);
2013 Jul 23
2
downloading web content
Hello,
I am trying to use R to download a bunch of .csv files such as:
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia
I have tried the following and neither work:
a<- getURL("
http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia")
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
and
2008 Dec 17
1
Extract Data from a Webpage
Hi All:
I would like to extract the provider name, address, and phone number
from multiple webpages like this:
http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489&P2=11490
Based on searching R-help archives, it seems like the XML package
might have something useful for this task. I can load the XML package
and supply the url as an argument to
2012 Oct 30
2
RCurl - curlPerform - Time out?!?
Hi,
I am working with the RCurl package and I am using the curlPerform
function for an soap-query.
The problem is that the code is usually working well, but sometimes the
connection gets lost.
So I wrote a while-loop to repeat the query if anything might happened
so that the same query runs again, but if the query-faults it takes a
very long time for the repetition.
My question is if there
2013 Feb 21
4
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list.
Looks like the same issue in Russian:
library(RCurl)
library(XML)
u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1"
a = getURL(u)
a # Here - the Russian is fine.
a2 <- htmlParse(a)
a2 # Here it is a mess...
None of these seem to fix it:
htmlParse(a, encoding = "windows-1251")
htmlParse(a, encoding =
2010 Sep 16
2
FTP Download
Hi,
I have problems downloading complete folders via ftp with R. Single files
work fine.
I tried Rcurl, but it does not work.
This is my code:
url =
"ftp://disc2.nascom.nasa.gov/data/TRMM/Gridded/Derived_Products/3B42_V6/Daily/2009/"
filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf =
TRUE)
filenames = paste(url, strsplit(filenames, "\r*\n")[[1]], sep =
2010 Jul 21
1
Command that is conditional upon file retrieval: is it possible?
Hi all,
I'm currently working on an R program where I have to access an FTP server
to download some of the data I need. However, the people who post up the
files I access are at times inconsistent with regards to time posted, if
they post at all, etc.... Here's some of the code I use:
library(RCurl)
url1 = paste("ftp://user:password at a.great.website.com/",
2010 Oct 06
2
Converting scraped data
Dear Colleagues,
I used this code to scrape data from the URL conatined within. This
code should be reproducible.
require("XML")
library(XML)
theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
class(tables)
test<-data.frame(tables, stringsAsFactors=FALSE)
2010 Oct 10
1
Create single vector after looping through multiple data frames with GREP
Hello all,
I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below.
I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming
2012 Jan 30
1
Getting htmlParse to work with Hebrew? (on windows)
Hello dear R-help mailing list.
I wish to be able to have htmlParse work well with Hebrew, but it keeps to
scramble the Hebrew text in pages I feed into it.
For example:
# why can't I parse the Hebrew correctly?
library(RCurl)
library(XML)
u = "http://humus101.com/?p=2737"
a = getURL(u)
a # Here - the hebrew is fine.
a2 <- htmlParse(a)
a2 # Here it is a mess...
None of
2007 Nov 12
1
Microsoft SOAP - Help!!
Hello,
I am trying to access Microsoft Live Search Using SOAP through R.
In R I am using the RCurl packages to make the calls.
I have the following situation that looks crazy and cannot figure out how to
solve it:
#SOAP Request
library(RCurl)
h = basicTextGatherer()
body='<?xml version="1.0" encoding="ISO-8859-15"?>
<SOAP-ENV:Envelope
2008 Oct 01
1
changing 'https' to 'http' when using download.file(), any side effects or just use RCurl?
Dear R-Help,
>From reading the help file, it is my understanding the the download.file()
function does not support HTTPS connections. So therefore, understandably,
the follow produces an error:
### R Code
> url <- "https://stat.ethz.ch/pipermail/r-help/2008-October/thread.html"
> destfile <- "//PFO-SBS001/Redirected/tonyb/Desktop/R_web_test/tmp.txt"
>
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not able to get
encoding right in htmlTreeParse command. See below
> library(RCurl)
> library(XML)
>
> site <- getURL("http://www.aarresaari.net/jobboard/jobs.html")
> txt <- readLines(tc <- textConnection(site)); close(tc)
> txt <- htmlTreeParse(txt,
2015 Feb 05
3
Rcurl crash in R-devel
Hello,
I don't know if the problem originates from R-devel 3.2 or Rcurl itself.
I post this message to the R-devel list and to the author of RCurl
(duncan at r-project.org).
> library("RCurl")
Le chargement a n?cessit? le package : bitops
> print(sessionInfo())
R Under development (unstable) (2015-02-03 r67717)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: