similar to: Converting scraped data

Displaying 20 results from an estimated 1300 matches similar to: "Converting scraped data"

2010 Oct 10
1
Create single vector after looping through multiple data frames with GREP
Hello all, I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below. I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming
2012 Aug 09
2
read htm table error
Hi I am using Version R 2.15 and I haven't been able read html table. Following is my code and error message. Error in htmlParse(doc) : error in creating parser for http://en.wikipedia.org/wiki/Brazil_national_football_team theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team" tables <- readHTMLTable(theurl) Regards, Kiung [[alternative HTML version
2011 Mar 30
1
sampling design runs with no errors but returns empty data set
Dear colleagues, I'm working with the 2008 Canada Election Studies (http://www.queensu.ca/cora/_files/_CES/CES2008.sav.zip), trying to construct a weighted national sample using the survey package. Three weights are included in the national survey (a household weight, a provincial weight and a national weight which is a product of the first two). In the following code I removed variables with
2011 Nov 16
1
Checking for monotonic sequence
I am scraping data from a web page using XML (excellent package BTW - that's scraping data the easy way!). So far, I've got the code: tables <- readHTMLTable(theurl) rhf <- tables$tabResHistFull div1 <- rhf[which(rhf$V1=="Div ps"),] div1 which is giving me the result:        V1 V2    V3    V4    V5    V6    V7          V8    V9   V10   V11   V12   V13   V14  V15 15
2012 May 28
1
Rcurl, postForm()
Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic
2012 Jun 07
1
How to set cookies in RCurl
Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url <- "http://www.theurl.com" content <- readHTMLTable(url) content
2012 Apr 16
1
grep and XML
Hi all: I struggle a lot scraping web data. I still haven't got a handle on the XML package. I'd like to get particular exchange rates from this table: https://raw.github.com/currencybot/open-exchange-rates/master/latest.json This is the code that I'm working with: library(RCurl) library(XML)
2012 Nov 09
5
using lapply with recode
Hello: Forgive me, this is surely a simple question but I can't figure it out, having consulted the help archives and "Data Manipulation With R" (Spector). I have a list of 11 data frames with one common variable in each (prov). I'd like to use lapply to go through and recode one particular level of that common variable. I can get the recode to work, but it only returns the
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |
2009 Oct 15
1
Removing Embedded Null characters from text/html
Hi, I'm trying to download some data from the web and am running into problems with 'embedded null' characters. These seem to indicate to R that it should stop processing the page so I'd like to remove them. I've been looking around and can't seem to identify exactly what the character is and consequently how to remove it. # THE CODE WORKS ON THIS PAGE library(RCurl)
2009 Oct 14
2
puzzle using gsub (and encodings maybe)
Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) > x [1] "NEW YORK NEW ENGLAND" > gsub(" -", "-", x) # this does not work! [1] "NEW YORK NEW ENGLAND" > Encoding(x) # is x in a special encoding? no [1] "unknown" > y = "NEW YORK -NEW
2010 Oct 19
2
separate elements of a character vector
Dear colleagues, this seems like an easy problem, and I found some suggestions which I've incorporated in the help list, but I can't quite get it right. I want to add a series of years to a second x-axis category label. I generate them with test and test_2 below, format them with some spacing (which is the suggestion I took from the R-list) and concatenate them and then write them with
2011 Jul 28
2
cycling from x11 window in RCommander to graphics device window: Mac Os 10.6.8
Dear Colleagues, I have recently installed R Commander on my Mac OS 10.6.8. I'd like to use it for an undergraduate class this year. Everything appears to be working fine, except for one thing. I cannot use Command-tab to cycle from the X11 window in which RCommander is running to any other window open in my workspace. This is particularly important because I cannot cycle to the graphics
2012 Aug 20
2
Changing line length in Sweave output works for numeric, but not for character vectors
Hi there: I'm preparing a report in RStudio 0.96.330 on a Mac OS. I'm running R 2.15.0 I understand from Ross Ihaka's document (http://www.stat.auckland.ac.nz/~stat782/downloads/Sweave-customisation.pdf) that you can modify the line length of Sweave output by a call to options(wdith=x). This works great for me for numeric output, but not for character vectors that I have to print.
2011 Nov 10
2
Listing tables together from random samples from a generated population?
. HI there, I'd like to show demonstrate how the chi-squared distribution works, so I've come up with a sample data frame of two categorical variables y<-data.frame(gender=sample(c('Male', 'Female'), size=100000, replace=TRUE, c(0.5, 0.5)), tea=sample(c('Yes', 'No'), size=100000, replace=TRUE, c(0.5, 0.5))) And I'd like to create a list of 100
2011 Jan 18
2
Counting dates in arbitrary ranges
Dear Colleagues, I have a data set that looks as below. I'd like to count the number of dates in a series of arbitrary ranges (breaks) i.e. not pre-defined breaks such as months, quarters or years. table(format()) produces ideally formatted output, but table() does not appear to accept arbitrary ranges. I also tried converting the dates to numeric and using histogram to try to get the data,
2011 Jan 17
1
Importing multiple text files with lapply.
Hello, I'm trying to read in 50 text filess with dates as content to create a list of tables. a is the list of filenames that need to be read in. The following command returns the following error mylist<-lapply(a, read.table(header=TRUE, sep="\n")) Error in read.table(header = TRUE, sep = "\n") : element 1 is empty; the part of the args list of
2011 Jul 11
1
grep lines before or after pattern matched?
Dear colleagues, I have a series of newspaper articles in a text file, downloaded from a text file. They look as follows: Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date I have a series of grep scripts that can extract the date and convert it to a date object, but I can't figure out how to grep the newspaper name. There is no field ID attached to those lines. The best I can come
2011 Jan 25
1
subsetting based on joint values of critera
Dear colleagues, I have a dataset that looks as below. I would like to make a new dataset that excludes the cases which are joint conjunctions of particular state names and years, so Connecticut and 2010, Maryland and 2010 and Vermont and 2010. I'm trying the following subset code: newdata<- subset(bpa, (!State=="Connecticut" & year<"2010")) It appears that
2011 Mar 31
1
error in recode.defalt ....object '.data' not found
Dear colleagues, working with the data frame below, trying to reverse two variables I the error message below. i searched through the help list but could not find any postings which could help me solve the situation. I tried attaching and detaching the data frame to no avail. Yours, Simon Kiss *DATA FRAME 'data.frame': 1569 obs. of 9 variables: $ equal : num 3 4 3 2 3 4 2 3 2 2 ...