I am trying to get stock metadata from Yahoo finance (or maybe there is a better source?) here is what I did so far: yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s="; stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some samples socket <- url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r"); data <- read.csv(socket, header = FALSE); close(socket); data is now: V1 V2 V3 V4 1 200.5B 116.00 166.25 4965150 2 19.1M 3.75 5.47 8521 3 226.6B 22.73 31.58 57127000 4 886.4M 30.80 74.54 226690 5 142.4B 3.21 5.15 541804992 6 276.4M 11.98 21.30 149656 7 55.823B 9.75 18.97 89369000 now I need to do this: --> convert 55.823B to 55e9 and 19.1M to 19e6 parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B$","e9",s))); } data[1]<-lapply(data[1],parse.num); seems like awfully inefficient (two regexp substitutions), is there a better way? --> iterate over stocks & data at the same time and put the results into a hash table: for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,]; I do get the right results, but I am wondering if I am doing it "the right R way". E.g., the hash table value is a data frame. A structure(record?) seems more appropriate. thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://pmw.org.il http://ffii.org http://camera.org http://honestreporting.com http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com I haven't lost my mind -- it's backed up on tape somewhere.
----------------------------------------> To: r-help at stat.math.ethz.ch > From: sds at gnu.org > Date: Tue, 15 Feb 2011 17:20:11 -0500 > Subject: [R] string parsing > > I am trying to get stock metadata from Yahoo finance (or maybe there is > a better source?)search this for "yahoo", http://cran.r-project.org/web/packages/quantmod/quantmod.pdf as a perennial page scraper, I was amazed this existed :)> here is what I did so far: > > yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s="; > stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some samples > socket <- url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r"); > data <- read.csv(socket, header = FALSE); > close(socket); > data is now: > V1 V2 V3 V4 > 1 200.5B 116.00 166.25 4965150 > 2 19.1M 3.75 5.47 8521 > 3 226.6B 22.73 31.58 57127000 > 4 886.4M 30.80 74.54 226690 > 5 142.4B 3.21 5.15 541804992 > 6 276.4M 11.98 21.30 149656 > 7 55.823B 9.75 18.97 89369000 > > now I need to do this: > > --> convert 55.823B to 55e9 and 19.1M to 19e6 > > parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B$","e9",s))); } > data[1]<-lapply(data[1],parse.num); > > seems like awfully inefficient (two regexp substitutions), > is there a better way? > > --> iterate over stocks & data at the same time and put the results into > a hash table: > for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,]; > > I do get the right results, > but I am wondering if I am doing it "the right R way". > E.g., the hash table value is a data frame. > A structure(record?) seems more appropriate. > > thanks! > > -- > Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) > http://pmw.org.il http://ffii.org http://camera.org http://honestreporting.com > http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com > I haven't lost my mind -- it's backed up on tape somewhere. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
try this:> x <- c('15.5B', '13.6M') > x <- sub("B", 'e9', x) > x <- sub("M", 'e6', x) > as.numeric(x)[1] 15500000000 13600000 On Tue, Feb 15, 2011 at 5:20 PM, Sam Steingold <sds at gnu.org> wrote:> I am trying to get stock metadata from Yahoo finance (or maybe there is > a better source?) > here is what I did so far: > > yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s="; > stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some samples > socket <- url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r"); > data <- read.csv(socket, header = FALSE); > close(socket); > data is now: > ? ? ? V1 ? ? V2 ? ? V3 ? ? ? ?V4 > 1 ?200.5B 116.00 166.25 ? 4965150 > 2 ? 19.1M ? 3.75 ? 5.47 ? ? ?8521 > 3 ?226.6B ?22.73 ?31.58 ?57127000 > 4 ?886.4M ?30.80 ?74.54 ? ?226690 > 5 ?142.4B ? 3.21 ? 5.15 541804992 > 6 ?276.4M ?11.98 ?21.30 ? ?149656 > 7 55.823B ? 9.75 ?18.97 ?89369000 > > now I need to do this: > > --> convert 55.823B to 55e9 and 19.1M to 19e6 > > parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B$","e9",s))); } > data[1]<-lapply(data[1],parse.num); > > seems like awfully inefficient (two regexp substitutions), > is there a better way? > > --> iterate over stocks & data at the same time and put the results into > a hash table: > for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,]; > > I do get the right results, > but I am wondering if I am doing it "the right R way". > E.g., the hash table value is a data frame. > A structure(record?) seems more appropriate. > > thanks! > > -- > Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) > http://pmw.org.il http://ffii.org http://camera.org http://honestreporting.com > http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com > I haven't lost my mind -- it's backed up on tape somewhere. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
On Feb 15, 2011, at 5:20 PM, Sam Steingold wrote:> I am trying to get stock metadata from Yahoo finance (or maybe there > is > a better source?) > here is what I did so far: > > yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s="; > stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some > samples > socket <- > url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r"); > data <- read.csv(socket, header = FALSE); > close(socket); > data is now: > V1 V2 V3 V4 > 1 200.5B 116.00 166.25 4965150 > 2 19.1M 3.75 5.47 8521 > 3 226.6B 22.73 31.58 57127000 > 4 886.4M 30.80 74.54 226690 > 5 142.4B 3.21 5.15 541804992 > 6 276.4M 11.98 21.30 149656 > 7 55.823B 9.75 18.97 89369000 > > now I need to do this: > > --> convert 55.823B to 55e9 and 19.1M to 19e6 > > parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B > $","e9",s))); } > > > seems like awfully inefficient (two regexp substitutions), > is there a better way?I haven't come up with a better approach at least for a two substitution task, having considered using strapply from pkg gsubfn but deciding it would be just as much, if not more, code. But why are you using lapply on a single vector. Why not: data[1] <- parse.num( data[[1]] ) # as.numeric and gsub are vectorized> > --> iterate over stocks & data at the same time and put the results > into > a hash table: > for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,]; > > I do get the right results, > but I am wondering if I am doing it "the right R way". > E.g., the hash table value is a data frame. > A structure(record?) seems more appropriate.David Winsemius, MD West Hartford, CT
On Tue, Feb 15, 2011 at 5:20 PM, Sam Steingold <sds at gnu.org> wrote:> I am trying to get stock metadata from Yahoo finance (or maybe there is > a better source?) > here is what I did so far: > > yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s="; > stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some samples > socket <- url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r"); > data <- read.csv(socket, header = FALSE); > close(socket); > data is now: > ? ? ? V1 ? ? V2 ? ? V3 ? ? ? ?V4 > 1 ?200.5B 116.00 166.25 ? 4965150 > 2 ? 19.1M ? 3.75 ? 5.47 ? ? ?8521 > 3 ?226.6B ?22.73 ?31.58 ?57127000 > 4 ?886.4M ?30.80 ?74.54 ? ?226690 > 5 ?142.4B ? 3.21 ? 5.15 541804992 > 6 ?276.4M ?11.98 ?21.30 ? ?149656 > 7 55.823B ? 9.75 ?18.97 ?89369000 > > now I need to do this: > > --> convert 55.823B to 55e9 and 19.1M to 19e6 > > parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B$","e9",s))); } > data[1]<-lapply(data[1],parse.num); > > seems like awfully inefficient (two regexp substitutions), > is there a better way? > > --> iterate over stocks & data at the same time and put the results into > a hash table: > for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,]; > > I do get the right results, > but I am wondering if I am doing it "the right R way". > E.g., the hash table value is a data frame. > A structure(record?) seems more appropriate. >Check the example at the end of section 2 of the gsubfn vignette: http://cran.r-project.org/web/packages/gsubfn/vignettes/gsubfn.pdf -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com