thr3ads.net - R help - [R] string parsing [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Sam Steingold

2011-Feb-15 22:20 UTC

[R] string parsing

I am trying to get stock metadata from Yahoo finance (or maybe there is
a better source?)
here is what I did so far:

yahoo.url <-
"http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
stocks <-
c("IBM","NOIZ","MSFT","LNN","C","BODY","F");
# just some samples
socket <-
url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
data <- read.csv(socket, header = FALSE);
close(socket);
data is now:
       V1     V2     V3        V4
1  200.5B 116.00 166.25   4965150
2   19.1M   3.75   5.47      8521
3  226.6B  22.73  31.58  57127000
4  886.4M  30.80  74.54    226690
5  142.4B   3.21   5.15 541804992
6  276.4M  11.98  21.30    149656
7 55.823B   9.75  18.97  89369000

now I need to do this:

--> convert 55.823B to 55e9 and 19.1M to 19e6

parse.num <- function (s) {
as.numeric(gsub("M$","e6",gsub("B$","e9",s)));
}
data[1]<-lapply(data[1],parse.num);

seems like awfully inefficient (two regexp substitutions),
is there a better way?

--> iterate over stocks & data at the same time and put the results into
a hash table:
for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];

I do get the right results,
but I am wondering if I am doing it "the right R way".
E.g., the hash table value is a data frame.
A structure(record?) seems more appropriate.

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final)
http://pmw.org.il http://ffii.org http://camera.org http://honestreporting.com
http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com
I haven't lost my mind -- it's backed up on tape somewhere.

Mike Marchywka

2011-Feb-16 02:00 UTC

head link

[R] string parsing

----------------------------------------> To: r-help at stat.math.ethz.ch
> From: sds at gnu.org
> Date: Tue, 15 Feb 2011 17:20:11 -0500
> Subject: [R] string parsing
>
> I am trying to get stock metadata from Yahoo finance (or maybe there is
> a better source?)
search this for "yahoo",

http://cran.r-project.org/web/packages/quantmod/quantmod.pdf

as a perennial page scraper, I was amazed this existed :)

> here is what I did so far:
>
> yahoo.url <-
"http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
> stocks <-
c("IBM","NOIZ","MSFT","LNN","C","BODY","F");
# just some samples
> socket <-
url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
> data <- read.csv(socket, header = FALSE);
> close(socket);
> data is now:
> V1 V2 V3 V4
> 1 200.5B 116.00 166.25 4965150
> 2 19.1M 3.75 5.47 8521
> 3 226.6B 22.73 31.58 57127000
> 4 886.4M 30.80 74.54 226690
> 5 142.4B 3.21 5.15 541804992
> 6 276.4M 11.98 21.30 149656
> 7 55.823B 9.75 18.97 89369000
>
> now I need to do this:
>
> --> convert 55.823B to 55e9 and 19.1M to 19e6
>
> parse.num <- function (s) {
as.numeric(gsub("M$","e6",gsub("B$","e9",s)));
}
> data[1]<-lapply(data[1],parse.num);
>
> seems like awfully inefficient (two regexp substitutions),
> is there a better way?
>
> --> iterate over stocks & data at the same time and put the results
into
> a hash table:
> for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
>
> I do get the right results,
> but I am wondering if I am doing it "the right R way".
> E.g., the hash table value is a data frame.
> A structure(record?) seems more appropriate.
>
> thanks!
>
> --
> Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final)
> http://pmw.org.il http://ffii.org http://camera.org
http://honestreporting.com
> http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com
> I haven't lost my mind -- it's backed up on tape somewhere.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2011-Feb-16 17:49 UTC

head link

[R] string parsing

try this:
> x <- c('15.5B', '13.6M')
> x <- sub("B", 'e9', x)
> x <- sub("M", 'e6', x)
> as.numeric(x)[1] 15500000000    13600000


On Tue, Feb 15, 2011 at 5:20 PM, Sam Steingold <sds at gnu.org>
wrote:> I am trying to get stock metadata from Yahoo finance (or maybe there is
> a better source?)
> here is what I did so far:
>
> yahoo.url <-
"http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
> stocks <-
c("IBM","NOIZ","MSFT","LNN","C","BODY","F");
# just some samples
> socket <-
url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
> data <- read.csv(socket, header = FALSE);
> close(socket);
> data is now:
> ? ? ? V1 ? ? V2 ? ? V3 ? ? ? ?V4
> 1 ?200.5B 116.00 166.25 ? 4965150
> 2 ? 19.1M ? 3.75 ? 5.47 ? ? ?8521
> 3 ?226.6B ?22.73 ?31.58 ?57127000
> 4 ?886.4M ?30.80 ?74.54 ? ?226690
> 5 ?142.4B ? 3.21 ? 5.15 541804992
> 6 ?276.4M ?11.98 ?21.30 ? ?149656
> 7 55.823B ? 9.75 ?18.97 ?89369000
>
> now I need to do this:
>
> --> convert 55.823B to 55e9 and 19.1M to 19e6
>
> parse.num <- function (s) {
as.numeric(gsub("M$","e6",gsub("B$","e9",s)));
}
> data[1]<-lapply(data[1],parse.num);
>
> seems like awfully inefficient (two regexp substitutions),
> is there a better way?
>
> --> iterate over stocks & data at the same time and put the results
into
> a hash table:
> for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
>
> I do get the right results,
> but I am wondering if I am doing it "the right R way".
> E.g., the hash table value is a data frame.
> A structure(record?) seems more appropriate.
>
> thanks!
>
> --
> Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final)
> http://pmw.org.il http://ffii.org http://camera.org
http://honestreporting.com
> http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com
> I haven't lost my mind -- it's backed up on tape somewhere.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

David Winsemius

2011-Feb-16 18:33 UTC

head link

[R] string parsing

On Feb 15, 2011, at 5:20 PM, Sam Steingold wrote:
> I am trying to get stock metadata from Yahoo finance (or maybe there  
> is
> a better source?)
> here is what I did so far:
>
> yahoo.url <-
"http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
> stocks <-
c("IBM","NOIZ","MSFT","LNN","C","BODY","F");
# just some
> samples
> socket <-  
>
url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
> data <- read.csv(socket, header = FALSE);
> close(socket);
> data is now:
>       V1     V2     V3        V4
> 1  200.5B 116.00 166.25   4965150
> 2   19.1M   3.75   5.47      8521
> 3  226.6B  22.73  31.58  57127000
> 4  886.4M  30.80  74.54    226690
> 5  142.4B   3.21   5.15 541804992
> 6  276.4M  11.98  21.30    149656
> 7 55.823B   9.75  18.97  89369000
>
> now I need to do this:
>
> --> convert 55.823B to 55e9 and 19.1M to 19e6
>
> parse.num <- function (s) {
as.numeric(gsub("M$","e6",gsub("B
> $","e9",s))); }
>
>
> seems like awfully inefficient (two regexp substitutions),
> is there a better way?
I haven't come up with a better approach at least for a two  
substitution task, having considered using strapply from pkg gsubfn  
but deciding it would be just as much, if not more, code. But why are  
you using lapply on a single vector. Why not:

data[1] <- parse.num( data[[1]] )  # as.numeric and gsub are vectorized
>
> --> iterate over stocks & data at the same time and put the results
> into
> a hash table:
> for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
>
> I do get the right results,
> but I am wondering if I am doing it "the right R way".
> E.g., the hash table value is a data frame.
> A structure(record?) seems more appropriate.

David Winsemius, MD
West Hartford, CT

Gabor Grothendieck

2011-Feb-16 20:25 UTC

head link

[R] string parsing

On Tue, Feb 15, 2011 at 5:20 PM, Sam Steingold <sds at gnu.org>
wrote:> I am trying to get stock metadata from Yahoo finance (or maybe there is
> a better source?)
> here is what I did so far:
>
> yahoo.url <-
"http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
> stocks <-
c("IBM","NOIZ","MSFT","LNN","C","BODY","F");
# just some samples
> socket <-
url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
> data <- read.csv(socket, header = FALSE);
> close(socket);
> data is now:
> ? ? ? V1 ? ? V2 ? ? V3 ? ? ? ?V4
> 1 ?200.5B 116.00 166.25 ? 4965150
> 2 ? 19.1M ? 3.75 ? 5.47 ? ? ?8521
> 3 ?226.6B ?22.73 ?31.58 ?57127000
> 4 ?886.4M ?30.80 ?74.54 ? ?226690
> 5 ?142.4B ? 3.21 ? 5.15 541804992
> 6 ?276.4M ?11.98 ?21.30 ? ?149656
> 7 55.823B ? 9.75 ?18.97 ?89369000
>
> now I need to do this:
>
> --> convert 55.823B to 55e9 and 19.1M to 19e6
>
> parse.num <- function (s) {
as.numeric(gsub("M$","e6",gsub("B$","e9",s)));
}
> data[1]<-lapply(data[1],parse.num);
>
> seems like awfully inefficient (two regexp substitutions),
> is there a better way?
>
> --> iterate over stocks & data at the same time and put the results
into
> a hash table:
> for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
>
> I do get the right results,
> but I am wondering if I am doing it "the right R way".
> E.g., the hash table value is a data frame.
> A structure(record?) seems more appropriate.
>
Check the example at the end of section 2 of the gsubfn vignette:

http://cran.r-project.org/web/packages/gsubfn/vignettes/gsubfn.pdf


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Maybe Matching Threads

Search for more maybe matching threads

R help - Feb 2011 - string parsing

[R] string parsing

[R] string parsing

[R] string parsing

[R] string parsing

[R] string parsing

Maybe Matching Threads