thr3ads.net - R help - [R] Aggragating subsets of data in larger vector with sapply [Jan 2011]

If this information is useful, please help other people find it:
Share via:

rivercode

2011-Jan-10 00:10 UTC

[R] Aggragating subsets of data in larger vector with sapply

Have 40,000 rows of buy/sell trade data and am trying to add up the buys for
each second, the code works but it is very slow.  Any suggestions how to
improve the sapply function ?

secEP = endpoints(xSym$Direction, "secs")  # vector of last second on
an XTS
timeseries object with multiple entries for each second.
d = xSym$Direction
s = xSym$Size
buySize = sapply(1:(length(secEP)-1), function(y) { 
	i =  (secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP
	return(sum(as.numeric(s[i][d[i] == "buy"])));
} )	

Object details:

secEP = numeric Vector of one second Endpoints in xSym$Direction. 
> head(xSym$Direction)                    Direction
2011-01-05 09:30:00 "unkn"   
2011-01-05 09:30:02 "sell"   
2011-01-05 09:30:02 "buy"    
2011-01-05 09:30:04 "buy"    
2011-01-05 09:30:04 "buy"    
2011-01-05 09:30:04 "buy" 
> head(xSym$Size)                    Size  
2011-01-05 09:30:00 " 865"
2011-01-05 09:30:02 " 100"
2011-01-05 09:30:02 " 100"
2011-01-05 09:30:04 " 100"
2011-01-05 09:30:04 " 100"
2011-01-05 09:30:04 "  41"

Thanks,
Chris


-- 
View this message in context:
http://r.789695.n4.nabble.com/Aggragating-subsets-of-data-in-larger-vector-with-sapply-tp3206445p3206445.html
Sent from the R help mailing list archive at Nabble.com.

Jim Holtman

2011-Jan-10 01:50 UTC

head link

[R] Aggragating subsets of data in larger vector with sapply

split the data by truncating the time to a second, then process each group. this
will save the subsetting you are doing. also merge the data with direction and
size in the same frame.  it looks like you can subset by "buy" to
begin with.

Sent from my iPad

On Jan 9, 2011, at 19:10, rivercode <aquanyc at gmail.com> wrote:
> 
> 
> Have 40,000 rows of buy/sell trade data and am trying to add up the buys
for
> each second, the code works but it is very slow.  Any suggestions how to
> improve the sapply function ?
> 
> secEP = endpoints(xSym$Direction, "secs")  # vector of last
second on an XTS
> timeseries object with multiple entries for each second.
> d = xSym$Direction
> s = xSym$Size
> buySize = sapply(1:(length(secEP)-1), function(y) { 
>    i =  (secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP
>    return(sum(as.numeric(s[i][d[i] == "buy"])));
> } )    
> 
> Object details:
> 
> secEP = numeric Vector of one second Endpoints in xSym$Direction. 
> 
>> head(xSym$Direction)
>                    Direction
> 2011-01-05 09:30:00 "unkn"   
> 2011-01-05 09:30:02 "sell"   
> 2011-01-05 09:30:02 "buy"    
> 2011-01-05 09:30:04 "buy"    
> 2011-01-05 09:30:04 "buy"    
> 2011-01-05 09:30:04 "buy" 
> 
>> head(xSym$Size)
>                    Size  
> 2011-01-05 09:30:00 " 865"
> 2011-01-05 09:30:02 " 100"
> 2011-01-05 09:30:02 " 100"
> 2011-01-05 09:30:04 " 100"
> 2011-01-05 09:30:04 " 100"
> 2011-01-05 09:30:04 "  41"
> 
> Thanks,
> Chris
> 
> 
> -- 
> View this message in context:
http://r.789695.n4.nabble.com/Aggragating-subsets-of-data-in-larger-vector-with-sapply-tp3206445p3206445.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joshua Ulrich

2011-Jan-12 04:35 UTC

head link

[R] Aggragating subsets of data in larger vector with sapply

Hi Chris,

This seems to work on the sample data you provided.

FUN <- function(x) {
  x <- xts(as.numeric(x),index(x))
  period.apply(x, endpoints(x,"secs"), sum)
}
lapply(split.default(xSym$Size,xSym$Direction), FUN)

Best,
--
Joshua Ulrich ?| ?FOSS Trading: www.fosstrading.com



On Sun, Jan 9, 2011 at 6:10 PM, rivercode <aquanyc at gmail.com>
wrote:>
>
> Have 40,000 rows of buy/sell trade data and am trying to add up the buys
for
> each second, the code works but it is very slow. ?Any suggestions how to
> improve the sapply function ?
>
> secEP = endpoints(xSym$Direction, "secs") ?# vector of last
second on an XTS
> timeseries object with multiple entries for each second.
> d = xSym$Direction
> s = xSym$Size
> buySize = sapply(1:(length(secEP)-1), function(y) {
> ? ? ? ?i = ?(secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP
> ? ? ? ?return(sum(as.numeric(s[i][d[i] == "buy"])));
> } )
>
> Object details:
>
> secEP = numeric Vector of one second Endpoints in xSym$Direction.
>
>> head(xSym$Direction)
> ? ? ? ? ? ? ? ? ? ?Direction
> 2011-01-05 09:30:00 "unkn"
> 2011-01-05 09:30:02 "sell"
> 2011-01-05 09:30:02 "buy"
> 2011-01-05 09:30:04 "buy"
> 2011-01-05 09:30:04 "buy"
> 2011-01-05 09:30:04 "buy"
>
>> head(xSym$Size)
> ? ? ? ? ? ? ? ? ? ?Size
> 2011-01-05 09:30:00 " 865"
> 2011-01-05 09:30:02 " 100"
> 2011-01-05 09:30:02 " 100"
> 2011-01-05 09:30:04 " 100"
> 2011-01-05 09:30:04 " 100"
> 2011-01-05 09:30:04 " ?41"
>
> Thanks,
> Chris
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Aggragating-subsets-of-data-in-larger-vector-with-sapply-tp3206445p3206445.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jan 2011 - Aggragating subsets of data in larger vector with sapply

[R] Aggragating subsets of data in larger vector with sapply

[R] Aggragating subsets of data in larger vector with sapply

[R] Aggragating subsets of data in larger vector with sapply

Possibly Parallel Threads