Folks Is there any way to get the row index into apply as a variable? I want a function to do some sums on a small subset of some very long vectors, rolling through the whole vectors. apply(X,1,function {do something}, other arguments) seems to be the way to do it. The subset I want is the most recent set of measurements only - perhaps a couple of hundred out of millions - but I can't see how to index each value. The ultimate output should be a matrix of results the length of the input vector. But to do the sum I need to access the current row number. It is easy in a loop but that will take ages. Is there any vectorised apply-like solution to this? Or does apply etc only operate on each row at a time, independently of other rows? Best wishes John John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675
>It is easy in a loop but that will take ages. Is there any vectorised >apply-like solution to this?If you showed the loop that takes ages, along with small inputs for it (and an indication of how to expand those small inputs to big ones), someone might be able to show you some code that does the same thing in less time. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jun 8, 2016 at 9:41 AM, John Logsdon <j.logsdon at quantex-research.com> wrote:> Folks > > Is there any way to get the row index into apply as a variable? > > I want a function to do some sums on a small subset of some very long > vectors, rolling through the whole vectors. > > apply(X,1,function {do something}, other arguments) > > seems to be the way to do it. > > The subset I want is the most recent set of measurements only - perhaps a > couple of hundred out of millions - but I can't see how to index each > value. The ultimate output should be a matrix of results the length of > the input vector. But to do the sum I need to access the current row > number. > > It is easy in a loop but that will take ages. Is there any vectorised > apply-like solution to this? > > Or does apply etc only operate on each row at a time, independently of > other rows? > > > Best wishes > > John > > John Logsdon > Quantex Research Ltd > +44 161 445 4951/+44 7717758675 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
John: 1. Please read and follow the posting guide. In particular, provide a small reproducible example so that we know what your data and looping code look like. 2. apply-type commands are *not* vectorized; they are disguised loops that may or may not offer any speedup over explicit loops. 3. A guess at a possible strategy is to convert character date-time data to POSIXct dates using as.POSITct and then just choose those rows with the maximum value . e.g. x[x==max(x)] These operations *are* vectorized. However, this guess might be completely useless with your unspecified data, so beware. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Jun 8, 2016 at 9:41 AM, John Logsdon <j.logsdon at quantex-research.com> wrote:> Folks > > Is there any way to get the row index into apply as a variable? > > I want a function to do some sums on a small subset of some very long > vectors, rolling through the whole vectors. > > apply(X,1,function {do something}, other arguments) > > seems to be the way to do it. > > The subset I want is the most recent set of measurements only - perhaps a > couple of hundred out of millions - but I can't see how to index each > value. The ultimate output should be a matrix of results the length of > the input vector. But to do the sum I need to access the current row > number. > > It is easy in a loop but that will take ages. Is there any vectorised > apply-like solution to this? > > Or does apply etc only operate on each row at a time, independently of > other rows? > > > Best wishes > > John > > John Logsdon > Quantex Research Ltd > +44 161 445 4951/+44 7717758675 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hopefully Bert and William won't be offended if I more or less summarize: Are you assuming a loop will take ages, or have you actually tested it? I wouldn't assume a loop will take ages, or that it will take much longer than apply(). What's wrong with apply( X[ {logical expression } , ] , 1, function {do something} ) ? Where the logical expression identifies (by row index or any other method) which rows you need to work on. I would expect it to be faster to subset the rows first, rather than test for inclusion at every iteration within a loop. Also, if the data is acquired in such a way that you can know that the most recent set of measurements is the last n rows, then tail(X,n) might be good. For example,> foo <- matrix(1:20, ncol=2) > foo[,1] [,2] [1,] 1 11 [2,] 2 12 [3,] 3 13 [4,] 4 14 [5,] 5 15 [6,] 6 16 [7,] 7 17 [8,] 8 18 [9,] 9 19 [10,] 10 20> tail(foo,4)[,1] [,2] [7,] 7 17 [8,] 8 18 [9,] 9 19 [10,] 10 20 -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 6/8/16, 9:41 AM, "R-help on behalf of John Logsdon" <r-help-bounces at r-project.org on behalf of j.logsdon at quantex-research.com> wrote:>Folks > >Is there any way to get the row index into apply as a variable? > >I want a function to do some sums on a small subset of some very long >vectors, rolling through the whole vectors. > >apply(X,1,function {do something}, other arguments) > >seems to be the way to do it. > >The subset I want is the most recent set of measurements only - perhaps a >couple of hundred out of millions - but I can't see how to index each >value. The ultimate output should be a matrix of results the length of >the input vector. But to do the sum I need to access the current row >number. > >It is easy in a loop but that will take ages. Is there any vectorised >apply-like solution to this? > >Or does apply etc only operate on each row at a time, independently of >other rows? > > >Best wishes > >John > >John Logsdon >Quantex Research Ltd >+44 161 445 4951/+44 7717758675 > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi John, With due respect to the other respondents, here is something that might help: # get a vector of values foo<-rnorm(100) # get a vector of increasing indices (aka your "recent" values) bar<-sort(sample(1:100,40)) # write a function to "clump" the adjacent index values clump_adj_int<-function(x) { index_list<-list(x[1]) list_index<-1 for(i in 2:length(x)) { if(x[i]==x[i-1]+1) index_list[[list_index]]<-c(index_list[[list_index]],x[i]) else { list_index<-list_index+1 index_list[[list_index]]<-x[i] } } return(index_list) } index_clumps<-clump_adj_int(bar) # write another function to sum the values sum_subsets<-function(indices,vector) return(sum(vector[indices],na.rm=TRUE)) # now "apply" the function to the list of indices lapply(index_clumps,sum_subsets,foo) Jim On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon <j.logsdon at quantex-research.com> wrote:> Folks > > Is there any way to get the row index into apply as a variable? > > I want a function to do some sums on a small subset of some very long > vectors, rolling through the whole vectors. > > apply(X,1,function {do something}, other arguments) > > seems to be the way to do it. > > The subset I want is the most recent set of measurements only - perhaps a > couple of hundred out of millions - but I can't see how to index each > value. The ultimate output should be a matrix of results the length of > the input vector. But to do the sum I need to access the current row > number. > > It is easy in a loop but that will take ages. Is there any vectorised > apply-like solution to this? > > Or does apply etc only operate on each row at a time, independently of > other rows? > > > Best wishes > > John > > John Logsdon > Quantex Research Ltd > +44 161 445 4951/+44 7717758675 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.