Thanks Jim and others (and sorry Jim - an early version of this slipped
into your inbox :))
Apologies for not giving some concrete code - I was trying to explain in
words.
What I need to do is to fit a simple linear model to successive sections
of a long matrix.
So far, the best solution I have come up with uses apply twice:
Generate some data in a 100000*3 matrix:
N = 100000
Z = cbind(1:N,cumsum(rnorm(N,1,0.01)),rnorm(N,1.2,0.1)) #
where the first column is an index, the second a monotonic increasing
value representing time and the third just the measurements I want to
process.
Then write a function dVals1:
dVals1 = function(Y,DD,dT){which.min((Y[2] - dT) > DD[,2])))
which will identify the first row where the time is greater than current
time - dT.
So to identify the start of the data (say) 10 units before for each row,
we use apply and prepended this as a column to the array for later use:
ZZ = cbind(apply(Z,1,dVals1,Z,10),Z)
There may be some cases, particularly at the start, where later values are
extracted because the minimum returned by which.min is 1.
I now have start and finish pointers for each position so can proceed to
fit a simple linear model with the following function:
dVals2=function(D2,DD){
if((D2[2]-D2[1])<10){return(rep(0,2))} # reject short examples
DX=DD[D2[1]:D2[2],]
Res=as.vector(lm(DX[,3]~DX[,2])$coefficients)
return(Res)
}
which returns 2 0's either if there are fewer than 10 values, otherwise it
returns the intercept and slope calculated over the specified range.
Applying this to the whole data by:
t(apply(ZZ,1,dVals2,DD=ZZ))
does the job I think returning the results as an N * 2 matrix.
> Hi John,
> With due respect to the other respondents, here is something that might
help:>
> # get a vector of values
> foo<-rnorm(100)
> # get a vector of increasing indices (aka your "recent" values)
> bar<-sort(sample(1:100,40))
> # write a function to "clump" the adjacent index values
> clump_adj_int<-function(x) {
> index_list<-list(x[1])
> list_index<-1
> for(i in 2:length(x)) {
> if(x[i]==x[i-1]+1)
> index_list[[list_index]]<-c(index_list[[list_index]],x[i])
> else {
> list_index<-list_index+1
> index_list[[list_index]]<-x[i]
> }
> }
> return(index_list)
> }
> index_clumps<-clump_adj_int(bar)
> # write another function to sum the values
> sum_subsets<-function(indices,vector)
> return(sum(vector[indices],na.rm=TRUE))
> # now "apply" the function to the list of indices
> lapply(index_clumps,sum_subsets,foo)
>
> Jim
>
>
> On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon
> <j.logsdon at quantex-research.com> wrote:
>> Folks
>>
>> Is there any way to get the row index into apply as a variable?
>>
>> I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.>>
>> apply(X,1,function {do something}, other arguments)
>>
>> seems to be the way to do it.
>>
>> The subset I want is the most recent set of measurements only - perhaps
a
>> couple of hundred out of millions - but I can't see how to index
each
value. The ultimate output should be a matrix of results the length of
the input vector. But to do the sum I need to access the current row
number.>>
>> It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?>>
>> Or does apply etc only operate on each row at a time, independently of
other rows?>>
>>
>> Best wishes
>>
>> John
>>
>> John Logsdon
>> Quantex Research Ltd
>> +44 161 445 4951/+44 7717758675
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
Best wishes
John
John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675
Best wishes
John
John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675