For sequential analysis of sequences of events, I want to calculate a
series of lagged
versions of a (numeric or character) variable. The simple function
below does this,
but I can't see how to generalize this to the case where there is also a
factor variable
and I want to calculate lags separately for each level of the factor
(by). Can anyone help?
# produce k lagged versions of a numeric or character variable
lags <- function(x, k=1, prefix='lag', by) {
if(missing(by)) {
n <- length(x)
res <- data.frame(lag0=x)
for (i in 1:k) {
res <- cbind(res, c(rep(NA, i), x[1:(n-i)]))
}
colnames(res) <- paste0(prefix, 0:k)
return(res)
}
else {
stop('by not yet implemented')
}
}
# tests
> events <- sample(letters[1:4], 10, replace=TRUE)
> lags(events)
lag0 lag1
1 c <NA>
2 a c
3 b a
4 d b
5 d d
6 c d
7 d c
8 c d
9 c c
10 d c
> lags(events, 3)
lag0 lag1 lag2 lag3
1 c <NA> <NA> <NA>
2 a c <NA> <NA>
3 b a c <NA>
4 d b a c
5 d d b a
6 c d d b
7 d c d d
8 c d c d
9 c c d c
10 d c c d
>
# similar, with by=sub variable
> events2 <- data.frame(sub=rep(1:2, each=5),
+ event=sample(letters[1:4], 10, replace=TRUE),
+ stringsAsFactors=FALSE)
> events2
sub event
1 1 b
2 1 d
3 1 d
4 1 c
5 1 b
6 2 b
7 2 b
8 2 b
9 2 d
10 2 a
> # do it separately for each sub ...
> (lg <- lapply(split(events2$event, events2$sub), lags, 2))
$`1`
lag0 lag1 lag2
1 b <NA> <NA>
2 d b <NA>
3 d d b
4 c d d
5 b c d
$`2`
lag0 lag1 lag2
1 b <NA> <NA>
2 b b <NA>
3 b b b
4 d b b
5 a d b
This gives sort of what I want, but I need to have the 'sub' variable
explicit in the result
> do.call(rbind, lg)
lag0 lag1 lag2
1.1 b <NA> <NA>
1.2 d b <NA>
1.3 d d b
1.4 c d d
1.5 b c d
2.1 b <NA> <NA>
2.2 b b <NA>
2.3 b b b
2.4 d b b
2.5 a d b
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA