I don't know if this is any faster, but it has no loop. There are
improvements that can be made if speed is too slow. Try it on your
data:
> x <-
data.frame(id=c("001","001","001","001","002","002","002","002","002"),
+ year=c(2000,2001,2002,2003,1996,1997,1998,1999,2000),
+ variable=c(0,0,1,0,0,0,1,0,0))> # will assume that the year is contiguous; exercise to reader if not
> # partition by 'id', find where 'variable' is 1 and set
next five year.
> x.new <- lapply(split(x, x$id), function(person){
+ change <- which(person$variable == 1)
+ mark <- unique(unlist(lapply(change, seq, length=5)))
+ # make sure less than length
+ mark <- mark[mark <= nrow(person)]
+ person$v2 <- 0 # initialize to zero
+ person$v2[mark] <- 1 # set to 1 on changes + 5 years
+ person # return new data
+ })> do.call('rbind', x.new)
id year variable v2
001.1 001 2000 0 0
001.2 001 2001 0 0
001.3 001 2002 1 1
001.4 001 2003 0 1
002.5 002 1996 0 0
002.6 002 1997 0 0
002.7 002 1998 1 1
002.8 002 1999 0 1
002.9 002 2000 0 1
On 10/16/07, Julien Barnier <jbarnier at ens-lsh.fr>
wrote:> Hi all,
>
> I currently work on a survey which contains biographical data stored
> in a chronological way, ie something like :
>
> id year variable
> 001 2000 0
> 001 2001 0
> 001 2002 1
> 001 2003 0
> 002 1996 0
> 002 1997 0
> 002 1998 1
> 002 1999 0
> 002 2000 0
>
> where id is a person identifier, year the year of observation and
> variable the variable value at given year. In this case, the variable
> says if a particular event happened during the given year or not.
>
> What I want to do is generate a new variable which would say if the
> event happened at least one time during the five years preceding the
> current one. So if I call this new variable v2, I'd like to obtain :
>
> id year variable v2
> 001 2000 0 0
> 001 2001 0 0
> 001 2002 1 1
> 001 2003 0 1
> 002 1996 0 0
> 002 1997 0 0
> 002 1998 1 1
> 002 1999 0 1
> 002 2000 0 1
>
> Currently I manage to achieve this with two nested for loops, but it
> is *very* slow and inefficient. So I wondered if there is a better way
> to do this.
>
> Thanks in advance for any help.
>
> PS : here is the code to reproduce the first sample data :
>
>
data.frame(id=c("001","001","001","001","002","002","002","002","002"),
> year=c(2000,2001,2002,2003,1996,1997,1998,1999,2000),
> variable=c(0,0,1,0,0,0,1,0,0))
>
> --
> Julien Barnier
> Groupe de recherche sur la socialisation
> ENS-LSH - Lyon, France
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?