I have a dataframe with many firm-year observations and many variables. Not all firms have information for all the years. I want another dataframe with only those firms that have information all years. This is, I want a balanced panel data, but with the maximum number of years. In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to 2004). I need your help to create a code for this. Thank you very much, Cecília Carmo (Universidade de Aveiro) #My reproducible example: firm<-sort(rep(1:3,5),decreasing=F) year<-rep(2000:2004,3) X<-rnorm(15) data1<-data.frame(firm,year,X) data1 firm<-sort(rep(4:6,3),decreasing=F) year<-rep(2001:2003,3) X<-rnorm(9) data2<-data.frame(firm,year,X) data2 finaldata<-rbind(data1,data2) finaldata [[alternative HTML version deleted]]
# If you know how many years are needed you could do this makenewtable <- function(x, years) { xlist <- split(x, x$firm) new <- list() dat <- lapply(xlist, function(z) if(length(unique(z$year)) == years) {new <- z} ) dat_ <- do.call(rbind, dat) return(dat_) } makenewtable(finaldata, 5) Scott On Thursday, May 19, 2011 at 6:24 AM, Cecilia Carmo wrote: I have a dataframe with many firm-year observations and many variables.> > Not all firms have information for all the years. > > I want another dataframe with only those firms that have information all > years. > > This is, I want a balanced panel data, but with the maximum number of years. > > In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to > 2004). > > > > I need your help to create a code for this. > > > > Thank you very much, > > > > Cecília Carmo > > (Universidade de Aveiro) > > > > > > #My reproducible example: > > firm<-sort(rep(1:3,5),decreasing=F) > > year<-rep(2000:2004,3) > > X<-rnorm(15) > > data1<-data.frame(firm,year,X) > > data1 > > > > firm<-sort(rep(4:6,3),decreasing=F) > > year<-rep(2001:2003,3) > > X<-rnorm(9) > > data2<-data.frame(firm,year,X) > > data2 > > > > finaldata<-rbind(data1,data2) > > finaldata > > > [[alternative HTML version deleted]] > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
It works! Thank you. Cecília De: Scott Chamberlain [mailto:scttchamberlain4@gmail.com] Enviada: quinta-feira, 19 de Maio de 2011 13:40 Para: Cecilia Carmo Cc: r-help@r-project.org Assunto: Re: [R] balanced panel data # If you know how many years are needed you could do this makenewtable <- function(x, years) { xlist <- split(x, x$firm) new <- list() dat <- lapply(xlist, function(z) if(length(unique(z$year)) == years) {new <- z} ) dat_ <- do.call(rbind, dat) return(dat_) } makenewtable(finaldata, 5) Scott On Thursday, May 19, 2011 at 6:24 AM, Cecilia Carmo wrote: I have a dataframe with many firm-year observations and many variables. Not all firms have information for all the years. I want another dataframe with only those firms that have information all years. This is, I want a balanced panel data, but with the maximum number of years. In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to 2004). I need your help to create a code for this. Thank you very much, Cecília Carmo (Universidade de Aveiro) #My reproducible example: firm<-sort(rep(1:3,5),decreasing=F) year<-rep(2000:2004,3) X<-rnorm(15) data1<-data.frame(firm,year,X) data1 firm<-sort(rep(4:6,3),decreasing=F) year<-rep(2001:2003,3) X<-rnorm(9) data2<-data.frame(firm,year,X) data2 finaldata<-rbind(data1,data2) finaldata [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]