Liviu Andronic
2010-May-17 15:16 UTC
[R] plm(..., model="within", effect="twoways") is very slow on unablanaced data (was: Re: Regressions with fixed-effect in R)
Hello Giovanni I made a minor modification to your function, which now allows to compute the within R-sq in Twoways Within models (see below). However I ran into an issue that I have already encountered before: whenever I try to fit Twoways Within models on my unbalanced data, the process is strangely slow and I usually terminate it either after ~15min or when my CPU hits 100C. This is similar to what I mentioned on the list some time ago [1]. [1] http://www.mail-archive.com/r-help at r-project.org/msg89421.html Unfortunately I get the same slow behaviour when using pmodel.response(x, model="within", effect="twoways") to compute teh within R-sq. Below is an example that approximates the structure of my data. ### define fun to compute the within R-sq pmodel.response<-plm:::pmodel.response.plm plmr2 <- function(x, adj=TRUE, effect="individual") { ## fetch response and residuals y <- pmodel.response(x, model="within", effect=effect) myres <- resid(x) n <- length(myres) if(adj) { adjssr <- x$df.residual adjtss <- n-1 } else { adjssr <- 1 adjtss <- 1 } ssr <- sum(myres^2)/adjssr tss <- sum(y^2)/adjtss return(1-ssr/tss) } ## prepare the data set.seed(1) n <- 2000 T <- 15 x=rnorm(T*n) x1=runif(T*n) fe=rep(rnorm(1*n),each=T) id=rep(1:(1*n),each=T) ti=rep(1:T,1*n) e=rnorm(T*n) y=x+x1+fe+e subs <- as.logical(rbinom(T*n, 1, .6)); sum(subs) data <- data.frame(y,x,x1,id,ti) dim(data) #[1] 30000 5 pdata <- plm.data(data[subs , ], index = c("id", "ti")) dim(pdata) #[1] 18018 5 paste("n=", length(unique(pdata$id))); paste("T=", length(unique(pdata$ti))); paste("N=", length((pdata$ti))) #[1] "n= 2000" #[1] "T= 15" #[1] "N= 18018" ## individual within model: no issues, 5sec affair pd_fe1 <- plm(y ~ x + x1, data = pdata, model = "within", effect="individual") summary(pd_fe1) plmr2(pd_fe1, F, "individual") ## twoways within model: never finishes in reasonable time (prepare to terminate the process) pd_fe1 <- plm(y ~ x + x1, data = pdata, model = "within", effect="twoways") summary(pd_fe1) plmr2(pd_fe1, F, "twoways") ##neither this on pd_fe1 fitted previously ## individual within with manual time effs: ~10sec pd_fe2 <- plm(y ~ x + x1 + ti, data = pdata, model = "within", effect="individual") summary(pd_fe2) plmr2(pd_fe2, F, "individual") #[1] 0.52888 ## extract within R-sq from the above fails (prepare to terminate the process) plmr2(pd_fe2, F, "twoways") I would have went ahead and computed the within R-sq manually, from pd_fe2, however the below computes the "individual" within R-sq RSS2 <- sum(residuals(pd_fe2)^2); RSS2 TSS2 <- plm:::tss.plm(pd_fe2); TSS2 1 - RSS2/TSS2 #[1] 0.52888 while the below yields zero SSR2 <- sum(fitted(pd_fe2)^2); SSR2 SSR2/TSS2 #[1] 0 because fitted(pd_fe2) #NULL I could also report that plm(..., model="within", effect="twoways") is quick when the number of individuals is kept low; re-run the same example with n <- 200. Any ideas on how to obtain the within R-sq from a Twoways Within fit on unbalanced panel data with n>2000? Thank you Liviu On 5/12/10, Millo Giovanni <Giovanni_Millo at generali.com> wrote:> Dear Liviu, > > we're still working on measures of fit for panels. If I get you right, > what you mean is the R^2 of the demeaned, or "within", regression. A > quick and dirty function to do this is: > > pmodel.response<-plm:::pmodel.response.plm # needs this to make the > method accessible > > r2<-function(x, adj=TRUE) { > ## fetch response and residuals > y <- pmodel.response(x, model="within") > myres <- resid(x) > n <- length(myres) > > if(adj) { > adjssr <- x$df.residual > adjtss <- n-1 > } else { > adjssr <- 1 > adjtss <- 1 > } > > ssr <- sum(myres^2)/adjssr > tss <- sum(y^2)/adjtss > return(1-ssr/tss) > } > > and then > > > r2(yourmodel) > > Hope this helps > Giovanni > > ------------------------------