lara harrup (IAH-P)
2009-May-20 13:16 UTC
[R] Error with regsubset in leaps package - vcov and all.best option (plus calculating VIFs for subsets)
Hi all I am hoping this is just a minor problem, I am trying to implement a best subsets regression procedure on some ecological datasets using the regsubsets function in the leaps package. The dataset contains 43 predictor variables plus the response (logcount) all in a dataframe called environment. I am implementing it as follows: library(leaps) subsets<-regsubsets(logcount~.,data=environment,nvmax=10,nbest=2,really.big=FALSE,method="exhaustive") ###the subset regression runs fine when i run it as above and i can get all the usual summaries ###The problem comes when i try and get it to output the variance convariance matric by adding the option vcov=TRUE ##When I do that i get the following: subsets<-regsubsets(logcount~.,data=environment,nvmax=10,nbest=2,really.big=FALSE,method="exhaustive",vcov=TRUE) Error in model.frame.default(data = environment, vcov = TRUE, formula = logcount~ : variable lengths differ (found for '(vcov)')> traceback()6: model.frame.default(data = environment, vcov = TRUE, formula = logcount~ .) 5: model.frame(data = environment, vcov = TRUE, formula = logcount~ .) 4: eval(expr, envir, enclos) 3: eval(mm, sys.frame(sys.parent())) 2: regsubsets.formula(logcount~ ., data = environment, nvmax = 10, really.big = TRUE, method = "exhaustive", nbest = 5, vcov = TRUE) 1: regsubsets(obs ~ ., data = environment, nvmax = 10, really.big = TRUE, method = "exhaustive", nbest = 5, vcov = TRUE) I get the same error when i try and add the all.best=TRUE option (as ideally i would like it to report the fits of all the subsets). All the predictor variables and the response are the same length (143) so not sure if I am misinterpreting the error or have misspecified the regsubsets? I was wanting to get the variance - covariance matrix as I believe I need it to calculate the Variance Inflation Factors (VIFs) for each of the models reported by regsubsets. As I want to exclude any models that exhibit multicollinerarity from later analysis, I am hoping to select say the 'best' 10 models and bootstrap them to find out more about how they perform. Or am I going about this all the wrong way? is there away to calulate vifs from regsubsets or pass it directly to something that calculates them e.g. VIF in the car package? Any help will be most appreciated, many thanks in advance Lara lara.harrup at bbsrc.ac.uk
Thomas Lumley
2009-May-20 14:28 UTC
[R] Error with regsubset in leaps package - vcov and all.best option (plus calculating VIFs for subsets)
On Wed, 20 May 2009, lara harrup (IAH-P) wrote:> > > Hi all > > > I am hoping this is just a minor problem, I am trying to implement a best subsets regression procedure on some ecological datasets using the regsubsets function in the leaps package. The dataset contains 43 predictor variables plus the response (logcount) all in a dataframe called environment. I am implementing it as follows: > > library(leaps) > > > subsets<-regsubsets(logcount~.,data=environment,nvmax=10,nbest=2,really.big=FALSE,method="exhaustive") > > ###the subset regression runs fine when i run it as above and i can get all >the usual summaries > > ###The problem comes when i try and get it to output the variance convariance >matric by adding the option vcov=TRUEYes, that would be because there is no option vcov=TRUE for regsubsets. There is a vcov= option for the coef() method, which may be what is confusing you. <snip>> I was wanting to get the variance - covariance matrix as I believe I need it >to calculate the Variance Inflation Factors (VIFs) for each of the models >reported by regsubsets. As I want to exclude any models that exhibit >multicollinerarity from later analysis, I am hoping to select say the 'best' >10 models and bootstrap them to find out more about how they perform.As in the example on the help page, once you have run regsubsets() you can use coef() and vcov() on the object it returns to get coefficient estimates and variance-covariance matrices for any of the best models. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle