Hi i need to create a model from 250 + variables with high collinearity, and only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC, and/or BIC to narrow down the number of variables, and then use VIF to choose a model without collinearity (if possible). I realize that having a huge p and small n is going to give me extreme linear dependency problems, but I *think* these model selection criteria should still be useful? I have currently been running regsubsets for over a week with no results. I have no idea if R is still working, or if the computer is hung. I ran regsubsets on a smaller portion of the data, also with linear dependency problems, and got results. However, the hourglass continues its endless spiraling with the full dataset. I am running the following on Windows 7 library(leaps) m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE) (NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.) Does anyone have any opinions on: 1) is R likely to still be running, even after a week, or should i just shut it down? 2) am i doing something wrong with regsubsets? 3) is there a better option than regsubsets, that will still allow me to narrow down parameters so i have explanatory power (ie i could develop a model using PLS, and keep all the variables, but also keep all the collinearity issues, and have good prediction but not explanatory power) 4) any other ideas? I am pretty new to R, so any newbie detail would be much appreciated! thanks in advance for any help! -- View this message in context: http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083.html Sent from the R help mailing list archive at Nabble.com.
Hi. I would take a look to the /forward.se/l function in the /packfor/ package. http://r-forge.r-project.org/R/?group_id=195 Good luck, Phil -- View this message in context: http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083p4632102.html Sent from the R help mailing list archive at Nabble.com.
thanks Phil, I will give it a try! -Kim -- View this message in context: http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083p4632105.html Sent from the R help mailing list archive at Nabble.com.
Frank -- where are you?! (To the OP: Your post leaves me simply breathless. You are embarked on a fool's errand. Filoche's "help" will continue you down that path. IMHO only of course. Bottom line: You CANNOT do what you wish to do. Or to quote John Tukey: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. " ) -- Bert ---------- Forwarded message ---------- From: farmedgirl <ksteinmann at cdpr.ca.gov> Date: Fri, Jun 1, 2012 at 8:19 AM Subject: [R] regsubsets (Leaps) To: r-help at r-project.org Hi i need to create a model from 250 + variables with high collinearity, and only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC, and/or BIC to narrow down the number of variables, and then use VIF to choose a model without collinearity (if possible). ?I realize that having a huge p and small n is going to give me extreme linear dependency problems, but I *think* these model selection criteria should still be useful? I have currently been running regsubsets for over a week with no results. I have no idea if R is still working, or if the computer is hung. I ran regsubsets on a smaller portion of the data, also with linear dependency problems, and got results. However, the hourglass continues its endless spiraling with the full dataset. I am running the following on Windows 7 library(leaps) m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE) (NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.) Does anyone have any opinions on: 1) is R likely to still be running, even after a week, or should i just shut it down? 2) am i doing something wrong with regsubsets? 3) is there a better option than regsubsets, that will still allow me to narrow down parameters so i have explanatory power (ie i could develop a model using PLS, and keep all the variables, but also keep all the collinearity issues, and have good prediction but not explanatory power) 4) any other ideas? I am pretty new to R, so any newbie detail would be much appreciated! thanks in advance for any help! -- View this message in context: http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
On Sat, Jun 2, 2012 at 3:19 AM, farmedgirl <ksteinmann at cdpr.ca.gov> wrote:> Hi > i need to create a model from 250 + variables with high collinearity, and > only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC, > and/or BIC to narrow down the number of variables, and then use VIF to > choose a model without collinearity (if possible). ?I realize that having a > huge p and small n is going to give me extreme linear dependency problems, > but I *think* these model selection criteria should still be useful? > > I have currently been running regsubsets for over a week with no results. I > have no idea if R is still working, or if the computer is hung. I ran > regsubsets on a smaller portion of the data, also with linear dependency > problems, and got results. However, the hourglass continues its endless > spiraling with the full dataset. > > I am running the following on Windows 7 > library(leaps) > m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE) > > (NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.) > > Does anyone have any opinions on: > 1) is R likely to still be running, even after a week, or should i just shut > it down?It's likely to be running for years. 2^250 is a large number, even with the branch-and-bound algorithm to cut it down.> 2) am i doing something wrong with regsubsets?Yes. At the very least, set nvmax to something reasonable. You certainly don't want to find a model with 243 variables, so don't waste time looking for one.> > 3) is there a better option than regsubsets,Almost certainly. regsubsets() is pretty much useless as a way of selecting a single model, unless perhaps when p is very small. It was produced as a way of viewing a large collection of best models, as in the example for the plot() method, by setting nbest fairly large -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland