thr3ads.net - R help - [R] regsubsets (Leaps) [Jun 2012]

If this information is useful, please help other people find it:
Share via:

farmedgirl

2012-Jun-01 15:19 UTC

[R] regsubsets (Leaps)

Hi
i need to create a model from 250 + variables with high collinearity, and
only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC,
and/or BIC to narrow down the number of variables, and then use VIF to
choose a model without collinearity (if possible).  I realize that having a
huge p and small n is going to give me extreme linear dependency problems, 
but I *think* these model selection criteria should still be useful?

I have currently been running regsubsets for over a week with no results. I
have no idea if R is still working, or if the computer is hung. I ran
regsubsets on a smaller portion of the data, also with linear dependency
problems, and got results. However, the hourglass continues its endless
spiraling with the full dataset.

I am running the following on Windows 7
library(leaps)
m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE)

(NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.)

Does anyone have any opinions on:
1) is R likely to still be running, even after a week, or should i just shut
it down? 

2) am i doing something wrong with regsubsets?

3) is there a better option than regsubsets, that will still allow me to
narrow down parameters so i have explanatory power (ie i could develop a
model using PLS, and keep all the variables, but also keep all the
collinearity issues, and have good prediction but not explanatory power)

4) any other ideas?

I am pretty new to R, so any newbie detail would be much appreciated!

thanks in advance for any help!

--
View this message in context:
http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083.html
Sent from the R help mailing list archive at Nabble.com.

Filoche

2012-Jun-01 17:42 UTC

head link

[R] regsubsets (Leaps)

Hi.

I would take a look to the /forward.se/l function in the /packfor/ package.

http://r-forge.r-project.org/R/?group_id=195

Good luck,
Phil

--
View this message in context:
http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083p4632102.html
Sent from the R help mailing list archive at Nabble.com.

farmedgirl

2012-Jun-01 17:47 UTC

head link

[R] regsubsets (Leaps)

thanks Phil, I will give it a try!
-Kim

--
View this message in context:
http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083p4632105.html
Sent from the R help mailing list archive at Nabble.com.

Bert Gunter

2012-Jun-01 18:22 UTC

head link

[R] Fwd: regsubsets (Leaps)

Frank -- where are you?!

(To the OP: Your post leaves me simply breathless. You are embarked on
a fool's errand. Filoche's "help" will continue you down that
path.
IMHO only of course.

Bottom line: You CANNOT do what you wish to do. Or to quote John Tukey:

"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. " )

-- Bert


---------- Forwarded message ----------
From: farmedgirl <ksteinmann at cdpr.ca.gov>
Date: Fri, Jun 1, 2012 at 8:19 AM
Subject: [R] regsubsets (Leaps)
To: r-help at r-project.org


Hi
i need to create a model from 250 + variables with high collinearity, and
only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC,
and/or BIC to narrow down the number of variables, and then use VIF to
choose a model without collinearity (if possible). ?I realize that having a
huge p and small n is going to give me extreme linear dependency problems,
but I *think* these model selection criteria should still be useful?

I have currently been running regsubsets for over a week with no results. I
have no idea if R is still working, or if the computer is hung. I ran
regsubsets on a smaller portion of the data, also with linear dependency
problems, and got results. However, the hourglass continues its endless
spiraling with the full dataset.

I am running the following on Windows 7
library(leaps)
m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE)

(NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.)

Does anyone have any opinions on:
1) is R likely to still be running, even after a week, or should i just shut
it down?

2) am i doing something wrong with regsubsets?

3) is there a better option than regsubsets, that will still allow me to
narrow down parameters so i have explanatory power (ie i could develop a
model using PLS, and keep all the variables, but also keep all the
collinearity issues, and have good prediction but not explanatory power)

4) any other ideas?

I am pretty new to R, so any newbie detail would be much appreciated!

thanks in advance for any help!

--
View this message in context:
http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Thomas Lumley

2012-Jun-03 22:56 UTC

head link

[R] regsubsets (Leaps)

On Sat, Jun 2, 2012 at 3:19 AM, farmedgirl <ksteinmann at cdpr.ca.gov>
wrote:> Hi
> i need to create a model from 250 + variables with high collinearity, and
> only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC,
> and/or BIC to narrow down the number of variables, and then use VIF to
> choose a model without collinearity (if possible). ?I realize that having a
> huge p and small n is going to give me extreme linear dependency problems,
> but I *think* these model selection criteria should still be useful?
>
> I have currently been running regsubsets for over a week with no results. I
> have no idea if R is still working, or if the computer is hung. I ran
> regsubsets on a smaller portion of the data, also with linear dependency
> problems, and got results. However, the hourglass continues its endless
> spiraling with the full dataset.
>
> I am running the following on Windows 7
> library(leaps)
> m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE)
>
> (NOTE: The ~ is a tilda, not a dash, in the regression statement above:
Y~.)
>
> Does anyone have any opinions on:
> 1) is R likely to still be running, even after a week, or should i just
shut
> it down?
It's likely to be running for years.  2^250 is a large number, even
with the branch-and-bound algorithm to cut it down.
> 2) am i doing something wrong with regsubsets?
Yes.  At the very least, set nvmax to something reasonable.  You
certainly don't want to find a model with 243 variables, so don't
waste time looking for one.
>
> 3) is there a better option than regsubsets,
Almost certainly.  regsubsets() is pretty much useless as a way of
selecting a single model, unless perhaps when p is very small.  It was
produced as a way of viewing a large collection of best models, as in
the example for the plot() method, by setting nbest fairly large


  -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jun 2012 - regsubsets (Leaps)

[R] regsubsets (Leaps)

[R] regsubsets (Leaps)

[R] regsubsets (Leaps)

[R] Fwd: regsubsets (Leaps)

[R] regsubsets (Leaps)

Maybe Matching Threads