thr3ads.net - R help - [R] Regression w/ interactions [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Michael Dykes

2010-Apr-15 04:52 UTC

[R] Regression w/ interactions

I have a project due in my Linear Regression class re: regression on a data
set & my professor gave us a hint that there were *exactly *2 sig
interactions. The data set is attached. We have to find which predictors are
significant, & which 2 interactions are sig. Also, I nedd some guidance for
this & selecting the best model. I tried the `full' model, that being:
z=lm(y~x1+x2+x3+x4+x1*x2+x2*x3...+x3*x4). I then ran an anova(z), &
summary(z). My R^2 & R^2_a were *really* low. I am not sure how to do PRESS,
AIC & Cp in R yet though. Any help would be appreciated.
-- 
In Christ,
Michael D
-------------- next part --------------
-0.181277	b	b	a	-4.854498
-1.373577	b	d	b	2.907360450142
0.084283	b	a	a	5.808826503644
-1.127579	a	d	a	20.376580203723
-1.505394	c	c	a	15.980271285708
0.805714	a	b	a	-7.816702950204
-1.758552	a	a	a	-10.166765410112
-2.142039	a	b	b	-10.375365155042
-1.186767	a	d	b	9.011867
0.626815	c	d	a	11.397132867325
0.583209	b	b	b	4.020803737681
0.883034	a	c	b	-14.856109180624
0.799648	b	b	b	7.527047923904
-1.387795	a	c	b	10.31442492405
0.664249	c	a	a	-4.76701
-0.468531	c	c	a	-5.222927893883
0.811834	c	b	a	-11.628574887112
0.366410	a	d	b	7.1486201357
-1.007479	a	c	b	-9.151556677205
-0.389301	a	a	b	-2.358260462798
-0.181228	a	d	a	1.429120648064
0.347862	a	d	a	-2.758801913132
-0.526441	a	c	b	-0.997357126481
0.573127	a	a	b	-3.129691558129
1.828985	b	a	b	-1.338757130225
1.712653	c	a	a	3.740484403182
0.323601	a	d	b	10.367141
-0.766638	c	d	b	-3.824669823044
-1.483897	b	b	b	-4.718086226436
-0.409424	c	c	b	-1.266755011776
0.688012	a	c	a	-4.187009024288
-0.223448	b	c	a	-6.474360017408
1.650190	b	b	b	-5.3548669639
0.509900	a	c	a	0.90449399
0.648275	b	a	a	12.589634
0.958924	c	c	a	-8.750357
1.236345	b	b	a	-6.271058959025
0.879718	b	c	b	-2.102226
0.847110	c	b	b	5.8888077042
0.057705	c	a	b	-0.15064
1.466836	a	d	b	-13.994296850896
0.252834	b	d	b	-5.343887
-0.646959	a	d	a	2.380481209276
-0.641361	a	b	a	5.281187203037
1.436205	a	a	a	7.230062197975
-0.914302	a	c	b	7.954952
-1.374686	b	a	a	4.024969598596
-1.405034	a	d	b	18.696831623468
-1.761170	b	d	b	-29.3249180756
-2.037577	c	a	b	-14.720158123716
0.315597	b	a	b	-15.567926533591
0.445058	c	a	a	7.356515753272
-0.391830	a	a	b	-0.1212394978
0.111954	c	d	a	-8.60269549058
1.070064	b	d	a	5.950436964096
0.744335	a	d	b	1.122728592225
0.744789	c	a	a	2.916147345479
-0.280928	c	c	a	-1.560806623552
0.074920	a	a	b	1.8566239936
-3.303051	b	c	b	-1.043654908601
-0.752622	a	a	a	-0.532701874884
0.213995	c	a	a	-5.100755580075
-0.288164	b	a	a	-5.433570963584
1.240903	b	b	b	-2.345784
0.028406	c	a	b	-14.952644900836
-1.778258	b	d	b	13.077621
0.362794	c	c	a	-11.816527
-1.183487	b	a	b	2.732006
-0.026913	b	c	a	-1.099999380862
-0.345788	c	c	b	5.098742340944
0.658509	c	a	a	-10.405751309243
0.583429	b	a	b	1.544372
-2.044017	c	a	b	-24.284095481445
-0.032043	a	d	a	-18.783719753849
-0.019330	a	b	b	7.5757614044
0.231343	c	c	a	-6.814815918245
0.453047	a	b	a	3.901269415791
1.669509	b	a	b	-13.543330903243
0.118164	a	c	b	2.424542538208
0.212080	c	a	b	-11.348567
1.775061	c	d	b	4.836750892558
0.164970	a	d	a	-0.3874122018
-1.473051	b	b	a	5.721379751399
0.984253	b	c	a	-7.528274968009
1.614237	a	c	a	-8.166492184338
1.328719	b	c	b	12.637026638078
0.657254	a	a	a	-10.174004820516
0.292654	a	b	a	2.993034636284
0.289567	b	d	b	-2.005221237445
-0.238941	a	a	b	-7.987413198519
-1.260164	c	a	a	17.403358920688
-0.800166	a	a	b	3.310341117332
-1.249576	b	c	b	-11.723231179776
0.688448	a	b	a	-13.002202648704
-0.319227	a	d	a	-4.562722
-1.174488	a	b	a	2.744689
-0.132163	a	b	a	8.887791765724
0.281622	c	c	b	-6.577071803536
-0.568164	b	c	a	-6.065338330896
0.417275	b	d	a	-13.401327276875

Frank E Harrell Jr

2010-Apr-15 12:26 UTC

head link

[R] Regression w/ interactions

Michael Dykes wrote:> I have a project due in my Linear Regression class re: regression on a data
> set & my professor gave us a hint that there were *exactly *2 sig
> interactions. The data set is attached. We have to find which predictors
are
> significant, & which 2 interactions are sig. Also, I nedd some guidance
for
> this & selecting the best model. I tried the `full' model, that
being:
> z=lm(y~x1+x2+x3+x4+x1*x2+x2*x3...+x3*x4). I then ran an anova(z), &
> summary(z). My R^2 & R^2_a were *really* low. I am not sure how to do
PRESS,
> AIC & Cp in R yet though. Any help would be appreciated.
> 
> 
Michael this is not really the place for help on homework other than 
perhaps on technical roadblocks.  Note that the strategy you are being 
told to follow is one whose statistical properties have been severely 
criticized in the statistical literature.  Only with a very high signal 
to noise ratio (e.g., high true R^2) can torturing data lead to a 
confession to something other than what the analyst wants to hear.  I 
suppose that in simulated data there is a "true" model out there
waiting
to be found, but beware of using this approach with real data with low 
signal to noise ratios.

Frank


-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

Frank E Harrell Jr

2010-Apr-15 16:40 UTC

head link

[R] Regression w/ interactions

Michael Dykes wrote:> So, am i wrong to /assume /that the reasons my professor is asking us to 
> find a high R^2 & adjusted R^2, low Cp (near what p+1, if i remember 
> correctly), low PRESS, & AIC is b/c the data is randomly generated (b/c
> he has stated that all of the data for *all *of these hw assignments are 
> randomly generated)?  And i am not /exactly /sure to what you are 
> referring when you say: 'low signal to noise ratios'. Do you mean
/low
> /R^2 to epsilon_i's? or /low /predictors to epsilon i's? Please
excuse
> my ignorance in these matters, but I am not only asking these questions 
> for hw sakes's but for my future, as I hope to study for the actuarial 
> exams & take the Probability Test sometime either next Spring or Summer
> [after taking this professors Calculus-based Prob & Stat sequence in
the
> coming Fall & Spring Semester].
> 
> Thanks again for your help, Professor.
Let me just add that a valid test of whether any of the variables or 
interactions is associated with Y is to formulate a model with all the 
parameters in it and to use the global F test.

Stepwise techniques such as you are being asked to use are not 
scientific.  If the true R^2 (which you do not know) is not high, the 
low signal:noise ratio makes the data incapable to telling you the 
"right" variables to include with any reliability.  Unfortunately,
most
teachers of statistics do not understand this point, so you might be 
graded off for providing the right answer.

Frank
> 
> On Thu, Apr 15, 2010 at 8:26 AM, Frank E Harrell Jr 
> <f.harrell at vanderbilt.edu <mailto:f.harrell at
vanderbilt.edu>> wrote:
> 
>     Michael Dykes wrote:
> 
>         I have a project due in my Linear Regression class re:
>         regression on a data
>         set & my professor gave us a hint that there were *exactly *2
sig
>         interactions. The data set is attached. We have to find which
>         predictors are
>         significant, & which 2 interactions are sig. Also, I nedd some
>         guidance for
>         this & selecting the best model. I tried the `full' model,
that
>         being:
>         z=lm(y~x1+x2+x3+x4+x1*x2+x2*x3...+x3*x4). I then ran an anova(z),
&
>         summary(z). My R^2 & R^2_a were *really* low. I am not sure how
>         to do PRESS,
>         AIC & Cp in R yet though. Any help would be appreciated.
> 
> 
> 
>     Michael this is not really the place for help on homework other than
>     perhaps on technical roadblocks.  Note that the strategy you are
>     being told to follow is one whose statistical properties have been
>     severely criticized in the statistical literature.  Only with a very
>     high signal to noise ratio (e.g., high true R^2) can torturing data
>     lead to a confession to something other than what the analyst wants
>     to hear.  I suppose that in simulated data there is a "true"
model
>     out there waiting to be found, but beware of using this approach
>     with real data with low signal to noise ratios.
> 
>     Frank
> 
> 
>     -- 
>     Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                         Department of Biostatistics   Vanderbilt University
> 
> 
>

Reasonably Related Threads

Search for more maybe matching threads

R help - Apr 2010 - Regression w/ interactions

[R] Regression w/ interactions

[R] Regression w/ interactions

[R] Regression w/ interactions

Reasonably Related Threads