thr3ads.net - R help - [R] Stepwise model selection question [Jun 1999]

If this information is useful, please help other people find it:
Share via:

John Thaden

1999-Jun-18 21:21 UTC

[R] Stepwise model selection question

I use the step() function occasionally, and I think I understand its
objective, proper use, and limitations.  Now I see stepwise model selection
being used in what seems to be an unusual way, and I wonder if it is right
or wrong.  May I describe?

     Genetic mapping tries to find where in an animal's genome are genetic
elements that influence a particular physical trait.  Say there are 100
individuals derived from a cross between two parental strains, and for each
individual, we know which parent contributed each of 40 different locations
on the genome (mapping markers), and we have measured the trait.  The
markers are then binary predictors, the trait is the outcome.
     First, we could do 'single-marker analysis'.  We'd make simple
F-tests
for the trait and each marker, separately, and rank markers by F-values.  A
high value suggests the mapping marker lies near a genetic element
affecting the trait. 
     As a second analysis, we could use a method called interval mapping.
It uses the same data to  "scan" the genome, including regions between
mapping markers, and produces plots with likelihood ratios or LOD scores on
the y-axis, and position along the genome on the x.  (It relies on mixture
regression models, since the parental contribution is unknown between
markers).
     As a third analysis, we could use a refinement called composite
interval mapping.  It includes a few of the 40 mapping markers as
additional cofactors in mixture regression formulae as one scans.  The idea
is to have cofactors to handle genetic elements with large effects while
scanning elsewhere in the genome.  Which markers to include as cofactors is
selected prior to the scanning phase, by stepwise multiple-regression model
selection (occasionally, I'm able to exhaustively compare all possible
models, but usually it is done by forward-backward algorithm).
     I'm OK with this so far.  The use of step() seems fairly standard.
But now here's where I think it gets weird:  There is a compulsion among
geneticists to then treat the results of the stepwise model selection as
yet a fourth analytical tool by which to rank all the mapping markers,
i.e., as further evidence that a marker must be near a genetic element
affecting the trait.  "If it's included in the model, it must be close
to a
genetic effector of the trait".  How does this sound to you?  If a stepwise
algorithm ranks possible cofactors--perhaps even assigns them an F
value--can you use that ranking to make any comparisons among possible
cofactors?  What do the F values mean?
************************************************************
John J. Thaden, Ph.D., Instructor        jjthaden at life.uams.edu
Department of Geriatrics                     (501) 257-5583
University of Arkansas for Medical Sciences  FAX: (501) 257-4822
      mail & ship to:	J. L. McClellan V.A. Medical Center
		Research-151 (Room GB103 or GC124)
		4300 West 7th Street
		Little Rock AR 72205 USA
***********************************************************
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian D Ripley

1999-Jun-20 07:46 UTC

head link

[R] Stepwise model selection question

On Fri, 18 Jun 1999, John Thaden wrote:
>      I use the step() function occasionally, and I think I understand its
> objective, proper use, and limitations.  Now I see stepwise model selection
> being used in what seems to be an unusual way, and I wonder if it is right
> or wrong.  May I describe?
[Description snipped.]

step() does not do `stepwise model selection' by F-tests as you describe.
The title of its help page is

Choose a model by AIC in a Stepwise Algorithm
>      I'm OK with this so far.  The use of step() seems fairly standard.
> But now here's where I think it gets weird:  There is a compulsion
among
> geneticists to then treat the results of the stepwise model selection as
> yet a fourth analytical tool by which to rank all the mapping markers,
> i.e., as further evidence that a marker must be near a genetic element
> affecting the trait.  "If it's included in the model, it must be
close to a
> genetic effector of the trait".  How does this sound to you?  If a
stepwise
> algorithm ranks possible cofactors--perhaps even assigns them an F
> value--can you use that ranking to make any comparisons among possible
> cofactors?  What do the F values mean?
Being in the model is never an indication of causality, but it is an 
indication of association. The step from association to `close' is a 
genetic hypothesis.  As step() does not give F-values, I will not
comment on the rest.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jun 1999 - Stepwise model selection question

[R] Stepwise model selection question

[R] Stepwise model selection question

Reasonably Related Threads