On 2013-05-17 12:45, Jesse Gervais wrote:> Hi there,
>
>
>
> I want to do several bivariate linear regressions and, than, do a
> multivariate linear regression including only variables significantly
> associated *(p < 0.15)* with y in bivariate analysis, without having to
> look manually to those p values.
>
>
>
> So, here what I got for the moment.
>
>
>
> First, I use this data set:
>
>
>
> tolerance <- read.csv("
> http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt").
>
>
>
> Second, I used this command, allowing me to extract p-values later:
>
>
>
> lmp <- function (modelobject) {
>
> if (class(modelobject) != "lm") stop("Not an
object of class
> 'lm' ")
>
> f <- summary(modelobject)$fstatistic
>
> p <- pf(f[1],f[2],f[3],lower.tail=F)
>
> attributes(p) <- NULL
>
> return(p)}
>
>
>
> Third, I did my bivariate linear regressions:
>
>
>
> fit = lm(exposure~tol11, data = tolerance)
>
> fit_2 = lm(exposure~tol12, data= tolerance)
>
> fit_3 = lm(exposure~tol13, data= tolerance)
>
> fit_4 = lm(exposure~tol14, data= tolerance)
>
> fit_5 = lm(exposure~tol15, data= tolerance)
>
>
>
> Fourth, I extracted p-values:
>
>
>
> lmp(fit)
>
> lmp(fit_2)
>
> lmp(fit_3)
>
> lmp(fit_4)
>
> lmp(fit_5)
>
>
>
> Firth, I confirmed that p-values were OK (just to be sure, it's the
first
> time I used the above procedure) :
>
>
>
> summary (fit)
>
> summary (fit_2)
>
> summary (fit_3)
>
> summary (fit_4)
>
> summary (fit_5)
>
>
>
> And now, I?m, I don?t know what to do.
>
>
>
> The multivariate linear regression (if all variables were included) is:
>
>
>
> fit_multi = lm (exposure ~ tol11 + tol12 + tol13 + tol14 + tol15, data>
tolerance)
>
>
>
> I would like to be able to do something like:
>
>
> fit_multi = lm (exposure ~ tol11 [include only if lmp( fit) < 0.15] +
> tol12 [include only if lmp(fit_2) < 0.15] + tol13 [include only if
> lmp(fit_3) < 0.15] + tol14 [include only if lmp(fit_4) < 0.15] +
> tol15 [include
> only if lmp(fit_4) < 0.15], data= tolerance)
>
>
>
> Any idea?
>
(Thanks for providing reproducible code!)
It seems to me that you're just missing two things:
1. a way to determine the names of the variables to be included
in the multiple (not 'multivariate' to be nitpicky) regression;
2. a way to build the formula for the multiple regression once
you know which predictors to include.
To get the variables:
varnames <- names(tolerance)[2:6]
pvec <- c(lmp(fit), lmp(fit_2), lmp(fit_3), lmp(fit_4), lmp(fit_5))
use <- varnames[pvec < 0.15]
use
#[1] "tol14" "tol15"
To construct the formula:
rhs <- paste(use, collapse = " + ")
form <- paste("exposure ~", rhs)
And then use it:
fit_multi <- lm(formula = form, data = tolerance)
Peter Ehlers