thr3ads.net - R help - [R] Questions on factors in regression analysis [Aug 2009]

If this information is useful, please help other people find it:
Share via:

guox at ucalgary.ca

2009-Aug-20 17:46 UTC

[R] Questions on factors in regression analysis

I got two questions on factors in regression:

Q1.
In a table, there a few categorical/factor variables, a few numerical
variables and the response variable is numeric. Some factors are important
but others not.
How to determine which categorical variables are significant to the
response variable?

Q2.
As we knew, lm can deal with categorical variables.
I thought, when there is a categorical predictor, we may use lm directly
without quantifying these factors and assigning different values to factors
would not change the fittings as shown:

x <- 1:20 ## numeric predictor
yes.no <- c("yes","no")
factors <- gl(2,10,20,yes.no) ##factor predictor
factors.quant <-  rep(c(18.8,29.9),c(10,10)) ##quantificatio of factors
factors.quant.1 <-  rep(c(16.9,38.9),c(10,10))
   ##second quantificatio of factors
response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response
lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications
lm.fact <- lm(response ~ x + factors) ##lm with factors

lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm with quantifications
lm.fact.1 <- lm(response ~ x + factors) ##lm with factors

par(mfrow=c(2,2)) ## comparisons of two fittings
plot(x, response)
lines(x,fitted(lm.quant),col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact),col = "red")
grid()
plot(x, response)
lines(x,fitted(lm.quant.1),lty =2,col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact.1),lty =2,col = "red")
grid()
par(mfrow = c(1,1))

So, is it right that we can assign any numeric values to factors,
for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above,
before doing lm, glm, aov, even nls?


Please drop a few lines and/or direct me some references. Thanks,

-james

David Winsemius

2009-Aug-20 18:13 UTC

head link

[R] Questions on factors in regression analysis

On Aug 20, 2009, at 1:46 PM, guox at ucalgary.ca wrote:
> I got two questions on factors in regression:
>
> Q1.
> In a table, there a few categorical/factor variables, a few numerical
> variables and the response variable is numeric. Some factors are  
> important
> but others not.
> How to determine which categorical variables are significant to the
> response variable?
Seems that you should engage the services of a consulting statistician  
for that sort of question. Or post in a venue where statistical  
consulting is supposed to occur, such as one of the sci.stat.*  
newsgroups.
>
> Q2.
> As we knew, lm can deal with categorical variables.
> I thought, when there is a categorical predictor, we may use lm  
> directly
> without quantifying these factors and assigning different values to  
> factors
> would not change the fittings as shown:
The "numbers" that you are attempting to assign are really just labels
for the factor levels. The regression functions in R will not use them  
for any calculations. They should not be thought of as having  
"values". Even if the factor is an ordered factor, the labels may not
be interpretable as having the same numerical order as the string  
values might suggest.
>
> x <- 1:20 ## numeric predictor
> yes.no <- c("yes","no")
> factors <- gl(2,10,20,yes.no) ##factor predictor
> factors.quant <-  rep(c(18.8,29.9),c(10,10)) ##quantificatio of  
> factors
Not sure what that is supposed to mean. It is not a factor object even  
though you may be misleading yourself in to believing it should be.  
It's a numeric vector.
 > str(factors.quant)
  num [1:20] 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 ...
> factors.quant.1 <-  rep(c(16.9,38.9),c(10,10))
>   ##second quantificatio of factors
> response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response
> lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications
> lm.fact <- lm(response ~ x + factors) ##lm with factors
 > lm.quant

Call:
lm(formula = response ~ x + factors.quant)

Coefficients:
   (Intercept)              x  factors.quant
       14.9098         0.5385         1.2350

 > lm.fact

Call:
lm(formula = response ~ x + factors)

Coefficients:
(Intercept)            x    factorsno
     38.1286       0.5385      13.7090>
> lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm with  
> quantifications
 > lm.quant.1

Call:
lm(formula = response ~ x + factors.quant.1)

Coefficients:
     (Intercept)                x  factors.quant.1
         27.5976           0.5385           0.6231
> lm.fact.1 <- lm(response ~ x + factors) ##lm with factors
>
> par(mfrow=c(2,2)) ## comparisons of two fittings
> plot(x, response)
> lines(x,fitted(lm.quant),col="blue")
> grid()
> plot(x,response)
> lines(x,fitted(lm.fact),col = "red")
> grid()
> plot(x, response)
> lines(x,fitted(lm.quant.1),lty =2,col="blue")
> grid()
> plot(x,response)
> lines(x,fitted(lm.fact.1),lty =2,col = "red")
> grid()
> par(mfrow = c(1,1))
>
> So, is it right that we can assign any numeric values to factors,
> for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above,
> before doing lm, glm, aov, even nls?
You can give factor levels any name you like, including any sequence  
of digit characters. Unlike "ordinary R where unquoted numbers cannot  
start variable names, factor functions will coerce numeric vectors to  
character vectors when assigning level names. But you seem to be  
conflating factors with numeric vectors that have many ties. Those two  
entities would have different handling by R's regression functions.

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2009 - Questions on factors in regression analysis

[R] Questions on factors in regression analysis

[R] Questions on factors in regression analysis

Maybe Matching Threads