Hi,
I'm trying to fit regression model, but there is something wrong with it.
The dataset contains 85 observations for 85 students.Those observations are
counts of several actions, and dependent variable is final score. More
precisely, I have 5 IV and one DV. I'm trying to build regression model to
check whether those variables can predict the final score.
I'm attaching output of several steps, but I tried to following procedure:
- build model with only those two variables
- summary shows that non of them is significant predictor of the final
outcome.
- test for multicollinearity revealed tolerance below 0.2 (potential
problem)
- build two new models having as a predictor only one of those values
- both models show that variable used for the model is significant
predictor. Separately they are significant, together not. Probably
multicollinearity problem, but...
- as I keep adding other variables to one or the other model, Multiple
R-squared slightly increases.
- I tried to compare different models using anova, but non of them seems to
be better.
How to determine which model is better?
Thanks
-------------- next part --------------> lm.all.1 <- lm(mark~IA+IC, data=social_presence_data)
> summary(lm.all.1)
Call:
lm(formula = mark ~ IA + IC, data = social_presence_data)
Residuals:
Min 1Q Median 3Q Max
-3.5969 -0.2573 0.2599 0.5819 1.2955
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.78938 0.24599 11.339 <2e-16 ***
IA 0.02844 0.04503 0.632 0.530
IC 0.01979 0.02601 0.761 0.449
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.031 on 79 degrees of freedom
Multiple R-squared: 0.12, Adjusted R-squared: 0.09774
F-statistic: 5.387 on 2 and 79 DF, p-value: 0.006407
> 1/vif(lm.all.1)
IA IC
0.1719037 0.1719037 > dwt(lm.all.1)
lag Autocorrelation D-W Statistic p-value
1 0.09176706 1.815883 0.372
Alternative hypothesis: rho != 0> lm.all.2 <- lm(mark~IA, data=social_presence_data)
> lm.all.3 <- lm(mark~IC, data=social_presence_data)
> anova(lm.all.2, lm.all.3)
Analysis of Variance Table
Model 1: mark ~ IA
Model 2: mark ~ IC
Res.Df RSS Df Sum of Sq F Pr(>F)
1 80 84.604
2 80 84.413 0 0.19141 > anova(lm.all.1, lm.all.3)
Analysis of Variance Table
Model 1: mark ~ IA + IC
Model 2: mark ~ IC
Res.Df RSS Df Sum of Sq F Pr(>F)
1 79 83.989
2 80 84.413 -1 -0.42402 0.3988 0.5295> anova(lm.all.1, lm.all.2)
Analysis of Variance Table
Model 1: mark ~ IA + IC
Model 2: mark ~ IA
Res.Df RSS Df Sum of Sq F Pr(>F)
1 79 83.989
2 80 84.604 -1 -0.61543 0.5789 0.449> summary(lm.all.2)
Call:
lm(formula = mark ~ IA, data = social_presence_data)
Residuals:
Min 1Q Median 3Q Max
-3.5409 -0.2539 0.2283 0.5793 1.2956
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.88517 0.21078 13.688 < 2e-16 ***
IA 0.05961 0.01862 3.202 0.00196 **
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.028 on 80 degrees of freedom
Multiple R-squared: 0.1136, Adjusted R-squared: 0.1025
F-statistic: 10.25 on 1 and 80 DF, p-value: 0.001962
> summary(lm.all.3)
Call:
lm(formula = mark ~ IC, data = social_presence_data)
Residuals:
Min 1Q Median 3Q Max
-3.6320 -0.2562 0.2590 0.5764 1.2585
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.76364 0.24168 11.435 < 2e-16 ***
IC 0.03473 0.01074 3.233 0.00178 **
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.027 on 80 degrees of freedom
Multiple R-squared: 0.1156, Adjusted R-squared: 0.1045
F-statistic: 10.45 on 1 and 80 DF, p-value: 0.001779
> lm.all.3.1 <- lm(mark~IC+AU, data=social_presence_data)
> summary(lm.all.3.1)
Call:
lm(formula = mark ~ IC + AU, data = social_presence_data)
Residuals:
Min 1Q Median 3Q Max
-3.5951 -0.2618 0.2378 0.5907 1.2619
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.77600 0.24499 11.331 < 2e-16 ***
IC 0.03276 0.01191 2.752 0.00735 **
AU 0.04994 0.12697 0.393 0.69514
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.033 on 79 degrees of freedom
Multiple R-squared: 0.1173, Adjusted R-squared: 0.09496
F-statistic: 5.249 on 2 and 79 DF, p-value: 0.007236