Hello,
I'm with a conceptual doubt regarding Rsquared of both lm() and
postResample(library caret).
I've got a multiple regression linear model (lets say mlr) with anR² value
of 67.52%.
Then I use this model pro make predictions with predict() function using the
same data as input , that is, use the generated model to predict the value
associated with data that I used as input.
Next, if I apply postResample() to the observed and predicted data, why do I
have have an R² value of 33%? I mean, wasn't it supposed to be, at least,
67%, as in the original model, since they're using the same data as input?
Here is the code (the data goes on the end of the email)
#read input data
input<-read.table("input.csv", header=T)
# multiple linear regression
mlr<-lm(input$TOTAL~-1 + input$A + input$B + input$C + input$D)
#observe the model
summary(mlr)
Call:
lm(formula = input$TOTAL ~ -1 + input$A + input$B + input$C + input$D)
Residuals:
Min 1Q Median 3Q Max
-25.753 -7.455 2.396 12.615 55.316
Coefficients:
Estimate Std. Error t value Pr(>|t|)
input$A 10.5985 3.9782 2.664 0.0121 *
input$B 0.3471 17.7731 0.020 0.9845
input$C 0.9468 1.9442 0.487 0.6297
input$D 12.1056 4.7262 2.561 0.0155 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Residual standard error: 17.08 on 31 degrees of freedom
Multiple R-Squared: 0.6752, Adjusted R-squared: 0.6333
F-statistic: 16.11 on 4 and 31 DF, p-value: 3.090e-07
#as we noticed, an Rsquared value of 67.52%
#next, lets predict the results with the same input data
prediction<-predict(mlr,input)
#now let's evaluate the predictions, observing the R² and RMSE values that
postResample returns
postResample(input$TOTAL, prediction)
RMSE Rsquared
16.0718506 0.3300378
So here comes my doubt: why do I have an value of 67.52% for R² when
creating the model(that is , the model explains 67.52% of the data) and
when I use this same model on the same input data, why does postResample
return a very different value associated to R²?
Best regards,
Giovane
#input.csv file used as input
"A" "B" "C" "D"
"TOTAL"
1 0 1 0 3.8
1 0 1 0 21.67
1 0 0 0 2.92
2 0 6 0 42.84
0 0 0 0 5.28
2 0 0 3 44.86
1 0 0 0 8.22
1 0 0 0 28.24
1 0 3 0 29.69
1 0 0 1 78.02
3 0 7 0 51.29
2 0 0 0 37.55
2 0 2 0 10.82
1 0 3 0 17.67
0 0 0 0 6.62
2 1 3 1 36.49
0 0 0 0 37.52
1 0 2 0 5.26
1 0 2 0 7.32
1 0 0 0 2.2
2 0 6 0 39.24
0 0 0 0 2.83
2 0 0 3 50.93
1 0 0 0 4.15
1 0 0 0 29.72
1 0 3 0 4.26
1 0 0 1 25.1
3 0 7 0 12.67
2 0 0 0 7.99
2 0 2 0 17.55
1 0 3 0 3.66
0 0 0 0 7.22
0 0 0 0 3.82
0 0 0 0 28.05
3 0 7 0 34.67
[[alternative HTML version deleted]]