jz7 at duke.edu
2006-Aug-02 17:12 UTC
[R] question about correlation coefficeint and root mean square (with code used)
Dear all, I am using different multiple regression models (OLS and principal component regression (PCR)) to make prediction of my test set. And those models come from the same training set, except that the number of variables or descriptors (columns of X) used in OLS is less than those used in PCR. And I use square correlation coefficient (r^2) and root mean square to see the relationship between my prediction and the experimental measurements of the test set. Here is the problem: My r^2 from PCR prediction is higher than r^2 from OLS prediction (0.8 vs. 0.7). However, my RMS of PCR prediction is also higher than OLS (0.55 vs. 0.48). I would expect r^2 and RMS show consistant trend (r^2 increase & rms decrease, or the opposite). But why am I getting opposite results? Is it because PCR is a biased method? Which one (r^2 or RMS) should be more reliable to evaluate the model? Here is the simple code I used for calculating r^2 and RMS in R (test set size is 40): r2=cor(test$p50, test.pred$fit)*cor(test$p50, test.pred$fit) rms=sqrt((test.pred$fit-test$p50)%*%(test.pred$fit-test$p50)/40) Really appreciate your kind help! Sincerely, Jeny