Dear R-helpers, I have performed a PLS regression with the mvr function from the pls.pcr package an I have 2 questions : 1- do you know if mvr automatically centers the data ? It seems to me that it does so... 2- why in the situation below does the output say that the optimal number of latent variables is 4 ? In my humble opinion, it is 2 because the RMS increases and the R2 decreases when 3 LVs are considered :> summary(maturityCondor.raw.mvr)Data: X dimension: 8 1050 Y dimension: 8 1 Method: SIMPLS Number of latent variables considered: 1-7 TRAINING: RMS table: [,1] 1 LV's 1.23e+01 2 LV's 6.79e+00 3 LV's 5.00e+00 4 LV's 2.17e+00 5 LV's 1.93e+00 6 LV's 7.79e-01 7 LV's 1.01e-09 Cumulative fraction of variance explained: X Y 1 LV's 0.848 0.499 2 LV's 0.930 0.846 3 LV's 0.979 0.917 4 LV's 0.992 0.984 5 LV's 0.999 0.988 6 LV's 1.000 0.998 7 LV's 1.000 1.000 VALIDATION Optimal number of latent variables: 4 RMS table (10-fold crossvalidation): [,1] 1 LV's 16.21 2 LV's 12.15 3 LV's 13.81 4 LV's 6.68 5 LV's 6.38 6 LV's 5.91 7 LV's 13.38 Coefficient of multiple determination (R2): [,1] 1 LV's 0.20 2 LV's 0.51 3 LV's 0.41 4 LV's 0.88 5 LV's 0.87 6 LV's 0.90 7 LV's 0.77 Thanks for your help, Arnaud ************************* Arnaud DOWKIW Department of Primary Industries J. Bjelke-Petersen Research Station KINGAROY, QLD 4610 Australia T : + 61 7 41 600 700 T : + 61 7 41 600 728 (direct) F : + 61 7 41 600 760 ************************** ********************************DISCLAIMER******************...{{dropped}}
On Thursday 24 July 2003 05:02, Dowkiw, Arnaud wrote:> Dear R-helpers, > > I have performed a PLS regression with the mvr function from the pls.pcr > package an I have 2 questions : 1- do you know if mvr automatically centers > the data ? It seems to me that it does so...Yup, it does... common practice.> 2- why in the situation below > does the output say that the optimal number of latent variables is 4 ? In > my humble opinion, it is 2 because the RMS increases and the R2 decreases > when 3 LVs are considered :Many criteria exist and for some data sets they agree, for most they do not. The criterion applied here checks whether the decrease in cross-validated error is significant; Hastie et al. use it in their book "The elements of statistical learning". It is described in the man page, and like all criteria, it is not guaranteed to satisfy all users. If you feel better using 2LVs, you can do that. Ron -- Ron Wehrens Dept. of Chemometrics University of Nijmegen Email: rwehrens at sci.kun.nl Toernooiveld 1 http://www-cac.sci.kun.nl/cac/ 6525 ED Nijmegen Tel: +31 24 365 2053 The Netherlands Fax: +31 24 365 2653