zhenjiang zech xu
2014-Feb-28 06:13 UTC
[R] how is the model resample performance calculated by caret?
Dear all, I did a 5-repeat of 10-fold cross validation using partial least square regression model provided by caret package. Can anyone tell me how are the values in plsTune$resample calculated? Is that predicted on each hold-out set using the model which is trained on the rest data with the optimized parameter tuned from previous cross validation? So in the following example, firstly, 5-repeat of 10-fold cross validation gives 2 for ncomp as the best, and then using ncomp of 2 and the training data to build a model and then predict the hold-out data with the model to give a RMSE and RSQUARE - is what I am thinking true?> plsTune524 samples 615 predictors Pre-processing: centered, scaled Resampling: Cross-Validation (10 fold, repeated 5 times) Summary of sample sizes: 472, 472, 471, 471, 471, 471, ... Resampling results across tuning parameters: ncomp RMSE Rsquared RMSE SD Rsquared SD 1 16.8 0.434 1.47 0.0616 2 14.3 0.612 2.21 0.0768 3 13.5 0.704 6.33 0.145 4 14.6 0.706 9.29 0.163 5 15.2 0.703 10.9 0.172 6 16.5 0.69 13.4 0.181 7 18.4 0.672 17.8 0.194 8 20 0.651 20.4 0.199 9 20.9 0.634 20.9 0.199 10 22.1 0.613 22.1 0.197 11 23.3 0.599 23.8 0.198 12 24 0.588 24.7 0.198 13 24.9 0.572 25.2 0.197 14 25.8 0.557 26.2 0.194 15 26.2 0.544 25.8 0.191 16 26.6 0.532 25.5 0.187 RMSE was used to select the optimal model using the one SE rule. The final value used for the model was ncomp = 2.> > plsTune$resamplencomp RMSE Rsquared Resample 1 2 13.61569 0.6349700 Fold06.Rep4 2 2 16.02091 0.5808985 Fold05.Rep1 3 2 12.59985 0.6008357 Fold03.Rep5 4 2 13.20069 0.6296245 Fold02.Rep3 5 2 12.43419 0.6560434 Fold04.Rep2 6 2 15.36510 0.5954177 Fold04.Rep5 7 2 12.70028 0.6894489 Fold03.Rep2 8 2 13.34882 0.6468300 Fold09.Rep3 9 2 14.80217 0.5575010 Fold08.Rep3 10 2 19.03705 0.4907630 Fold05.Rep4 11 2 14.26704 0.6579390 Fold10.Rep2 12 2 13.79060 0.5806663 Fold05.Rep3 13 2 14.83641 0.5918039 Fold05.Rep2 14 2 12.48721 0.7011439 Fold01.Rep3 15 2 14.98765 0.5866102 Fold07.Rep4 16 2 10.88100 0.7597167 Fold06.Rep1 17 2 13.60705 0.6321377 Fold08.Rep5 18 2 13.42618 0.6136031 Fold08.Rep4 19 2 13.26066 0.6784586 Fold07.Rep1 20 2 13.20623 0.6812341 Fold03.Rep3 21 2 18.54275 0.4404729 Fold08.Rep2 22 2 11.80312 0.7177681 Fold05.Rep5 23 2 18.56271 0.4661072 Fold03.Rep1 24 2 13.54879 0.5850439 Fold10.Rep3 25 2 14.10859 0.5994811 Fold06.Rep5 26 2 13.68329 0.6701091 Fold01.Rep5 27 2 16.12123 0.5401200 Fold10.Rep1 28 2 12.92250 0.6917220 Fold06.Rep3 29 2 12.94366 0.6400066 Fold06.Rep2 30 2 12.39889 0.6790578 Fold01.Rep2 31 2 13.48499 0.6759649 Fold01.Rep1 32 2 12.52938 0.6728476 Fold03.Rep4 33 2 16.43352 0.5795160 Fold09.Rep5 34 2 12.53991 0.6550694 Fold09.Rep4 35 2 12.78708 0.6304606 Fold08.Rep1 36 2 13.97559 0.6655688 Fold04.Rep3 37 2 15.31642 0.5124997 Fold09.Rep2 38 2 15.24194 0.5324943 Fold09.Rep1 39 2 12.90107 0.6318960 Fold04.Rep1 40 2 13.59574 0.6277869 Fold01.Rep4 41 2 19.73633 0.4154821 Fold07.Rep5 42 2 12.03759 0.6537381 Fold02.Rep5 43 2 15.47139 0.5597097 Fold02.Rep4 44 2 22.55060 0.3816672 Fold07.Rep3 45 2 14.57875 0.6269560 Fold07.Rep2 46 2 13.02385 0.6395148 Fold02.Rep2 47 2 13.81020 0.6116137 Fold02.Rep1 48 2 13.46100 0.6200828 Fold04.Rep4 49 2 13.95487 0.6709253 Fold10.Rep5 50 2 12.65981 0.6606435 Fold10.Rep4 Best, Zhenjiang [[alternative HTML version deleted]]
Max Kuhn
2014-Feb-28 18:11 UTC
[R] how is the model resample performance calculated by caret?
On Fri, Feb 28, 2014 at 1:13 AM, zhenjiang zech xu <zhenjiang.xu at gmail.com> wrote:> Dear all, > > I did a 5-repeat of 10-fold cross validation using partial least square > regression model provided by caret package. Can anyone tell me how are the > values in plsTune$resample calculated? Is that predicted on each hold-out > set using the model which is trained on the rest data with the optimized > parameter tuned from previous cross validation?Yes, those values are the performance estimates across each hold-out using the final model. There is an option in trainControl() that will have it return the resamples from all models too.> So in the following > example, firstly, 5-repeat of 10-fold cross validation gives 2 for ncomp as > the best, and then using ncomp of 2 and the training data to build a model > and then predict the hold-out data with the model to give a RMSE and > RSQUARE - is what I am thinking true?It is. Max> > >> plsTune > 524 samples > 615 predictors > > Pre-processing: centered, scaled > Resampling: Cross-Validation (10 fold, repeated 5 times) > > Summary of sample sizes: 472, 472, 471, 471, 471, 471, ... > > Resampling results across tuning parameters: > > ncomp RMSE Rsquared RMSE SD Rsquared SD > 1 16.8 0.434 1.47 0.0616 > 2 14.3 0.612 2.21 0.0768 > 3 13.5 0.704 6.33 0.145 > 4 14.6 0.706 9.29 0.163 > 5 15.2 0.703 10.9 0.172 > 6 16.5 0.69 13.4 0.181 > 7 18.4 0.672 17.8 0.194 > 8 20 0.651 20.4 0.199 > 9 20.9 0.634 20.9 0.199 > 10 22.1 0.613 22.1 0.197 > 11 23.3 0.599 23.8 0.198 > 12 24 0.588 24.7 0.198 > 13 24.9 0.572 25.2 0.197 > 14 25.8 0.557 26.2 0.194 > 15 26.2 0.544 25.8 0.191 > 16 26.6 0.532 25.5 0.187 > > RMSE was used to select the optimal model using the one SE rule. > The final value used for the model was ncomp = 2. >> >> plsTune$resample > ncomp RMSE Rsquared Resample > 1 2 13.61569 0.6349700 Fold06.Rep4 > 2 2 16.02091 0.5808985 Fold05.Rep1 > 3 2 12.59985 0.6008357 Fold03.Rep5 > 4 2 13.20069 0.6296245 Fold02.Rep3 > 5 2 12.43419 0.6560434 Fold04.Rep2 > 6 2 15.36510 0.5954177 Fold04.Rep5 > 7 2 12.70028 0.6894489 Fold03.Rep2 > 8 2 13.34882 0.6468300 Fold09.Rep3 > 9 2 14.80217 0.5575010 Fold08.Rep3 > 10 2 19.03705 0.4907630 Fold05.Rep4 > 11 2 14.26704 0.6579390 Fold10.Rep2 > 12 2 13.79060 0.5806663 Fold05.Rep3 > 13 2 14.83641 0.5918039 Fold05.Rep2 > 14 2 12.48721 0.7011439 Fold01.Rep3 > 15 2 14.98765 0.5866102 Fold07.Rep4 > 16 2 10.88100 0.7597167 Fold06.Rep1 > 17 2 13.60705 0.6321377 Fold08.Rep5 > 18 2 13.42618 0.6136031 Fold08.Rep4 > 19 2 13.26066 0.6784586 Fold07.Rep1 > 20 2 13.20623 0.6812341 Fold03.Rep3 > 21 2 18.54275 0.4404729 Fold08.Rep2 > 22 2 11.80312 0.7177681 Fold05.Rep5 > 23 2 18.56271 0.4661072 Fold03.Rep1 > 24 2 13.54879 0.5850439 Fold10.Rep3 > 25 2 14.10859 0.5994811 Fold06.Rep5 > 26 2 13.68329 0.6701091 Fold01.Rep5 > 27 2 16.12123 0.5401200 Fold10.Rep1 > 28 2 12.92250 0.6917220 Fold06.Rep3 > 29 2 12.94366 0.6400066 Fold06.Rep2 > 30 2 12.39889 0.6790578 Fold01.Rep2 > 31 2 13.48499 0.6759649 Fold01.Rep1 > 32 2 12.52938 0.6728476 Fold03.Rep4 > 33 2 16.43352 0.5795160 Fold09.Rep5 > 34 2 12.53991 0.6550694 Fold09.Rep4 > 35 2 12.78708 0.6304606 Fold08.Rep1 > 36 2 13.97559 0.6655688 Fold04.Rep3 > 37 2 15.31642 0.5124997 Fold09.Rep2 > 38 2 15.24194 0.5324943 Fold09.Rep1 > 39 2 12.90107 0.6318960 Fold04.Rep1 > 40 2 13.59574 0.6277869 Fold01.Rep4 > 41 2 19.73633 0.4154821 Fold07.Rep5 > 42 2 12.03759 0.6537381 Fold02.Rep5 > 43 2 15.47139 0.5597097 Fold02.Rep4 > 44 2 22.55060 0.3816672 Fold07.Rep3 > 45 2 14.57875 0.6269560 Fold07.Rep2 > 46 2 13.02385 0.6395148 Fold02.Rep2 > 47 2 13.81020 0.6116137 Fold02.Rep1 > 48 2 13.46100 0.6200828 Fold04.Rep4 > 49 2 13.95487 0.6709253 Fold10.Rep5 > 50 2 12.65981 0.6606435 Fold10.Rep4 > > Best, > Zhenjiang > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.