Hi, I'm trying to fit PLSR model in R with 'pls' package with 22 samples (16 train, 6 test). I know that basic for considering of number of component is cross-validation (in my case 'LOO') and then I should choose number of component with minimum of RMSEP (or first minimum). But problem is that values of RMSEP is increasing (not the opposite). Should I choose only 1 component? And then I tried compute R2 with my test-dataset (6 samples) and I received nonsensical values (below 0, bigger then 1). Do you have any idea what may be caused? If it's my problem with fitting or problem with datasets. Below, you can see my results:>pH.spec<-plsr(pH ~ spec, data=soil.train, validation="LOO")>summary(pH.spec)Data: ??? X dimension: 16 501 ??? Y dimension: 16 1 Fit method: kernelpls Number of components considered: 14 VALIDATION: RMSEP Cross-validated using 16 leave-one-out segments. ?????? (Intercept)? 1 comps? 2 comps? 3 comps? 4 comps? 5 comps? 6 comps? 7 comps? 8 comps? 9 comps? 10 comps? 11 comps CV????????? 0.5343?? 0.5435?? 0.5506??? 1.629??? 1.617??? 1.742??? 1.921??? 1.979??? 1.977??? 1.971???? 1.972???? 1.972 adjCV?????? 0.5343?? 0.5419?? 0.5486??? 1.587??? 1.570??? 1.688??? 1.860??? 1.916??? 1.914??? 1.908???? 1.910???? 1.909 ?????? 12 comps? 13 comps? 14 comps CV??????? 1.972???? 1.972???? 1.972 adjCV???? 1.909???? 1.909???? 1.909 TRAINING: % variance explained ??? 1 comps? 2 comps? 3 comps? 4 comps? 5 comps? 6 comps? 7 comps? 8 comps? 9 comps? 10 comps? 11 comps? 12 comps X??? 96.410?? 99.655??? 99.87??? 99.90??? 99.93??? 99.94??? 99.95??? 99.96?? ? 99.96???? 99.97???? 99.98???? 99.99 pH??? 3.649??? 8.342??? 19.41??? 67.48??? 88.96??? 97.19??? 99.69??? 99.94?? ? 99.99??? 100.00??? 100.00??? 100.00 ??? 13 comps? 14 comps X????? 99.99?????? 100 pH??? 100.00?????? 100> R2(pH.spec, newdata = soil.test)(Intercept)????? 1 comps????? 2 comps????? 3 comps????? 4 comps????? 5 comps ????? 6 comps????? 7 comps????? 8 comps? ?? -1.65763???? -0.60849???? -0.05253???? -0.72870???? -2.84718???? -2.34102 ???? -3.28201???? -3.68611???? -3.69817? ??? 9 comps???? 10 comps???? 11 comps???? 12 comps???? 13 comps???? 14 comps ? ?? -3.77271???? -3.74585???? -3.76342???? -3.76074???? -3.76110???? -3.76115 ? Thank you in advance for your help [[alternative HTML version deleted]]
I think this wants a statistical discussion, which is OT here. stats.stackexchange.com would be a better place to post for that. However, if I understand correctly, using pls or anything else to try to fit (some combination of) 501 variables to 16 data points -- and then crossvalidate with 6 data points -- is utter nonsense. You just have a fancy random number generator! As I said, I think it better to follow up or complain about me on stackexchange rather than here. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Feb 7, 2017 at 4:49 PM, Ladislav Rozko?n? <ladarozkosny at seznam.cz> wrote:> > > > Hi, > > > > > I'm trying to fit PLSR model in R with 'pls' package with 22 samples (16 > train, 6 test). I know that basic for considering of number of component is > cross-validation (in my case 'LOO') and then I should choose number of > component with minimum of RMSEP (or first minimum). But problem is that > values of RMSEP is increasing (not the opposite). Should I choose only 1 > component? > > > > > And then I tried compute R2 with my test-dataset (6 samples) and I received > nonsensical values (below 0, bigger then 1). > > Do you have any idea what may be caused? If it's my problem with fitting or > problem with datasets. > > > > > Below, you can see my results: > > > > >>pH.spec<-plsr(pH ~ spec, data=soil.train, validation="LOO") > >>summary(pH.spec) > > Data: X dimension: 16 501 > Y dimension: 16 1 > Fit method: kernelpls > Number of components considered: 14 > > VALIDATION: RMSEP > Cross-validated using 16 leave-one-out segments. > (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 > comps 8 comps 9 comps 10 comps 11 comps > CV 0.5343 0.5435 0.5506 1.629 1.617 1.742 1.921 > 1.979 1.977 1.971 1.972 1.972 > adjCV 0.5343 0.5419 0.5486 1.587 1.570 1.688 1.860 > 1.916 1.914 1.908 1.910 1.909 > 12 comps 13 comps 14 comps > CV 1.972 1.972 1.972 > adjCV 1.909 1.909 1.909 > > TRAINING: % variance explained > 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps > 9 comps 10 comps 11 comps 12 comps > X 96.410 99.655 99.87 99.90 99.93 99.94 99.95 99.96 > 99.96 99.97 99.98 99.99 > pH 3.649 8.342 19.41 67.48 88.96 97.19 99.69 99.94 > 99.99 100.00 100.00 100.00 > 13 comps 14 comps > X 99.99 100 > pH 100.00 100 > > > > >> R2(pH.spec, newdata = soil.test) > (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps > 6 comps 7 comps 8 comps > -1.65763 -0.60849 -0.05253 -0.72870 -2.84718 -2.34102 > -3.28201 -3.68611 -3.69817 > 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps > > -3.77271 -3.74585 -3.76342 -3.76074 -3.76110 -3.76115 > > > > > > > Thank you in advance for your help > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter <bgunter.4567 at gmail.com> writes:> However, if I understand correctly, using pls or anything else to try > to fit (some combination of) 501 variables to 16 data points -- and > then crossvalidate with 6 data points -- is utter nonsense. You just > have a fancy random number generator!That is incorrect. PLSR and other dimension reducing regression methods can handle more prediction variables than samples perfectly fine -- many of them were created for that purpose. As for the original question: typically this happens when there is no (or very little) correlation between the response and the prediction variables. (Or as they tend to say in chemometrics: You don't have a model.)> As I said, I think it better to follow up or complain about me on > stackexchange rather than here.Sorry, I read this too late. :) -- Regards, Bj?rn-Helge Mevik -------------- neste del -------------- A non-text attachment was scrubbed... Name: ikke tilgjengelig Type: application/pgp-signature Size: 800 bytes Desc: ikke tilgjengelig URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20170208/2c61e1ad/attachment-0001.bin>