On Tue, 14 May 2002, Kolling Alfons, F+E wrote:
>
> Hello experts,
>
> as newcomer in pca, i have a question, concerning the princomp algorithm.
> With a dataset "r" containing 18 "input" parameters and
1 "output" parameter
> r[19], i got with the following fit
>
> ls <- lsfit(r[1:18],r[19]); lsdiag <- ls.diag(ls); lsdiag$std.dev
>
> a prediction error of:
> [1] 8.879561
>
> what is quite reasonable. If i take only two significant important inputs,
>
> ls <- lsfit(r[1:2],r[19]); lsdiag <- ls.diag(ls); lsdiag$std.dev
>
> i will get an prediction error of:
> [1] 20.18148
>
> what is not so bad for only two of 18 input parameters. If i made an lsfit
> with the scores of:
>
> p <- princomp(r[1:18],cor=TRUE)
> ls <- lsfit(p$scores[,1:18],r[19]); lsdiag <- ls.diag(ls);
> lsdiag$std.dev
> i got the reasonable error of:
> [1] 8.879561
> (see above the first fit)
> But (and here comes the question) if take the two most important principal
> components for the lsfit
>
> ls <- lsfit(p$scores[,1:2],r[19]); lsdiag <- ls.diag(ls);
> lsdiag$std.dev
> i have an prediction error of:
> [1] 33.22741
>
> which is a good deal worse, compared to the 20.18148 from above. So what is
> wrong? I thought, that the first principle components are the "most
> important"?
Your understanding. The first two PCs explain most of the variance in X,
but they do not explain most of the variation in y.
BTW, it is `principal' not `principle'.
Principal components regression is a big topic. Ridge regression is
almost always preferable (and there is code for it in package MASS).
See
@Article{Frank.Friedman.93,
author = "I. E. Frank and J. H. Friedman",
title = "A statistical view of some chemometrics regression
tools (with discussion)",
journal = "Technometrics",
volume = "35",
pages = "109--148",
year = "1993",
}
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._