thr3ads.net - R help - [R] LDA with previous PCA for dimensionality reduction [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Christoph Lehmann

2004-Nov-24 10:16 UTC

[R] LDA with previous PCA for dimensionality reduction

Dear all, not really a R question but:

If I want to check for the classification accuracy of a LDA with 
previous PCA for dimensionality reduction by means of the LOOCV method:

Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA 
with the CV option set to TRUE (runs LOOCV)

-- OR--

do I need
- to compute for each 'test-bag' (the n-1 observations) a PCA 
(my.princomp.1),
- then run the LDA on the test-bag scores (-> my.lda.1)
- then compute the scores of the left-out-observation using 
my.princomp.1 (-> my.scores.2)
- and only then use predict.lda(my.lda.1, my.scores.2) on the scores of 
the left-out-observation

?
I read some articles, where they choose procedure 1, but I am not sure, 
if this is really correct?

many thanks for a hint

Christoph

Ramon Diaz-Uriarte

2004-Nov-24 13:01 UTC

head link

[R] LDA with previous PCA for dimensionality reduction

Dear Cristoph,

I guess you want to assess the error rate of a LDA that has been fitted to a 
set of currently existing training data, and that in the future you will get 
some new observation(s) for which you want to make a prediction.
Then, I'd say that you want to use the second approach. You might find that 
the first step turns out to be crucial and, after all, your whole subsequent 
LDA is contingent on the PC scores you obtain on the previous step. Somewhat 
similar issues have been discussed in the microarray literature. Two 
references are:


@ARTICLE{ambroise-02,
  author = {Ambroise, C. and McLachlan, G. J.},
  title = {Selection bias in gene extraction on the basis of microarray 
gene-expression data},
  journal = {Proc Natl Acad Sci USA},
  year = {2002},
  volume = {99},
  pages = {6562--6566},
  number = {10},
}


@ARTICLE{simon-03,
  author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.},
  title = {Pitfalls in the use of DNA microarray data for diagnostic and 
prognostic classification},
  journal = {Journal of the National Cancer Institute},
  year = {2003},
  volume = {95},
  pages = {14--18},
  number = {1},
}


I am not sure, though, why you use PCA followed by LDA. But that's another 
story.

Best,


R.

On Wednesday 24 November 2004 11:16, Christoph Lehmann
wrote:> Dear all, not really a R question but:
>
> If I want to check for the classification accuracy of a LDA with
> previous PCA for dimensionality reduction by means of the LOOCV method:
>
> Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA
> with the CV option set to TRUE (runs LOOCV)
>
> -- OR--
>
> do I need
> - to compute for each 'test-bag' (the n-1 observations) a PCA
> (my.princomp.1),
> - then run the LDA on the test-bag scores (-> my.lda.1)
> - then compute the scores of the left-out-observation using
> my.princomp.1 (-> my.scores.2)
> - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of
> the left-out-observation
>
> ?
> I read some articles, where they choose procedure 1, but I am not sure,
> if this is really correct?
>
> many thanks for a hint
>
> Christoph
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
-- 
Ram??n D??az-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncol??gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern??ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

David Enot

2004-Nov-24 15:18 UTC

head link

[R] LDA with previous PCA for dimensionality reduction

On 24 Nov 2004, at 10:16, Christoph Lehmann wrote:
> Dear all, not really a R question but:
>
> If I want to check for the classification accuracy of a LDA with 
> previous PCA for dimensionality reduction by means of the LOOCV 
> method:
>
> Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA 
> with the CV option set to TRUE (runs LOOCV)
>
> -- OR--
>
> do I need
> - to compute for each 'test-bag' (the n-1 observations) a PCA 
> (my.princomp.1),
> - then run the LDA on the test-bag scores (-> my.lda.1)
> - then compute the scores of the left-out-observation using 
> my.princomp.1 (-> my.scores.2)
> - and only then use predict.lda(my.lda.1, my.scores.2) on the scores 
> of the left-out-observation
>
> ?
> I read some articles, where they choose procedure 1, but I am not 
> sure, if this is really correct?

As far as understand your problem (assessing the predictive ability of 
your model), the second solution should be done: the test set is 
something that should be never seen by the training data. If you run 
your PCA on the whole set, then you will take into account your test 
bag while forming your training data. Keep in mind that your classifier 
is made up with 2 components: PCA followed by LDA. This is fine if you 
build your model with a given number of PC's: the procedure to get an 
optimal number of PC's would be similar as above but considering the 
(n-1) examples. A proper validation of the model can become quickly 
tricky: this requires a bit of computing skills and this may take 
longer (especially with LOO)!

  Hope it helps

   David

>
> many thanks for a hint
>
> Christoph
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Reasonably Related Threads

Search for more reasonably related threads

R help - Nov 2004 - LDA with previous PCA for dimensionality reduction

[R] LDA with previous PCA for dimensionality reduction

[R] LDA with previous PCA for dimensionality reduction

[R] LDA with previous PCA for dimensionality reduction

Reasonably Related Threads