Hi all, I am wondering in R, suppose I did the principal component analysis on training data set and obtain the rotation matrix, via:> pca=prcomp(training_data, center=TRUE, scale=FALSE, retx=TRUE);Then I want to rotate the test data set using the> d1=scale(test_data, center=TRUE, scale=FALSE) %*% pca$rotation; > d2=predict(pca, test_data, center=TRUE, scale=FALSE);these two values are different> min(d2-d1)[1] -1.976152> max(d2-d1)[1] 1.535222 However, if I do these on the training data:> d1=scale(training_data, center=TRUE, scale=FALSE) %*% pca$rotation; > d2=predict(pca, training_data, center=TRUE, scale=FALSE); > d3=pca$x;Then the d1, d2, d3 are all the same... ------------------------------------ So now I am confused... why does the test data have two different rotated matrix value? Thanks a lot! [[alternative HTML version deleted]]
Bjørn-Helge Mevik
2006-Feb-28 08:52 UTC
[R] question about Principal Component Analysis in R?
Michael wrote:>> pca=prcomp(training_data, center=TRUE, scale=FALSE, retx=TRUE); > > Then I want to rotate the test data set using the > >> d1=scale(test_data, center=TRUE, scale=FALSE) %*% pca$rotation; >> d2=predict(pca, test_data, center=TRUE, scale=FALSE); > > these two values are different > >> min(d2-d1) > [1] -1.976152 >> max(d2-d1) > [1] 1.535222This is because you have subtracted a different means vector. You should use the coloumn means of the training data (as predict does; see the last line of stats:::predict.prcomp): d1=scale(test_data, center=pca$center, scale=FALSE) %*% pca$rotation; -- Bj??rn-Helge Mevik