olsen
2016-Apr-18 16:20 UTC
[R] project test data into principal components of training dataset
Hi there, I've a training dataset and a test dataset. My aim is to visually allocate the test data within the calibrated space reassembled by the PC's of the training data set, furthermore to keep the training data set coordinates fixed, so they can serve as ruler for measurement for additional test datasets coming up. Please find a minimum working example using the wine dataset below. Ideally I would like to use ggbiplot as it comes with the elegant features but it only accepts objects of class prcomp, princomp, PCA, or lda, which is not fullfilled by the predicted test data. I'm still slightly wet behind my R ears and the only solution I can think of is to plot the calibrated space in ggbiplot and the training data in ggplot and then join them, in the worst case by exporting them as svg and importing them in inkscape. Which is slightly complicated plus the scaling is different. Any indication how this mission can be accomplished very welcome! Thanks and greets Olsen I started a threat on stackoverflow on that issue but know relevant indications so far. http://stackoverflow.com/questions/36603268/how-to-plot-training-and-test-validation-data-in-r-using-ggbiplot ##MWE library(ggbiplot) data(wine) ##pca on the wine dataset used as training data wine.pca <- prcomp(wine, center = TRUE, scale. = TRUE) wine$class <- wine.class ##simulate test data by generating three new wine classes wine.new.1 <- wine[c(sample(1:nrow(wine), 25)),] wine.new.2 <- wine[c(sample(1:nrow(wine), 43)),] wine.new.3 <- wine[c(sample(1:nrow(wine), 36)),] ##Predict PCs for the new classes by transforming #them using the predict.prcomp function pred.new.1 <- predict(wine.pca, newdata = wine.new.1) pred.new.2 <- predict(wine.pca, newdata = wine.new.2) pred.new.3 <- predict(wine.pca, newdata = wine.new.3) #simulate the classes for the new sorts wine.new.1$class <- rep("new.wine.1", nrow(wine.new.1)) wine.new.2$class <- rep("new.wine.2", nrow(wine.new.2)) wine.new.3$class <- rep("new.wine.3", nrow(wine.new.3)) wine.new.bind <- rbind(wine.new.1, wine.new.2, wine.new.3) ##compose the plot by joining the PCA ggbiplot training data with the testing data from ggplot #plot the calibrated space resulting from the test data g.train <- ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups wine$class, ellipse = TRUE, circle = TRUE) g.train #plot the test data resulting from the prediction df.pred = data.frame(PC1 = wine.new.bind[,1], PC2 = wine.new.bind[,2], PC3 = wine.new.bind[,3], PC4 = wine.new.bind[,4], classes = wine.new.bind$class) g.test <- ggplot(df.pred, aes(PC1, PC2, color = classes, shape classes)) + geom_point() + stat_ellipse() g.test -- Our solar system is the cream of the crop http://hasa-labs.org
olsen
2016-Apr-20 17:33 UTC
[R] project test data into principal components of training dataset
For the records, a slightly hacky answer, by modifying the ggbiplot function, is provided now here: http://stackoverflow.com/questions/36603268/how-to-plot-training-and-test-validation-data-in-r-using-ggbiplot On 18/04/16 17:20, olsen wrote:> Hi there, > > I've a training dataset and a test dataset. My aim is to visually > allocate the test data within the calibrated space reassembled by the > PC's of the training data set, furthermore to keep the training data set > coordinates fixed, so they can serve as ruler for measurement for > additional test datasets coming up. > > Please find a minimum working example using the wine dataset below. > Ideally I would like to use ggbiplot as it comes with the elegant > features but it only accepts objects of class prcomp, princomp, PCA, or > lda, which is not fullfilled by the predicted test data. > > I'm still slightly wet behind my R ears and the only solution I can > think of is to plot the calibrated space in ggbiplot and the training > data in ggplot and then join them, in the worst case by exporting them > as svg and importing them in inkscape. Which is slightly complicated > plus the scaling is different. > > Any indication how this mission can be accomplished very welcome! > > Thanks and greets > Olsen > > I started a threat on stackoverflow on that issue but know relevant > indications so far. > http://stackoverflow.com/questions/36603268/how-to-plot-training-and-test-validation-data-in-r-using-ggbiplot > > ##MWE > library(ggbiplot) > data(wine) > > ##pca on the wine dataset used as training data > wine.pca <- prcomp(wine, center = TRUE, scale. = TRUE) > > wine$class <- wine.class > > ##simulate test data by generating three new wine classes > wine.new.1 <- wine[c(sample(1:nrow(wine), 25)),] > wine.new.2 <- wine[c(sample(1:nrow(wine), 43)),] > wine.new.3 <- wine[c(sample(1:nrow(wine), 36)),] > > ##Predict PCs for the new classes by transforming > #them using the predict.prcomp function > pred.new.1 <- predict(wine.pca, newdata = wine.new.1) > pred.new.2 <- predict(wine.pca, newdata = wine.new.2) > pred.new.3 <- predict(wine.pca, newdata = wine.new.3) > > #simulate the classes for the new sorts > wine.new.1$class <- rep("new.wine.1", nrow(wine.new.1)) > wine.new.2$class <- rep("new.wine.2", nrow(wine.new.2)) > wine.new.3$class <- rep("new.wine.3", nrow(wine.new.3)) > wine.new.bind <- rbind(wine.new.1, wine.new.2, wine.new.3) > > ##compose the plot by joining the PCA ggbiplot training data with the > testing data from ggplot > #plot the calibrated space resulting from the test data > g.train <- ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups > wine$class, ellipse = TRUE, circle = TRUE) > g.train > #plot the test data resulting from the prediction > df.pred = data.frame(PC1 = wine.new.bind[,1], PC2 = wine.new.bind[,2], > PC3 = wine.new.bind[,3], PC4 = wine.new.bind[,4], > classes = wine.new.bind$class) > g.test <- ggplot(df.pred, aes(PC1, PC2, color = classes, shape > classes)) + geom_point() + stat_ellipse() > g.test > > > > >-- Our solar system is the cream of the crop http://hasa-labs.org
Apparently Analagous Threads
- R and S-Plus got the different results of principal component analysis from SAS, why?
- Regarding Principal Component Analysis result Interpretation
- Regarding Principal Component Analysis result Interpretation
- Problems with principal components analysis PCA with prcomp
- Principal component analysis