Dear r-help I am trying to run LDA on a training data set, and test it on another data set with the same variables. I found examples using crossvalidation, and using training and testing data sets set up with sample, but not when they are preassigned. Here is what I tried # FIRST SET UP A DATAFRAME WITH ALL THE DATA AND CREATE NEW VARIABLES traintest1 <- arnaudnognod1[arnaudnognod1$DISC_USE1 == 1.01|arnaudnognod1$DISC_USE1 == 1.03|arnaudnognod1$DISC_USE1 == 1.04 |arnaudnognod1$DISC_USE1 == 1.02|arnaudnognod1$DISC_USE1 == 1.05|arnaudnognod1$DISC_USE1 == 1.06,] traintest1$normal <- traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.03|traintest1$DISC_USE1 == 1.04 traintest1$mafelev <- apply(traintest1[,1:40], 1, FUN = mean) traintest1$mafscatter <- apply(traintest1[,1:40], 1, FUN = sd) # NEXT CREATE TRAINING AND TESTING DATAFRAMES train <- traintest1[traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.02,] test <- traintest1[traintest1$DISC_USE1 > 1.02,] # NOW, TRAIN HAS 400 ROWS, TEST HAS 396 ROWS, AND TRAINTEST1 HAS 796 ROWS, EACH HAS 615 COLUMNS, AS EXPECTED # RUN DISCRIM ON TRAINING DATA mafdisc <- lda(normal~mafelev + mafscatter, data = train) #mafdisc$counts IS 210 AND 190, AS EXPECTED #FINALLY, TEST IT ON THE TEST DATA mafdiscpred <- predict(mafdisc, data = test) #BUT mafdiscpred$class HAS LENGTH = 400, NOT 396, AS EXPECTED. any help appreciated thanks Peter Peter L. Flom, PhD Brainscope, Inc. 212 263 7863 (MTW) 212 845 4485 (Th) 917 488 7176 (F) [[alternative HTML version deleted]]
Michael Conklin
2008-Jun-25 16:37 UTC
[R] LDA on pre-assigned training and testing data sets
I think this line mafdiscpred <- predict(mafdisc, data = test) needs to be mafdiscpred <- predict(mafdisc, newdata = test) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 Michael.Conklin at markettools.com MarketTools(r) http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Peter Flom Sent: Wednesday, June 25, 2008 11:22 AM To: r-help at r-project.org Subject: [R] LDA on pre-assigned training and testing data sets Dear r-help I am trying to run LDA on a training data set, and test it on another data set with the same variables. I found examples using crossvalidation, and using training and testing data sets set up with sample, but not when they are preassigned. Here is what I tried # FIRST SET UP A DATAFRAME WITH ALL THE DATA AND CREATE NEW VARIABLES traintest1 <- arnaudnognod1[arnaudnognod1$DISC_USE1 =1.01|arnaudnognod1$DISC_USE1 == 1.03|arnaudnognod1$DISC_USE1 == 1.04 |arnaudnognod1$DISC_USE1 == 1.02|arnaudnognod1$DISC_USE1 =1.05|arnaudnognod1$DISC_USE1 == 1.06,] traintest1$normal <- traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.03|traintest1$DISC_USE1 == 1.04 traintest1$mafelev <- apply(traintest1[,1:40], 1, FUN = mean) traintest1$mafscatter <- apply(traintest1[,1:40], 1, FUN = sd) # NEXT CREATE TRAINING AND TESTING DATAFRAMES train <- traintest1[traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 =1.02,] test <- traintest1[traintest1$DISC_USE1 > 1.02,] # NOW, TRAIN HAS 400 ROWS, TEST HAS 396 ROWS, AND TRAINTEST1 HAS 796 ROWS, EACH HAS 615 COLUMNS, AS EXPECTED # RUN DISCRIM ON TRAINING DATA mafdisc <- lda(normal~mafelev + mafscatter, data = train) #mafdisc$counts IS 210 AND 190, AS EXPECTED #FINALLY, TEST IT ON THE TEST DATA mafdiscpred <- predict(mafdisc, data = test) #BUT mafdiscpred$class HAS LENGTH = 400, NOT 396, AS EXPECTED. any help appreciated thanks Peter Peter L. Flom, PhD Brainscope, Inc. 212 263 7863 (MTW) 212 845 4485 (Th) 917 488 7176 (F) [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.