BostonR
2009-Oct-20 14:31 UTC
[R] LDA Precdict - Seems to be predicting on the Training Data
When I import a simple dataset, run LDA, and then try to use the model to forecast out of sample data, I get a forecast for the training set not the out of sample set. Others have posted this question, but I do not see the answers to their posts. Here is some sample data: Date Names v1 v2 v3 c1 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a Attached is the code: myDat <-read.csv(file="f:\\Systematiq\\data\\TestData.csv", header=TRUE,sep=",") myData <- data.frame(myDat) length(myDat[,1]) train <- myDat[1:10,] outOfSample <- myDat[11:16,] outOfSample <- (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3)) outOfSample <-data.frame(outOfSample) length(train[,1]) length(outOfSample[,1]) fit <- lda(train$c1~train$v1+train$v2+train$v3) forecast <- predict(fit,outOfSample)$class length(forecast)##### I am expecting this to be same as lengthoutOfSample[,1]), which is 6 Output: length(forecast)##### I am expecting this to be same as lengthoutOfSample[,1]), which is 6 [1] 10 -- View this message in context: http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html Sent from the R help mailing list archive at Nabble.com.
Tony Plate
2009-Oct-20 15:23 UTC
[R] LDA Precdict - Seems to be predicting on the Training Data
Maybe you're getting strange results because you're not supplying a data object to lda() when you build your fit. When I do it the "standard" way, predict.lda() uses the new data and produces a result of length 6 as expected:> myDat <- read.csv("clipboard", sep="\t") > fit <- lda(c1 ~ v1 + v2 + v3, data=myDat[1:10,]) > predict(fit, myDat[11:16,])$class [1] c c c b c a Levels: a b c ...>-- Tony Plate BostonR wrote:> When I import a simple dataset, run LDA, and then try to use the model to > forecast out of sample data, I get a forecast for the training set not the > out of sample set. Others have posted this question, but I do not see the > answers to their posts. > > Here is some sample data: > > Date Names v1 v2 v3 c1 > 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a > 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a > 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a > 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b > 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b > 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b > 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c > 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c > 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c > 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a > 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a > 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a > 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b > 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a > 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b > 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a > > Attached is the code: > > myDat <-read.csv(file="f:\\Systematiq\\data\\TestData.csv", > header=TRUE,sep=",") > myData <- data.frame(myDat) > > length(myDat[,1]) > > train <- myDat[1:10,] > outOfSample <- myDat[11:16,] > outOfSample <- (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3)) > outOfSample <-data.frame(outOfSample) > > length(train[,1]) > length(outOfSample[,1]) > > fit <- lda(train$c1~train$v1+train$v2+train$v3) > > forecast <- predict(fit,outOfSample)$class > > length(forecast)##### I am expecting this to be same as > lengthoutOfSample[,1]), which is 6 > > Output: > > length(forecast)##### I am expecting this to be same as > lengthoutOfSample[,1]), which is 6 > [1] 10 > > > > > >
Gabriela Cendoya
2009-Oct-20 15:32 UTC
[R] LDA Precdict - Seems to be predicting on the Training Data
This is not an explanation but it gives you a solution, Instead of using lda with a formula do it by giving the variables and the classification factor as arguments, base on your example and data: outOfSample <- myDat[11:16,] train <- myDat[1:10,] outOfSample <- outOfSample[,3:5] train2 <- train[,3:5] fit <- lda(train2,train$c1) forecast <- predict(fit,outOfSample)$class length(forecast) [1] 6 Seems that the problem arise when predict.lda works on lda fit applied to a formula class object. Hope this help, Gabriela. ______________________________ Lic. Mar?a Gabriela Cendoya Mag?ster en Biometr?a Profesor Adjunto C?tedra de Estad?stica y Dise?o Facultad de Ciencias Agrarias Universidad Nacional de Mar del Plata ______________________________ ----- Original Message ----- From: "BostonR" <dpope at capitaliq.com> To: <r-help at r-project.org> Sent: Tuesday, October 20, 2009 11:31 AM Subject: [R] LDA Precdict - Seems to be predicting on the Training Data> > When I import a simple dataset, run LDA, and then try to use the model to > forecast out of sample data, I get a forecast for the training set not the > out of sample set. Others have posted this question, but I do not see the > answers to their posts. > > Here is some sample data: > > Date Names v1 v2 v3 c1 > 1/31/2009 Name1 0.714472361 0.902552278 0.783353694 a > 1/31/2009 Name2 0.512158919 0.770451596 0.111853346 a > 1/31/2009 Name3 0.470693282 0.129200065 0.800973877 a > 1/31/2009 Name4 0.24236898 0.472219638 0.486599763 b > 1/31/2009 Name5 0.785619735 0.628511593 0.106868172 b > 1/31/2009 Name6 0.718718387 0.697257275 0.690326648 b > 1/31/2009 Name7 0.327331186 0.01715109 0.861421706 c > 1/31/2009 Name8 0.632011743 0.599040196 0.320741634 c > 1/31/2009 Name9 0.302804404 0.475166304 0.907143632 c > 1/31/2009 Name10 0.545284813 0.967196462 0.945163717 a > 1/31/2009 Name11 0.563720418 0.024862018 0.970685281 a > 1/31/2009 Name12 0.357614427 0.417490445 0.415162276 a > 1/31/2009 Name13 0.154971203 0.425227967 0.856866993 b > 1/31/2009 Name14 0.935080173 0.488659307 0.194967973 a > 1/31/2009 Name15 0.363069339 0.334206603 0.639795596 b > 1/31/2009 Name16 0.862889297 0.821752532 0.549552875 a > > Attached is the code: > > myDat <-read.csv(file="f:\\Systematiq\\data\\TestData.csv", > header=TRUE,sep=",") > myData <- data.frame(myDat) > > length(myDat[,1]) > > train <- myDat[1:10,] > outOfSample <- myDat[11:16,] > outOfSample <- (cbind(outOfSample$v1,outOfSample$v2,outOfSample$v3)) > outOfSample <-data.frame(outOfSample) > > length(train[,1]) > length(outOfSample[,1]) > > fit <- lda(train$c1~train$v1+train$v2+train$v3) > > forecast <- predict(fit,outOfSample)$class > > length(forecast)##### I am expecting this to be same as > lengthoutOfSample[,1]), which is 6 > > Output: > > length(forecast)##### I am expecting this to be same as > lengthoutOfSample[,1]), which is 6 > [1] 10 > > > > > > > -- > View this message in context: > http://www.nabble.com/LDA-Precdict---Seems-to-be-predicting-on-the-Training-Data-tp25976178p25976178.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >___________________________________________________________________________ Aviso: ==== El contenido del presente e-mail y sus posibles adjuntos pertenecen al INTA y pueden contener informaci?n confidencial. Si usted no es el destinatario original de este mensaje y por este medio pudo acceder a dicha informaci?n, por favor solicitamos contactar al remitente y eliminar el mensaje de inmediato. Se encuentra prohibida la divulgaci?n, copia, distribuci?n o cualquier otro uso de la informaci?n contenida en el presente e-mail por parte de personas distintas al destinatario. This e-mail contents and its possible attachments belong to INTA and may contain confidential information. If this message was not originally addressed to you, but you have accessed to such information by this means, please contact the sender and eliminate this message immediately. Circulation, copy, distribution, or any other use of the information contained in this e-mail is not allowed on part of those different from the addressee. Antes de imprimir este mensaje, aseg?rese de que sea necesario. Proteger el medio ambiente est? tambi?n en su mano.