Setting:
200 input variables, 1 binary target variable.
Run a principle component analysis on the data
then
use the output of the principle component analysis (the generated factors)
as input into a neural network -but first having partitioned the pca data
into training and testing sets so that a neural network model can be trained
on the first partition and tested on the second.
I was told that it was not logically sound to include the target variable as
an input into the principle component algorithm.
Normally that sounds correct. You never want to include the target variable
as an input variable in your model.
However, I argued that it is ok here because I am only using the target
variable to build the principle components the model. So each record now has
a value for each of the principle components. Then take the training
partition only to build the neural network. Then test the neural network on
the testing partition.
Is this wrong?
(sorry to post this twice, the first time I was not properly logged in and I
don't think it registered correctly, won't happen again as I have
registered
and logged in correctly now)
--
View this message in context:
http://www.nabble.com/logic-question-tp24372208p24372208.html
Sent from the R help mailing list archive at Nabble.com.