I have a data set with 10 variables, and about 8000 instances (or
objects/rows/samples). In addition I have one more ('class') variable
that
I have about 10 instances for, but for which I wish to impute values for.
I am a little confused how to go about doing this, mostly as I'm not
well-versed in it. Do I train the SOM with a data object that contains just
the first 10 variables (exclude the 'class' variable), then predict
using
an object that has all of the variables (including the class variable)?
(I am using the kohonen package, and in general I am using the SOM
technique as a comparison to some other methods).
I don't know if providing some or all data is useful, please let me know if
you think it is.
# get the data
bw <- read.csv("bw.csv")
# some missing values in data
bwm <- data.frame(na.approx(bw, na.rm=FALSE, rule=2))
bw10 <- bwm[, 1:10]
bw10.sc <- scale(bw10)
bw.som <- som(data=bw10.sc, grid=somgrid(25,20,'hexagonal')) #
playing
with diff grid sizes
# the different plots of the som at this point show some interesting
features to me, but are quite difficult to interpret.
# there's much work needed here to understand it, but for now I want to see
if it's possible to impute values for another variable...
# here's where I lose it, missing values, trainY, don't get it.
bw.predict <- predict(bw.som, newdata=scale(bw), trainX=???, trainY=???)
Ben.
[[alternative HTML version deleted]]