Ben Harrison
2013-Jul-24 09:05 UTC
[R] Help to improve prediction from supervised mapping using kohonen package
I would really like some or any advice on how I can improve (or fix??) the following analysis. I hope I have provided a completely runnable code - it doesn't produce any errors for me. The resulting plot at the end shows a pretty poor correlation (just speaking visually here) to the test set. How can I improve the performance of the mapping and prediction? Here are some of the data (continuous, numerical):> head(somdata)MEAS_TC SP LN SN GR NEUT 1 2.780000 59.181090 33.74364 19.75361 66.57665 257.0368 2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001 3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739 4 2.201276 18.240331 19.20386 10.74748 62.04492 494.4161 5 2.201276 18.215522 19.18009 10.72446 61.87448 494.7409 6 1.276476 9.337769 14.16061 19.06902 14.99612 363.0020 Complete data set is at the following link if you fancy it: https://gist.github.com/ottadini/6068259 The first variable is the dependent. I wish to train a som using this data, and then be able to predict MEAS_TC using a new set of data with missing values of MEAS_TC. Below I'm simply splitting the somdata into a training and a testing set for evaluation purposes. # ===== # library(kohonen) somdata <- read.csv("somdata.csv") # Create test and training sets from data: inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3)) training <- somdata[inTrain, ] testing <- somdata[-inTrain, ] # Supervised kohonen map, where the dependent variable is MEAS_TC. # Attempting to follow the examples in Wehrens and Buydens, 2007, 21(5), J Stat Soft. # somdata[1] is the MEAS_TC variable somX <- scale(training[-1]) somY <- training[[1]] # Needs to return a vector # Train the map (not sure this is how it should be done): tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6, "hexagonal"), contin=TRUE) # Prediction with test set: tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1])) # Basic plot: x <- seq(nrow(testing)) plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5)) par(new=TRUE) plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5)) # Wow, that's terrible. Do I have something wrong? # ===== #
ONKELINX, Thierry
2013-Jul-24 09:25 UTC
[R] Help to improve prediction from supervised mapping using kohonen package
Try rescaling your data prior to splitting it up into a training and test set. Otherwise you end up with two different ways of scaling. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Ben Harrison Verzonden: woensdag 24 juli 2013 11:05 Aan: r-help at r-project.org Onderwerp: [R] Help to improve prediction from supervised mapping using kohonen package I would really like some or any advice on how I can improve (or fix??) the following analysis. I hope I have provided a completely runnable code - it doesn't produce any errors for me. The resulting plot at the end shows a pretty poor correlation (just speaking visually here) to the test set. How can I improve the performance of the mapping and prediction? Here are some of the data (continuous, numerical):> head(somdata)MEAS_TC SP LN SN GR NEUT 1 2.780000 59.181090 33.74364 19.75361 66.57665 257.0368 2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001 3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739 4 2.201276 18.240331 19.20386 10.74748 62.04492 494.4161 5 2.201276 18.215522 19.18009 10.72446 61.87448 494.7409 6 1.276476 9.337769 14.16061 19.06902 14.99612 363.0020 Complete data set is at the following link if you fancy it: https://gist.github.com/ottadini/6068259 The first variable is the dependent. I wish to train a som using this data, and then be able to predict MEAS_TC using a new set of data with missing values of MEAS_TC. Below I'm simply splitting the somdata into a training and a testing set for evaluation purposes. # ===== # library(kohonen) somdata <- read.csv("somdata.csv") # Create test and training sets from data: inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3)) training <- somdata[inTrain, ] testing <- somdata[-inTrain, ] # Supervised kohonen map, where the dependent variable is MEAS_TC. # Attempting to follow the examples in Wehrens and Buydens, 2007, 21(5), J Stat Soft. # somdata[1] is the MEAS_TC variable somX <- scale(training[-1]) somY <- training[[1]] # Needs to return a vector # Train the map (not sure this is how it should be done): tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6, "hexagonal"), contin=TRUE) # Prediction with test set: tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1])) # Basic plot: x <- seq(nrow(testing)) plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5)) par(new=TRUE) plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5)) # Wow, that's terrible. Do I have something wrong? # ===== # ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.