Hi all, I want to run the following Cross-validation technique on my data set: Variables: X, Y (each having 50 observations) I want to run a least-squares regression of Y on X. I need to divide the entire data set into 10 groups of size 5 each, keep one of them out as 'test' set and build the model on the basis of the remaining 9 groups (the 'training set').Then i calculate the MSE of the fitted model by applying it on the 'test set'. So, for each choice of the 'test set', I get a MSE value. I select that model as final which corresponds to the minimum of these 10 MSE values. How do i get the coefficients for this model. I am a beginner in R and the help page on CROSSVAL appeared a bit confusing to me. Thanks for any help. Regards, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]
Below is the some code, may be helpful for you. [maybe have the finished package which includes mlr with crossvaliation], you can check http://cran.r-project.org/web/packages/available_packages_by_date.html. you can check this "ChemometricsWithR<http://cran.r-project.org/web/packages/ChemometricsWithR/index.html>" package. # product crossvalidation index crossvalind <- function(N, kfold) { len.seg <- ceiling(N/kfold) incomplete <- kfold*len.seg - N complete <- kfold - incomplete ind <- matrix(c(sample(1:N), rep(NA, incomplete)), nrow = len.seg, byrow TRUE) cvi <- lapply(as.data.frame(ind), function(x) c(na.omit(x))) # a list return(cvi) } N <- length(y) kfold <- 10 cvi <- crossvalind(N, kfold) for (i in 1:length(cvi)) { xc <- x[cvi[-i], ] # x in training set yc <- y[cvi[-i]] # y in training set xt <- x[cvi[i], ] # x in test set yt <- y[cvi[i]] # y in test set lm.mod <- lm(yc ~ xc) yt.pred <- predict(lm.mod, xt) mse[i] <- sum((yt - yt.pred)^2)/length(yt) } plot(mse) you can see the minimum mse and point it out. Maybe this is what you want. Best, Kevin On Wed, May 15, 2013 at 5:51 AM, Preetam Pal <lordpreetam@gmail.com> wrote:> Hi all, > I want to run the following Cross-validation technique on my data set: > > > Variables: X, Y (each having 50 observations) > I want to run a least-squares regression of Y on X. > I need to divide the entire data set into 10 groups of size 5 each, keep > one of them out as 'test' set and build the model on the basis of the > remaining 9 groups (the 'training set').Then i calculate the MSE of the > fitted model by applying it on the 'test set'. > > So, for each choice of the 'test set', I get a MSE value. > I select that model as final which corresponds to the minimum of these 10 > MSE values. > > How do i get the coefficients for this model. > > I am a beginner in R and the help page on CROSSVAL appeared a bit confusing > to me. > > Thanks for any help. > Regards, > Preetam > > -- > Preetam Pal > (+91)-9432212774 > M-Stat 2nd Year, Room No. N-114 > Statistics Division, C.V.Raman > Hall > Indian Statistical Institute, B.H.O.S. > Kolkata. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Maybe the following code is helpful for you. At the same time, you can check the ChemometricsWithR<http://cran.r-project.org/web/packages/ChemometricsWithR/index.html> package from http://cran.r-project.org/web/packages/available_packages_by_date.html. # product crossvalidation index crossvalind <- function(N, kfold) { len.seg <- ceiling(N/kfold) incomplete <- kfold*len.seg - N complete <- kfold - incomplete ind <- matrix(c(sample(1:N), rep(NA, incomplete)), nrow = len.seg, byrow TRUE) cvi <- lapply(as.data.frame(ind), function(x) c(na.omit(x))) # a list return(cvi) } N <- length(y) kfold <- 10 cvi <- crossvalind(N, kfold) for (i in 1:length(cvi)) { idx.tr <- unlist(cvi[-i]) idx.te <- unlist(cvi[i]) xc <- x[idx.tr, ] yc <- y[idx.tr] xt <- x[idx.te, ] yt <- y[idx.te] lm.mod <- lm(yc ~ xc) yt.pred <- predict(lm.mod, xt) mse[i] <- sum((yt - yt.pred)^2)/length(yt) } plot(mse) You can pick it out which is the minimu mse. Best, Kevin On Wed, May 15, 2013 at 5:51 AM, Preetam Pal <lordpreetam@gmail.com> wrote:> Hi all, > I want to run the following Cross-validation technique on my data set: > > > Variables: X, Y (each having 50 observations) > I want to run a least-squares regression of Y on X. > I need to divide the entire data set into 10 groups of size 5 each, keep > one of them out as 'test' set and build the model on the basis of the > remaining 9 groups (the 'training set').Then i calculate the MSE of the > fitted model by applying it on the 'test set'. > > So, for each choice of the 'test set', I get a MSE value. > I select that model as final which corresponds to the minimum of these 10 > MSE values. > > How do i get the coefficients for this model. > > I am a beginner in R and the help page on CROSSVAL appeared a bit confusing > to me. > > Thanks for any help. > Regards, > Preetam > > -- > Preetam Pal > (+91)-9432212774 > M-Stat 2nd Year, Room No. N-114 > Statistics Division, C.V.Raman > Hall > Indian Statistical Institute, B.H.O.S. > Kolkata. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]