Hi all, I want to run the following Cross-validation technique on my data set: Variables: X, Y (each having 50 observations) I want to run a least-squares regression of Y on X. I need to divide the entire data set into 10 groups of size 5 each, keep one of them out as 'test' set and build the model on the basis of the remaining 9 groups (the 'training set').Then i calculate the MSE of the fitted model by applying it on the 'test set'. So, for each choice of the 'test set', I get a MSE value. I select that model as final which corresponds to the minimum of these 10 MSE values. How do i get the coefficients for this model. I am a beginner in R and the help page on CROSSVAL appeared a bit confusing to me. Thanks for any help. Regards, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]
Below is the some code, may be helpful for you. [maybe have the finished
package which includes mlr with crossvaliation], you can check
http://cran.r-project.org/web/packages/available_packages_by_date.html.
you can check this
"ChemometricsWithR<http://cran.r-project.org/web/packages/ChemometricsWithR/index.html>"
package.
# product crossvalidation index
crossvalind <- function(N, kfold) {
len.seg <- ceiling(N/kfold)
incomplete <- kfold*len.seg - N
complete <- kfold - incomplete
ind <- matrix(c(sample(1:N), rep(NA, incomplete)), nrow = len.seg, byrow
TRUE)
cvi <- lapply(as.data.frame(ind), function(x) c(na.omit(x))) # a list
return(cvi)
}
N <- length(y)
kfold <- 10
cvi <- crossvalind(N, kfold)
for (i in 1:length(cvi)) {
xc <- x[cvi[-i], ] # x in training set
yc <- y[cvi[-i]] # y in training set
xt <- x[cvi[i], ] # x in test set
yt <- y[cvi[i]] # y in test set
lm.mod <- lm(yc ~ xc)
yt.pred <- predict(lm.mod, xt)
mse[i] <- sum((yt - yt.pred)^2)/length(yt)
}
plot(mse)
you can see the minimum mse and point it out.
Maybe this is what you want.
Best,
Kevin
On Wed, May 15, 2013 at 5:51 AM, Preetam Pal <lordpreetam@gmail.com>
wrote:
> Hi all,
> I want to run the following Cross-validation technique on my data set:
>
>
> Variables: X, Y (each having 50 observations)
> I want to run a least-squares regression of Y on X.
> I need to divide the entire data set into 10 groups of size 5 each, keep
> one of them out as 'test' set and build the model on the basis of
the
> remaining 9 groups (the 'training set').Then i calculate the MSE of
the
> fitted model by applying it on the 'test set'.
>
> So, for each choice of the 'test set', I get a MSE value.
> I select that model as final which corresponds to the minimum of these 10
> MSE values.
>
> How do i get the coefficients for this model.
>
> I am a beginner in R and the help page on CROSSVAL appeared a bit confusing
> to me.
>
> Thanks for any help.
> Regards,
> Preetam
>
> --
> Preetam Pal
> (+91)-9432212774
> M-Stat 2nd Year, Room No. N-114
> Statistics Division, C.V.Raman
> Hall
> Indian Statistical Institute, B.H.O.S.
> Kolkata.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Maybe the following code is helpful for you. At the same time, you can
check the
ChemometricsWithR<http://cran.r-project.org/web/packages/ChemometricsWithR/index.html>
package
from http://cran.r-project.org/web/packages/available_packages_by_date.html.
# product crossvalidation index
crossvalind <- function(N, kfold) {
len.seg <- ceiling(N/kfold)
incomplete <- kfold*len.seg - N
complete <- kfold - incomplete
ind <- matrix(c(sample(1:N), rep(NA, incomplete)), nrow = len.seg, byrow
TRUE)
cvi <- lapply(as.data.frame(ind), function(x) c(na.omit(x))) # a list
return(cvi)
}
N <- length(y)
kfold <- 10
cvi <- crossvalind(N, kfold)
for (i in 1:length(cvi)) {
idx.tr <- unlist(cvi[-i])
idx.te <- unlist(cvi[i])
xc <- x[idx.tr, ]
yc <- y[idx.tr]
xt <- x[idx.te, ]
yt <- y[idx.te]
lm.mod <- lm(yc ~ xc)
yt.pred <- predict(lm.mod, xt)
mse[i] <- sum((yt - yt.pred)^2)/length(yt)
}
plot(mse)
You can pick it out which is the minimu mse.
Best,
Kevin
On Wed, May 15, 2013 at 5:51 AM, Preetam Pal <lordpreetam@gmail.com>
wrote:
> Hi all,
> I want to run the following Cross-validation technique on my data set:
>
>
> Variables: X, Y (each having 50 observations)
> I want to run a least-squares regression of Y on X.
> I need to divide the entire data set into 10 groups of size 5 each, keep
> one of them out as 'test' set and build the model on the basis of
the
> remaining 9 groups (the 'training set').Then i calculate the MSE of
the
> fitted model by applying it on the 'test set'.
>
> So, for each choice of the 'test set', I get a MSE value.
> I select that model as final which corresponds to the minimum of these 10
> MSE values.
>
> How do i get the coefficients for this model.
>
> I am a beginner in R and the help page on CROSSVAL appeared a bit confusing
> to me.
>
> Thanks for any help.
> Regards,
> Preetam
>
> --
> Preetam Pal
> (+91)-9432212774
> M-Stat 2nd Year, Room No. N-114
> Statistics Division, C.V.Raman
> Hall
> Indian Statistical Institute, B.H.O.S.
> Kolkata.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]