Khulood Aljehani
2014-Nov-01 12:03 UTC
[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression
Hello I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW) I used a simulation data and made a loop ??to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable, X: the explanatory variable from uniform (0,1) E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE But the MSE increases with increasing the sample size, and this is my program that i wrote it n1=25 set.seed(4455) E<-rnorm(n1,mean=0,sd=0.1) X<-runif(n1, min = 0, max = 1) mx=1-X+exp(-200*(X-0.5)^2) Y <- mx+E nrep <- 1000 #----------------------------------------Fixed NW mse_rep1<-c() for(i in 1:1500){ set.seed(i+236) E<-rnorm(n1,mean=0,sd=0.1) X<-runif(n1, min = 0, max = 1) mx=1-X+exp(-200*(X-0.5)^2) Y <- mx+E hmax <- 2 * sqrt(var(X)) * n1^(-1/5) lower = 0.01 * hmax h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower) est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y mse1<-(n1^-1)*sum((Y - est1)^2) mse_rep1 <- cbind(mse_rep1,mse1) dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i)) } library(functional) MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))] MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean) #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem. I hope I was able to clarify the problem well Regards [[alternative HTML version deleted]]
Bert Gunter
2014-Nov-01 18:28 UTC
[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression
1. I am unfamiliar with the functional package. 2. I think the proper question is: Why do you expect the mse to decrease with decreasing sample size? Example: the precision of an average (as an estimator of the population mean) increases (gets smaller) as sample size increases, but the mse is essentially constant as an estimator of the population variance. Note: for nonparametric smoothers, mse is related to bandwidth choice also. This might change by default with different sample sizes. 3. In future, please post in plain text, not html, as the posting guide requests. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Sat, Nov 1, 2014 at 5:03 AM, Khulood Aljehani <aljehani-k at hotmail.com> wrote:> > Hello > I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW) I used a simulation data and made a loop to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable, X: the explanatory variable from uniform (0,1) E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE But the MSE increases with increasing the sample size, and this is my program that i wrote it > n1=25 > set.seed(4455) > E<-rnorm(n1,mean=0,sd=0.1) > X<-runif(n1, min = 0, max = 1) > mx=1-X+exp(-200*(X-0.5)^2) > Y <- mx+E > nrep <- 1000 > > #----------------------------------------Fixed NW > mse_rep1<-c() > for(i in 1:1500){ > set.seed(i+236) > E<-rnorm(n1,mean=0,sd=0.1) > X<-runif(n1, min = 0, max = 1) > mx=1-X+exp(-200*(X-0.5)^2) > Y <- mx+E > hmax <- 2 * sqrt(var(X)) * n1^(-1/5) > lower = 0.01 * hmax > h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower) > est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y > mse1<-(n1^-1)*sum((Y - est1)^2) > > mse_rep1 <- cbind(mse_rep1,mse1) > > dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i)) > > } > library(functional) > MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))] > > MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean) #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value > When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem. > I hope I was able to clarify the problem well > Regards > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
peter dalgaard
2014-Nov-01 18:35 UTC
[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression
You seem to be using bw.ucv to set the bandwidth for ksmooth. However, bw.ucv selects the bandwidth for estimating the _density_ of x. I see no reason to believe that the same bandwidth selection should be optimal or even consistent for a kernel smoother like ksmooth. Check out the KernSmooth package, in particular the dpik() and dpill() function and the book that the package supports. -pd> On 01 Nov 2014, at 13:03 , Khulood Aljehani <aljehani-k at hotmail.com> wrote: > > > Hello > I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW) I used a simulation data and made a loop ??to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable, X: the explanatory variable from uniform (0,1) E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE But the MSE increases with increasing the sample size, and this is my program that i wrote it > n1=25 > set.seed(4455) > E<-rnorm(n1,mean=0,sd=0.1) > X<-runif(n1, min = 0, max = 1) > mx=1-X+exp(-200*(X-0.5)^2) > Y <- mx+E > nrep <- 1000 > > #----------------------------------------Fixed NW > mse_rep1<-c() > for(i in 1:1500){ > set.seed(i+236) > E<-rnorm(n1,mean=0,sd=0.1) > X<-runif(n1, min = 0, max = 1) > mx=1-X+exp(-200*(X-0.5)^2) > Y <- mx+E > hmax <- 2 * sqrt(var(X)) * n1^(-1/5) > lower = 0.01 * hmax > h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower) > est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y > mse1<-(n1^-1)*sum((Y - est1)^2) > > mse_rep1 <- cbind(mse_rep1,mse1) > > dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i)) > > } > library(functional) > MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))] > > MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean) #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value > When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem. > I hope I was able to clarify the problem well > Regards > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com