thr3ads.net - R help - [R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Khulood Aljehani

2014-Nov-01 12:03 UTC

[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

Hello
I hope that you will help me in my problem with the Nadaraya-Watson kernel
regression estimation method (NW)  I used a simulation data and made a loop ??to
calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E
where, Y: the response variable,       X: the explanatory variable from uniform
(0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE 
But the MSE increases with increasing the sample size, and this is my program
that i wrote it
n1=25
set.seed(4455)
E<-rnorm(n1,mean=0,sd=0.1)
X<-runif(n1, min = 0, max = 1)
mx=1-X+exp(-200*(X-0.5)^2)
Y <- mx+E
nrep <- 1000

#----------------------------------------Fixed NW
mse_rep1<-c()
for(i in 1:1500){
set.seed(i+236)
E<-rnorm(n1,mean=0,sd=0.1)
X<-runif(n1, min = 0, max = 1)
mx=1-X+exp(-200*(X-0.5)^2)
Y <- mx+E
hmax <- 2 * sqrt(var(X)) * n1^(-1/5) 
lower = 0.01 * hmax              
h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y
mse1<-(n1^-1)*sum((Y - est1)^2)

mse_rep1 <- cbind(mse_rep1,mse1)

dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))

}
library(functional)
MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]

MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the average of
the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000
without NA value
When i change the sample size to 50 or 100 the MSE decrease , but more than 100
the MSE increas. this is the main problem.
I hope I was able to clarify the problem well
Regards 
    		 	   		  
	[[alternative HTML version deleted]]

Bert Gunter

2014-Nov-01 18:28 UTC

head link

[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

1. I am unfamiliar with the functional package.

2. I think the proper question is: Why do you expect the mse to
decrease with decreasing sample size?
Example: the precision of an average (as an estimator of the
population mean) increases (gets smaller) as sample size increases,
but the mse is essentially constant as an estimator of the population
variance.
Note: for nonparametric smoothers, mse is related to bandwidth choice
also. This might change by default with different sample sizes.

3. In future, please post in plain text, not html, as the posting
guide requests.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sat, Nov 1, 2014 at 5:03 AM, Khulood Aljehani <aljehani-k at
hotmail.com> wrote:>
> Hello
> I hope that you will help me in my problem with the Nadaraya-Watson kernel
regression estimation method (NW)  I used a simulation data and made a loop to
calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E
where, Y: the response variable,       X: the explanatory variable from uniform
(0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE 
But the MSE increases with increasing the sample size, and this is my program
that i wrote it
> n1=25
> set.seed(4455)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> nrep <- 1000
>
> #----------------------------------------Fixed NW
> mse_rep1<-c()
> for(i in 1:1500){
> set.seed(i+236)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> hmax <- 2 * sqrt(var(X)) * n1^(-1/5)
> lower = 0.01 * hmax
> h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
> est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y
> mse1<-(n1^-1)*sum((Y - est1)^2)
>
> mse_rep1 <- cbind(mse_rep1,mse1)
>
> dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))
>
> }
> library(functional)
> MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]
>
> MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the
average of the 1000 MSEBut i got NA value first, i made 1500 replication then i
choose 1000 without NA value
> When i change the sample size to 50 or 100 the MSE decrease , but more than
100 the MSE increas. this is the main problem.
> I hope I was able to clarify the problem well
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2014-Nov-01 18:35 UTC

head link

[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

You seem to be using bw.ucv to set the bandwidth for ksmooth. However, bw.ucv
selects the bandwidth for estimating the _density_ of x. I see no reason to
believe that the same bandwidth selection should be optimal or even consistent
for a kernel smoother like ksmooth.

Check out the KernSmooth package, in particular the dpik() and dpill() function
and the book that the package supports.

-pd

> On 01 Nov 2014, at 13:03 , Khulood Aljehani <aljehani-k at
hotmail.com> wrote:
> 
> 
> Hello
> I hope that you will help me in my problem with the Nadaraya-Watson kernel
regression estimation method (NW)  I used a simulation data and made a loop ??to
calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E
where, Y: the response variable,       X: the explanatory variable from uniform
(0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE 
But the MSE increases with increasing the sample size, and this is my program
that i wrote it
> n1=25
> set.seed(4455)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> nrep <- 1000
> 
> #----------------------------------------Fixed NW
> mse_rep1<-c()
> for(i in 1:1500){
> set.seed(i+236)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> hmax <- 2 * sqrt(var(X)) * n1^(-1/5) 
> lower = 0.01 * hmax              
> h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
> est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y
> mse1<-(n1^-1)*sum((Y - est1)^2)
> 
> mse_rep1 <- cbind(mse_rep1,mse1)
> 
> dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))
> 
> }
> library(functional)
> MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]
> 
> MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the
average of the 1000 MSEBut i got NA value first, i made 1500 replication then i
choose 1000 without NA value
> When i change the sample size to 50 or 100 the MSE decrease , but more than
100 the MSE increas. this is the main problem.
> I hope I was able to clarify the problem well
> Regards 
>    		 	   		  
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Nov 2014 - MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression