I have a dataset of pregnancy values for multiple years (and ages, not
included) with missing years. I would like to use local polynomial
regression to smooth the values and estimate for the missing years. I
would also like to use GCV to justify the smoothing parameter selection.
When using locfit() with lp() I found that the gcvplot function does not
work as it is looking for an alpha value to replace so I used gcv():
########################################################################
#
library(locfit)
### approx. my data
Yearc(1954,1965,1966,1967,1968,1969,1970,1978,1979,1980,1981,1982,1985,1987,
1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2002,20
03,2004)
Num.sampledc(3,44,9,19,19,25,13,38,15,2,4,5,3,8,6,9,7,11,11,17,16,13,6,4,10,7,9,4,3
,4)
Num.pregc(1,5,1,4,6,4,3,23,8,1,3,2,1,3,1,1,1,2,3,2,2,6,0,0,3,0,3,1,2,1)
x = Year
y = Num.preg/Num.sampled
gcv.fit = c(0)
b = seq(from = 0.4, to = 1, by = 0.01)
for(i in 1:length(b))
{
fit = locfit(y~lp(x, nn = b[i]))
gcv.fit[i] = gcv(fit)[4]
}
gcv.fit = round(gcv.fit,4)
b1 = b[min(which.min(gcv.fit))]
fit = locfit(y~lp(x, nn = b1))
## plots and predicting missing years
xg = 1954:2004
plot(x,y,xlab="Years",ylab="Proportion
Pregnant",pch=16,cex=0.5)
lines(xg,predict(fit,newdata=xg),col=4)
x11()
plot(b,gcv.fit)
########################################################################
#
This seems to work (correct me if I am wrong), however, now I am looking
for a way to penalise pregnancy rates that are derived from low sample
sizes. There isn't much documentation on the arguments to lp()
The locfit function has the argument "weights", I tried this function
outside of lp() and it had no effect. It is not listed as an argument
to lp()...however I tried using it as an argument and it changed the
output:
for(i in 1:length(b))
{
fit = locfit(y~lp(x, nn = b[i], weights = n))
gcv.fit[i] = gcv(fit)[4]
}
gcv.fit = round(gcv.fit,3)
b1 = b[min(which.min(gcv.fit))]
fit = locfit(y~lp(x, nn = b1, weights = n))
However, this does not give "pretty" results, so I have a feeling that
it really isn't doing what I think it is doing. (I also tried log(n)).
Can weights be a vector? Should I be altering the sample size is some
way? There isn't much documentation on the weights argument, can it even
be used within the lp()?
Thank you
Amanda
[[alternative HTML version deleted]]