Dr. D. P. Kreil (Boku)
2011-Jun-16 15:45 UTC
[Rd] Update: Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect?
Dear R developers! Considering I got no response or comments in the general r-help forum so far, perhaps my question is actually better suited for this list? I have added some more hopefully relevant technical details to my original post (edited below). Any comments gratefully received! Best regards, David Kreil. ---------- Dear R experts, I have a problem that is a related to the question raised in this earlier post ??? https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html My situation is different in that I have only 2 predictors (coordinates x,y) for local regression but a number of global ("parametric") offsets that I need to consider. Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z: z_i = s(x_i,y_i) + v_{g_i} These measurements z can be grouped by the same underlying undistorted measurement value v for each group g. The group membership g_i is known for each measurement, but the underlying undistorted measurement values v_g for the groups are not known and should be determined by (global, not local) regression. We need to estimate the two-dimensional spatial trend s(x,y), which we then want to remove. In our application, say there are 20 groups of at least 35 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets. The below code for toy data (spatial trend in one dimension x) works for two or three offset groups. Unfortunately, the loess call fails for four or more offset groups with the error message "Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,? : ? only 1-4 predictors are allowed" See comment #1 I tried overriding the restriction (see lower section of code example) and got "k>d2MAX in ehg136.? Need to recompile with increased dimensions." How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f ????? if(k .gt. 15)?? call ehg182(105) Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here? Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower. Any comments would be greatly appreciated! Best regards, David Kreil. ### # # loess with parametric offsets - toy data demo # x<-seq(0,9,.1); x.N<-length(x); # Comment #1: # These are the offsets ("unknown", to be revovered by the regression): o<-c(0.4,-0.8,1.2); # three parametric variables --> total of four, works o<-c(0.4,-0.8,1.2,-0.2); # four par.variables --> total of 5 > 4 --> error: # "... only 1-4 predictors are allowed" o.N<-length(o); f<-sapply(seq(o.N), ????????? function(n){ ??????????? ifelse((seq(x.N)<= n?? *x.N/(o.N+1) & ??????????????????? seq(x.N)> (n-1)*x.N/(o.N+1)), ??????????????????? 1,0); ????????? }); f<-f[sample(NROW(f)),]; y<-sin(x)+rnorm(length(x),0,.1)+f%*%o; s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')}); s<-paste(c('y~x',s.fs),collapse='+'); d<-data.frame(x,y,f) names(d)<-c('x','y',s.fs); l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d, ???????? span=0.4); yp<-predict(l,newdata=d); plot(x,y,pch='+',ylim=c(-3,3),col='red');? # input data points(x,yp,pch='o',col='blue');?????????? # fit of that d0<-d; d0$f1<-d0$f2<-d0$f3<-0; yp0<-predict(l,newdata=d0); points(x,y-f%*%o);???? # spatial distortion lines(x,yp0,pch='+');? # estimate of that op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]}); cat("Demo offsets:",o,"\n"); cat("Estimated offsets:",format(op,digits=1),"\n"); # # to test if the underlying implementation has a limit of 4 (it does): # # copied from loess.R, and modified as indicated by ###***### simpleLoess <- function(y, x, weights, span = 0.75, degree = 2, parametric = FALSE, drop.square = FALSE, normalize = TRUE, statistics = "approximate", surface = "interpolate", cell = 0.2, iterations = 1, trace.hat = "exact") { ## loess_ translated to R. D <- NCOL(x) ###***### if(D > 4) stop("only 1-4 predictors are allowed") N <- NROW(x) if(!N || !D) stop("invalid 'x'") if(!length(y)) stop("invalid 'y'") x <- as.matrix(x) max.kd <- max(N, 200) robust <- rep(1, N) divisor<- rep(1, D) if(normalize && D > 1L) { trim <- ceiling(0.1 * N) divisor <- sqrt(apply(apply(x, 2L, sort)[seq(trim+1, N-trim), , drop = FALSE], 2L, var)) x <- x/rep(divisor, rep(N, D)) } sum.drop.sqr <- sum(drop.square) sum.parametric <- sum(parametric) nonparametric <- sum(!parametric) order.parametric <- order(parametric) x <- x[, order.parametric] order.drop.sqr <- (2 - drop.square)[order.parametric] if(degree==1 && sum.drop.sqr) stop("specified the square of a factor predictor to be dropped when degree = 1") if(D == 1 && sum.drop.sqr) stop("specified the square of a predictor to be dropped with only one numeric predictor") if(sum.parametric == D) stop("specified parametric for all predictors") if(iterations) for(j in 1L:iterations) { robust <- weights * robust if(j > 1) statistics <- "none" else if(surface == "interpolate" && statistics == "approximate") statistics <- if(trace.hat == "exact") "1.approx" else "2.approx" # trace.hat == "approximate" surf.stat <- paste(surface, statistics, sep="/") z <- .C(R_loess_raw, # ../src/loessc.c as.double(y), as.double(x), as.double(weights), as.double(robust), as.integer(D), as.integer(N), as.double(span), as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr), as.double(span*cell), as.character(surf.stat), fitted.values = double(N), parameter = integer(7), a = integer(max.kd), xi = double(max.kd), vert = double(2*D), vval = double((D+1)*max.kd), diagonal = double(N), trL = double(1), delta1 = double(1), delta2 = double(1), as.integer(surf.stat == "interpolate/exact")) if(j==1) { trace.hat.out <- z$trL one.delta <- z$delta1 two.delta <- z$delta2 } fitted.residuals <- y - z$fitted.values if(j < iterations) robust <- .Fortran(R_lowesw, as.double(fitted.residuals), as.integer(N), robust = double(N), integer(N))$robust } if(surface == "interpolate") { pars <- z$parameter names(pars) <- c("d", "n", "vc", "nc", "nv", "liv", "lv") enough <- (D + 1) * pars["nv"] fit.kd <- list(parameter=pars, a=z$a[1L:pars[4L]], xi=z$xi[1L:pars[4L]], vert=z$vert, vval=z$vval[1L:enough]) } if(iterations > 1) { pseudovalues <- .Fortran(R_lowesp, as.integer(N), as.double(y), as.double(z$fitted.values), as.double(weights), as.double(robust), integer(N), pseudovalues = double(N))$pseudovalues zz <- .C(R_loess_raw, as.double(pseudovalues), as.double(x), as.double(weights), as.double(weights), as.integer(D), as.integer(N), as.double(span), as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr), as.double(span*cell), as.character(surf.stat), temp = double(N), parameter = integer(7), a = integer(max.kd), xi = double(max.kd), vert = double(2*D), vval = double((D+1)*max.kd), diagonal = double(N), trL = double(1), delta1 = double(1), delta2 = double(1), as.integer(0L)) pseudo.resid <- pseudovalues - zz$temp } sum.squares <- if(iterations <= 1) sum(weights * fitted.residuals^2) else sum(weights * pseudo.resid^2) enp <- one.delta + 2*trace.hat.out - N s <- sqrt(sum.squares/one.delta) pars <- list(robust=robust, span=span, degree=degree, normalize=normalize, parametric=parametric, drop.square=drop.square, surface=surface, cell=cell, family if(iterations <= 1) "gaussian" else "symmetric", iterations=iterations) fit <- list(n=N, fitted=z$fitted.values, residuals=fitted.residuals, enp=enp, s=s, one.delta=one.delta, two.delta=two.delta, trace.hat=trace.hat.out, divisor=divisor) fit$pars <- pars if(surface == "interpolate") fit$kd <- fit.kd class(fit) <- "loess" fit } environment(simpleLoess)<-environment(loess); # to make it find private funcs environment(loess)<-.GlobalEnv; # to make it use our simpleLoess! environment(loess.smooth)<-.GlobalEnv; # to make it use our simpleLoess! # now run the demo with four parametric variables --> total 5.> 4 triggers error: # k>d2MAX in ehg136. Need to recompile with increased dimensions. # but there is no obvious d2MAX to configure that I can find...
Possibly Parallel Threads
- R crash with ATLAS precompiled Rblas.dll on Windows XP Core2 Duo
- Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect? [re-posting as plain text to pass char-set filter]
- Is there an implementation loess with more than 4 parametric predictors or a trick to similar effect?
- dynamic argument names and values as variables inside a loop
- loess: choose span to minimize AIC?