Janne Huttunen
2008-Jun-12 20:35 UTC
[R] Problems with mars in R in the case of nonlinear functions
Hi, I'm trying to use mars function in R to interpolate nonlinear multivariate functions. However, it seems that mars gives me a fit which uses only very few basis function and it underfits very badly. For example, I have tried the following code to test mars: require("mda") f <- function(x,y) { x^2-y^2 }; #f <- function(x,y) { x+2*y }; # Grid x <- seq(-1,1,length=10); x <- outer(x*0,x,FUN="+"); y <- t(x); X <- cbind(as.vector(x),as.vector(y)); # Data z <- f(x,y); fit <- mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2); # Plotting par(mfrow=c(1,2),pty="s") lims <- c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted)))) persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50, xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims) persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed', col='lightblue', xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS', phi=25,theta=55,zlim=lims) (the code is also here if someone wants to try it: http://venda.uku.fi/~jmhuttun/R/marstest.R) The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The fitted model contains only 5 terms which is not enough in this case. Adjusting parameters like nk, thresh, penalty and degree seems only have minor effect or no effect at all. It's also strange that when I increase the number of points in the grid, the results are ever worse: see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid. However Mars seems to work well with linear functions (e.g. with the function which is commented in the above code). Do anyone know what is wrong in this case? Do I miss something is there something wrong in my code? This seems not to be a problem with MARS method in general. For example, Friedman's MARS implementation (ran in Matlab) gives a rather good fit: see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf . Thank you Janne -- Janne Huttunen University of California Department of Statistics 367 Evans Hall Berlekey, CA 94720-3860 email: jmhuttun at stat.berkeley.edu phone: +1-510-502-5205 office room: 449 Evans Hall
Stephen Milborrow
2008-Jun-13 12:34 UTC
[R] Problems with mars in R in the case of nonlinear functions
| I'm trying to use mars function in R to interpolate nonlinear | multivariate functions. | However, it seems that mars gives me a fit which uses only very few | basis function and it underfits very badly. Try the "earth" package which extends the mars function in the mda package. Your example becomes library(earth) # was mda f <- function(x,y) { x^2-y^2 } x <- seq(-1,1,length=10) x <- outer(x*0,x,FUN="+") y <- t(x) X <- cbind(as.vector(x),as.vector(y)) z <- f(x,y) fit <- earth(X, as.vector(z)) summary(fit) plotmo(fit) # note better fit than before # your original plotting code could be used too For this kind of data, you could possibly use the minspan parameter. MARS by default does not allow every observation to be used as a knot in the generated basis functions. This strategyy increases resistance to runs of correlated noise in the data. For non-noisy data, you can set minspan=1 to allow MARS to consider every observation as a potential knot. If your data were noisy then minspan=1 could overfit the data. With earth, you can use trace=2 to see the calculated minspan value. If you run the above example with the earth parameter trace=1, you will see that the stopping condition for the forward pass is: Reached delta RSq threshold (DeltaRSq 0.00030214 < 0.001) To make the forward pass continue further, change the "delta RSq threshold" by using the thresh parameter: fit <- earth(X, as.vector(z), thresh=1e-6) The resulting model "looks" better when plotted, but note that using thresh here makes almost no change to the GRSq. That is, with the lower threshold the model is more complicated (has more terms) but does not have a greater predictive power. The threshold is just one of the reasons that the forward pass can terminate (reaching the the maximum number of terms nk is another). AFAIK Friedman's code (that you ran from Matlab) does not use the threshold but instead just continues forward stepping until nk is reached. In this case the Matlab model is arguably more complicated than it need be. I believe the forward threshhold for MARS was an innovation of Hastie and Tibshirani, but I could be wrong. To reduce mailing list traffic, let's continue this discussion off-line i.e. by direct mail to each other, and if necessary I will summarize results of our discussions in the earth documentation. Regards Steve | Message: 76 | Date: Thu, 12 Jun 2008 13:35:35 -0700 | From: Janne Huttunen <jmhuttun at stat.berkeley.edu> | Subject: [R] Problems with mars in R in the case of nonlinear | functions | To: | Message-ID: <48518897.7080804 at stat.berkeley.edu> | Content-Type: text/plain; charset=ISO-8859-1; format=flowed | | Hi, | | I'm trying to use mars function in R to interpolate nonlinear | multivariate functions. | However, it seems that mars gives me a fit which uses only very few | basis function and | it underfits very badly. | | For example, I have tried the following code to test mars: | | require("mda") | | f <- function(x,y) { x^2-y^2 }; | #f <- function(x,y) { x+2*y }; | | # Grid | x <- seq(-1,1,length=10); | x <- outer(x*0,x,FUN="+"); y <- t(x); | X <- cbind(as.vector(x),as.vector(y)); | | # Data | z <- f(x,y); | | fit <- mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2); | | # Plotting | par(mfrow=c(1,2),pty="s") | lims <- c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted)))) | persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50, | xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims) | persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed', | col='lightblue', | xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS', | phi=25,theta=55,zlim=lims) | | (the code is also here if someone wants to try it: | http://venda.uku.fi/~jmhuttun/R/marstest.R) | | The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The | fitted model contains only | 5 terms which is not enough in this case. Adjusting parameters like nk, | thresh, penalty and degree | seems only have minor effect or no effect at all. It's also strange that | when I increase | the number of points in the grid, the results are ever worse: | see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid. | However Mars seems to work well with linear functions (e.g. with the | function which | is commented in the above code). | | Do anyone know what is wrong in this case? Do I miss something is there | something | wrong in my code? | | This seems not to be a problem with MARS method in general. For example, | Friedman's MARS implementation (ran in Matlab) gives a rather good fit: | see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf . | | Thank you | | Janne | | -- | Janne Huttunen | University of California | Department of Statistics | 367 Evans Hall Berlekey, CA 94720-3860 | email: jmhuttun at stat.berkeley.edu | phone: +1-510-502-5205 | office room: 449 Evans Hall