Hi, I have a data frame with 6563 observations. I can run a regression with loess using four explanatory variables. If I add a fifth, R crashes. There are no missings in the data, and if I run a regression with any four of the five explanatory variables, it works. Its only when I go from four to five that it crashes. This leads me to believe that it is not an obvious problem with the data, but rather a resource problem. But I've set the max memory option for R to 512 M, so I wouldn't think that I'd be running out of RAM. Is there some other memory option I need to set? Is there some limit on the size of problems that loess can handle regardless of system resources? I'm running Windows 2000 and R 1.5.1. I've also run this with R 1.3. I've tried it on a Pentium IV and an Athlon. If anybody knows of a solution to this problem, I would greatly appreciate it. Thanks, John -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hmm... if I reduce the number of observations to just 500, I still get the error. I don't think its an issue of colinearity, because I've tried several different combinations of variables, all of which work just fine in an OLS or logistic regression. I'm probably doing something stupid, but I'm not seeing it... At 02:00 PM 9/15/2002, John Deke wrote:>Hi, > >I have a data frame with 6563 observations. I can run a regression with >loess using four explanatory variables. If I add a fifth, R crashes. There >are no missings in the data, and if I run a regression with any four of >the five explanatory variables, it works. Its only when I go from four to >five that it crashes. > >This leads me to believe that it is not an obvious problem with the data, >but rather a resource problem. But I've set the max memory option for R to >512 M, so I wouldn't think that I'd be running out of RAM. > >Is there some other memory option I need to set? Is there some limit on >the size of problems that loess can handle regardless of system resources? > >I'm running Windows 2000 and R 1.5.1. I've also run this with R 1.3. I've >tried it on a Pentium IV and an Athlon. > >If anybody knows of a solution to this problem, I would greatly appreciate it. > >Thanks, > >John > >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- >r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html >Send "info", "help", or "[un]subscribe" >(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thanks for looking into that, Peter. Maybe I'll try writing something to do local linear regressions... John -----Original Message----- From: Peter Dalgaard BSA [mailto:p.dalgaard at biostat.ku.dk] Sent: Monday, September 16, 2002 9:08 AM To: John Deke Cc: r-help at stat.math.ethz.ch Subject: Re: [R] loess crash John Deke <jdeke2 at comcast.net> writes:> Here's a simple example that yields the crash: > > library(modreg) > data1 <- array(runif(500*5),c(500,5)) > colnames(data1) <- c("x1","x2","x3","x4","x5") > y <- >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4"]+14*data1[,"x 5"]+rnorm(500)> data2 <- cbind(y,data1) > data2 <- as.data.frame(data2) > result1 <- loess(y~x1+x2+x3+x4,data2) > > To get the crash, I just add x5-- > > result1 <- loess(y~x1+x2+x3+x4+x5,data2) > > And bammo -- I'm dead. It doesn't even pause -- Rgui crashes, and I > mean really crashes -- the program is terminated, I get the little > Windows dialogue saying that a log file is being generated -- the > whole dramatic death scene. > > I know its a computationally intensive thing, but the one that doesn't > crash (with four explanatory variables) runs almost instantly. Its > hard to see how adding a fifth could be so catastrophic. But I am > somewhat new to this particular methodology....Ok, this is easily reproducible on Linux and with the debugger. It looks pretty squarely like a memory overrun condition, specifically at (in /src/library/modreg/src/loessf.f) 380 call dsvdc(u,15,k,k,sigma,g,u,15,e,15,work,21,info) which is getting called with k=21, where the 15 is supposed to be the leading dimension of u, and (k,k) are the actual dimensions. So dsvdc starts writing into places where it shouldn't and things basically go downhill from there, ending with a null pointer reference and a corrupted stack. So basically, you're not meant to do that... R could do with a safety check there, though. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
FWIW... Running the example (with x5 added) on Linux and R-1.5.1 gave segmentation fault. Running it on a copy of R-1.5.0 compiled w/o optimization gives:> result1 <- loess(y~x1+x2+x3+x4+x5,data2)Warning messages: 1: k>d2MAX in ehg136. Need to recompile with increased dimensions. 2: pseudoinverse used at 1.0256 1.017 1.0161 1.0371 1.036 3: neighborhood radius 0 4: reciprocal condition number 0 5: There are other near singularities as well. 0> result1Call: loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2) Number of Observations: 500 Equivalent Number of Parameters: NaN Residual Standard Error: NaN Andy> -----Original Message----- > From: John Deke [mailto:jdeke2 at comcast.net] > Sent: Monday, September 16, 2002 7:36 AM > To: r-help at stat.math.ethz.ch > Subject: Re: [R] loess crash > > > Here's a simple example that yields the crash: > > library(modreg) > data1 <- array(runif(500*5),c(500,5)) > colnames(data1) <- c("x1","x2","x3","x4","x5") > y <- > 3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4" > ]+14*data1[,"x5"]+rnorm(500) > data2 <- cbind(y,data1) > data2 <- as.data.frame(data2) > result1 <- loess(y~x1+x2+x3+x4,data2) > > To get the crash, I just add x5-- > > result1 <- loess(y~x1+x2+x3+x4+x5,data2) > > And bammo -- I'm dead. It doesn't even pause -- Rgui crashes, > and I mean > really crashes -- the program is terminated, I get the little Windows > dialogue saying that a log file is being generated -- the > whole dramatic > death scene. > > I know its a computationally intensive thing, but the one > that doesn't > crash (with four explanatory variables) runs almost > instantly. Its hard to > see how adding a fifth could be so catastrophic. But I am > somewhat new to > this particular methodology.... > > John > > At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote: > >John Deke <jdeke2 at comcast.net> writes: > > > > > Hmm... if I reduce the number of observations to just > 500, I still get > > > the error. > > > > > > I don't think its an issue of colinearity, because I've > tried several > > > different combinations of variables, all of which work > just fine in an > > > OLS or logistic regression. > > > > > > I'm probably doing something stupid, but I'm not seeing it... > > > > > > At 02:00 PM 9/15/2002, John Deke wrote: > > > >Hi, > > > > > > > > I have a data frame with 6563 observations. I can run a > regression > > > > with loess using four explanatory variables. If I add a fifth, R > > > > crashes. There are no missings in the data, and if I run a > > > > regression with any four of the five explanatory variables, it > > > > works. Its only when I go from four to five that it crashes. > > > >Hmm... I wouldn't try loess with more than one or two descriptors. I > >mean, it's a smoothing method and representing a smooth function of > >many variables can be computationally demanding. > > > >The Fortran source code for loess is one of the more > obfuscated pieces > >of R, but I can see that some structures inside of it are of fixed > >size, which might explain it (BTW: Does R really crash, or just say > >memory exhausted?). > > > >Do you have a simple example that reproduces the crash (using random > >numbers, e.g.)? > > > >-- > > O__ ---- Peter Dalgaard Blegdamsvej 3 > > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > >~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: > (+45) 35327907 > >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > .-.-.-.-.-.-.-.-.- > >r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > >Send "info", "help", or "[un]subscribe" > >(in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ > ._._._._._._._._._ > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Ah... I hadn't noticed that option! Thanks... that's a good idea. I'm quite happy to use local linear regression. To answer your question -- perhaps I'm off base, but my reason for wanting to do this is that I have a set of explanatory variables that most likely influence my dependent variable in ways that are difficult to model parametrically. That is, I suspect that there are all sorts of complementary relationships between these variables, and its not at all clear that there's a satisfying theoretical model that would suggest a clear-cut parametric relationship. So, rather than using parametric regression, I'd like to try something non-parametric. My plan for summarizing the results is to find the average marginal effect of each explanatory variable of interest, holding all else constant. Also, I would calculate predicted outcomes for combinations of the explanatory variables that are most likely to occur in "the real world". John -----Original Message----- From: John Fox [mailto:jfox at mcmaster.ca] Sent: Monday, September 16, 2002 9:31 AM To: John Deke Cc: r-help at stat.math.ethz.ch Subject: Re: [R] loess crash Dear John, For curiosity, I tried your example under R 1.5.1 on an 800 MHz PC with 512 Mb of memory running Windows 2000. The results were just as you described: The four-predictor problem ran essentially instantly, and the five-predictor problem crashed R, again instantly. I also tried making the problem less computationally demanding by specifying locally linear, rather than quadratic, fits; this appears to work: > loess(y~x1+x2+x3+x4+x5, data2, degree=1) Call: loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, degree = 1) Number of Observations: 500 Equivalent Number of Parameters: 13.5 Residual Standard Error: 1.012 > Although something is obviously wrong here, I wonder whether it makes sense to fit a local regression with so many predictors (unless the object is to compare the general nonparametric fit with some more constrained model): how would you describe the five-dimensional surface that's produced? John At 07:36 AM 9/16/2002 -0400, John Deke wrote:>Here's a simple example that yields the crash: > >library(modreg) >data1 <- array(runif(500*5),c(500,5)) >colnames(data1) <- c("x1","x2","x3","x4","x5") >y <- >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4"]+14*data1[,"x5"]+rnorm(500)>data2 <- cbind(y,data1) >data2 <- as.data.frame(data2) >result1 <- loess(y~x1+x2+x3+x4,data2) > >To get the crash, I just add x5-- > >result1 <- loess(y~x1+x2+x3+x4+x5,data2) > >And bammo -- I'm dead. It doesn't even pause -- Rgui crashes, and I mean >really crashes -- the program is terminated, I get the little Windows >dialogue saying that a log file is being generated -- the whole dramatic >death scene. > >I know its a computationally intensive thing, but the one that doesn't >crash (with four explanatory variables) runs almost instantly. Its hard to >see how adding a fifth could be so catastrophic. But I am somewhat new to >this particular methodology.... > >John > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote: >>John Deke <jdeke2 at comcast.net> writes: >> >> > Hmm... if I reduce the number of observations to just 500, I still get >> > the error. >> > >> > I don't think its an issue of colinearity, because I've tried several >> > different combinations of variables, all of which work just fine in an >> > OLS or logistic regression. >> > >> > I'm probably doing something stupid, but I'm not seeing it... >> > >> > At 02:00 PM 9/15/2002, John Deke wrote: >> > >Hi, >> > > >> > > I have a data frame with 6563 observations. I can run a regression >> > > with loess using four explanatory variables. If I add a fifth, R >> > > crashes. There are no missings in the data, and if I run a >> > > regression with any four of the five explanatory variables, it >> > > works. Its only when I go from four to five that it crashes. >> >>Hmm... I wouldn't try loess with more than one or two descriptors. I >>mean, it's a smoothing method and representing a smooth function of >>many variables can be computationally demanding. >> >>The Fortran source code for loess is one of the more obfuscated pieces >>of R, but I can see that some structures inside of it are of fixed >>size, which might explain it (BTW: Does R really crash, or just say >>memory exhausted?). >> >>Do you have a simple example that reproduces the crash (using random >>numbers, e.g.)?----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thanks for the suggestion. I've only used splines for desnity estimation before -- I've never used them for regression (although I'm aware that people do). I'll look into it... -----Original Message----- From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu] Sent: Monday, September 16, 2002 10:17 AM To: jdeke2 at comcast.net Cc: 'r-help at stat.math.ethz.ch' Subject: RE: [R] loess crash i would suggest looking at the package mgcv. you can fit generalized additive models which are useful for what you desribe below. On Mon, 16 Sep 2002, John Deke wrote:> Ah... I hadn't noticed that option! Thanks... that's a good idea. I'mquite> happy to use local linear regression. > > To answer your question -- perhaps I'm off base, but my reason for wanting > to do this is that I have a set of explanatory variables that most likely > influence my dependent variable in ways that are difficult to model > parametrically. That is, I suspect that there are all sorts ofcomplementary> relationships between these variables, and its not at all clear thatthere's> a satisfying theoretical model that would suggest a clear-cut parametric > relationship. So, rather than using parametric regression, I'd like to try > something non-parametric. > > My plan for summarizing the results is to find the average marginal effect > of each explanatory variable of interest, holding all else constant. Also,I> would calculate predicted outcomes for combinations of the explanatory > variables that are most likely to occur in "the real world". > > John > > -----Original Message----- > From: John Fox [mailto:jfox at mcmaster.ca] > Sent: Monday, September 16, 2002 9:31 AM > To: John Deke > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] loess crash > > > Dear John, > > For curiosity, I tried your example under R 1.5.1 on an 800 MHz PC with512> Mb of memory running Windows 2000. The results were just as you described:> The four-predictor problem ran essentially instantly, and the > five-predictor problem crashed R, again instantly. > > I also tried making the problem less computationally demanding by > specifying locally linear, rather than quadratic, fits; this appears to > work: > > > loess(y~x1+x2+x3+x4+x5, data2, degree=1) > Call: > loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, degree = 1) > > Number of Observations: 500 > Equivalent Number of Parameters: 13.5 > Residual Standard Error: 1.012 > > > > > Although something is obviously wrong here, I wonder whether it makessense> to fit a local regression with so many predictors (unless the object is to> compare the general nonparametric fit with some more constrained model): > how would you describe the five-dimensional surface that's produced? > > John > > At 07:36 AM 9/16/2002 -0400, John Deke wrote: > >Here's a simple example that yields the crash: > > > >library(modreg) > >data1 <- array(runif(500*5),c(500,5)) > >colnames(data1) <- c("x1","x2","x3","x4","x5") > >y <- > >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4"]+14*data1[," > x5"]+rnorm(500) > >data2 <- cbind(y,data1) > >data2 <- as.data.frame(data2) > >result1 <- loess(y~x1+x2+x3+x4,data2) > > > >To get the crash, I just add x5-- > > > >result1 <- loess(y~x1+x2+x3+x4+x5,data2) > > > >And bammo -- I'm dead. It doesn't even pause -- Rgui crashes, and I mean > >really crashes -- the program is terminated, I get the little Windows > >dialogue saying that a log file is being generated -- the whole dramatic > >death scene. > > > >I know its a computationally intensive thing, but the one that doesn't > >crash (with four explanatory variables) runs almost instantly. Its hardto> >see how adding a fifth could be so catastrophic. But I am somewhat new to> >this particular methodology.... > > > >John > > > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote: > >>John Deke <jdeke2 at comcast.net> writes: > >> > >> > Hmm... if I reduce the number of observations to just 500, I stillget> >> > the error. > >> > > >> > I don't think its an issue of colinearity, because I've tried several > >> > different combinations of variables, all of which work just fine inan> >> > OLS or logistic regression. > >> > > >> > I'm probably doing something stupid, but I'm not seeing it... > >> > > >> > At 02:00 PM 9/15/2002, John Deke wrote: > >> > >Hi, > >> > > > >> > > I have a data frame with 6563 observations. I can run a regression > >> > > with loess using four explanatory variables. If I add a fifth, R > >> > > crashes. There are no missings in the data, and if I run a > >> > > regression with any four of the five explanatory variables, it > >> > > works. Its only when I go from four to five that it crashes. > >> > >>Hmm... I wouldn't try loess with more than one or two descriptors. I > >>mean, it's a smoothing method and representing a smooth function of > >>many variables can be computationally demanding. > >> > >>The Fortran source code for loess is one of the more obfuscated pieces > >>of R, but I can see that some structures inside of it are of fixed > >>size, which might explain it (BTW: Does R really crash, or just say > >>memory exhausted?). > >> > >>Do you have a simple example that reproduces the crash (using random > >>numbers, e.g.)? > > ----------------------------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario, Canada L8S 4M4 > email: jfox at mcmaster.ca > phone: 905-525-9140x23604 > web: www.socsci.mcmaster.ca/jfox > ----------------------------------------------------- >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.-> r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I agree with John mostly. For a model as complicated as you're trying to fit with loess, you might as well try things like ppr (in the `modreg' package), MARS (in the 'mda' package) or neural nets (in the 'nnet' package), or even randomForest... Actually MARS might offer a bit more interpretability than others, because of its hierarchical construction. If you do care about `marginal effects' of the predictors, then aren't you sort of assuming additivity? In which case the additive model is more appropriate. If not, the `marginal effects' can be misleading. In terms of comparing a loess with 5 terms with a less complicated model, I think it needs to be pointed out that (AFAIK) it can only be done on a more or less qualitative level, as the models are not nested. Cheers, Andy> -----Original Message----- > From: John Fox [mailto:jfox at mcmaster.ca] > Sent: Monday, September 16, 2002 1:59 PM > To: jdeke2 at comcast.net > Cc: r-help at stat.math.ethz.ch > Subject: RE: [R] loess crash > > > Dear John, > > It's true that the gam function in mgcv fits with splines > while loess uses > local regression, but an even more fundamental difference is > that gam fits > additive models (though, with some care, you can include > higher-dimensional > terms). Given your description of what you plan to do with the fitted > model, an additive model might be what you want. > > More generally, a model that fits five-way interactions may > be useful as a > point of comparison for simpler models, but I doubt that it > will provide a > digestible description of the data. > > I hope that this helps, > John > > At 10:45 AM 9/16/2002 -0400, you wrote: > >Thanks for the suggestion. I've only used splines for > desnity estimation > >before -- I've never used them for regression (although I'm > aware that > >people do). I'll look into it... > > > > > >-----Original Message----- > >From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu] > >Sent: Monday, September 16, 2002 10:17 AM > >To: jdeke2 at comcast.net > >Cc: 'r-help at stat.math.ethz.ch' > >Subject: RE: [R] loess crash > > > > > >i would suggest looking at the package mgcv. > >you can fit generalized additive models which are useful for what > >you desribe below. > > > >On Mon, 16 Sep 2002, John Deke wrote: > > > > > Ah... I hadn't noticed that option! Thanks... that's a > good idea. I'm > >quite > > > happy to use local linear regression. > > > > > > To answer your question -- perhaps I'm off base, but my > reason for wanting > > > to do this is that I have a set of explanatory variables > that most likely > > > influence my dependent variable in ways that are > difficult to model > > > parametrically. That is, I suspect that there are all sorts of > >complementary > > > relationships between these variables, and its not at all > clear that > >there's > > > a satisfying theoretical model that would suggest a > clear-cut parametric > > > relationship. So, rather than using parametric > regression, I'd like to try > > > something non-parametric. > > > > > > My plan for summarizing the results is to find the > average marginal effect > > > of each explanatory variable of interest, holding all > else constant. Also, > >I > > > would calculate predicted outcomes for combinations of > the explanatory > > > variables that are most likely to occur in "the real world". > > > > > > John > > > > > > -----Original Message----- > > > From: John Fox [mailto:jfox at mcmaster.ca] > > > Sent: Monday, September 16, 2002 9:31 AM > > > To: John Deke > > > Cc: r-help at stat.math.ethz.ch > > > Subject: Re: [R] loess crash > > > > > > > > > Dear John, > > > > > > For curiosity, I tried your example under R 1.5.1 on an > 800 MHz PC with > >512 > > > Mb of memory running Windows 2000. The results were just > as you described: > > > > > The four-predictor problem ran essentially instantly, and the > > > five-predictor problem crashed R, again instantly. > > > > > > I also tried making the problem less computationally demanding by > > > specifying locally linear, rather than quadratic, fits; > this appears to > > > work: > > > > > > > loess(y~x1+x2+x3+x4+x5, data2, degree=1) > > > Call: > > > loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, > degree = 1) > > > > > > Number of Observations: 500 > > > Equivalent Number of Parameters: 13.5 > > > Residual Standard Error: 1.012 > > > > > > > > > > > > > Although something is obviously wrong here, I wonder > whether it makes > >sense > > > to fit a local regression with so many predictors (unless > the object is to > > > > > compare the general nonparametric fit with some more > constrained model): > > > how would you describe the five-dimensional surface > that's produced? > > > > > > John > > > > > > At 07:36 AM 9/16/2002 -0400, John Deke wrote: > > > >Here's a simple example that yields the crash: > > > > > > > >library(modreg) > > > >data1 <- array(runif(500*5),c(500,5)) > > > >colnames(data1) <- c("x1","x2","x3","x4","x5") > > > >y <- > > > > > > >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4 > "]+14*data1[," > > > x5"]+rnorm(500) > > > >data2 <- cbind(y,data1) > > > >data2 <- as.data.frame(data2) > > > >result1 <- loess(y~x1+x2+x3+x4,data2) > > > > > > > >To get the crash, I just add x5-- > > > > > > > >result1 <- loess(y~x1+x2+x3+x4+x5,data2) > > > > > > > >And bammo -- I'm dead. It doesn't even pause -- Rgui > crashes, and I mean > > > >really crashes -- the program is terminated, I get the > little Windows > > > >dialogue saying that a log file is being generated -- > the whole dramatic > > > >death scene. > > > > > > > >I know its a computationally intensive thing, but the > one that doesn't > > > >crash (with four explanatory variables) runs almost > instantly. Its hard > >to > > > >see how adding a fifth could be so catastrophic. But I > am somewhat new to > > > > > >this particular methodology.... > > > > > > > >John > > > > > > > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote: > > > >>John Deke <jdeke2 at comcast.net> writes: > > > >> > > > >> > Hmm... if I reduce the number of observations to > just 500, I still > >get > > > >> > the error. > > > >> > > > > >> > I don't think its an issue of colinearity, because > I've tried several > > > >> > different combinations of variables, all of which > work just fine in > >an > > > >> > OLS or logistic regression. > > > >> > > > > >> > I'm probably doing something stupid, but I'm not seeing it... > > > >> > > > > >> > At 02:00 PM 9/15/2002, John Deke wrote: > > > >> > >Hi, > > > >> > > > > > >> > > I have a data frame with 6563 observations. I can > run a regression > > > >> > > with loess using four explanatory variables. If I > add a fifth, R > > > >> > > crashes. There are no missings in the data, and if I run a > > > >> > > regression with any four of the five explanatory > variables, it > > > >> > > works. Its only when I go from four to five that > it crashes. > > > >> > > > >>Hmm... I wouldn't try loess with more than one or two > descriptors. I > > > >>mean, it's a smoothing method and representing a smooth > function of > > > >>many variables can be computationally demanding. > > > >> > > > >>The Fortran source code for loess is one of the more > obfuscated pieces > > > >>of R, but I can see that some structures inside of it > are of fixed > > > >>size, which might explain it (BTW: Does R really crash, > or just say > > > >>memory exhausted?). > > > >> > > > >>Do you have a simple example that reproduces the crash > (using random > > > >>numbers, e.g.)? > > > > > > ----------------------------------------------------- > > > John Fox > > ____________________________ > John Fox > Department of Sociology > McMaster University > email: jfox at mcmaster.ca > web: http://www.socsci.mcmaster.ca/jfox > ____________________________ > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Actually, I forgot there's the `locfit' package: library(locfit)> fit1 <- locfit(y~x1*x2*x3*x4*x5, data=data2) > fit1Call: locfit(formula = y ~ x1 * x2 * x3 * x4 * x5, data = data2) Number of observations: 500 Family: Gaussian Fitted Degrees of freedom: 32.179 Residual scale: 0.954> summary(fit1)Estimation type: Local Regression Call: locfit(formula = y ~ x1 * x2 * x3 * x4 * x5, data = data2) Number of data points: 500 Independent variables: x1 x2 x3 x4 x5 Evaluation structure: Rectangular Tree Number of evaluation points: 32 Degree of fit: 2 Fitted Degrees of Freedom: 32.179 The default settings might be different from loess, though. Andy> -----Original Message----- > From: Liaw, Andy [mailto:andy_liaw at merck.com] > Sent: Monday, September 16, 2002 4:17 PM > To: 'John Fox'; jdeke2 at comcast.net > Cc: r-help at stat.math.ethz.ch > Subject: RE: [R] loess crash > > > I agree with John mostly. For a model as complicated as > you're trying to > fit with loess, you might as well try things like ppr (in > the `modreg' > package), MARS (in the 'mda' package) or neural nets (in the 'nnet' > package), or even randomForest... Actually MARS might offer > a bit more > interpretability than others, because of its hierarchical > construction. > > If you do care about `marginal effects' of the predictors, > then aren't you > sort of assuming additivity? In which case the additive model is more > appropriate. If not, the `marginal effects' can be misleading. > > In terms of comparing a loess with 5 terms with a less > complicated model, I > think it needs to be pointed out that (AFAIK) it can only be > done on a more > or less qualitative level, as the models are not nested. > > Cheers, > Andy > > > -----Original Message----- > > From: John Fox [mailto:jfox at mcmaster.ca] > > Sent: Monday, September 16, 2002 1:59 PM > > To: jdeke2 at comcast.net > > Cc: r-help at stat.math.ethz.ch > > Subject: RE: [R] loess crash > > > > > > Dear John, > > > > It's true that the gam function in mgcv fits with splines > > while loess uses > > local regression, but an even more fundamental difference is > > that gam fits > > additive models (though, with some care, you can include > > higher-dimensional > > terms). Given your description of what you plan to do with > the fitted > > model, an additive model might be what you want. > > > > More generally, a model that fits five-way interactions may > > be useful as a > > point of comparison for simpler models, but I doubt that it > > will provide a > > digestible description of the data. > > > > I hope that this helps, > > John > > > > At 10:45 AM 9/16/2002 -0400, you wrote: > > >Thanks for the suggestion. I've only used splines for > > desnity estimation > > >before -- I've never used them for regression (although I'm > > aware that > > >people do). I'll look into it... > > > > > > > > >-----Original Message----- > > >From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu] > > >Sent: Monday, September 16, 2002 10:17 AM > > >To: jdeke2 at comcast.net > > >Cc: 'r-help at stat.math.ethz.ch' > > >Subject: RE: [R] loess crash > > > > > > > > >i would suggest looking at the package mgcv. > > >you can fit generalized additive models which are useful for what > > >you desribe below. > > > > > >On Mon, 16 Sep 2002, John Deke wrote: > > > > > > > Ah... I hadn't noticed that option! Thanks... that's a > > good idea. I'm > > >quite > > > > happy to use local linear regression. > > > > > > > > To answer your question -- perhaps I'm off base, but my > > reason for wanting > > > > to do this is that I have a set of explanatory variables > > that most likely > > > > influence my dependent variable in ways that are > > difficult to model > > > > parametrically. That is, I suspect that there are all sorts of > > >complementary > > > > relationships between these variables, and its not at all > > clear that > > >there's > > > > a satisfying theoretical model that would suggest a > > clear-cut parametric > > > > relationship. So, rather than using parametric > > regression, I'd like to try > > > > something non-parametric. > > > > > > > > My plan for summarizing the results is to find the > > average marginal effect > > > > of each explanatory variable of interest, holding all > > else constant. Also, > > >I > > > > would calculate predicted outcomes for combinations of > > the explanatory > > > > variables that are most likely to occur in "the real world". > > > > > > > > John > > > > > > > > -----Original Message----- > > > > From: John Fox [mailto:jfox at mcmaster.ca] > > > > Sent: Monday, September 16, 2002 9:31 AM > > > > To: John Deke > > > > Cc: r-help at stat.math.ethz.ch > > > > Subject: Re: [R] loess crash > > > > > > > > > > > > Dear John, > > > > > > > > For curiosity, I tried your example under R 1.5.1 on an > > 800 MHz PC with > > >512 > > > > Mb of memory running Windows 2000. The results were just > > as you described: > > > > > > > The four-predictor problem ran essentially instantly, and the > > > > five-predictor problem crashed R, again instantly. > > > > > > > > I also tried making the problem less computationally > demanding by > > > > specifying locally linear, rather than quadratic, fits; > > this appears to > > > > work: > > > > > > > > > loess(y~x1+x2+x3+x4+x5, data2, degree=1) > > > > Call: > > > > loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, > > degree = 1) > > > > > > > > Number of Observations: 500 > > > > Equivalent Number of Parameters: 13.5 > > > > Residual Standard Error: 1.012 > > > > > > > > > > > > > > > > > Although something is obviously wrong here, I wonder > > whether it makes > > >sense > > > > to fit a local regression with so many predictors (unless > > the object is to > > > > > > > compare the general nonparametric fit with some more > > constrained model): > > > > how would you describe the five-dimensional surface > > that's produced? > > > > > > > > John > > > > > > > > At 07:36 AM 9/16/2002 -0400, John Deke wrote: > > > > >Here's a simple example that yields the crash: > > > > > > > > > >library(modreg) > > > > >data1 <- array(runif(500*5),c(500,5)) > > > > >colnames(data1) <- c("x1","x2","x3","x4","x5") > > > > >y <- > > > > > > > > > >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4 > > "]+14*data1[," > > > > x5"]+rnorm(500) > > > > >data2 <- cbind(y,data1) > > > > >data2 <- as.data.frame(data2) > > > > >result1 <- loess(y~x1+x2+x3+x4,data2) > > > > > > > > > >To get the crash, I just add x5-- > > > > > > > > > >result1 <- loess(y~x1+x2+x3+x4+x5,data2) > > > > > > > > > >And bammo -- I'm dead. It doesn't even pause -- Rgui > > crashes, and I mean > > > > >really crashes -- the program is terminated, I get the > > little Windows > > > > >dialogue saying that a log file is being generated -- > > the whole dramatic > > > > >death scene. > > > > > > > > > >I know its a computationally intensive thing, but the > > one that doesn't > > > > >crash (with four explanatory variables) runs almost > > instantly. Its hard > > >to > > > > >see how adding a fifth could be so catastrophic. But I > > am somewhat new to > > > > > > > >this particular methodology.... > > > > > > > > > >John > > > > > > > > > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote: > > > > >>John Deke <jdeke2 at comcast.net> writes: > > > > >> > > > > >> > Hmm... if I reduce the number of observations to > > just 500, I still > > >get > > > > >> > the error. > > > > >> > > > > > >> > I don't think its an issue of colinearity, because > > I've tried several > > > > >> > different combinations of variables, all of which > > work just fine in > > >an > > > > >> > OLS or logistic regression. > > > > >> > > > > > >> > I'm probably doing something stupid, but I'm not > seeing it... > > > > >> > > > > > >> > At 02:00 PM 9/15/2002, John Deke wrote: > > > > >> > >Hi, > > > > >> > > > > > > >> > > I have a data frame with 6563 observations. I can > > run a regression > > > > >> > > with loess using four explanatory variables. If I > > add a fifth, R > > > > >> > > crashes. There are no missings in the data, and > if I run a > > > > >> > > regression with any four of the five explanatory > > variables, it > > > > >> > > works. Its only when I go from four to five that > > it crashes. > > > > >> > > > > >>Hmm... I wouldn't try loess with more than one or two > > descriptors. I > > > > >>mean, it's a smoothing method and representing a smooth > > function of > > > > >>many variables can be computationally demanding. > > > > >> > > > > >>The Fortran source code for loess is one of the more > > obfuscated pieces > > > > >>of R, but I can see that some structures inside of it > > are of fixed > > > > >>size, which might explain it (BTW: Does R really crash, > > or just say > > > > >>memory exhausted?). > > > > >> > > > > >>Do you have a simple example that reproduces the crash > > (using random > > > > >>numbers, e.g.)? > > > > > > > > ----------------------------------------------------- > > > > John Fox > > > > ____________________________ > > John Fox > > Department of Sociology > > McMaster University > > email: jfox at mcmaster.ca > > web: http://www.socsci.mcmaster.ca/jfox > > ____________________________ > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > > -.-.-.-.-.-.-.-.- > > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._. > _._ > > > -------------------------------------------------------------- > ---------------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (Whitehouse > Station, New Jersey, USA) that may be confidential, > proprietary copyrighted and/or legally privileged, and is > intended solely for the use of the individual or entity named > in this message. If you are not the intended recipient, and > have received this message in error, please immediately > return this by e-mail and then delete it. > > =============================================================> ===============> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._