I am trying to model a dependent variable as a threshold function of my independent variable. What I mean is that I want to fit different intercepts to y following 2 breakpoints, but with fixed slopes. I am trying to do this with using ifelse statements in the nls function. Perhaps, this is not an appropriate approach. I have created a very simple example to illustrate what I am trying to do. #Creating my independent variable x <- seq(0, 1000) #Creating my dependent variable, where all values below threshold #1 are 0, all values above threshold #2 are 0 and all values within the two thresholds are 1 y <- ifelse(x < 300, 0, ifelse(x>700, 0, 1)) #Adding noise to my dependent variable y <- y + rnorm(length(y), 0, 0.01) #I am having trouble clearly explaining the model I want to fit but perhaps the plot is self explanatory plot(x, y) #Now I am trying to adjust a nonlinear model to fit the two breakpoints, with known slopes between the breakpoints (here, respectively 0, 1 and 0) threshold.model <- nls(y~ifelse(x<b1, 0, ifelse(x>b2, 0, 1)), start=list(b1=300, b2=700), trace=T) I have played around with this function quite a bit and systematically get an error saying: singular gradient matrix at initial parameter estimates I admit that I don't fully understand what a singular gradient matrix implies. But it seems to me that both my model and starting values are sensible given the data, and that the break points are in fact estimable. Could someone point to what I am doing wrong? More generally, I am interested in knowing (1) whether I can use such ifelse statements in the function nls (1) whether I can even use nls for this model (3) whether I can model this with a function that would allow me to assume that the errors are binomial, rather than normally distributed (ultimately I want to use such a model on binomial data) I am using R version 2.15.1 on 64-bit Windows 7 Any guidance would be greatly appreciated. Veronique [[alternative HTML version deleted]]
V?ronique: I've cc'ed this to a true expert (Ravi Varadhan) who is one of those who can give you a definitive response, but I **believe** the problem is that threshhold type function fits have objective functions whose derivatives are discontinuous,and hence gradient -based methods can run into the sorts of problems that you see. **If** this is so, then you might do better using an explicit non-gradient optimizer = rss minimizer via one of the optim() suite of functions (maybe even the default Nelder-Mead); but this is definitely where the counsel of an expert would be valuable. Also check out the CRAN Optimization Task View for advice on optimization options. Cheers, Bert On Mon, Oct 15, 2012 at 9:43 AM, V?ronique Boucher Lalonde <veronique.boucher.lalonde at gmail.com> wrote:> I am trying to model a dependent variable as a threshold function of > my independent variable. > What I mean is that I want to fit different intercepts to y following 2 > breakpoints, but with fixed slopes. > I am trying to do this with using ifelse statements in the nls function. > Perhaps, this is not an appropriate approach. > > I have created a very simple example to illustrate what I am trying to do. > > #Creating my independent variable > x <- seq(0, 1000) > #Creating my dependent variable, where all values below threshold #1 are 0, > all values above threshold #2 are 0 and all values within the two > thresholds are 1 > y <- ifelse(x < 300, 0, ifelse(x>700, 0, 1)) > #Adding noise to my dependent variable > y <- y + rnorm(length(y), 0, 0.01) > #I am having trouble clearly explaining the model I want to fit but perhaps > the plot is self explanatory > plot(x, y) > #Now I am trying to adjust a nonlinear model to fit the two breakpoints, > with known slopes between the breakpoints (here, respectively 0, 1 and 0) > threshold.model <- nls(y~ifelse(x<b1, 0, ifelse(x>b2, 0, 1)), > start=list(b1=300, b2=700), trace=T) > > I have played around with this function quite a bit and systematically get > an error saying: singular gradient matrix at initial parameter estimates > I admit that I don't fully understand what a singular gradient matrix > implies. > But it seems to me that both my model and starting values are sensible > given the data, and that the break points are in fact estimable. > Could someone point to what I am doing wrong? > > More generally, I am interested in knowing > (1) whether I can use such ifelse statements in the function nls > (1) whether I can even use nls for this model > (3) whether I can model this with a function that would allow me to assume > that the errors are binomial, rather than normally distributed > (ultimately I want to use such a model on binomial data) > > I am using R version 2.15.1 on 64-bit Windows 7 > > Any guidance would be greatly appreciated. > Veronique > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hi You indeed cannot use conventional optimisation algorithms as the indicator function is discontinuous, with no derivative. The search hence has to be done with a grid search over each potential threshold value, which is termed "conditional/concentrated" least square, or also "profile likelihood". Another approach is to make the indicator function smooth, replacing it with a logistic or cumulative normal, in which case you can then use conventional methods. This means that instead of having a clear jump from one regime to the other, you have a smooth transition. Package tsDyn implements both these models for time series auto-regressions, the first being called SETAR and the second star. You can have a look at the code and adapt it easily for your purpose (although there the y is continuous)! Regarding litterature, the paper that fits most what you look for is probably: -Samia, N. and Chan, K. (2011). Maximum likelihood estimation of a generalized threshold stochastic regression model. Biometrika, 98(2):433–448. Best Matthieu Message: 13 Date: Sat, 20 Oct 2012 09:44:00 -0400 From: V?ronique Boucher Lalonde <veronique.boucher.lalonde@gmail.com>To: Ravi Varadhan <ravi.varadhan@jhu.edu> Cc: "r-help@r-project.org" <r-help@r-project.org>, Bert Gunter <gunter.berton@gene.com> Subject: Re: [R] fit a "threshold" function with nls Message-ID: <CABkQxbzsPKFe6YQVe85pHkDp8aNDo7Ri6GCmzK8mAfrS2cbSuA@mail.gmail.com> Content-Type: text/plain Thank you all for your help. I think I now understand the issue. I tried to write a likelihood function for my binomial model. Please excuse my ignorance if I am not doing this right; I do not have any statistical background. #Example data x <- seq(0, 1000) y <- ifelse(x < 300, 0, ifelse(x>700, 0, 1)) #Function assuming binomial errors fn <- function(par) { y.pred = ifelse(x<par[1], 0, ifelse(x>par[2], 0, 1)) -sum(dbinom(1-abs(y-y.pred), prob = mean(y.pred == y), size = 1, log = T)) } optim(par=c(300,700), fn) Would the "Nelder-Mead" optimization method be appropriate? On Tue, Oct 16, 2012 at 11:28 AM, Ravi Varadhan <ravi.varadhan@jhu.edu>wrote:[[alternative HTML version deleted]]