Thanks Duncan, People working in my field are frequently (and rightly) accused of butchering statistical terminology. So I guess I'm guilty as charged ? I will look into the suggested path. One question though in your expression: loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) a + b*x/(c+x) ________________________________ From: Duncan Murdoch <murdoch.duncan at gmail.com> Sent: Wednesday, July 29, 2020 16:04 To: Sebastien Bihorel <Sebastien.Bihorel at cognigencorp.com>; J C Nash <profjcnash at gmail.com>; r-help at r-project.org <r-help at r-project.org> Subject: Re: [R] Nonlinear logistic regression fitting Just a quick note about jargon: you are using the word "likelihood" in a way that I (and maybe some others) find confusing. (In fact, I think you used it two different ways, but maybe I'm just confused.) I would say that likelihood is the probability of observing the entire data set, considered as a function of the parameters. You appear to be using it (at first) as the probability that a particular observation is equal to 1, and then as the argument to a logit function to give that probability. What you probably want to do is find the parameters that maximize the likelihood (in my sense). The usual practice is to maximize the log of the likelihood; it tends to be easier to work with. In your notation below, the log likelihood would be loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) When you have a linear logistic regression model, this simplifies a bit, and there are fast algorithms that are usually stable to optimize it. With a nonlinear model, you lose some of that, and I'd suggest directly optimizing it. Duncan Murdoch On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote:> Thank your, Pr. Nash, for your perspective on the issue. > > Here is an example of binary data/response (resp) that were simulated and re-estimated assuming a non linear effect of the predictor (x) on the likelihood of response. For re-estimation, I have used gnlm::bnlr for the logistic regression. The accuracy of the parameter estimates is so-so, probably due to the low number of data points (8*nx). For illustration, I have also include a glm call to an incorrect linear model of x. > > #install.packages(gnlm) > #require(gnlm) > set.seed(12345) > > nx <- 10 > x <- c( > rep(0, 3*nx), > rep(c(10, 30, 100, 500, 1000), each = nx) > ) > rnd <- runif(length(x)) > a <- log(0.2/(1-0.2)) > b <- log(0.7/(1-0.7)) - a > c <- 30 > likelihood <- a + b*x/(c+x) > p <- exp(likelihood) / (1 + exp(likelihood)) > resp <- ifelse(rnd <= p, 1, 0) > > df <- data.frame( > x = x, > resp = resp, > nresp = 1- resp > ) > > head(df) > > # glm can only assume linear effect of x, which is the wrong model > glm_mod <- glm( > resp~x, > data = df, > family = 'binomial' > ) > glm_mod > > # Using gnlm package, estimate a model model with just intercept, and a model with predictor effect > int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) ) > emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a + p_b*x/(p_c+x), pmu = c(a, b, c) ) > > int_mod > emax_mod > > ________________________________ > From: J C Nash <profjcnash at gmail.com> > Sent: Tuesday, July 28, 2020 14:16 > To: Sebastien Bihorel <Sebastien.Bihorel at cognigencorp.com>; r-help at r-project.org <r-help at r-project.org> > Subject: Re: [R] Nonlinear logistic regression fitting > > There is a large literature on nonlinear logistic models and similar > curves. Some of it is referenced in my 2014 book Nonlinear Parameter > Optimization Using R Tools, which mentions nlxb(), now part of the > nlsr package. If useful, I could put the Bibtex refs for that somewhere. > > nls() is now getting long in the tooth. It has a lot of flexibility and > great functionality, but it did very poorly on the Hobbs problem that > rather forced me to develop the codes that are 3/5ths of optim() and > also led to nlsr etc. The Hobbs problem dated from 1974, and with only > 12 data points still defeats a majority of nonlinear fit programs. > nls() poops out because it has no LM stabilization and a rather weak > forward difference derivative approximation. nlsr tries to generate > analytic derivatives, which often help when things are very badly scaled. > > Another posting suggests an example problem i.e., some data and a > model, though you also need the loss function (e.g., Max likelihood, > weights, etc.). Do post some data and functions so we can provide more > focussed advice. > > JN > > On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote: >> Hi >> >> I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as: >> >> L = intercept + emax * x / (EC50+x) >> >> Which I guess could be expressed as the following R model >> >> ~ emax*x/(ec50+x) >> >> As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear. >> >> A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... I would be grateful for any opinion on this package or for any alternative recommendation of package/function. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thanks Duncan, (Sorry for the repeated email) People working in my field are frequently (and rightly) accused of butchering statistical terminology. So I guess I'm guilty as charged ? I will look into the suggested path. One question though in your expression of loglik, p is "a + b*x/(c+x)". Correct? Thanks ________________________________ From: Duncan Murdoch <murdoch.duncan at gmail.com> Sent: Wednesday, July 29, 2020 16:04 To: Sebastien Bihorel <Sebastien.Bihorel at cognigencorp.com>; J C Nash <profjcnash at gmail.com>; r-help at r-project.org <r-help at r-project.org> Subject: Re: [R] Nonlinear logistic regression fitting Just a quick note about jargon: you are using the word "likelihood" in a way that I (and maybe some others) find confusing. (In fact, I think you used it two different ways, but maybe I'm just confused.) I would say that likelihood is the probability of observing the entire data set, considered as a function of the parameters. You appear to be using it (at first) as the probability that a particular observation is equal to 1, and then as the argument to a logit function to give that probability. What you probably want to do is find the parameters that maximize the likelihood (in my sense). The usual practice is to maximize the log of the likelihood; it tends to be easier to work with. In your notation below, the log likelihood would be loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) When you have a linear logistic regression model, this simplifies a bit, and there are fast algorithms that are usually stable to optimize it. With a nonlinear model, you lose some of that, and I'd suggest directly optimizing it. Duncan Murdoch On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote:> Thank your, Pr. Nash, for your perspective on the issue. > > Here is an example of binary data/response (resp) that were simulated and re-estimated assuming a non linear effect of the predictor (x) on the likelihood of response. For re-estimation, I have used gnlm::bnlr for the logistic regression. The accuracy of the parameter estimates is so-so, probably due to the low number of data points (8*nx). For illustration, I have also include a glm call to an incorrect linear model of x. > > #install.packages(gnlm) > #require(gnlm) > set.seed(12345) > > nx <- 10 > x <- c( > rep(0, 3*nx), > rep(c(10, 30, 100, 500, 1000), each = nx) > ) > rnd <- runif(length(x)) > a <- log(0.2/(1-0.2)) > b <- log(0.7/(1-0.7)) - a > c <- 30 > likelihood <- a + b*x/(c+x) > p <- exp(likelihood) / (1 + exp(likelihood)) > resp <- ifelse(rnd <= p, 1, 0) > > df <- data.frame( > x = x, > resp = resp, > nresp = 1- resp > ) > > head(df) > > # glm can only assume linear effect of x, which is the wrong model > glm_mod <- glm( > resp~x, > data = df, > family = 'binomial' > ) > glm_mod > > # Using gnlm package, estimate a model model with just intercept, and a model with predictor effect > int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) ) > emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a + p_b*x/(p_c+x), pmu = c(a, b, c) ) > > int_mod > emax_mod > > ________________________________ > From: J C Nash <profjcnash at gmail.com> > Sent: Tuesday, July 28, 2020 14:16 > To: Sebastien Bihorel <Sebastien.Bihorel at cognigencorp.com>; r-help at r-project.org <r-help at r-project.org> > Subject: Re: [R] Nonlinear logistic regression fitting > > There is a large literature on nonlinear logistic models and similar > curves. Some of it is referenced in my 2014 book Nonlinear Parameter > Optimization Using R Tools, which mentions nlxb(), now part of the > nlsr package. If useful, I could put the Bibtex refs for that somewhere. > > nls() is now getting long in the tooth. It has a lot of flexibility and > great functionality, but it did very poorly on the Hobbs problem that > rather forced me to develop the codes that are 3/5ths of optim() and > also led to nlsr etc. The Hobbs problem dated from 1974, and with only > 12 data points still defeats a majority of nonlinear fit programs. > nls() poops out because it has no LM stabilization and a rather weak > forward difference derivative approximation. nlsr tries to generate > analytic derivatives, which often help when things are very badly scaled. > > Another posting suggests an example problem i.e., some data and a > model, though you also need the loss function (e.g., Max likelihood, > weights, etc.). Do post some data and functions so we can provide more > focussed advice. > > JN > > On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote: >> Hi >> >> I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as: >> >> L = intercept + emax * x / (EC50+x) >> >> Which I guess could be expressed as the following R model >> >> ~ emax*x/(ec50+x) >> >> As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear. >> >> A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... I would be grateful for any opinion on this package or for any alternative recommendation of package/function. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 29/07/2020 4:16 p.m., Sebastien Bihorel wrote:> Thanks Duncan, > > (Sorry for the repeated email) > > People working in my field are frequently (and rightly) accused of > butchering statistical terminology. So I guess I'm guilty as charged ? > > I will look into the suggested path. One question though in your > expression of loglik, p is "a + b*x/(c+x)". Correct?p should be the probability of observing 1, so I would guess it is what you calculated below, i.e. exp(likelihood)/(1 + exp(likelihood)) (in your notation, not mine). Duncan Murdoch> > Thanks > > ------------------------------------------------------------------------ > *From:* Duncan Murdoch <murdoch.duncan at gmail.com> > *Sent:* Wednesday, July 29, 2020 16:04 > *To:* Sebastien Bihorel <Sebastien.Bihorel at cognigencorp.com>; J C Nash > <profjcnash at gmail.com>; r-help at r-project.org <r-help at r-project.org> > *Subject:* Re: [R] Nonlinear logistic regression fitting > Just a quick note about jargon:? you are using the word "likelihood" in > a way that I (and maybe some others) find confusing. (In fact, I think > you used it two different ways, but maybe I'm just confused.) I would > say that likelihood is the probability of observing the entire data set, > considered as a function of the parameters.? You appear to be using it > (at first) as the probability that a particular observation is equal to > 1, and then as the argument to a logit function to give that probability. > > What you probably want to do is find the parameters that maximize the > likelihood (in my sense).? The usual practice is to maximize the log of > the likelihood; it tends to be easier to work with.? In your notation > below, the log likelihood would be > > ?? loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) > > When you have a linear logistic regression model, this simplifies a bit, > and there are fast algorithms that are usually stable to optimize it. > With a nonlinear model, you lose some of that, and I'd suggest directly > optimizing it. > > Duncan Murdoch > > On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote: >> Thank your, Pr. Nash, for your perspective on the issue. >> >> Here is an example of binary data/response (resp) that were simulated and re-estimated assuming a non linear effect of the predictor (x) on the likelihood of response. For re-estimation, I have used gnlm::bnlr for the logistic regression. The accuracy of the parameter estimates is so-so, probably due to the low number of > data points (8*nx). For illustration, I have also include a glm call to > an incorrect linear model of x. >> >> #install.packages(gnlm) >> #require(gnlm) >> set.seed(12345) >> >> nx <- 10 >> x <- c( >>??? rep(0, 3*nx), >>??? rep(c(10, 30, 100, 500, 1000), each = nx) >> ) >> rnd <- runif(length(x)) >> a <- log(0.2/(1-0.2)) >> b <- log(0.7/(1-0.7)) - a >> c <- 30 >> likelihood <- a + b*x/(c+x) >> p <- exp(likelihood) / (1 + exp(likelihood)) >> resp <- ifelse(rnd <= p, 1, 0) >> >> df <- data.frame( >>??? x = x, >>??? resp = resp, >>??? nresp = 1- resp >> ) >> >> head(df) >> >> # glm can only assume linear effect of x, which is the wrong model >> glm_mod <- glm( >>??? resp~x, >>??? data = df, >>??? family = 'binomial' >> ) >> glm_mod >> >> # Using gnlm package, estimate a model model with just intercept, and a model with predictor effect >> int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) ) >> emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit',? mu = ~ p_a + p_b*x/(p_c+x),? pmu = c(a, b, c) ) >> >> int_mod >> emax_mod >> >> ________________________________ >> From: J C Nash <profjcnash at gmail.com> >> Sent: Tuesday, July 28, 2020 14:16 >> To: Sebastien Bihorel <Sebastien.Bihorel at cognigencorp.com>; r-help at r-project.org <r-help at r-project.org> >> Subject: Re: [R] Nonlinear logistic regression fitting >> >> There is a large literature on nonlinear logistic models and similar >> curves. Some of it is referenced in my 2014 book Nonlinear Parameter >> Optimization Using R Tools, which mentions nlxb(), now part of the >> nlsr package. If useful, I could put the Bibtex refs for that somewhere. >> >> nls() is now getting long in the tooth. It has a lot of flexibility and >> great functionality, but it did very poorly on the Hobbs problem that >> rather forced me to develop the codes that are 3/5ths of optim() and >> also led to nlsr etc. The Hobbs problem dated from 1974, and with only >> 12 data points still defeats a majority of nonlinear fit programs. >> nls() poops out because it has no LM stabilization and a rather weak >> forward difference derivative approximation. nlsr tries to generate >> analytic derivatives, which often help when things are very badly scaled. >> >> Another posting suggests an example problem i.e., some data and a >> model, though you also need the loss function (e.g., Max likelihood, >> weights, etc.). Do post some data and functions so we can provide more >> focussed advice. >> >> JN >> >> On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote: >>> Hi >>> >>> I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as: >>> >>> L = intercept + emax * x / (EC50+x) >>> >>> Which I guess could be expressed as the following R model >>> >>> ~ emax*x/(ec50+x) >>> >>> As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear. >>> >>> A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... > I would be grateful for any opinion on this package or for any > alternative recommendation of package/function. >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >>??????? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >