Dear friends - I have a function for the charge in a fluid (water) 
buffered with HEPES and otherwise only containing Na and Cl so that [Na] 
- [Cl] = SID (strong ion difference) goes from -1 mM to 1 mM. With known 
SID and total HEPES concentration I can calculate accurately the pH if I 
know 3 pK values for HEPES by finding the single root with uniroot
Now, the problem is that there is some disagreement in the literature 
what is the correct value for the 3 pKs. I know the 3 pK values have the 
relationship pK1 < pK2 <- pK3 and for the most common formulation of 
HEPES I know the charge on fully protonated species is 2. Hence I can 
generate a huge number of pK values from uniform distribution by taking 
sort(runif(3,-1,9) and make sure there are no ties and then run all 
those triplets of pK values to find pH values and thereby find the 
lowest deviation vis a vis measured values. That works but requires many 
triplets and is not satisfying. Hence I wonder if I could somehow have 
non linear regression to find the 3 pK values. Below is HEPESFUNC which 
delivers charge in the fluid for known pKs, HEPTOT and SID. Is it 
possible to have root-finding in the formula with nls? (I know the 
precision asked for is extreme but it has worked well in really many 
applications).
All best wishes
Troels Ring, MD
Aalborg, Denmark
HEPESFUNC <-
 ? function(H,SID,HEPTOT,pK1,pK2,pK3) {
 ??? XX <- (H^3/(10^-pK3*10^-pK2*10^-pK1)+H^2/(10^-pK3*10^-pK2)+H/(10^-pK3))
 ??? IV <- HEPTOT/(1+XX)
 ??? I <- IV*H^3/(10^-pK3*10^-pK2*10^-pK1)
 ??? II <- IV*H^2/(10^-pK3*10^-pK2)
 ??? III <- IV*H/10^-pK3
 ??? H - kw/H + SID + I*2 + II - IV
 ? }
HEPTOT <- 0.050
SID <- c(-seq(10,1)*1e-4,0,seq(1,10)*1e-4)
pHobs? <- c(4.63,4.68,4.72,4.77,4.83,4.9,4.96,5.04,5.12,5.21,5.3,
 ??????????? 5.39,5.48,5.55,5.63,5.69,5.74,5.8,5.85,5.89,5.93)
pK1 <- -1; pK2 <- 3; pK3 <- 7.55 # literature values
pK3 <- 7.6; pK2 <- 2.96 # values eye-balled to be better
pH <- c()
for (i in 1:length(SID)) {
 ? pH[i] <- -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4,
 ?????????????????????? HEPTOT=HEPTOT,SID = SID[i],
 ?????????????????????? pK1=pK1,pK2=pK2,pK3=pK3)$root)
}
plot(SID,pH)
points(SID,pHobs,col="red")
? Mon, 6 Nov 2023 17:53:49 +0100 Troels Ring <tring at gvdnet.dk> ?????:> Hence I wonder if I could somehow have non linear regression to find > the 3 pK values. Below is HEPESFUNC which delivers charge in the > fluid for known pKs, HEPTOT and SID. Is it possible to have > root-finding in the formula with nls?Sure. Just reformulate the problem in terms of a function that takes a vector of predictors (your independent variable SID) and the desired parameters (pK1, pK2, pK3) as separate arguments and then returns predicted values of the dependent variable (to compare against pHobs): kw <- 1e-14 # I'm assuming pHm <- Vectorize(function(SID, pK1, pK2, pK3) -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, HEPTOT=HEPTOT,SID = SID, pK1=pK1,pK2=pK2,pK3=pK3)$root)) (Yes, Vectorize() doesn't make the function any faster, but I don't see an easy way to rewrite this function to make it truly vectorised.) Unfortunately, nls() seems to take a step somewhere where crossprod() of the Jacobian of the model function cannot be inverted and fails with the message "Singular gradient". I wish that R could have a more reliable built-in nonlinear least squares solver. (I could also be holding it wrong.) Meanwhile, we have excellent CRAN packages nlsr and minpack.lm: minpack.lm::nlsLM( pHobs ~ pHm(SID, pK1, pK2, pK3), data.frame(pHobs = pHobs, SID = SID), start = c(pK1 = pK1, pK2 = pK2, pK3 = pK3), # the following is also needed to avoid MINPACK failing to fit lower = rep(-1, 3), upper = rep(9, 3) ) # Nonlinear regression model # model: pHobs ~ pHm(SID, pK1, pK2, pK3) # data: data.frame(pHobs = pHobs, SID = SID) # pK1 pK2 pK3 # -1.000 2.966 7.606 # residual sum-of-squares: 0.001195 # # Number of iterations to convergence: 15 # Achieved convergence tolerance: 1.49e-08 (Unfortunately, your code seemed to have a lot of non-breakable spaces, which confuse R's parser and make it harder to copy&paste it.) I think that your function can also be presented as a degree-5 polynomial in H, so it should also be possible to use polyroot() to obtain your solutions in a more exact manner. -- Best regards, Ivan
Your script is missing something (in particular kw).
I presume you are trying to estimate the pK values. You may have more success
with package nlsr than nls(). nlsr::nlxb() tries to get the Jacobian of the
model specified by a formula and do so by applying symbolic or automatic
differentiation. The multi expression function would probably not work.
(That is one advantage of nls(), but it has an unstabilized solver and unless
carefully called will use a simple forward derivative approximation.)
There is also nlsr::nlfb() that can use functions for the residuals and
jacobian. Your function is messy, but likely the jacobian can be developed
with a bit of work.
Some of the symbolic derivative features of R can help here. Alternatively,
there are several numerical approximations in nlsr and the vignette "Intro
to nlsr" explains how to use these. Note that simple forward and backward
approximations are generally not worth using, but central approximation is
quite good.
The solver in the nlsr package allows of bounds on the parameters. Since you
want them ordered, you may want to transform
    pK1 < pK2 <- pK3
to  pk1, deltapk2, deltapk3 so pK2 = pk1+deltapk2
and pk3 = pk2 + deltapk3 = pk1 + deltapk2 + deltapk3
and all values bounded below by 0 or possibly some small numbers to
keep the paramters apart.
I won't pretend any of this is trivial. It is VERY easy to make small errors
that still allow for output that is disastrously incorrect. If it is possible
to get a single "formula" as one expression even if spread over
multiple lines,
then nlxb() might be able to handle it.
J Nash (maintainer of nlsr and optimx)
On 2023-11-06 11:53, Troels Ring wrote:
....> HEPESFUNC <-
>  ? function(H,SID,HEPTOT,pK1,pK2,pK3) {
>  ??? XX <-
(H^3/(10^-pK3*10^-pK2*10^-pK1)+H^2/(10^-pK3*10^-pK2)+H/(10^-pK3))
>  ??? IV <- HEPTOT/(1+XX)
>  ??? I <- IV*H^3/(10^-pK3*10^-pK2*10^-pK1)
>  ??? II <- IV*H^2/(10^-pK3*10^-pK2)
>  ??? III <- IV*H/10^-pK3
>  ??? H - kw/H + SID + I*2 + II - IV
>  ? }
> 
> 
> HEPTOT <- 0.050
> SID <- c(-seq(10,1)*1e-4,0,seq(1,10)*1e-4)
> 
> pHobs? <- c(4.63,4.68,4.72,4.77,4.83,4.9,4.96,5.04,5.12,5.21,5.3,
>  ??????????? 5.39,5.48,5.55,5.63,5.69,5.74,5.8,5.85,5.89,5.93)
> 
> pK1 <- -1; pK2 <- 3; pK3 <- 7.55 # literature values
> pK3 <- 7.6; pK2 <- 2.96 # values eye-balled to be better
> 
> pH <- c()
> for (i in 1:length(SID)) {
>  ? pH[i] <- -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4,
>  ?????????????????????? HEPTOT=HEPTOT,SID = SID[i],
>  ?????????????????????? pK1=pK1,pK2=pK2,pK3=pK3)$root)
> }
> 
> plot(SID,pH)
> points(SID,pHobs,col="red")