Dear friends - I have a function for the charge in a fluid (water) buffered with HEPES and otherwise only containing Na and Cl so that [Na] - [Cl] = SID (strong ion difference) goes from -1 mM to 1 mM. With known SID and total HEPES concentration I can calculate accurately the pH if I know 3 pK values for HEPES by finding the single root with uniroot Now, the problem is that there is some disagreement in the literature what is the correct value for the 3 pKs. I know the 3 pK values have the relationship pK1 < pK2 <- pK3 and for the most common formulation of HEPES I know the charge on fully protonated species is 2. Hence I can generate a huge number of pK values from uniform distribution by taking sort(runif(3,-1,9) and make sure there are no ties and then run all those triplets of pK values to find pH values and thereby find the lowest deviation vis a vis measured values. That works but requires many triplets and is not satisfying. Hence I wonder if I could somehow have non linear regression to find the 3 pK values. Below is HEPESFUNC which delivers charge in the fluid for known pKs, HEPTOT and SID. Is it possible to have root-finding in the formula with nls? (I know the precision asked for is extreme but it has worked well in really many applications). All best wishes Troels Ring, MD Aalborg, Denmark HEPESFUNC <- ? function(H,SID,HEPTOT,pK1,pK2,pK3) { ??? XX <- (H^3/(10^-pK3*10^-pK2*10^-pK1)+H^2/(10^-pK3*10^-pK2)+H/(10^-pK3)) ??? IV <- HEPTOT/(1+XX) ??? I <- IV*H^3/(10^-pK3*10^-pK2*10^-pK1) ??? II <- IV*H^2/(10^-pK3*10^-pK2) ??? III <- IV*H/10^-pK3 ??? H - kw/H + SID + I*2 + II - IV ? } HEPTOT <- 0.050 SID <- c(-seq(10,1)*1e-4,0,seq(1,10)*1e-4) pHobs? <- c(4.63,4.68,4.72,4.77,4.83,4.9,4.96,5.04,5.12,5.21,5.3, ??????????? 5.39,5.48,5.55,5.63,5.69,5.74,5.8,5.85,5.89,5.93) pK1 <- -1; pK2 <- 3; pK3 <- 7.55 # literature values pK3 <- 7.6; pK2 <- 2.96 # values eye-balled to be better pH <- c() for (i in 1:length(SID)) { ? pH[i] <- -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, ?????????????????????? HEPTOT=HEPTOT,SID = SID[i], ?????????????????????? pK1=pK1,pK2=pK2,pK3=pK3)$root) } plot(SID,pH) points(SID,pHobs,col="red")
? Mon, 6 Nov 2023 17:53:49 +0100 Troels Ring <tring at gvdnet.dk> ?????:> Hence I wonder if I could somehow have non linear regression to find > the 3 pK values. Below is HEPESFUNC which delivers charge in the > fluid for known pKs, HEPTOT and SID. Is it possible to have > root-finding in the formula with nls?Sure. Just reformulate the problem in terms of a function that takes a vector of predictors (your independent variable SID) and the desired parameters (pK1, pK2, pK3) as separate arguments and then returns predicted values of the dependent variable (to compare against pHobs): kw <- 1e-14 # I'm assuming pHm <- Vectorize(function(SID, pK1, pK2, pK3) -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, HEPTOT=HEPTOT,SID = SID, pK1=pK1,pK2=pK2,pK3=pK3)$root)) (Yes, Vectorize() doesn't make the function any faster, but I don't see an easy way to rewrite this function to make it truly vectorised.) Unfortunately, nls() seems to take a step somewhere where crossprod() of the Jacobian of the model function cannot be inverted and fails with the message "Singular gradient". I wish that R could have a more reliable built-in nonlinear least squares solver. (I could also be holding it wrong.) Meanwhile, we have excellent CRAN packages nlsr and minpack.lm: minpack.lm::nlsLM( pHobs ~ pHm(SID, pK1, pK2, pK3), data.frame(pHobs = pHobs, SID = SID), start = c(pK1 = pK1, pK2 = pK2, pK3 = pK3), # the following is also needed to avoid MINPACK failing to fit lower = rep(-1, 3), upper = rep(9, 3) ) # Nonlinear regression model # model: pHobs ~ pHm(SID, pK1, pK2, pK3) # data: data.frame(pHobs = pHobs, SID = SID) # pK1 pK2 pK3 # -1.000 2.966 7.606 # residual sum-of-squares: 0.001195 # # Number of iterations to convergence: 15 # Achieved convergence tolerance: 1.49e-08 (Unfortunately, your code seemed to have a lot of non-breakable spaces, which confuse R's parser and make it harder to copy&paste it.) I think that your function can also be presented as a degree-5 polynomial in H, so it should also be possible to use polyroot() to obtain your solutions in a more exact manner. -- Best regards, Ivan
Your script is missing something (in particular kw). I presume you are trying to estimate the pK values. You may have more success with package nlsr than nls(). nlsr::nlxb() tries to get the Jacobian of the model specified by a formula and do so by applying symbolic or automatic differentiation. The multi expression function would probably not work. (That is one advantage of nls(), but it has an unstabilized solver and unless carefully called will use a simple forward derivative approximation.) There is also nlsr::nlfb() that can use functions for the residuals and jacobian. Your function is messy, but likely the jacobian can be developed with a bit of work. Some of the symbolic derivative features of R can help here. Alternatively, there are several numerical approximations in nlsr and the vignette "Intro to nlsr" explains how to use these. Note that simple forward and backward approximations are generally not worth using, but central approximation is quite good. The solver in the nlsr package allows of bounds on the parameters. Since you want them ordered, you may want to transform pK1 < pK2 <- pK3 to pk1, deltapk2, deltapk3 so pK2 = pk1+deltapk2 and pk3 = pk2 + deltapk3 = pk1 + deltapk2 + deltapk3 and all values bounded below by 0 or possibly some small numbers to keep the paramters apart. I won't pretend any of this is trivial. It is VERY easy to make small errors that still allow for output that is disastrously incorrect. If it is possible to get a single "formula" as one expression even if spread over multiple lines, then nlxb() might be able to handle it. J Nash (maintainer of nlsr and optimx) On 2023-11-06 11:53, Troels Ring wrote: ....> HEPESFUNC <- > ? function(H,SID,HEPTOT,pK1,pK2,pK3) { > ??? XX <- (H^3/(10^-pK3*10^-pK2*10^-pK1)+H^2/(10^-pK3*10^-pK2)+H/(10^-pK3)) > ??? IV <- HEPTOT/(1+XX) > ??? I <- IV*H^3/(10^-pK3*10^-pK2*10^-pK1) > ??? II <- IV*H^2/(10^-pK3*10^-pK2) > ??? III <- IV*H/10^-pK3 > ??? H - kw/H + SID + I*2 + II - IV > ? } > > > HEPTOT <- 0.050 > SID <- c(-seq(10,1)*1e-4,0,seq(1,10)*1e-4) > > pHobs? <- c(4.63,4.68,4.72,4.77,4.83,4.9,4.96,5.04,5.12,5.21,5.3, > ??????????? 5.39,5.48,5.55,5.63,5.69,5.74,5.8,5.85,5.89,5.93) > > pK1 <- -1; pK2 <- 3; pK3 <- 7.55 # literature values > pK3 <- 7.6; pK2 <- 2.96 # values eye-balled to be better > > pH <- c() > for (i in 1:length(SID)) { > ? pH[i] <- -log10(uniroot(HEPESFUNC,c(1e-20,1),tol=1e-20,maxiter=1e4, > ?????????????????????? HEPTOT=HEPTOT,SID = SID[i], > ?????????????????????? pK1=pK1,pK2=pK2,pK3=pK3)$root) > } > > plot(SID,pH) > points(SID,pHobs,col="red")