In mentoring and participating in a Google Summer of Code project "Improvements to nls()", I've not found examples of use of the "subset" argument in the call to nls(). Moreover, in searching through the source code for the various functions related to nls(), I can't seem to find where subset is used, but a simple example, included below, indicates it works. Three approaches all seem to give the same results. Can someone point to documentation or code so we can make sure we get our revised programs to work properly? The aim is to make them more maintainable and provide maintainer documentation, along with some improved functionality. We seem, for example, to already be able to offer analytic derivatives where they are feasible, and should be able to add Marquardt-Levenberg stabilization as an option. Note that this "subset" does not seem to be the "subset()" function of R. John Nash # CroucherSubset.R -- https://walkingrandomly.com/?p=5254 xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9) ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001) Cform <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata) Cstart<-list(p1=1,p2=0.2) Cdata<-data.frame(xdata, ydata) Csubset<-1:8 # just first 8 points # Original problem - no subset fit0 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, start=list(p1=1,p2=.2)) summary(fit0) # via subset argument fit1 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, start=list(p1=1,p2=.2), subset=Csubset) summary(fit1) # via explicit subsetting Csdata <- Cdata[Csubset, ] Csdata fit2 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Csdata, start=list(p1=1,p2=.2)) summary(fit2) # via weights -- seems to give correct observation count if zeros not recognized wts <- c(rep(1,8), rep(0,2)) fit3 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, weights=wts, start=list(p1=1,p2=.2)) summary(fit3)
Marc Schwartz
2021-Jul-14 01:14 UTC
[Rd] subset argument in nls() and possibly other functions
Hi John, In scanning some of the more popular model functions (e.g. lm(), glm(), lme(), coxph(), etc.), none seem to provide examples of the use of the 'subset' argument, even though it is documented for them. That being said, there is some old (2003) documentation by Prof Ripley here: https://developer.r-project.org/model-fitting-functions.html that may be helpful, and where the link to the lm() function source code on the above page should be: ? https://svn.r-project.org/R/trunk/src/library/stats/R/lm.R Within that source file, you might want to focus upon the model.frame.lm() function, the basic form which is used internally in many (most?, all?) of the typical model related functions in R to create the internal data frame from the specified formula, that is then used to create the model. There is a parallel model.frame.glm() function for glm() here: https://svn.r-project.org/R/trunk/src/library/stats/R/glm.R There is also a 2003 paper by Thomas Lumley on non-standard evaluation that may be helpful: https://developer.r-project.org/nonstandard-eval.pdf The help for the generic ?model.frame has the following text for the 'subset' argument: "a specification of the rows to be used: defaults to all rows. This can be any valid indexing vector (see|[.data.frame <http://127.0.0.1:13384/library/stats/help/[.data.frame>|) for the rows of|data|or if that is not supplied, a data frame made up of the variables used in|formula|." I cannot recall off-hand, using the 'subset' argument myself in ~20 years of using R, but do seem to recall some old discussions on the e-mail lists, which I cannot seem to locate at present. A search via rseek.org may yield some benefit. Regards, Marc Schwartz J C Nash wrote on 7/13/21 7:21 PM:> In mentoring and participating in a Google Summer of Code project "Improvements to nls()", > I've not found examples of use of the "subset" argument in the call to nls(). Moreover, > in searching through the source code for the various functions related to nls(), I can't > seem to find where subset is used, but a simple example, included below, indicates it works. > Three approaches all seem to give the same results. > > Can someone point to documentation or code so we can make sure we get our revised programs > to work properly? The aim is to make them more maintainable and provide maintainer documentation, > along with some improved functionality. We seem, for example, to already be able to offer > analytic derivatives where they are feasible, and should be able to add Marquardt-Levenberg > stabilization as an option. > > Note that this "subset" does not seem to be the "subset()" function of R. > > John Nash > > # CroucherSubset.R -- https://walkingrandomly.com/?p=5254 > > xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9) > ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001) > Cform <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata) > Cstart<-list(p1=1,p2=0.2) > Cdata<-data.frame(xdata, ydata) > Csubset<-1:8 # just first 8 points > > # Original problem - no subset > fit0 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, start=list(p1=1,p2=.2)) > summary(fit0) > > # via subset argument > fit1 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, start=list(p1=1,p2=.2), subset=Csubset) > summary(fit1) > > # via explicit subsetting > Csdata <- Cdata[Csubset, ] > Csdata > fit2 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Csdata, start=list(p1=1,p2=.2)) > summary(fit2) > > # via weights -- seems to give correct observation count if zeros not recognized > wts <- c(rep(1,8), rep(0,2)) > fit3 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, weights=wts, start=list(p1=1,p2=.2)) > summary(fit3) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Balasubramanian Narasimhan
2021-Jul-14 22:16 UTC
[Rd] subset argument in nls() and possibly other functions
For the example provided below, the subsetting happens in evaluating the call to stats::model.formula in line 583 of nls.R (https://github.com/wch/r-source/blob/e91be22f6f37644e5a0ba74a3dfe504a3a29e9f7/src/library/stats/R/nls.R#L583) returning an appropriate (subsetted) data frame. -Naras On 7/13/21 4:21 PM, J C Nash wrote:> In mentoring and participating in a Google Summer of Code project "Improvements to nls()", > I've not found examples of use of the "subset" argument in the call to nls(). Moreover, > in searching through the source code for the various functions related to nls(), I can't > seem to find where subset is used, but a simple example, included below, indicates it works. > Three approaches all seem to give the same results. > > Can someone point to documentation or code so we can make sure we get our revised programs > to work properly? The aim is to make them more maintainable and provide maintainer documentation, > along with some improved functionality. We seem, for example, to already be able to offer > analytic derivatives where they are feasible, and should be able to add Marquardt-Levenberg > stabilization as an option. > > Note that this "subset" does not seem to be the "subset()" function of R. > > John Nash > > # CroucherSubset.R -- https://walkingrandomly.com/?p=5254 > > xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9) > ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001) > Cform <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata) > Cstart<-list(p1=1,p2=0.2) > Cdata<-data.frame(xdata, ydata) > Csubset<-1:8 # just first 8 points > > # Original problem - no subset > fit0 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, start=list(p1=1,p2=.2)) > summary(fit0) > > # via subset argument > fit1 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, start=list(p1=1,p2=.2), subset=Csubset) > summary(fit1) > > # via explicit subsetting > Csdata <- Cdata[Csubset, ] > Csdata > fit2 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Csdata, start=list(p1=1,p2=.2)) > summary(fit2) > > # via weights -- seems to give correct observation count if zeros not recognized > wts <- c(rep(1,8), rep(0,2)) > fit3 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), data=Cdata, weights=wts, start=list(p1=1,p2=.2)) > summary(fit3) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel