thr3ads.net - R help - [R] Extracting specific arguments from "..." [Jan 2025]

If this information is useful, please help other people find it:
Share via:

Sorkin, John

2025-Jan-07 22:03 UTC

[R] Extracting specific arguments from "..."

Colleagues,

My interest is not in writing ad hoc functions (which I might use once to
analyze my data), but rather what I will call a system function that might be
part of a package. The lm function is a paradigm of what I call a system
function.

The lm function begins by processing the arguments passed to the function
(represented in the function as parameters, see code below.) Much of this
processing is only peripherally related to running a regression, but the code is
necessary to determine exactly what the user of the system function wants the
function to do. It would be helpful if there was a document that would describe
best practices when writing system functions, with clear explanations of what
each step in system function is designed to do and how the line accomplishes its
task. It would also be nice if the system function had documentation. I have
pushed my way through the lm function, and with the help of R help files, I have
come to understand how the function works, but this is not an efficient way to
learn best practices that should be used when writing a system function.

Perhaps there is a document that does what I would like to see done, but I do
not know of one.

John

lmlm
function (formula, data, subset, weights, na.action, method = "qr",
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
    contrasts = NULL, offset, ...)
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset",
"weights", "na.action",
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame")
        return(mf)
    else if (method != "qr")
        warning(gettextf("method = '%s' is not supported. Using
'qr'",
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w))
        stop("'weights' must be a numeric vector")
    offset <- model.offset(mf)
    mlm <- is.matrix(y)
    ny <- if (mlm)
        nrow(y)
    else length(y)
    if (!is.null(offset)) {
        if (!mlm)
            offset <- as.vector(offset)
        if (NROW(offset) != ny)
            stop(gettextf("number of offsets is %d, should equal %d (number
of observations)",
                NROW(offset), ny), domain = NA)
    }
    if (is.empty.model(mt)) {
        x <- NULL
        z <- list(coefficients = if (mlm) matrix(NA_real_, 0,
            ncol(y)) else numeric(), residuals = y, fitted.values = 0 *
            y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w !   
0) else ny)
        if (!is.null(offset)) {
            z$fitted.values <- offset
            z$residuals <- y - offset
        }
    }
    else {
        x <- model.matrix(mt, mf, contrasts)
        z <- if (is.null(w))
            lm.fit(x, y, offset = offset, singular.ok = singular.ok,
                ...)
        else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
            ...)
    }
    class(z) <- c(if (mlm) "mlm", "lm")
    z$na.action <- attr(mf, "na.action")
    z$offset <- offset
    z$contrasts <- attr(x, "contrasts")
    z$xlevels <- .getXlevels(mt, mf)
    z$call <- cl
    z$terms <- mt
    if (model)
        z$model <- mf
    if (ret.x)
        z$x <- x
    if (ret.y)
        z$y <- y
    if (!qr)
        z$qr <- NULL
    z
}



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;?
PI?Biostatistics and Informatics Core, University of Maryland School of Medicine
Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382




________________________________________
From: Jorgen Harmse <JHarmse at roku.com>
Sent: Tuesday, January 7, 2025 1:47 PM
To: r-help at r-project.org; ikwsimmo at gmail.com; Bert Gunter; Sorkin, John;
jdnewmil at dcn.davis.ca.us
Subject: Re: Extracting specific arguments from "..."

Interesting discussion. A few things occurred to me.

Apologies to Iris Simmons: I mixed up his answer with Bert's question.

Bert raises questions about promises, and I think they are related to John
Sorkin's question. A big difference between R and most other languages is
that function arguments are computed lazily. match.call & substitute tell us
what expressions will be evaluated if function arguments are needed but not the
environments in which that will happen. The usual suspects are environment() and
parent.frame(), but parent.frame(k) & maybe even other environments are
possible. If you are really determined then I guess you can keep evaluating
match.call() in parent frames until you have accounted for all the inputs.

It's not clear to what extent John Sorkin is concerned about writing
functions as opposed to using functions. Lazy computation has advantages but
leads to some issues.
Exactly matching the function's default expression for an input is not
necessarily the same as omitting the input. The evaluation environment is
different.
If the caller uses an expression with side effects then there is no guarantee
that the side effects will happen. If there are side effects from two or more
inputs then the order is uncertain. (If an argument is not supplied and the
default has side effects then they might not happen either. However, I don't
know why the function writer would specify any side effect except stop(), and
then he or she has probably arranged for it to happen exactly when it should.)
If a default value depends on another input and that input is modified inside
the function then order of evaluation of inputs becomes important. Even if you
know exactly what you're doing when you write the function, you should make
it clear to future maintainers. An explicit call to force clarifies that the
input needs to be computed with the existing values of anything that is used in
the default, even if the code is refactored so that the value is not used
immediately. If you really want to modify another input before evaluating the
default then specify that in a comment.

Jeff Newmiller makes a good point. You can still change your mind about
inspecting a particular input without breaking old code that uses your function,
and you don?t necessarily need default values.

Old definition: f <- function(?) {<code that passes ? to other functions
and does some other things>}

New definition:
f <- function(?, a = <default expression, possibly stop()>)
{ <pass ?, a=a to another function>
  <do something with the output>
}

OR

f <- function(?, a)
{ if (missing(a)) # OK, this becomes clunky if there are several such inputs
  { < pass ? to another function >}
  else
 { <inspect or modify a> # Pitfall: Changing the order of evaluation may
break old code, but then the design was probably too devious in the first place.
    <pass ?, a=a to another function>
  }
  <do something with the output>
}

Regards,
Jorgen Harmse.

Ben Bolker

2025-Jan-07 22:06 UTC

head link

[R] Extracting specific arguments from "..."

There's an ancient (2003) document on the CRAN "developers'
page"
https://developer.r-project.org/model-fitting-functions.html that is 
sort of (but not exactly) what you're looking for ...


On 2025-01-07 5:03 p.m., Sorkin, John wrote:> Colleagues,
> 
> My interest is not in writing ad hoc functions (which I might use once to
analyze my data), but rather what I will call a system function that might be
part of a package. The lm function is a paradigm of what I call a system
function.
> 
> The lm function begins by processing the arguments passed to the function
(represented in the function as parameters, see code below.) Much of this
processing is only peripherally related to running a regression, but the code is
necessary to determine exactly what the user of the system function wants the
function to do. It would be helpful if there was a document that would describe
best practices when writing system functions, with clear explanations of what
each step in system function is designed to do and how the line accomplishes its
task. It would also be nice if the system function had documentation. I have
pushed my way through the lm function, and with the help of R help files, I have
come to understand how the function works, but this is not an efficient way to
learn best practices that should be used when writing a system function.
> 
> Perhaps there is a document that does what I would like to see done, but I
do not know of one.
> 
> John
> 
> lmlm
> function (formula, data, subset, weights, na.action, method =
"qr",
>      model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
>      contrasts = NULL, offset, ...)
> {
>      ret.x <- x
>      ret.y <- y
>      cl <- match.call()
>      mf <- match.call(expand.dots = FALSE)
>      m <- match(c("formula", "data",
"subset", "weights", "na.action",
>          "offset"), names(mf), 0L)
>      mf <- mf[c(1L, m)]
>      mf$drop.unused.levels <- TRUE
>      mf[[1L]] <- quote(stats::model.frame)
>      mf <- eval(mf, parent.frame())
>      if (method == "model.frame")
>          return(mf)
>      else if (method != "qr")
>          warning(gettextf("method = '%s' is not supported.
Using 'qr'",
>              method), domain = NA)
>      mt <- attr(mf, "terms")
>      y <- model.response(mf, "numeric")
>      w <- as.vector(model.weights(mf))
>      if (!is.null(w) && !is.numeric(w))
>          stop("'weights' must be a numeric vector")
>      offset <- model.offset(mf)
>      mlm <- is.matrix(y)
>      ny <- if (mlm)
>          nrow(y)
>      else length(y)
>      if (!is.null(offset)) {
>          if (!mlm)
>              offset <- as.vector(offset)
>          if (NROW(offset) != ny)
>              stop(gettextf("number of offsets is %d, should equal %d
(number of observations)",
>                  NROW(offset), ny), domain = NA)
>      }
>      if (is.empty.model(mt)) {
>          x <- NULL
>          z <- list(coefficients = if (mlm) matrix(NA_real_, 0,
>              ncol(y)) else numeric(), residuals = y, fitted.values = 0 *
>              y, weights = w, rank = 0L, df.residual = if (!is.null(w))
sum(w !>              0) else ny)
>          if (!is.null(offset)) {
>              z$fitted.values <- offset
>              z$residuals <- y - offset
>          }
>      }
>      else {
>          x <- model.matrix(mt, mf, contrasts)
>          z <- if (is.null(w))
>              lm.fit(x, y, offset = offset, singular.ok = singular.ok,
>                  ...)
>          else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
>              ...)
>      }
>      class(z) <- c(if (mlm) "mlm", "lm")
>      z$na.action <- attr(mf, "na.action")
>      z$offset <- offset
>      z$contrasts <- attr(x, "contrasts")
>      z$xlevels <- .getXlevels(mt, mf)
>      z$call <- cl
>      z$terms <- mt
>      if (model)
>          z$model <- mf
>      if (ret.x)
>          z$x <- x
>      if (ret.y)
>          z$y <- y
>      if (!qr)
>          z$qr <- NULL
>      z
> }
> 
> 
> 
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
> PI?Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
> 
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
> 
> 
> 
> 
> ________________________________________
> From: Jorgen Harmse <JHarmse at roku.com>
> Sent: Tuesday, January 7, 2025 1:47 PM
> To: r-help at r-project.org; ikwsimmo at gmail.com; Bert Gunter; Sorkin,
John; jdnewmil at dcn.davis.ca.us
> Subject: Re: Extracting specific arguments from "..."
> 
> Interesting discussion. A few things occurred to me.
> 
> Apologies to Iris Simmons: I mixed up his answer with Bert's question.
> 
> Bert raises questions about promises, and I think they are related to John
Sorkin's question. A big difference between R and most other languages is
that function arguments are computed lazily. match.call & substitute tell us
what expressions will be evaluated if function arguments are needed but not the
environments in which that will happen. The usual suspects are environment() and
parent.frame(), but parent.frame(k) & maybe even other environments are
possible. If you are really determined then I guess you can keep evaluating
match.call() in parent frames until you have accounted for all the inputs.
> 
> It's not clear to what extent John Sorkin is concerned about writing
functions as opposed to using functions. Lazy computation has advantages but
leads to some issues.
> Exactly matching the function's default expression for an input is not
necessarily the same as omitting the input. The evaluation environment is
different.
> If the caller uses an expression with side effects then there is no
guarantee that the side effects will happen. If there are side effects from two
or more inputs then the order is uncertain. (If an argument is not supplied and
the default has side effects then they might not happen either. However, I
don't know why the function writer would specify any side effect except
stop(), and then he or she has probably arranged for it to happen exactly when
it should.)
> If a default value depends on another input and that input is modified
inside the function then order of evaluation of inputs becomes important. Even
if you know exactly what you're doing when you write the function, you
should make it clear to future maintainers. An explicit call to force clarifies
that the input needs to be computed with the existing values of anything that is
used in the default, even if the code is refactored so that the value is not
used immediately. If you really want to modify another input before evaluating
the default then specify that in a comment.
> 
> Jeff Newmiller makes a good point. You can still change your mind about
inspecting a particular input without breaking old code that uses your function,
and you don?t necessarily need default values.
> 
> Old definition: f <- function(?) {<code that passes ? to other
functions and does some other things>}
> 
> New definition:
> f <- function(?, a = <default expression, possibly stop()>)
> { <pass ?, a=a to another function>
>    <do something with the output>
> }
> 
> OR
> 
> f <- function(?, a)
> { if (missing(a)) # OK, this becomes clunky if there are several such
inputs
>    { < pass ? to another function >}
>    else
>   { <inspect or modify a> # Pitfall: Changing the order of evaluation
may break old code, but then the design was probably too devious in the first
place.
>      <pass ?, a=a to another function>
>    }
>    <do something with the output>
> }
> 
> Regards,
> Jorgen Harmse.
> 
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.

Martin Maechler

2025-Jan-08 10:21 UTC

head link

[R] Extracting specific arguments from "..."

>>>>> Sorkin, John 
>>>>>     on Tue, 7 Jan 2025 22:03:02 +0000 writes:
    > Colleagues,
    > My interest is not in writing ad hoc functions (which I might use once
to analyze my data), but rather what I will call a system function that might be
part of a package. The lm function is a paradigm of what I call a system
function.

    > The lm function begins by processing the arguments passed to the
function (represented in the function as parameters, see code below.) Much of
this processing is only peripherally related to running a regression, but the
code is necessary to determine exactly what the user of the system function
wants the function to do. It would be helpful if there was a document that would
describe best practices when writing system functions, with clear explanations
of what each step in system function is designed to do and how the line
accomplishes its task. It would also be nice if the system function had
documentation. I have pushed my way through the lm function, and with the help
of R help files, I have come to understand how the function works, but this is
not an efficient way to learn best practices that should be used when writing a
system function.

    > Perhaps there is a document that does what I would like to see done,
but I do not know of one.

    > John

Note that the following is *not* the source of the lm() function, but a
print out to your console of what has become from the original source.
Notably all comments *and* all original author formatting has
been lost (as the "system functions" are *not* installed with
     	   something like options(keep.source = TRUE)).

For a long time, many have strongly advised to use the source
(e.g. https://en.wiktionary.org/wiki/UTSL   "Use The Source, Luke!")
instead. Here's the always latest (development / R-devel) source
for lm() *and* related functions ... also with comments etc:
  --> https://svn.r-project.org/R/trunk/src/library/stats/R/lm.R

(or you use one of its github mirrors ..)

Martin


    > lm

    > function (formula, data, subset, weights, na.action, method =
"qr",
    > model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
    > contrasts = NULL, offset, ...)
    > {
    > ret.x <- x
    > ret.y <- y
    > cl <- match.call()
    > mf <- match.call(expand.dots = FALSE)
    > m <- match(c("formula", "data",
"subset", "weights", "na.action",
    > "offset"), names(mf), 0L)
    > mf <- mf[c(1L, m)]
    > mf$drop.unused.levels <- TRUE
    > mf[[1L]] <- quote(stats::model.frame)
    > mf <- eval(mf, parent.frame())
    > if (method == "model.frame")
    > return(mf)
    > else if (method != "qr")
    > warning(gettextf("method = '%s' is not supported. Using
'qr'",
    > method), domain = NA)
    > mt <- attr(mf, "terms")
    > y <- model.response(mf, "numeric")
    > w <- as.vector(model.weights(mf))
    > if (!is.null(w) && !is.numeric(w))
    > stop("'weights' must be a numeric vector")
    > offset <- model.offset(mf)
    > mlm <- is.matrix(y)
    > ny <- if (mlm)
    > nrow(y)
    > else length(y)
    > if (!is.null(offset)) {
    > if (!mlm)
    > offset <- as.vector(offset)
    > if (NROW(offset) != ny)
    > stop(gettextf("number of offsets is %d, should equal %d (number of
observations)",
    > NROW(offset), ny), domain = NA)
    > }
    > if (is.empty.model(mt)) {
    > x <- NULL
    > z <- list(coefficients = if (mlm) matrix(NA_real_, 0,
    > ncol(y)) else numeric(), residuals = y, fitted.values = 0 *
    > y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w !   
> 0) else ny)
    > if (!is.null(offset)) {
    > z$fitted.values <- offset
    > z$residuals <- y - offset
    > }
    > }
    > else {
    > x <- model.matrix(mt, mf, contrasts)
    > z <- if (is.null(w))
    > lm.fit(x, y, offset = offset, singular.ok = singular.ok,
    > ...)
    > else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
    > ...)
    > }
    > class(z) <- c(if (mlm) "mlm", "lm")
    > z$na.action <- attr(mf, "na.action")
    > z$offset <- offset
    > z$contrasts <- attr(x, "contrasts")
    > z$xlevels <- .getXlevels(mt, mf)
    > z$call <- cl
    > z$terms <- mt
    > if (model)
    > z$model <- mf
    > if (ret.x)
    > z$x <- x
    > if (ret.y)
    > z$y <- y
    > if (!qr)
    > z$qr <- NULL
    > z
    > }



    > John David Sorkin M.D., Ph.D.
    > Professor of Medicine, University of Maryland School of Medicine;
    > Associate Director for Biostatistics and Informatics, Baltimore VA
Medical Center Geriatrics Research, Education, and Clinical Center;?
    > PI?Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
    > Senior Statistician University of Maryland Center for Vascular
Research;

    > Division of Gerontology and Paliative Care,
    > 10 North Greene Street
    > GRECC (BT/18/GR)
    > Baltimore, MD 21201-1524
    > Cell phone 443-418-5382




    > ________________________________________
    > From: Jorgen Harmse <JHarmse at roku.com>
    > Sent: Tuesday, January 7, 2025 1:47 PM
    > To: r-help at r-project.org; ikwsimmo at gmail.com; Bert Gunter;
Sorkin, John; jdnewmil at dcn.davis.ca.us
    > Subject: Re: Extracting specific arguments from "..."

    > Interesting discussion. A few things occurred to me.

    > Apologies to Iris Simmons: I mixed up his answer with Bert's
question.

    > Bert raises questions about promises, and I think they are related to
John Sorkin's question. A big difference between R and most other languages
is that function arguments are computed lazily. match.call & substitute tell
us what expressions will be evaluated if function arguments are needed but not
the environments in which that will happen. The usual suspects are environment()
and parent.frame(), but parent.frame(k) & maybe even other environments are
possible. If you are really determined then I guess you can keep evaluating
match.call() in parent frames until you have accounted for all the inputs.

    > It's not clear to what extent John Sorkin is concerned about
writing functions as opposed to using functions. Lazy computation has advantages
but leads to some issues.
    > Exactly matching the function's default expression for an input is
not necessarily the same as omitting the input. The evaluation environment is
different.
    > If the caller uses an expression with side effects then there is no
guarantee that the side effects will happen. If there are side effects from two
or more inputs then the order is uncertain. (If an argument is not supplied and
the default has side effects then they might not happen either. However, I
don't know why the function writer would specify any side effect except
stop(), and then he or she has probably arranged for it to happen exactly when
it should.)
    > If a default value depends on another input and that input is modified
inside the function then order of evaluation of inputs becomes important. Even
if you know exactly what you're doing when you write the function, you
should make it clear to future maintainers. An explicit call to force clarifies
that the input needs to be computed with the existing values of anything that is
used in the default, even if the code is refactored so that the value is not
used immediately. If you really want to modify another input before evaluating
the default then specify that in a comment.

    > Jeff Newmiller makes a good point. You can still change your mind about
inspecting a particular input without breaking old code that uses your function,
and you don?t necessarily need default values.

    > Old definition: f <- function(?) {<code that passes ? to other
functions and does some other things>}

    > New definition:
    > f <- function(?, a = <default expression, possibly stop()>)
    > { <pass ?, a=a to another function>
    > <do something with the output>
    > }

    > OR

    > f <- function(?, a)
    > { if (missing(a)) # OK, this becomes clunky if there are several such
inputs
    > { < pass ? to another function >}
    > else
    > { <inspect or modify a> # Pitfall: Changing the order of
evaluation may break old code, but then the design was probably too devious in
the first place.
    > <pass ?, a=a to another function>
    > }
    > <do something with the output>
    > }

    > Regards,
    > Jorgen Harmse.




    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2025 - Extracting specific arguments from "..."

[R] Extracting specific arguments from "..."

[R] Extracting specific arguments from "..."

[R] Extracting specific arguments from "..."

Reasonably Related Threads