thr3ads.net - R devel - [Rd] issue with model.frame() [May 2018]

If this information is useful, please help other people find it:
Share via:

Therneau, Terry M., Ph.D.

2018-May-01 13:11 UTC

[Rd] issue with model.frame()

A user sent me an example where coxph fails, and the root of the failure is a
case where
names(mf) is not equal to the term.labels attribute of the formula -- the latter
has an
extraneous newline. Here is an example that does not use the survival library.

# first create a data set with many long names
n <- 30? # number of rows for the dummy data set
vname <- vector("character", 26)
for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='')?
# long variable names

tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
names(tdata) <- c('y', vname)

# Use it in a formula
myform <- paste("y ~ cbind(", paste(vname, collapse=",
"), ")")
mf <- model.frame(formula(myform), data=tdata)

match(attr(terms(mf), "term.labels"), names(mf))?? # gives NA

----

In the user's case the function is ridge(x1, x2, ....) rather than cbind,
but the effect
is the same.
Any ideas for a work around?

Aside: the ridge() function is very simple, it was added as an example to show
how a user
can add their own penalization to coxph.? I never expected serious use of it.?
For this
particular user the best answer is to use glmnet instead.?? He/she is trying to
apply an
L2 penalty to a large number of SNP * covariate interactions.

Terry T.

Berry, Charles

2018-May-01 16:43 UTC

head link

[Rd] issue with model.frame()

> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel
<r-devel at r-project.org> wrote:
> 
> A user sent me an example where coxph fails, and the root of the failure is
a case where names(mf) is not equal to the term.labels attribute of the formula
-- the latter has an extraneous newline. Here is an example that does not use
the survival library.
> 
> # first create a data set with many long names
> n <- 30  # number of rows for the dummy data set
> vname <- vector("character", 26)
> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2),
collapse='')  # long variable names
> 
> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
> names(tdata) <- c('y', vname)
> 
> # Use it in a formula
> myform <- paste("y ~ cbind(", paste(vname, collapse=",
"), ")")
> mf <- model.frame(formula(myform), data=tdata)
> 
> match(attr(terms(mf), "term.labels"), names(mf))   # gives NA
> 
> ----
> 
> In the user's case the function is ridge(x1, x2, ....) rather than
cbind, but the effect is the same.
> Any ideas for a work around?
Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass
method where the width cutoff arg here (around lines 57-58 of
model.frame.default) is made larger:

varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), 
        collapse = " "))[-1L]

??
> 
> Aside: the ridge() function is very simple, it was added as an example to
show how a user can add their own penalization to coxph.  I never expected
serious use of it.  For this particular user the best answer is to use glmnet
instead.   He/she is trying to apply an L2 penalty to a large number of SNP *
covariate interactions.
> 
> Terry T.
> 

HTH,

Chuck

Therneau, Terry M., Ph.D.

2018-May-01 17:55 UTC

head link

[Rd] [EXTERNAL] Re: issue with model.frame()

Great catch.  I'm very reluctant to use my own model.frame, since that locks
me into
tracking all the base R changes, potentially breaking survival in a bad way if I
miss one.

But, this shows me clearly what the issue is and will allow me to think about
it.

Another solution for the user is to use multiple ridge() calls to break it up;
since
he/she was using a fixed tuning parameter the result is the same.

Terry T.


On 05/01/2018 11:43 AM, Berry, Charles wrote:> 
> 
>> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel
<r-devel at r-project.org> wrote:
>>
>> A user sent me an example where coxph fails, and the root of the
failure is a case where names(mf) is not equal to the term.labels attribute of
the formula -- the latter has an extraneous newline. Here is an example that
does not use the survival library.
>>
>> # first create a data set with many long names
>> n <- 30  # number of rows for the dummy data set
>> vname <- vector("character", 26)
>> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2),
collapse='')  # long variable names
>>
>> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
>> names(tdata) <- c('y', vname)
>>
>> # Use it in a formula
>> myform <- paste("y ~ cbind(", paste(vname,
collapse=", "), ")")
>> mf <- model.frame(formula(myform), data=tdata)
>>
>> match(attr(terms(mf), "term.labels"), names(mf))   # gives NA
>>
>> ----
>>
>> In the user's case the function is ridge(x1, x2, ....) rather than
cbind, but the effect is the same.
>> Any ideas for a work around?
> 
> Maybe add a `yourclass' class to mf and dispatch to a
model.frame.yourclass method where the width cutoff arg here (around lines 57-58
of model.frame.default) is made larger:
> 
> varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff =
500),
>          collapse = " "))[-1L]
> 
> ??
> 
>>
>> Aside: the ridge() function is very simple, it was added as an example
to show how a user can add their own penalization to coxph.  I never expected
serious use of it.  For this particular user the best answer is to use glmnet
instead.   He/she is trying to apply an L2 penalty to a large number of SNP *
covariate interactions.
>>
>> Terry T.
>>
> 
> 
> HTH,
> 
> Chuck
>

William Dunlap

2018-May-01 18:38 UTC

head link

[Rd] issue with model.frame()

You run into the same problem when using 'non-syntactical' names:
> mfB <- model.frame(y ~ `Temp(C)` + `Pres(mb)`,data=data.frame(check.names=FALSE, y=1:10, `Temp(C)`=21:30,
`Pres(mb)`=991:1000))> match(attr(terms(mfB), "term.labels"), names(mfB))   # gives
NA's
[1] NA NA> attr(terms(mfB), "term.labels")
[1] "`Temp(C)`"  "`Pres(mb)`"> names(mfB)[1] "y"        "Temp(C)"  "Pres(mb)"

Note that names(mfB) does not give a hint as whether they represent R
expressions or not (in this case they do not).  When they do represent R
expressions then one could parse() them and compare them to
as.list(attr(mfB),"variables")[-1]).


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, May 1, 2018 at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel <
r-devel at r-project.org> wrote:
> A user sent me an example where coxph fails, and the root of the failure
> is a case where names(mf) is not equal to the term.labels attribute of the
> formula -- the latter has an extraneous newline. Here is an example that
> does not use the survival library.
>
> # first create a data set with many long names
> n <- 30  # number of rows for the dummy data set
> vname <- vector("character", 26)
> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2),
collapse='')  #
> long variable names
>
> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
> names(tdata) <- c('y', vname)
>
> # Use it in a formula
> myform <- paste("y ~ cbind(", paste(vname, collapse=",
"), ")")
> mf <- model.frame(formula(myform), data=tdata)
>
> match(attr(terms(mf), "term.labels"), names(mf))   # gives NA
>
> ----
>
> In the user's case the function is ridge(x1, x2, ....) rather than
cbind,
> but the effect is the same.
> Any ideas for a work around?
>
> Aside: the ridge() function is very simple, it was added as an example to
> show how a user can add their own penalization to coxph.  I never expected
> serious use of it.  For this particular user the best answer is to use
> glmnet instead.   He/she is trying to apply an L2 penalty to a large number
> of SNP * covariate interactions.
>
> Terry T.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Martin Maechler

2018-May-01 20:15 UTC

head link

[Rd] issue with model.frame()

>>>>> Berry, Charles <ccberry at ucsd.edu>
>>>>>     on Tue, 1 May 2018 16:43:18 +0000 writes:
    >> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel
<r-devel at r-project.org> wrote:
    >> 
    >> A user sent me an example where coxph fails, and the root of the
failure is a case where names(mf) is not equal to the term.labels attribute of
the formula -- the latter has an extraneous newline. Here is an example that
does not use the survival library.
    >> 
    >> # first create a data set with many long names
    >> n <- 30  # number of rows for the dummy data set
    >> vname <- vector("character", 26)
    >> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2),
collapse='')  # long variable names
    >> 
    >> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
    >> names(tdata) <- c('y', vname)
    >> 
    >> # Use it in a formula
    >> myform <- paste("y ~ cbind(", paste(vname,
collapse=", "), ")")
    >> mf <- model.frame(formula(myform), data=tdata)
    >> 
    >> match(attr(terms(mf), "term.labels"), names(mf))   #
gives NA
    >> 
    >> ----
    >> 
    >> In the user's case the function is ridge(x1, x2, ....) rather
than cbind, but the effect is the same.
    >> Any ideas for a work around?

    > Maybe add a `yourclass' class to mf and dispatch to a
model.frame.yourclass method where the width cutoff arg here (around lines 57-58
of model.frame.default) is made larger:

    > varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff =
500),
    > collapse = " "))[-1L]

What version of R is that ?  In current versions it is

    varnames <- vapply(vars, deparse2, " ")[-1L]

and deparse2() is a slightly enhanced version of the above
function, again with  'width.cutoff = 500'

*BUT* if you read  help(deparse)  you will learn that 500 is the
upper bound allowed currently.  (and yes, one could consider
increasing that as it has been unchanged in R since the very
beginning (I have checked R version 0.49 from 1997).

On the other hand, deparse2 (and your older code above) do paste
all the parts together  via  collapse = " "  so I don't see
quite yet ...

Martin


    >> Aside: the ridge() function is very simple, it was added as an
example to show how a user can add their own penalization to coxph.  I never
expected serious use of it.  For this particular user the best answer is to use
glmnet instead.   He/she is trying to apply an L2 penalty to a large number of
SNP * covariate interactions.
    >> 
    >> Terry T.



    > HTH,

    > Chuck
    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Maybe Matching Threads

Search for more possibly parallel threads

R devel - May 2018 - issue with model.frame()

[Rd] issue with model.frame()

[Rd] issue with model.frame()

[Rd] [EXTERNAL] Re: issue with model.frame()

[Rd] issue with model.frame()

[Rd] issue with model.frame()

Maybe Matching Threads