A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. # first create a data set with many long names n <- 30? # number of rows for the dummy data set vname <- vector("character", 26) for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='')? # long variable names tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) names(tdata) <- c('y', vname) # Use it in a formula myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") mf <- model.frame(formula(myform), data=tdata) match(attr(terms(mf), "term.labels"), names(mf))?? # gives NA ---- In the user's case the function is ridge(x1, x2, ....) rather than cbind, but the effect is the same. Any ideas for a work around? Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph.? I never expected serious use of it.? For this particular user the best answer is to use glmnet instead.?? He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. Terry T.
> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel <r-devel at r-project.org> wrote: > > A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. > > # first create a data set with many long names > n <- 30 # number of rows for the dummy data set > vname <- vector("character", 26) > for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long variable names > > tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) > names(tdata) <- c('y', vname) > > # Use it in a formula > myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") > mf <- model.frame(formula(myform), data=tdata) > > match(attr(terms(mf), "term.labels"), names(mf)) # gives NA > > ---- > > In the user's case the function is ridge(x1, x2, ....) rather than cbind, but the effect is the same. > Any ideas for a work around?Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass method where the width cutoff arg here (around lines 57-58 of model.frame.default) is made larger: varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), collapse = " "))[-1L] ??> > Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph. I never expected serious use of it. For this particular user the best answer is to use glmnet instead. He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. > > Terry T. >HTH, Chuck
Therneau, Terry M., Ph.D.
2018-May-01 17:55 UTC
[Rd] [EXTERNAL] Re: issue with model.frame()
Great catch. I'm very reluctant to use my own model.frame, since that locks me into tracking all the base R changes, potentially breaking survival in a bad way if I miss one. But, this shows me clearly what the issue is and will allow me to think about it. Another solution for the user is to use multiple ridge() calls to break it up; since he/she was using a fixed tuning parameter the result is the same. Terry T. On 05/01/2018 11:43 AM, Berry, Charles wrote:> > >> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel <r-devel at r-project.org> wrote: >> >> A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. >> >> # first create a data set with many long names >> n <- 30 # number of rows for the dummy data set >> vname <- vector("character", 26) >> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long variable names >> >> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) >> names(tdata) <- c('y', vname) >> >> # Use it in a formula >> myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") >> mf <- model.frame(formula(myform), data=tdata) >> >> match(attr(terms(mf), "term.labels"), names(mf)) # gives NA >> >> ---- >> >> In the user's case the function is ridge(x1, x2, ....) rather than cbind, but the effect is the same. >> Any ideas for a work around? > > Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass method where the width cutoff arg here (around lines 57-58 of model.frame.default) is made larger: > > varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), > collapse = " "))[-1L] > > ?? > >> >> Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph. I never expected serious use of it. For this particular user the best answer is to use glmnet instead. He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. >> >> Terry T. >> > > > HTH, > > Chuck >
You run into the same problem when using 'non-syntactical' names:> mfB <- model.frame(y ~ `Temp(C)` + `Pres(mb)`,data=data.frame(check.names=FALSE, y=1:10, `Temp(C)`=21:30, `Pres(mb)`=991:1000))> match(attr(terms(mfB), "term.labels"), names(mfB)) # gives NA's[1] NA NA> attr(terms(mfB), "term.labels")[1] "`Temp(C)`" "`Pres(mb)`"> names(mfB)[1] "y" "Temp(C)" "Pres(mb)" Note that names(mfB) does not give a hint as whether they represent R expressions or not (in this case they do not). When they do represent R expressions then one could parse() them and compare them to as.list(attr(mfB),"variables")[-1]). Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, May 1, 2018 at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel < r-devel at r-project.org> wrote:> A user sent me an example where coxph fails, and the root of the failure > is a case where names(mf) is not equal to the term.labels attribute of the > formula -- the latter has an extraneous newline. Here is an example that > does not use the survival library. > > # first create a data set with many long names > n <- 30 # number of rows for the dummy data set > vname <- vector("character", 26) > for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # > long variable names > > tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) > names(tdata) <- c('y', vname) > > # Use it in a formula > myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") > mf <- model.frame(formula(myform), data=tdata) > > match(attr(terms(mf), "term.labels"), names(mf)) # gives NA > > ---- > > In the user's case the function is ridge(x1, x2, ....) rather than cbind, > but the effect is the same. > Any ideas for a work around? > > Aside: the ridge() function is very simple, it was added as an example to > show how a user can add their own penalization to coxph. I never expected > serious use of it. For this particular user the best answer is to use > glmnet instead. He/she is trying to apply an L2 penalty to a large number > of SNP * covariate interactions. > > Terry T. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
>>>>> Berry, Charles <ccberry at ucsd.edu> >>>>> on Tue, 1 May 2018 16:43:18 +0000 writes:>> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel <r-devel at r-project.org> wrote: >> >> A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. >> >> # first create a data set with many long names >> n <- 30 # number of rows for the dummy data set >> vname <- vector("character", 26) >> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long variable names >> >> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) >> names(tdata) <- c('y', vname) >> >> # Use it in a formula >> myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") >> mf <- model.frame(formula(myform), data=tdata) >> >> match(attr(terms(mf), "term.labels"), names(mf)) # gives NA >> >> ---- >> >> In the user's case the function is ridge(x1, x2, ....) rather than cbind, but the effect is the same. >> Any ideas for a work around? > Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass method where the width cutoff arg here (around lines 57-58 of model.frame.default) is made larger: > varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), > collapse = " "))[-1L] What version of R is that ? In current versions it is varnames <- vapply(vars, deparse2, " ")[-1L] and deparse2() is a slightly enhanced version of the above function, again with 'width.cutoff = 500' *BUT* if you read help(deparse) you will learn that 500 is the upper bound allowed currently. (and yes, one could consider increasing that as it has been unchanged in R since the very beginning (I have checked R version 0.49 from 1997). On the other hand, deparse2 (and your older code above) do paste all the parts together via collapse = " " so I don't see quite yet ... Martin >> Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph. I never expected serious use of it. For this particular user the best answer is to use glmnet instead. He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. >> >> Terry T. > HTH, > Chuck > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel