Paul Johnson
2018-Aug-21 22:45 UTC
[R] looking for formula parser that allows coefficients
Can you point me at any packages that allow users to write a formula with coefficients? I want to write a data simulator that has a matrix X with lots of columns, and then users can generate predictive models by entering a formula that uses some of the variables, allowing interactions, like y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 Currently, in the rockchalk package, I have a function simulates data (genCorrelatedData2), but my interface to enter the beta coefficients is poor. I assumed user would always enter 0's as place holder for the unused coefficients, and the intercept is always first. The unnamed vector is too confusing. I have them specify: c(2, 1.1, 0, 3, 0, 0, 0.2, ...) I the documentation I say (ridiculously) it is easy to figure out from the examples, but it really isnt. It function prints out the equation it thinks you intended, thats minimum protection against user error, but still not very good: dat <- genCorrelatedData2(N = 10, rho = 0.0, beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0), means = c(0,0,0), sds = c(1,1,1), stde = 0) [1] "The equation that was calculated was" y = 1 + 2*x1 + 1*x2 + 1*x3 + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1 + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 + N(0,0) random error But still, it is not very good. As I look at this now, I realize expect just the vech, not the whole vector of all interaction terms, so it is even more difficult than I thought to get the correct input.Hence, I'd like to let the user write a formula. The alternative for the user interface is to have named coefficients. I can more or less easily allow a named vector for beta beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) I could build a formula from that. That's not too bad. But I still think it would be cool to allow formula input. Have you ever seen it done? pj -- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu To write to me directly, please address me at pauljohn at ku.edu.
Gabor Grothendieck
2018-Aug-22 07:33 UTC
[R] looking for formula parser that allows coefficients
Some string manipulation can convert the formula to a named vector such as the one shown at the end of your post. library(gsubfn) # input fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2 pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?" ch <- format(fo[[3]]) m <- matrix(strapplyc(ch, pat)[[1]], 3) m <- m[, colSums(m != "") > 0] m[2, m[2, ] == ""] <- 1 m[3, m[3, ] == ""] <- "(Intercept)" co <- as.numeric(paste0(m[1, ], m[2, ])) v <- m[3, ] setNames(co, v) ## (Intercept) x1 x3 x1:x3 x2:x2 ## 2.0 -1.1 1.0 -1.0 0.2 On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 at gmail.com> wrote:> > Can you point me at any packages that allow users to write a > formula with coefficients? > > I want to write a data simulator that has a matrix X with lots > of columns, and then users can generate predictive models > by entering a formula that uses some of the variables, allowing > interactions, like > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 > > Currently, in the rockchalk package, I have a function simulates > data (genCorrelatedData2), but my interface to enter the beta > coefficients is poor. I assumed user would always enter 0's as > place holder for the unused coefficients, and the intercept is > always first. The unnamed vector is too confusing. I have them specify: > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...) > > I the documentation I say (ridiculously) it is easy to figure out from > the examples, but it really isnt. > It function prints out the equation it thinks you intended, thats > minimum protection against user error, but still not very good: > > dat <- genCorrelatedData2(N = 10, rho = 0.0, > beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0), > means = c(0,0,0), sds = c(1,1,1), stde = 0) > [1] "The equation that was calculated was" > y = 1 + 2*x1 + 1*x2 + 1*x3 > + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1 > + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 > + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 > + N(0,0) random error > > But still, it is not very good. > > As I look at this now, I realize expect just the vech, not the whole vector > of all interaction terms, so it is even more difficult than I thought to get the > correct input.Hence, I'd like to let the user write a formula. > > The alternative for the user interface is to have named coefficients. > I can more or less easily allow a named vector for beta > > beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) > > I could build a formula from that. That's not too bad. But I still think > it would be cool to allow formula input. > > Have you ever seen it done? > pj > -- > Paul E. Johnson http://pj.freefaculty.org > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > To write to me directly, please address me at pauljohn at ku.edu. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Gabor Grothendieck
2018-Aug-25 02:06 UTC
[R] looking for formula parser that allows coefficients
Also here is a solution that uses formula processing rather than string processing. No packages are used. Parse <- function(e) { if (length(e) == 1) { if (is.numeric(e)) return(e) else setNames(1, as.character(e)) } else { if (isChar(e[[1]], "*")) { x1 <- Recall(e[[2]]) x2 <- Recall(e[[3]]) setNames(unname(x1 * x2), paste0(names(x1), names(x2))) } else if (isChar(e[[1]], "+")) c(Recall(e[[2]]), Recall(e[[3]])) else if (isChar(e[[1]], "-")) { if (length(e) == 2) -1 * Recall(e[[2]]) else c(Recall(e[[2]]), -Recall(e[[3]])) } else if (isChar(e[[1]], ":")) setNames(1, paste(e[-1], collapse = ":")) } } # test fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2 Parse(fo[[3]]) giving: x1 x3 x1:x3 x2:x2 2.0 -1.1 1.0 -1.0 0.2 On Wed, Aug 22, 2018 at 11:50 AM Paul Johnson <pauljohn32 at gmail.com> wrote:> > Thanks as usual. I owe you more KU decorations soon. > On Wed, Aug 22, 2018 at 2:34 AM Gabor Grothendieck > <ggrothendieck at gmail.com> wrote: > > > > Some string manipulation can convert the formula to a named vector such as > > the one shown at the end of your post. > > > > library(gsubfn) > > > > # input > > fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2 > > > > pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?" > > ch <- format(fo[[3]]) > > m <- matrix(strapplyc(ch, pat)[[1]], 3) > > m <- m[, colSums(m != "") > 0] > > m[2, m[2, ] == ""] <- 1 > > m[3, m[3, ] == ""] <- "(Intercept)" > > co <- as.numeric(paste0(m[1, ], m[2, ])) > > v <- m[3, ] > > setNames(co, v) > > ## (Intercept) x1 x3 x1:x3 x2:x2 > > ## 2.0 -1.1 1.0 -1.0 0.2 > > On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 at gmail.com> wrote: > > > > > > Can you point me at any packages that allow users to write a > > > formula with coefficients? > > > > > > I want to write a data simulator that has a matrix X with lots > > > of columns, and then users can generate predictive models > > > by entering a formula that uses some of the variables, allowing > > > interactions, like > > > > > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 > > > > > > Currently, in the rockchalk package, I have a function simulates > > > data (genCorrelatedData2), but my interface to enter the beta > > > coefficients is poor. I assumed user would always enter 0's as > > > place holder for the unused coefficients, and the intercept is > > > always first. The unnamed vector is too confusing. I have them specify: > > > > > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...) > > > > > > I the documentation I say (ridiculously) it is easy to figure out from > > > the examples, but it really isnt. > > > It function prints out the equation it thinks you intended, thats > > > minimum protection against user error, but still not very good: > > > > > > dat <- genCorrelatedData2(N = 10, rho = 0.0, > > > beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0), > > > means = c(0,0,0), sds = c(1,1,1), stde = 0) > > > [1] "The equation that was calculated was" > > > y = 1 + 2*x1 + 1*x2 + 1*x3 > > > + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1 > > > + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 > > > + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 > > > + N(0,0) random error > > > > > > But still, it is not very good. > > > > > > As I look at this now, I realize expect just the vech, not the whole vector > > > of all interaction terms, so it is even more difficult than I thought to get the > > > correct input.Hence, I'd like to let the user write a formula. > > > > > > The alternative for the user interface is to have named coefficients. > > > I can more or less easily allow a named vector for beta > > > > > > beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) > > > > > > I could build a formula from that. That's not too bad. But I still think > > > it would be cool to allow formula input. > > > > > > Have you ever seen it done? > > > pj > > > -- > > > Paul E. Johnson http://pj.freefaculty.org > > > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > > > > > To write to me directly, please address me at pauljohn at ku.edu. > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > Statistics & Software Consulting > > GKX Group, GKX Associates Inc. > > tel: 1-877-GKX-GROUP > > email: ggrothendieck at gmail.com > > > > -- > Paul E. Johnson http://pj.freefaculty.org > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > To write to me directly, please address me at pauljohn at ku.edu.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Gabor Grothendieck
2018-Aug-25 02:24 UTC
[R] looking for formula parser that allows coefficients
The isChar function used in Parse is: isChar <- function(e, ch) identical(e, as.symbol(ch)) On Fri, Aug 24, 2018 at 10:06 PM Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> > Also here is a solution that uses formula processing rather than > string processing. > No packages are used. > > Parse <- function(e) { > if (length(e) == 1) { > if (is.numeric(e)) return(e) > else setNames(1, as.character(e)) > } else { > if (isChar(e[[1]], "*")) { > x1 <- Recall(e[[2]]) > x2 <- Recall(e[[3]]) > setNames(unname(x1 * x2), paste0(names(x1), names(x2))) > } else if (isChar(e[[1]], "+")) c(Recall(e[[2]]), Recall(e[[3]])) > else if (isChar(e[[1]], "-")) { > if (length(e) == 2) -1 * Recall(e[[2]]) > else c(Recall(e[[2]]), -Recall(e[[3]])) > } else if (isChar(e[[1]], ":")) setNames(1, paste(e[-1], collapse = ":")) > } > } > > # test > fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2 > Parse(fo[[3]]) > > giving: > > x1 x3 x1:x3 x2:x2 > 2.0 -1.1 1.0 -1.0 0.2 > On Wed, Aug 22, 2018 at 11:50 AM Paul Johnson <pauljohn32 at gmail.com> wrote: > > > > Thanks as usual. I owe you more KU decorations soon. > > On Wed, Aug 22, 2018 at 2:34 AM Gabor Grothendieck > > <ggrothendieck at gmail.com> wrote: > > > > > > Some string manipulation can convert the formula to a named vector such as > > > the one shown at the end of your post. > > > > > > library(gsubfn) > > > > > > # input > > > fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2 > > > > > > pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?" > > > ch <- format(fo[[3]]) > > > m <- matrix(strapplyc(ch, pat)[[1]], 3) > > > m <- m[, colSums(m != "") > 0] > > > m[2, m[2, ] == ""] <- 1 > > > m[3, m[3, ] == ""] <- "(Intercept)" > > > co <- as.numeric(paste0(m[1, ], m[2, ])) > > > v <- m[3, ] > > > setNames(co, v) > > > ## (Intercept) x1 x3 x1:x3 x2:x2 > > > ## 2.0 -1.1 1.0 -1.0 0.2 > > > On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 at gmail.com> wrote: > > > > > > > > Can you point me at any packages that allow users to write a > > > > formula with coefficients? > > > > > > > > I want to write a data simulator that has a matrix X with lots > > > > of columns, and then users can generate predictive models > > > > by entering a formula that uses some of the variables, allowing > > > > interactions, like > > > > > > > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 > > > > > > > > Currently, in the rockchalk package, I have a function simulates > > > > data (genCorrelatedData2), but my interface to enter the beta > > > > coefficients is poor. I assumed user would always enter 0's as > > > > place holder for the unused coefficients, and the intercept is > > > > always first. The unnamed vector is too confusing. I have them specify: > > > > > > > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...) > > > > > > > > I the documentation I say (ridiculously) it is easy to figure out from > > > > the examples, but it really isnt. > > > > It function prints out the equation it thinks you intended, thats > > > > minimum protection against user error, but still not very good: > > > > > > > > dat <- genCorrelatedData2(N = 10, rho = 0.0, > > > > beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0), > > > > means = c(0,0,0), sds = c(1,1,1), stde = 0) > > > > [1] "The equation that was calculated was" > > > > y = 1 + 2*x1 + 1*x2 + 1*x3 > > > > + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1 > > > > + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 > > > > + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 > > > > + N(0,0) random error > > > > > > > > But still, it is not very good. > > > > > > > > As I look at this now, I realize expect just the vech, not the whole vector > > > > of all interaction terms, so it is even more difficult than I thought to get the > > > > correct input.Hence, I'd like to let the user write a formula. > > > > > > > > The alternative for the user interface is to have named coefficients. > > > > I can more or less easily allow a named vector for beta > > > > > > > > beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) > > > > > > > > I could build a formula from that. That's not too bad. But I still think > > > > it would be cool to allow formula input. > > > > > > > > Have you ever seen it done? > > > > pj > > > > -- > > > > Paul E. Johnson http://pj.freefaculty.org > > > > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > > > > > > > To write to me directly, please address me at pauljohn at ku.edu. > > > > > > > > ______________________________________________ > > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > -- > > > Statistics & Software Consulting > > > GKX Group, GKX Associates Inc. > > > tel: 1-877-GKX-GROUP > > > email: ggrothendieck at gmail.com > > > > > > > > -- > > Paul E. Johnson http://pj.freefaculty.org > > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu > > > > To write to me directly, please address me at pauljohn at ku.edu. > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com