thr3ads.net - R help - [R] looking for formula parser that allows coefficients [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Paul Johnson

2018-Aug-21 22:45 UTC

[R] looking for formula parser that allows coefficients

Can you point me at any packages that allow users to write a
formula with coefficients?

I want to write a data simulator that has a matrix X with lots
of columns, and then users can generate predictive models
by entering a formula that uses some of the variables, allowing
interactions, like

y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2

Currently, in the rockchalk package, I have a function simulates
data (genCorrelatedData2), but my interface to enter the beta
coefficients is poor.  I assumed user would always enter 0's as
place holder for the unused coefficients, and the intercept is
always first. The unnamed vector is too confusing.  I have them specify:

c(2, 1.1, 0, 3, 0, 0, 0.2, ...)

I the documentation I say (ridiculously) it is easy to figure out from
the examples, but it really isnt.
It function prints out the equation it thinks you intended, thats
minimum protection against user error, but still not very good:

dat <- genCorrelatedData2(N = 10, rho = 0.0,
          beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
          means = c(0,0,0), sds = c(1,1,1), stde = 0)
[1] "The equation that was calculated was"
y = 1 + 2*x1 + 1*x2 + 1*x3
 + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
 + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
 + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
 + N(0,0) random error

But still, it is not very good.

As I look at this now, I realize expect just the vech, not the whole vector
of all interaction terms, so it is even more difficult than I thought to get the
correct input.Hence, I'd like to let the user write a formula.

The alternative for the user interface is to have named coefficients.
I can more or less easily allow a named vector for beta

beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1,
"x3" = 1, "x2:x1" = 0.1)

I could build a formula from that.  That's not too bad. But I still think
it would be cool to allow formula input.

Have you ever seen it done?
pj
-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

Gabor Grothendieck

2018-Aug-22 07:33 UTC

head link

[R] looking for formula parser that allows coefficients

Some string manipulation can convert the formula to a named vector such as
the one shown at the end of your post.

library(gsubfn)

# input
fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2

pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?"
ch <- format(fo[[3]])
m <- matrix(strapplyc(ch, pat)[[1]], 3)
m <- m[, colSums(m != "") > 0]
m[2, m[2, ] == ""] <- 1
m[3, m[3, ] == ""] <- "(Intercept)"
co <- as.numeric(paste0(m[1, ], m[2, ]))
v <- m[3, ]
setNames(co, v)
## (Intercept)          x1          x3       x1:x3       x2:x2
##         2.0        -1.1         1.0        -1.0         0.2
On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 at gmail.com>
wrote:>
> Can you point me at any packages that allow users to write a
> formula with coefficients?
>
> I want to write a data simulator that has a matrix X with lots
> of columns, and then users can generate predictive models
> by entering a formula that uses some of the variables, allowing
> interactions, like
>
> y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2
>
> Currently, in the rockchalk package, I have a function simulates
> data (genCorrelatedData2), but my interface to enter the beta
> coefficients is poor.  I assumed user would always enter 0's as
> place holder for the unused coefficients, and the intercept is
> always first. The unnamed vector is too confusing.  I have them specify:
>
> c(2, 1.1, 0, 3, 0, 0, 0.2, ...)
>
> I the documentation I say (ridiculously) it is easy to figure out from
> the examples, but it really isnt.
> It function prints out the equation it thinks you intended, thats
> minimum protection against user error, but still not very good:
>
> dat <- genCorrelatedData2(N = 10, rho = 0.0,
>           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
>           means = c(0,0,0), sds = c(1,1,1), stde = 0)
> [1] "The equation that was calculated was"
> y = 1 + 2*x1 + 1*x2 + 1*x3
>  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
>  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
>  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
>  + N(0,0) random error
>
> But still, it is not very good.
>
> As I look at this now, I realize expect just the vech, not the whole vector
> of all interaction terms, so it is even more difficult than I thought to
get the
> correct input.Hence, I'd like to let the user write a formula.
>
> The alternative for the user interface is to have named coefficients.
> I can more or less easily allow a named vector for beta
>
> beta = c("(Intercept)" = 1, "x1" = 2, "x2" =
1, "x3" = 1, "x2:x1" = 0.1)
>
> I could build a formula from that.  That's not too bad. But I still
think
> it would be cool to allow formula input.
>
> Have you ever seen it done?
> pj
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Gabor Grothendieck

2018-Aug-25 02:06 UTC

head link

[R] looking for formula parser that allows coefficients

Also here is a solution that uses formula processing rather than
string processing.
No packages are used.

Parse <- function(e) {
  if (length(e) == 1) {
    if (is.numeric(e)) return(e)
    else setNames(1, as.character(e))
  } else {
    if (isChar(e[[1]], "*")) {
       x1 <- Recall(e[[2]])
       x2 <- Recall(e[[3]])
       setNames(unname(x1 * x2), paste0(names(x1), names(x2)))
    } else if (isChar(e[[1]], "+")) c(Recall(e[[2]]), Recall(e[[3]]))
    else if (isChar(e[[1]], "-")) {
      if (length(e) == 2) -1 * Recall(e[[2]])
      else c(Recall(e[[2]]), -Recall(e[[3]]))
    } else if (isChar(e[[1]], ":")) setNames(1, paste(e[-1], collapse
= ":"))
  }
}

# test
fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2
Parse(fo[[3]])

giving:

         x1    x3 x1:x3 x2:x2
  2.0  -1.1   1.0  -1.0   0.2
On Wed, Aug 22, 2018 at 11:50 AM Paul Johnson <pauljohn32 at gmail.com>
wrote:>
> Thanks as usual.  I owe you more KU decorations soon.
> On Wed, Aug 22, 2018 at 2:34 AM Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
> >
> > Some string manipulation can convert the formula to a named vector
such as
> > the one shown at the end of your post.
> >
> > library(gsubfn)
> >
> > # input
> > fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2
> >
> > pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?"
> > ch <- format(fo[[3]])
> > m <- matrix(strapplyc(ch, pat)[[1]], 3)
> > m <- m[, colSums(m != "") > 0]
> > m[2, m[2, ] == ""] <- 1
> > m[3, m[3, ] == ""] <- "(Intercept)"
> > co <- as.numeric(paste0(m[1, ], m[2, ]))
> > v <- m[3, ]
> > setNames(co, v)
> > ## (Intercept)          x1          x3       x1:x3       x2:x2
> > ##         2.0        -1.1         1.0        -1.0         0.2
> > On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 at
gmail.com> wrote:
> > >
> > > Can you point me at any packages that allow users to write a
> > > formula with coefficients?
> > >
> > > I want to write a data simulator that has a matrix X with lots
> > > of columns, and then users can generate predictive models
> > > by entering a formula that uses some of the variables, allowing
> > > interactions, like
> > >
> > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2
> > >
> > > Currently, in the rockchalk package, I have a function simulates
> > > data (genCorrelatedData2), but my interface to enter the beta
> > > coefficients is poor.  I assumed user would always enter 0's
as
> > > place holder for the unused coefficients, and the intercept is
> > > always first. The unnamed vector is too confusing.  I have them
specify:
> > >
> > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...)
> > >
> > > I the documentation I say (ridiculously) it is easy to figure out
from
> > > the examples, but it really isnt.
> > > It function prints out the equation it thinks you intended, thats
> > > minimum protection against user error, but still not very good:
> > >
> > > dat <- genCorrelatedData2(N = 10, rho = 0.0,
> > >           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
> > >           means = c(0,0,0), sds = c(1,1,1), stde = 0)
> > > [1] "The equation that was calculated was"
> > > y = 1 + 2*x1 + 1*x2 + 1*x3
> > >  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
> > >  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
> > >  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
> > >  + N(0,0) random error
> > >
> > > But still, it is not very good.
> > >
> > > As I look at this now, I realize expect just the vech, not the
whole vector
> > > of all interaction terms, so it is even more difficult than I
thought to get the
> > > correct input.Hence, I'd like to let the user write a
formula.
> > >
> > > The alternative for the user interface is to have named
coefficients.
> > > I can more or less easily allow a named vector for beta
> > >
> > > beta = c("(Intercept)" = 1, "x1" = 2,
"x2" = 1, "x3" = 1, "x2:x1" = 0.1)
> > >
> > > I could build a formula from that.  That's not too bad. But I
still think
> > > it would be cool to allow formula input.
> > >
> > > Have you ever seen it done?
> > > pj
> > > --
> > > Paul E. Johnson   http://pj.freefaculty.org
> > > Director, Center for Research Methods and Data Analysis
http://crmda.ku.edu
> > >
> > > To write to me directly, please address me at pauljohn at ku.edu.
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
>
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Gabor Grothendieck

2018-Aug-25 02:24 UTC

head link

[R] looking for formula parser that allows coefficients

The isChar function used in Parse is:

  isChar <- function(e, ch) identical(e, as.symbol(ch))
On Fri, Aug 24, 2018 at 10:06 PM Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:>
> Also here is a solution that uses formula processing rather than
> string processing.
> No packages are used.
>
> Parse <- function(e) {
>   if (length(e) == 1) {
>     if (is.numeric(e)) return(e)
>     else setNames(1, as.character(e))
>   } else {
>     if (isChar(e[[1]], "*")) {
>        x1 <- Recall(e[[2]])
>        x2 <- Recall(e[[3]])
>        setNames(unname(x1 * x2), paste0(names(x1), names(x2)))
>     } else if (isChar(e[[1]], "+")) c(Recall(e[[2]]),
Recall(e[[3]]))
>     else if (isChar(e[[1]], "-")) {
>       if (length(e) == 2) -1 * Recall(e[[2]])
>       else c(Recall(e[[2]]), -Recall(e[[3]]))
>     } else if (isChar(e[[1]], ":")) setNames(1, paste(e[-1],
collapse = ":"))
>   }
> }
>
> # test
> fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2
> Parse(fo[[3]])
>
> giving:
>
>          x1    x3 x1:x3 x2:x2
>   2.0  -1.1   1.0  -1.0   0.2
> On Wed, Aug 22, 2018 at 11:50 AM Paul Johnson <pauljohn32 at
gmail.com> wrote:
> >
> > Thanks as usual.  I owe you more KU decorations soon.
> > On Wed, Aug 22, 2018 at 2:34 AM Gabor Grothendieck
> > <ggrothendieck at gmail.com> wrote:
> > >
> > > Some string manipulation can convert the formula to a named
vector such as
> > > the one shown at the end of your post.
> > >
> > > library(gsubfn)
> > >
> > > # input
> > > fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2
> > >
> > > pat <- "([+-])? *(\\d\\S*)? *\\*?
*([[:alpha:]]\\S*)?"
> > > ch <- format(fo[[3]])
> > > m <- matrix(strapplyc(ch, pat)[[1]], 3)
> > > m <- m[, colSums(m != "") > 0]
> > > m[2, m[2, ] == ""] <- 1
> > > m[3, m[3, ] == ""] <- "(Intercept)"
> > > co <- as.numeric(paste0(m[1, ], m[2, ]))
> > > v <- m[3, ]
> > > setNames(co, v)
> > > ## (Intercept)          x1          x3       x1:x3       x2:x2
> > > ##         2.0        -1.1         1.0        -1.0         0.2
> > > On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 at
gmail.com> wrote:
> > > >
> > > > Can you point me at any packages that allow users to write a
> > > > formula with coefficients?
> > > >
> > > > I want to write a data simulator that has a matrix X with
lots
> > > > of columns, and then users can generate predictive models
> > > > by entering a formula that uses some of the variables,
allowing
> > > > interactions, like
> > > >
> > > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2
> > > >
> > > > Currently, in the rockchalk package, I have a function
simulates
> > > > data (genCorrelatedData2), but my interface to enter the
beta
> > > > coefficients is poor.  I assumed user would always enter
0's as
> > > > place holder for the unused coefficients, and the intercept
is
> > > > always first. The unnamed vector is too confusing.  I have
them specify:
> > > >
> > > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...)
> > > >
> > > > I the documentation I say (ridiculously) it is easy to
figure out from
> > > > the examples, but it really isnt.
> > > > It function prints out the equation it thinks you intended,
thats
> > > > minimum protection against user error, but still not very
good:
> > > >
> > > > dat <- genCorrelatedData2(N = 10, rho = 0.0,
> > > >           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
> > > >           means = c(0,0,0), sds = c(1,1,1), stde = 0)
> > > > [1] "The equation that was calculated was"
> > > > y = 1 + 2*x1 + 1*x2 + 1*x3
> > > >  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
> > > >  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
> > > >  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
> > > >  + N(0,0) random error
> > > >
> > > > But still, it is not very good.
> > > >
> > > > As I look at this now, I realize expect just the vech, not
the whole vector
> > > > of all interaction terms, so it is even more difficult than
I thought to get the
> > > > correct input.Hence, I'd like to let the user write a
formula.
> > > >
> > > > The alternative for the user interface is to have named
coefficients.
> > > > I can more or less easily allow a named vector for beta
> > > >
> > > > beta = c("(Intercept)" = 1, "x1" = 2,
"x2" = 1, "x3" = 1, "x2:x1" = 0.1)
> > > >
> > > > I could build a formula from that.  That's not too bad.
But I still think
> > > > it would be cool to allow formula input.
> > > >
> > > > Have you ever seen it done?
> > > > pj
> > > > --
> > > > Paul E. Johnson   http://pj.freefaculty.org
> > > > Director, Center for Research Methods and Data Analysis
http://crmda.ku.edu
> > > >
> > > > To write to me directly, please address me at pauljohn at
ku.edu.
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
code.
> > >
> > >
> > >
> > > --
> > > Statistics & Software Consulting
> > > GKX Group, GKX Associates Inc.
> > > tel: 1-877-GKX-GROUP
> > > email: ggrothendieck at gmail.com
> >
> >
> >
> > --
> > Paul E. Johnson   http://pj.freefaculty.org
> > Director, Center for Research Methods and Data Analysis
http://crmda.ku.edu
> >
> > To write to me directly, please address me at pauljohn at ku.edu.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

R help - Aug 2018 - looking for formula parser that allows coefficients

[R] looking for formula parser that allows coefficients

[R] looking for formula parser that allows coefficients

[R] looking for formula parser that allows coefficients

[R] looking for formula parser that allows coefficients