Andrew Agrimson
2014-Aug-18 19:58 UTC
[R] Building and scoring multiple models with dplyr functions
Hello All,
I have a question regarding building multiple models and then scoring new
data with these models. I have been tasked with converting existing SAS
modeling code into equivalent R code, but unfortunately I rarely use R so
I'm in unfamiliar territory. I've think I've found a good way to
build the
models using dplyr functions, however I'm having difficulty figuring out
how to use the models to score new data.
*The SAS code I'm converting builds multiple binomial models using the
"BY"
statement in the GLIMMIX procedure. The results of the modeling fitting
process are stored using the "STORE" statement. *
*proc* *glimmix* data = model_data;
by group;
model y/n = var1-var10/dist=bin;
store model;
*quit*;
*The next step is to score a new data set using the PLM procedure. The "New
Data" is also grouped by "group" and PLM is able to match and
apply the
appropriate model with the appropriate "by" value. *
*proc* *plm* restore=model;
score data=new_data out=scored predicted=p/ilink;
*run*;
*In R I've been able to reproduce the first model building step using dplyr
functions and it seems to work quite well. In fact it's much faster than my
SAS implementation.*
by_group <- group_by(model_data, group)
models <- by_group %>% do(mod = glm(cbind(y,n) ~ var1 + var2 + var3 + var4
+ var5 + var6 + var7 + var8 + var9 + var10,
family = binomial, data = .))
*As stated above, I cannot figure out how to apply these models to new
data. I've scoured the internet and the documentation for an example but
so far no luck. I want to extract the model objects out of the data frame
"models" and apply the "predict" function, but my novice
knowledge of R and
dplyr specifically is making this very difficult.*
*Any help or advice would be greatly appreciated.*
*Thanks,*
*Andy*
[[alternative HTML version deleted]]
Ista Zahn
2014-Aug-19 01:09 UTC
[R] Building and scoring multiple models with dplyr functions
At the risk of being old-fashioned, I suggest doing this in a
for-loop. Why struggle to fit this into the dplyr framework when a
straight-forward loop will do the trick?
This is untested in the absence of example data, but something along
the lines of
models <- list()
predictions <- list()
for(g in unique(model_data$group)) {
models[[g]] <- glm(cbind(y,n) ~ var1 + var2 +
var3 + var4 + var5 +
var6 + var7 + var8 +
var9 + var10,
family = binomial,
data = subset(model_data, group == g)
)
predictions[[g]] <- predict(models[[g]],
newdata = subset(new_data, group == g))
}
should do it.
Best,
Ista
On Mon, Aug 18, 2014 at 3:58 PM, Andrew Agrimson <jagrimsasl at gmail.com>
wrote:> Hello All,
>
> I have a question regarding building multiple models and then scoring new
> data with these models. I have been tasked with converting existing SAS
> modeling code into equivalent R code, but unfortunately I rarely use R so
> I'm in unfamiliar territory. I've think I've found a good way
to build the
> models using dplyr functions, however I'm having difficulty figuring
out
> how to use the models to score new data.
>
>
> *The SAS code I'm converting builds multiple binomial models using the
"BY"
> statement in the GLIMMIX procedure. The results of the modeling fitting
> process are stored using the "STORE" statement. *
>
> *proc* *glimmix* data = model_data;
>
> by group;
>
> model y/n = var1-var10/dist=bin;
>
> store model;
>
> *quit*;
>
>
> *The next step is to score a new data set using the PLM procedure. The
"New
> Data" is also grouped by "group" and PLM is able to match
and apply the
> appropriate model with the appropriate "by" value. *
>
> *proc* *plm* restore=model;
>
> score data=new_data out=scored predicted=p/ilink;
>
> *run*;
>
>
> *In R I've been able to reproduce the first model building step using
dplyr
> functions and it seems to work quite well. In fact it's much faster
than my
> SAS implementation.*
>
> by_group <- group_by(model_data, group)
>
> models <- by_group %>% do(mod = glm(cbind(y,n) ~ var1 + var2 + var3 +
var4
> + var5 + var6 + var7 + var8 + var9 + var10,
> family = binomial, data = .))
>
>
> *As stated above, I cannot figure out how to apply these models to new
> data. I've scoured the internet and the documentation for an example
but
> so far no luck. I want to extract the model objects out of the data frame
> "models" and apply the "predict" function, but my
novice knowledge of R and
> dplyr specifically is making this very difficult.*
>
> *Any help or advice would be greatly appreciated.*
>
>
> *Thanks,*
>
> *Andy*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.