Andrew Agrimson
2014-Aug-18 19:58 UTC
[R] Building and scoring multiple models with dplyr functions
Hello All, I have a question regarding building multiple models and then scoring new data with these models. I have been tasked with converting existing SAS modeling code into equivalent R code, but unfortunately I rarely use R so I'm in unfamiliar territory. I've think I've found a good way to build the models using dplyr functions, however I'm having difficulty figuring out how to use the models to score new data. *The SAS code I'm converting builds multiple binomial models using the "BY" statement in the GLIMMIX procedure. The results of the modeling fitting process are stored using the "STORE" statement. * *proc* *glimmix* data = model_data; by group; model y/n = var1-var10/dist=bin; store model; *quit*; *The next step is to score a new data set using the PLM procedure. The "New Data" is also grouped by "group" and PLM is able to match and apply the appropriate model with the appropriate "by" value. * *proc* *plm* restore=model; score data=new_data out=scored predicted=p/ilink; *run*; *In R I've been able to reproduce the first model building step using dplyr functions and it seems to work quite well. In fact it's much faster than my SAS implementation.* by_group <- group_by(model_data, group) models <- by_group %>% do(mod = glm(cbind(y,n) ~ var1 + var2 + var3 + var4 + var5 + var6 + var7 + var8 + var9 + var10, family = binomial, data = .)) *As stated above, I cannot figure out how to apply these models to new data. I've scoured the internet and the documentation for an example but so far no luck. I want to extract the model objects out of the data frame "models" and apply the "predict" function, but my novice knowledge of R and dplyr specifically is making this very difficult.* *Any help or advice would be greatly appreciated.* *Thanks,* *Andy* [[alternative HTML version deleted]]
Ista Zahn
2014-Aug-19 01:09 UTC
[R] Building and scoring multiple models with dplyr functions
At the risk of being old-fashioned, I suggest doing this in a for-loop. Why struggle to fit this into the dplyr framework when a straight-forward loop will do the trick? This is untested in the absence of example data, but something along the lines of models <- list() predictions <- list() for(g in unique(model_data$group)) { models[[g]] <- glm(cbind(y,n) ~ var1 + var2 + var3 + var4 + var5 + var6 + var7 + var8 + var9 + var10, family = binomial, data = subset(model_data, group == g) ) predictions[[g]] <- predict(models[[g]], newdata = subset(new_data, group == g)) } should do it. Best, Ista On Mon, Aug 18, 2014 at 3:58 PM, Andrew Agrimson <jagrimsasl at gmail.com> wrote:> Hello All, > > I have a question regarding building multiple models and then scoring new > data with these models. I have been tasked with converting existing SAS > modeling code into equivalent R code, but unfortunately I rarely use R so > I'm in unfamiliar territory. I've think I've found a good way to build the > models using dplyr functions, however I'm having difficulty figuring out > how to use the models to score new data. > > > *The SAS code I'm converting builds multiple binomial models using the "BY" > statement in the GLIMMIX procedure. The results of the modeling fitting > process are stored using the "STORE" statement. * > > *proc* *glimmix* data = model_data; > > by group; > > model y/n = var1-var10/dist=bin; > > store model; > > *quit*; > > > *The next step is to score a new data set using the PLM procedure. The "New > Data" is also grouped by "group" and PLM is able to match and apply the > appropriate model with the appropriate "by" value. * > > *proc* *plm* restore=model; > > score data=new_data out=scored predicted=p/ilink; > > *run*; > > > *In R I've been able to reproduce the first model building step using dplyr > functions and it seems to work quite well. In fact it's much faster than my > SAS implementation.* > > by_group <- group_by(model_data, group) > > models <- by_group %>% do(mod = glm(cbind(y,n) ~ var1 + var2 + var3 + var4 > + var5 + var6 + var7 + var8 + var9 + var10, > family = binomial, data = .)) > > > *As stated above, I cannot figure out how to apply these models to new > data. I've scoured the internet and the documentation for an example but > so far no luck. I want to extract the model objects out of the data frame > "models" and apply the "predict" function, but my novice knowledge of R and > dplyr specifically is making this very difficult.* > > *Any help or advice would be greatly appreciated.* > > > *Thanks,* > > *Andy* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.