Guazzelli, Alex
2009-Jul-08 22:38 UTC
[R] How to deploy statistical models built in R in real-time?
I am framing this as a question since I would like to know how folks are currently deploying the models they build in R. Say, you want to use the results of your model inside another application in real-time or on-demand, how do you do it? How do you use the decisions you get back from your models? As you may know, a PMML package is available for R that allows for many mining model to be exported into the Predictive Model Markup Language. PMML is the standard way to represent models and can be exported from most statistical packages (including SPSS, SAS, KNIME, ...). Once your model is represented as a PMML file, it can easily be moved around. PMML allows for true interoperability. We have recently published an article about PMML on The R Journal. It basically describes the PMML language and the package itself. If you are interested in finding out more about PMML and how to benefit from this standard, please check the link below. http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf We have also wrote a paper about open standards and cloud computing for the SIGKDD Explorations newsletter. In this paper, we describe the ADAPA Scoring Engine which executes PMML models and is available as a service on the Amazon Cloud. ADAPA can be used to deploy R models in real-time from anywhere in the world. I believe it represents a revolution in data mining since it allows for anyone that uses R to make effective use of predictive models at a cost of less than $1/hour. http://www.zementis.com/docs/SIGKDD_ADAPA.pdf Thanks! Alex [[alternative HTML version deleted]]
Allan Engelhardt
2009-Jul-15 08:12 UTC
[R] How to deploy statistical models built in R in real-time?
> > I am framing this as a question since I would like to know how folks > are currently deploying the models they build in R. Say, you want to > use the results of your model inside another application in real-time > or on-demand, how do you do it? How do you use the decisions you get > back from your models?Late answer, sorry. I love PMML (and have been advocating it since at least version 2.0) but I rarely see it deployed in commercial companies. What I see in decreasing order of importance: 1. Pre-scoring. That is pre-calculate the scores of your model for each customer and stuff them into a database that your operational system can access. Example: Customer churn in mobile telco. 2. Convert the model to SQL. This is obviously easier for some model types (trees, k-nearest neighbour, ...) than others. This is surprisingly common. Example: A Big Famous Data Insights Company created a global customer segmentation model (really: 'cause all markets and cultures are the same....) for a multi-national company and distributed it as a Word document with pseudo-SQL fragments for each country to implement. Gets over the problem of different technologies in different countries. 3. Pre-scoring for multiple likely events. Example: For cross- and up-sell in a call centre (which is phenomenally effective) you really want to include the outcome of the original call as an input to the propensity model. A badly handled complaint call does not offer the same opportunities for flogging more products as a successful upgrade to a higher price plan (but might be an opportunity to extend an (expensive) retention offer). The Right Way to do this is to run the model in real time which would usually mean PMML if you have created the model in R. At least one vendor recommended “just” pre-scoring the model for each possible (relevant) call outcome and storing that in the operational database. That vendor also sold databases :-) 4. Use PL/R to embed R within your (PostgreSQL) RDBMS. (Rare.) 5. Embed R within your operational application and run the model that way (I have done this exactly once). Somewhere between 1 and 2 is an approach that doesn’t really fit with the way you framed the question (and is probably OT for this list). It is simply this: if you want to deploy models for real-time or fast on-demand usage, usually you don’t implement them in R (or SAS or SPSS). In Marketing, which is my main area, there are dedicated tools for real-time decisioning and marketing like Oracle RTD [1], Unica inbound marketing [2], Chordiant Recommendation Advisor, and others [3], though only the first of these can realistically be described as “modelling”. Happy to discuss this more offline if you want. And I really like your approach - hope to actually use it some day. Allan. More at http://www.pcagroup.co.uk/ and http://www.cybaea.net/Blogs/Data/ [1] http://www.oracle.com/appserver/business-intelligence/real-time-decisions.html [2] http://www.unica.com/products/Inbound_Marketing.htm [web site down at time of writing] [3] E.piphany and SPSS Interaction Builder appears to be nearly dead in the market. On 08/07/09 23:38, Guazzelli, Alex wrote:> I am framing this as a question since I would like to know how folks > are currently deploying the models they build in R. Say, you want to > use the results of your model inside another application in real-time > or on-demand, how do you do it? How do you use the decisions you get > back from your models? > > As you may know, a PMML package is available for R that allows for > many mining model to be exported into the Predictive Model Markup > Language. PMML is the standard way to represent models and can be > exported from most statistical packages (including SPSS, SAS, > KNIME, ...). Once your model is represented as a PMML file, it can > easily be moved around. PMML allows for true interoperability. We have > recently published an article about PMML on The R Journal. It > basically describes the PMML language and the package itself. If you > are interested in finding out more about PMML and how to benefit from > this standard, please check the link below. > > http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf > > We have also wrote a paper about open standards and cloud computing > for the SIGKDD Explorations newsletter. In this paper, we describe the > ADAPA Scoring Engine which executes PMML models and is available as a > service on the Amazon Cloud. ADAPA can be used to deploy R models in > real-time from anywhere in the world. I believe it represents a > revolution in data mining since it allows for anyone that uses R to > make effective use of predictive models at a cost of less than $1/hour. > > http://www.zementis.com/docs/SIGKDD_ADAPA.pdf > > Thanks! > > Alex > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Possibly Parallel Threads
- U R ready for R! Now deploy your R models via cloud computing!
- Deploying a R model on a cloud computer using PMML
- Predictive Analytics with R, PMML and ADAPA
- pmml 1.2.0 (predictive modelling markup language)
- pmml 1.2.0 (predictive modelling markup language)