Dear all, I have a couple of short noob questions for whoever can take them. I'm from a very non-stats background so sorry for offending anybody with stupid questions ! :-) I have been using logistic regression care of glm to analyse a binary dependent variable against a couple of independent variables. All has gone well so far. In my work I have to compare the accuracy of analysis to a C4.5 machine learning approach. With the machine learning, a straight-forward measure of the quality of the classifier is simply the percentage of correctly classified instances. I can calculate this for the resultant model by comparing predictions to original values 'manually'. My question: is this not automatically - or easily - calculated in the produced model or the summary of that model? I want to use my model in real time to produce results for new inputs. Basically this model is to be used as a classifier for a robot in real time. Can anyone suggest the best way that a produced model can be used directly in external code once the model has been developed in R? If my external code is in Java, then using jri is one option. A more efficient method would be to take the intercept and coefficients and actually code up the function in the appropriate programming language. Has anyone ever tried doing this? Apologies again for the stupid questions, but the sooner I get some of these things straight, the better. Claus
confusionMatrix in the caret package can be used to replace your manual procedure. You could try using RWeka, the R interface to the java Weka software. Once you have it working you could then directly interface your java program to Weka without involving R. On Thu, Apr 22, 2010 at 9:29 PM, Claus O'Rourke <claus.orourke at gmail.com> wrote:> Dear all, > > I have a couple of short noob questions for whoever can take them. I'm > from a very non-stats background so sorry for offending anybody with > stupid questions ! :-) > > I have been using logistic regression care of glm to analyse a binary > dependent variable against a couple of independent variables. All has > gone well so far. In my work I have to compare the accuracy of > analysis to a C4.5 machine learning approach. With the machine > learning, a straight-forward measure of the quality of the classifier > is simply the percentage of correctly classified instances. I can > calculate this for the resultant model by comparing predictions to > original values 'manually'. My question: is this not automatically - > or easily - calculated in the produced model or the summary of that > model? > > I want to use my model in real time to produce results for new inputs. > Basically this model is to be used as a classifier for a robot in real > time. Can anyone suggest the best way that a produced model can be > used directly in external code once the model has been developed in R? > If my external code is in Java, then using jri is one option. A more > efficient method would be to take the intercept and coefficients and > actually code up the function in the appropriate programming language. > Has anyone ever tried doing this? > > Apologies again for the stupid questions, but the sooner I get some of > these things straight, the better. > > Claus
Claus O'Rourke wrote:> Dear all, > > I have a couple of short noob questions for whoever can take them. I'm > from a very non-stats background so sorry for offending anybody with > stupid questions ! :-) > > I have been using logistic regression care of glm to analyse a binary > dependent variable against a couple of independent variables. All has > gone well so far. In my work I have to compare the accuracy of > analysis to a C4.5 machine learning approach. With the machine > learning, a straight-forward measure of the quality of the classifier > is simply the percentage of correctly classified instances. I can > calculate this for the resultant model by comparing predictions to > original values 'manually'. My question: is this not automatically - > or easily - calculated in the produced model or the summary of that > model?The percent classified correctly is an improper scoring rule that will lead to a selection of a bogus model. You can easily find examples where adding a very important variable to a binary logistic model results in a decrease in the percent "correct". Frank> > I want to use my model in real time to produce results for new inputs. > Basically this model is to be used as a classifier for a robot in real > time. Can anyone suggest the best way that a produced model can be > used directly in external code once the model has been developed in R? > If my external code is in Java, then using jri is one option. A more > efficient method would be to take the intercept and coefficients and > actually code up the function in the appropriate programming language. > Has anyone ever tried doing this? > > Apologies again for the stupid questions, but the sooner I get some of > these things straight, the better. > > Claus > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University
When you just want to calculate the probability of belong to class A or B of a new observation xi and do not have to do any new model estimations or other analyses, the easiest way is probably to write the estimated coefficients to a text write and read them in in your java/c/whatever program and use them directly to calculate the probabilities. This is simply p_i = 1/(1+e^(-(b0 + b1 x1i + ... + bk xki))) with the b0 ... bk your parameters. Easy to implement. Having an interface to R for this seems to me overkill. Regards, Jan