This news item in a data mining newsletter makes various claims for a technique called "Reduced Error Logistic Regression": http://www.kdnuggets.com/news/2007/n08/12i.html In brief, are these (ambitious) claims justified and if so, has this technique been implemented in R (or does anyone have any plans to do so)? Tim C
I don't know about the claims, but I do know about this:> Recent News: January 31, 2007. St. Louis, MO - Rice Analytics > applied for a U.S. patent this week on a generalized form of > Reduced Error Logistic Regression. This generalized form allows > repeated measures, multilevel, and survival designs that include > individual level estimates. None of these capabilities were > possible with the previously disclosed formulation which also had > limited application because it could only be applied to models > where all variables had no missing observationsThis is a very bad trend in science and statistics, IMHO. -Roy M. On Apr 25, 2007, at 7:29 PM, Tim Churches wrote:> This news item in a data mining newsletter makes various claims for > a technique called "Reduced Error Logistic Regression": http:// > www.kdnuggets.com/news/2007/n08/12i.html > > In brief, are these (ambitious) claims justified and if so, has > this technique been implemented in R (or does anyone have any plans > to do so)? > > Tim C > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center 1352 Lighthouse Avenue Pacific Grove, CA 93950-2097 e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address) voice: (831)-648-9029 fax: (831)-648-8440 www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill."
>From what I've read (which isn't much), the idea is to estimate autility (preference) function for discrete categories, using logistic regression, under the assumption that the residuals of the linear predictor of the utilities are ~ Type I Gumbel. This implies the "independence of irrelevant alternatives" in economic jargon. ie the utility of choice a versus choice b is independent of the introduction of a third choice c. It also implies homoscedasticity of the errors. The model can be generalized in various ways. If you are willing to introduce extra parameters into the model, such as the parameters of the Gumbel distribution, you may get more precision in the estimates of the utility function. An alternative (without the independence of irrelevant alternatives assumption) is to model the errors as multivariate normal (ie use probit regression), which is computationally much more difficult. Whether it makes substantive sense to use these models outside of "discrete choice" experiments is another question. Patenting these methods is worrying. There have been a lot of people working on discrete choice experiments over the years. It's hard to believe that a single company could have ownership over an idea that is the result of a collaborative effort such as this. Cheers, Simon. On Thu, 2007-04-26 at 12:29 +1000, Tim Churches wrote:> This news item in a data mining newsletter makes various claims for a technique called "Reduced Error Logistic Regression": http://www.kdnuggets.com/news/2007/n08/12i.html > > In brief, are these (ambitious) claims justified and if so, has this technique been implemented in R (or does anyone have any plans to do so)? > > Tim C > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320, Goddard Building (8) T: +61 7 3365 2506 email: S.Blomberg1_at_uq.edu.au The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey.
paulandpen at optusnet.com.au
2007-Apr-26 06:35 UTC
[R] Reduced Error Logistic Regression, and R?
Further to Simon's points, Here is what is confusing to me and I highlight the section of the claims below: The key assumption concerns "symmetrical error constraints". These "symmetrical error constraints" force a solution where the probabilities of positive and negative error are symmetrical across all cross product sums that are the basis of maximum likelihood logistic regression. As the number of independent variables increases, it becomes more and more likely that this symmetrical assumption is accurate. Because this error component can be reliably estimated and subtracted out with a large enough number of variables, the resulting model parameters are strikingly error-free and do not overfit the data. For me, maybe this is a bit old school here, but isn't the point of model development generating the most parsimonious model with the greatest explanatory power from the fewest variables. I myself could just imagine going to a client and standing in a 'bored' (grin) room for a presentation, and saying hay client, here are the 200 variables that are driving choice behaviour. I use latent class and bayes based approaches because they recover heterogeneity in utility allocation across the sample, that to me is a big battle in choice based analytics. I believe that after a certain point, a heap of predictors become meaningless. I can see some of my colleagues adopting this because it is in SAS and makes up for poor design. Anyway, from a technical point of view, I would have to read a little about the error they are referring to. Good on them for developing a new technology, like any algorithm, it will have its strengths and weaknesses and depending on factors such as usability etc, will gain some level of acceptance. Paul> Simon Blomberg <s.blomberg1@uq.edu.au> wrote: > > >From what I've read (which isn't much), the idea is to estimate a > utility (preference) function for discrete categories, using logistic > regression, under the assumption that the residuals of the linear > predictor of the utilities are ~ Type I Gumbel. This implies the > "independence of irrelevant alternatives" in economic jargon. ie the > utility of choice a versus choice b is independent of the introduction > of a third choice c. It also implies homoscedasticity of the errors. The > model can be generalized in various ways. If you are willing to > introduce extra parameters into the model, such as the parameters of the > Gumbel distribution, you may get more precision in the estimates of the > utility function. An alternative (without the independence of irrelevant > alternatives assumption) is to model the errors as multivariate normal > (ie use probit regression), which is computationally much more > difficult. > > Whether it makes substantive sense to use these models outside of > "discrete choice" experiments is another question. > > Patenting these methods is worrying. There have been a lot of people > working on discrete choice experiments over the years. It's hard to > believe that a single company could have ownership over an idea that is > the result of a collaborative effort such as this. > > Cheers, > > Simon. > > On Thu, 2007-04-26 at 12:29 +1000, Tim Churches wrote: > > This news item in a data mining newsletter makes various claims for a > technique called "Reduced Error Logistic Regression": > http://www.kdnuggets.com/news/2007/n08/12i.html > > > > In brief, are these (ambitious) claims justified and if so, has this > technique been implemented in R (or does anyone have any plans to do > so)? > > > > Tim C > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- > Simon Blomberg, BSc (Hons), PhD, MAppStat. > Lecturer and Consultant Statistician > Faculty of Biological and Chemical Sciences > The University of Queensland > St. Lucia Queensland 4072 > Australia > > Room 320, Goddard Building (8) > T: +61 7 3365 2506 > email: S.Blomberg1_at_uq.edu.au > > The combination of some data and an aching desire for > an answer does not ensure that a reasonable answer can > be extracted from a given body of data. - John Tukey. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.