On 17-Sep-09 17:28:16, Noah Silverman wrote:> Hi,
> I'm not sure of the correct nomenclature or function for what
> I'm trying to do.
>
> I'm interested in calculated a logistic regression on a binary
> dependent variable (True,False).
>
> There are a few ways to easily do this in R. Both SVM and GLM
> work easily.
>
> The part that I want to add is "group wise" awareness. So that
> the algorithm computes the coefficients to maximize the liklihood
> of of a "True" label per group.
>
> An toy explanation is probably best. I've been looking at horse
> racing models as a fun field to learn about statistics and R.
>
> So, for this example, lets assume the following:
> 100 horses in our stable
> 10 horses per race
> 75 races this season (some horses race more than once.)
>
> The independent variables are things about a horse (average speed,
> number of past wins, etc.)
> The dependent variable is (Win, Lose) represented by (1,0)
>
> As mentioned above, an SVM or GLM will quickly work to estimate
> coefficients and probability of a Win. I'd like to take it further
> and estimate the probability of a win but look at the per race.
>
> I'm NOT interested in the group label as a final part of the model.
> I don't want a separate set of coefficients for each group. I just
> want the iterative algorithm to work toward maximizing the liklihood
> PER GROUP as an average.
>
> I looked extensively through rseek.org for things like "grouped
> logistic" and "nested logistic". I couldn't seem to
find anything
> do this. I'm probably naming it wrong.
>
> I assume that a MANUAL iteration concept would be to :
> 1) Pick a coefficient
> 2) Calculate the resulting probability for each horse.
> 3) Measure the strength of the result for each race (sum them
> together or average them?)
> 4) Adjust coefficient and repeat
>
> Surely there must be some standard function in a library that will
> do this.
>
> Can any of the stat gurus here offer some suggestions?
>
> Thanks!
> --
> Noah
In the context of your "fun example", you have a fundamental problem
in that (if I've understood your statement of it correctly) you will
have more than one of your horses in the same race (apparently 10).
Therefore, one of them winning excludes any of the others winning in
that same race, so their results are not independent of each
other.
Also, at least in real life, the probability that a given horse will
win in a particular race depends not only on the covariates "per
horse"
(such as your average speed, number of past wins, etc.), and indeed
on the condition of the race-course at the time, but also (and usually
strongly) on the characteristics of the other horses in the same race.
So a simple logistic model of the kind you seem to be proposing would
certainly not be realistic!
I would be happier thinking about your problem in the context of a
different kind of example ...
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Sep-09 Time: 19:06:27
------------------------------ XFMail ------------------------------