thr3ads.net - R help - [R] machine learning and horse racing [Sep 2007]

If this information is useful, please help other people find it:
Share via:

stephenc at ics.mq.edu.au

2007-Sep-17 19:55 UTC

[R] machine learning and horse racing

Hi

 I am trying to use various techniques (eg svm, logistic regression,
neural networks) to classify and predict the outcome of horse races.

 Most of my predictive features are categorical  - horse, jockey, trainer 
- and I keep on running out of memory owing to the size of the vector.

 Does anyone know how to solve the problem?

 I have classified the outcomes as win/lose or place/lose with a view to
train on x years of results and then testing on the subsequent years
results. Is there some alternate way of looking at the problem?

 Does anyone have pointers to published work in this area?

 Thanks.

 Stephen

jim holtman

2007-Sep-17 20:01 UTC

head link

[R] machine learning and horse racing

Need more information.  What is your operating system, how much memory
do you have, how big is your data, what operations are you failing on,
etc.

On 9/17/07, stephenc at ics.mq.edu.au <stephenc at ics.mq.edu.au>
wrote:> Hi
>
>  I am trying to use various techniques (eg svm, logistic regression,
> neural networks) to classify and predict the outcome of horse races.
>
>  Most of my predictive features are categorical  - horse, jockey, trainer
> - and I keep on running out of memory owing to the size of the vector.
>
>  Does anyone know how to solve the problem?
>
>  I have classified the outcomes as win/lose or place/lose with a view to
> train on x years of results and then testing on the subsequent years
> results. Is there some alternate way of looking at the problem?
>
>  Does anyone have pointers to published work in this area?
>
>  Thanks.
>
>  Stephen
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Moshe Olshansky

2007-Sep-18 01:59 UTC

head link

[R] machine learning and horse racing

Hi Stephen,

How many variables do you have?  How many of them are
categorical?
How many observations do you have?
Since I am not a racing expert, in how many races a
typical horse participates? How many years does it
usually span?

In the past I had a good experience with Random
Forest. There exists a RandomForest package in R. If
you run out of memory and do not mind to spend some
time you can try the original Fortran code (after
trying the R package without saving the forest).

Regards,

Moshe.

--- stephenc at ics.mq.edu.au wrote:
> Hi
> 
>  I am trying to use various techniques (eg svm,
> logistic regression,
> neural networks) to classify and predict the outcome
> of horse races.
> 
>  Most of my predictive features are categorical  -
> horse, jockey, trainer 
> - and I keep on running out of memory owing to the
> size of the vector.
> 
>  Does anyone know how to solve the problem?
> 
>  I have classified the outcomes as win/lose or
> place/lose with a view to
> train on x years of results and then testing on the
> subsequent years
> results. Is there some alternate way of looking at
> the problem?
> 
>  Does anyone have pointers to published work in this
> area?
> 
>  Thanks.
> 
>  Stephen
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

Gerard Smits

2007-Sep-18 21:03 UTC

head link

[R] machine learning and horse racing

Hi Stephen,

Not responding to the R memory question, but to the racing.

I worked on this many years ago and found no way of overcoming the 
19% or so paramutual take.  That being said, I suggest you take class 
into account (based on purse, type of race (maiden claiming, claiming 
$, NWxx allowance, etc).    Make sure that you are accounting for the 
size of the field.  it is much easier to win a race of 6 than 12 
horses.  A similar bias applies to the advantage of inner post 
position, if you do not account for number of entries.

Re validation, I would not build a mode on X years of data and then 
validate.  Patterns change and a model needs to be adaptive. I would 
use a hold out day, per week (randomly chosen) and then use that.

good luck in a difficult task.

Gerard

R help - Sep 2007 - machine learning and horse racing

[R] machine learning and horse racing

[R] machine learning and horse racing

[R] machine learning and horse racing

[R] machine learning and horse racing