Hi I am trying to use various techniques (eg svm, logistic regression, neural networks) to classify and predict the outcome of horse races. Most of my predictive features are categorical - horse, jockey, trainer - and I keep on running out of memory owing to the size of the vector. Does anyone know how to solve the problem? I have classified the outcomes as win/lose or place/lose with a view to train on x years of results and then testing on the subsequent years results. Is there some alternate way of looking at the problem? Does anyone have pointers to published work in this area? Thanks. Stephen
Need more information. What is your operating system, how much memory do you have, how big is your data, what operations are you failing on, etc. On 9/17/07, stephenc at ics.mq.edu.au <stephenc at ics.mq.edu.au> wrote:> Hi > > I am trying to use various techniques (eg svm, logistic regression, > neural networks) to classify and predict the outcome of horse races. > > Most of my predictive features are categorical - horse, jockey, trainer > - and I keep on running out of memory owing to the size of the vector. > > Does anyone know how to solve the problem? > > I have classified the outcomes as win/lose or place/lose with a view to > train on x years of results and then testing on the subsequent years > results. Is there some alternate way of looking at the problem? > > Does anyone have pointers to published work in this area? > > Thanks. > > Stephen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Hi Stephen, How many variables do you have? How many of them are categorical? How many observations do you have? Since I am not a racing expert, in how many races a typical horse participates? How many years does it usually span? In the past I had a good experience with Random Forest. There exists a RandomForest package in R. If you run out of memory and do not mind to spend some time you can try the original Fortran code (after trying the R package without saving the forest). Regards, Moshe. --- stephenc at ics.mq.edu.au wrote:> Hi > > I am trying to use various techniques (eg svm, > logistic regression, > neural networks) to classify and predict the outcome > of horse races. > > Most of my predictive features are categorical - > horse, jockey, trainer > - and I keep on running out of memory owing to the > size of the vector. > > Does anyone know how to solve the problem? > > I have classified the outcomes as win/lose or > place/lose with a view to > train on x years of results and then testing on the > subsequent years > results. Is there some alternate way of looking at > the problem? > > Does anyone have pointers to published work in this > area? > > Thanks. > > Stephen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Hi Stephen, Not responding to the R memory question, but to the racing. I worked on this many years ago and found no way of overcoming the 19% or so paramutual take. That being said, I suggest you take class into account (based on purse, type of race (maiden claiming, claiming $, NWxx allowance, etc). Make sure that you are accounting for the size of the field. it is much easier to win a race of 6 than 12 horses. A similar bias applies to the advantage of inner post position, if you do not account for number of entries. Re validation, I would not build a mode on X years of data and then validate. Patterns change and a model needs to be adaptive. I would use a hold out day, per week (randomly chosen) and then use that. good luck in a difficult task. Gerard