How can I split a dataset randomly into a training and testing set. I would like to have the ability to specify the size of the training set and use the remaining data as the testing set. For example 90% training data and 10% testing data split. Is there a function that will accomplish this? Thank you, -Dhiren Rutgers University Graduate Student
Dhiren DSouza wrote:> How can I split a dataset randomly into a training and testing set. I would > like to have the ability to specify the size of the training set and use the > remaining data as the testing set. > > For example 90% training data and 10% testing data split. Is there a > function that will accomplish this? > > Thank you, > > -Dhiren > > Rutgers University > Graduate Student >See ?sample. sub <- sample(nrow(x), floor(nrow(x) * 0.9)) training <- x[sub, ] testing <- x[-sub, ] HTH, --sundar
On Fri, 11 Nov 2005, Dhiren DSouza wrote:> How can I split a dataset randomly into a training and testing set. I would > like to have the ability to specify the size of the training set and use the > remaining data as the testing set. > > For example 90% training data and 10% testing data split. Is there a > function that will accomplish this?Yes, see ?sample: use it to sample indices. There are lots of examples around, e.g. in ?lda. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595