bbslover
2009-Dec-21 14:09 UTC
[R] Help,Suggest me some methods to identify training set and test set!!!
I want to split my whole dateset to training set and test set, building model in training set, and validate model using test set. Now, How can I split my dataset to them reasonally. Please give me a hand, It is better to give me some R code. and I see some ways like using SOM to project whole independents to 2-dimensions and find some to be training set and others are test set. like below. I also want to do this. and my date is in xls accessory. Please help me. and excel file is 218*47 matrix, 47 are indepents. I want to project it to 2D and label the corresponding sample label like that picture below. thank you! http://n4.nabble.com/file/n976245/SOM%2Btraining%2Bset%2Band%2Btest%2Bset.jpg SOM+training+set+and+test+set.jpg http://n4.nabble.com/file/n976245/matlab218x47.xls matlab218x47.xls -- View this message in context: http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976245.html Sent from the R help mailing list archive at Nabble.com.
milton ruser
2009-Dec-21 16:23 UTC
[R] Help,Suggest me some methods to identify training set and test set!!!
Not ellegant.. but... MyDF<-data.frame(cbind(x=runif(10), y=rnorm(10))) TrainingSize=5 TrainingSize_list<-sample(1:nrow(MyDF))[1:TrainingSize] TrainingSize_list MyDF.training<-MyDF[(1:nrow(MyDF) %in% TrainingSize_list),] MyDF.training MyDF.test<-MyDF[ ! (1:nrow(MyDF) %in% TrainingSize_list),] MyDF.test bests milton On Mon, Dec 21, 2009 at 9:09 AM, bbslover <dluthm@yeah.net> wrote:> > I want to split my whole dateset to training set and test set, building > model > in training set, and validate model using test set. Now, How can I split my > dataset to them reasonally. Please give me a hand, It is better to give me > some R code. > > and I see some ways like using SOM to project whole independents to > 2-dimensions and find some to be training set and others are test set. > like > below. I also want to do this. and my date is in xls accessory. Please help > me. and excel file is 218*47 matrix, 47 are indepents. I want to project > it > to 2D and label the corresponding sample label like that picture below. > > thank you! > > http://n4.nabble.com/file/n976245/SOM%2Btraining%2Bset%2Band%2Btest%2Bset.jpg > SOM+training+set+and+test+set.jpg > http://n4.nabble.com/file/n976245/matlab218x47.xls matlab218x47.xls > -- > View this message in context: > http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976245.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Steve Lianoglou
2009-Dec-21 16:25 UTC
[R] Help,Suggest me some methods to identify training set and test set!!!
Hi, On Mon, Dec 21, 2009 at 9:09 AM, bbslover <dluthm at yeah.net> wrote:> > I want to split my whole dateset to training set and test set, building model > in training set, and validate model using test set. Now, How can I split my > dataset to them reasonally. Please give me a hand, It is better to give me > some R code. > > and I see some ways like using SOM to project whole independents to > 2-dimensions and find some to be training set and others are test set. ?like > below. I also want to do this. and my date is in xls accessory. Please help > me. ?and excel file is 218*47 matrix, 47 are indepents. I want to project it > to 2D and label the corresponding sample label like that picture below.I noticed Max already pointed you to the caret package. Load the library and look at the help for the createFolds function, eg: library(caret) ?createFolds -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Max Kuhn
2009-Dec-21 17:31 UTC
[R] Help,Suggest me some methods to identify training set and test set!!!
> I noticed Max already pointed you to the caret package. > > Load the library and look at the help for the createFolds function, eg: > > library(caret) > ?createFoldsI think that the createDataPartition function in caret might work better for you. There are a number of other packages with similar functions. Max
Frank E Harrell Jr
2009-Dec-21 21:15 UTC
[R] Help,Suggest me some methods to identify training set and test set!!!
Ot should be noted that the performance of single split into training + test does not perform satisfactorily unless N > 10,000 in many cases. Frank Max Kuhn wrote:>> I noticed Max already pointed you to the caret package. >> >> Load the library and look at the help for the createFolds function, eg: >> >> library(caret) >> ?createFolds > > I think that the createDataPartition function in caret might work > better for you. > > There are a number of other packages with similar functions. > > Max > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University
bbslover
2009-Dec-22 00:00 UTC
[R] Help,Suggest me some methods to identify training set and test set!!!
Thank you for all help. It is helpful for me. Max Kuhn wrote:> >> I noticed Max already pointed you to the caret package. >> >> Load the library and look at the help for the createFolds function, eg: >> >> library(caret) >> ?createFolds > > I think that the createDataPartition function in caret might work > better for you. > > There are a number of other packages with similar functions. > > Max > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976641.html Sent from the R help mailing list archive at Nabble.com.
Possibly Parallel Threads
- Working with createFolds
- difference between createPartition and createfold functions
- [caret package] [trainControl] supplying predefined partitions to train with cross validation
- DUDA SOBRE PARTICIÓN DE DATOS PARA VALIDACIÓN CRUZADA
- How to create a loop and then extract values from the list generated by cor.test