Hi; Does anyone know how to create a calibration and validation set from a particular dataset? I have a dataframe with nearly 20,000 rows! and I would like to select (randomly) a subset from the original dataset (...I found how to do that) to use as calibration set. However, I don't know how to remove this "calibration" set from the original dataframe in order to get my "validation" set.....Any hint will be greatly appreciated. TT
You could keep a row index vector like in the following example.> data(iris) > indx <- sample(nrow(iris), 20, replace=FALSE) > train <- iris[indx,] > test <- iris[-indx,]--Matt -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Peyuco Porras Porras . Sent: Saturday, August 14, 2004 17:15 PM To: R-help at stat.math.ethz.ch Subject: [R] calibration/validation sets Importance: High Hi; Does anyone know how to create a calibration and validation set from a particular dataset? I have a dataframe with nearly 20,000 rows! and I would like to select (randomly) a subset from the original dataset (...I found how to do that) to use as calibration set. However, I don't know how to remove this "calibration" set from the original dataframe in order to get my "validation" set.....Any hint will be greatly appreciated. TT ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi, On Sat, 14 Aug 2004, Peyuco Porras Porras . wrote:> Hi; > Does anyone know how to create a calibration and validation set from a particular dataset? I have a dataframe with nearly 20,000 rows! and I would like to select (randomly) a subset from the original dataset (...I found how to do that) to use as calibration set. However, I don't know how to remove this "calibration" set from the original dataframe in order to get my "validation" set.....Any hint will be greatly appreciated.A really quick way, suppose you want to have 30% of your dataset as the validation set:> iris.id = sample(nrow(iris), nrow(iris) * 0.3) > iris.valid = iris[iris.id, ] > iris.train = iris[-iris.id, ] > nrow(iris.valid)[1] 45> nrow(iris.train)[1] 105 The first line takes a sample of 30% of the number of rows in the Iris data. The second line does a subetting of those samples -- the validation set. The third takes what's left -- the training set. This is perhaps not efficient and the code can definitely be simplified...but it's Sunday morning and I haven't had my morning coffee yet :D Cheers, Kevin -------------------------------- Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301
There are many ways to do this. One example, supposing your data is in `myData': ## randomly pick 1/3 for validation: valid.idx <- sample(nrow(myData), round(nrow(myData)/3), replace=FALSE) ## training set: myData.tr <- myData[-valid.idx,] ## validation set: myData.valid <- myData[valid.idx,] HTH, Andy> From: Peyuco Porras Porras . > > Hi; > Does anyone know how to create a calibration and validation > set from a particular dataset? I have a dataframe with nearly > 20,000 rows! and I would like to select (randomly) a subset > from the original dataset (...I found how to do that) to use > as calibration set. However, I don't know how to remove this > "calibration" set from the original dataframe in order to get > my "validation" set.....Any hint will be greatly appreciated. > TT > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >