Hello dear R Users, I am working on a dataset of 928 Enterprises, of which are observed 12 different characters. I need to randomly sample, without repetition, 70% of the entreprises, to create a testing set, and let the other 30% of the enterprises be a validating set (holdout validation, I think that is). How do I do that? Of course all the characters of each row must remain together. Also, I am not very familiar with the R-Base language (it is the first time I use it) so if You could also explain to me what every function and argument means, it would be great help to then reiterate the procedure. Thank You very much, Sebastiano -- View this message in context: http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html Sent from the R help mailing list archive at Nabble.com.
Here is one way to do it:> x <- matrix(1:100,10) > x[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 11 21 31 41 51 61 71 81 91 [2,] 2 12 22 32 42 52 62 72 82 92 [3,] 3 13 23 33 43 53 63 73 83 93 [4,] 4 14 24 34 44 54 64 74 84 94 [5,] 5 15 25 35 45 55 65 75 85 95 [6,] 6 16 26 36 46 56 66 76 86 96 [7,] 7 17 27 37 47 57 67 77 87 97 [8,] 8 18 28 38 48 58 68 78 88 98 [9,] 9 19 29 39 49 59 69 79 89 99 [10,] 10 20 30 40 50 60 70 80 90 100> select <- sample(nrow(x), nrow(x) * .7) > x[select,] # select[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 3 13 23 33 43 53 63 73 83 93 [2,] 2 12 22 32 42 52 62 72 82 92 [3,] 5 15 25 35 45 55 65 75 85 95 [4,] 9 19 29 39 49 59 69 79 89 99 [5,] 7 17 27 37 47 57 67 77 87 97 [6,] 10 20 30 40 50 60 70 80 90 100 [7,] 8 18 28 38 48 58 68 78 88 98> x[-select,] # testing[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 11 21 31 41 51 61 71 81 91 [2,] 4 14 24 34 44 54 64 74 84 94 [3,] 6 16 26 36 46 56 66 76 86 96>On Sun, Jan 18, 2009 at 12:35 PM, S.Putoto <rebelshop615 at gmail.com> wrote:> > Hello dear R Users, > > I am working on a dataset of 928 Enterprises, of which are observed 12 > different characters. I need to randomly sample, without repetition, 70% of > the entreprises, to create a testing set, and let the other 30% of the > enterprises be a validating set (holdout validation, I think that is). How > do I do that? Of course all the characters of each row must remain together. > Also, I am not very familiar with the R-Base language (it is the first time > I use it) so if You could also explain to me what every function and > argument means, it would be great help to then reiterate the procedure. > > Thank You very much, > > Sebastiano > -- > View this message in context: http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
> read.table(textConnection(gsub("\\(|\\)", "", var) )) #from priorposting V1 V2 1 p1 10 2 p1 3 3 p1 4 4 p2 20 5 p2 30 6 p2 40 7 p3 4 8 p3 1 9 p1 2 > ridxs <- sample(1:nrow(df),floor(0.7*nrow(df)) ) # the 70% sample row IDs > df[ridxs,] V1 V2 5 p2 30 6 p2 40 2 p1 3 7 p3 4 4 p2 20 8 p3 1 > > > df[-ridxs,] V1 V2 1 p1 10 3 p1 4 9 p1 2 The terms to pay particular attention to in the introductory material are row indexing, dataframe, and negative indexing of dataframes. On Jan 18, 2009, at 12:35 PM, S.Putoto wrote:> > Hello dear R Users, > > I am working on a dataset of 928 Enterprises, of which are observed 12 > different characters. I need to randomly sample, without repetition, > 70% of > the entreprises, to create a testing set, and let the other 30% of the > enterprises be a validating set (holdout validation, I think that > is). How > do I do that? Of course all the characters of each row must remain > together. > Also, I am not very familiar with the R-Base language (it is the > first time > I use it) so if You could also explain to me what every function and > argument means, it would be great help to then reiterate the > procedure.Really! Don't you that is a bit much? There are many tutorials available on line. The terms to pay particular attention to in the introductory material are indexing, dataframe, and negative indexing of dataframes. -- David Winsemius> > > Thank You very much, > > Sebastiano > -- > View this message in context: http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.