Hi, I need to sample randomly my dataset for 1000 times. The sample need to be the 80%. I know how to do that, my problem is that not only I need the 80%, but I also need the corresponding 20% each time. Is there any way to do that? Alternatively, I was thinking to something like setdiff () function to compare my 80% sample to the original dataset and obtain the corresponding 20%, unfortunately setdiff works just for vectors, do you know a similar function for dataframes? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4453752.html Sent from the R help mailing list archive at Nabble.com.
You could make a vector containing the number of TRUE values that makes up 80% of your data, and the number of FALSE values that makes up 20% of your data. Use sample() to reorder it, then use it to divide your dataset. If you had provided a reproducible example, I could write you code. Sarah On Wed, Mar 7, 2012 at 11:41 AM, Oritteropus <lucasantini85 at hotmail.com> wrote:> Hi, > I need to sample randomly my dataset for 1000 times. The sample need to be > the 80%. I know how to do that, my problem is that not only I need the 80%, > but I also need the corresponding 20% each time. Is there any way to do > that? > Alternatively, I was thinking to something like setdiff () function to > compare my 80% sample to the original dataset and obtain the corresponding > 20%, unfortunately setdiff works just for vectors, do you know a similar > function for dataframes? > Thanks >-- Sarah Goslee http://www.functionaldiversity.org
On Wed, Mar 07, 2012 at 08:41:35AM -0800, Oritteropus wrote:> Hi, > I need to sample randomly my dataset for 1000 times. The sample need to be > the 80%. I know how to do that, my problem is that not only I need the 80%, > but I also need the corresponding 20% each time. Is there any way to do > that?Hi. If you use sample() to get the 80% and store the indices, you can also get the remaining cases a <- matrix(1:30, ncol=3) i <- sample(10, 8) a[sort(i), ] [,1] [,2] [,3] [1,] 1 11 21 [2,] 2 12 22 [3,] 3 13 23 [4,] 4 14 24 [5,] 6 16 26 [6,] 7 17 27 [7,] 8 18 28 [8,] 10 20 30 a[-i, ] [,1] [,2] [,3] [1,] 5 15 25 [2,] 9 19 29 Hope this helps. Petr Savicky.
On Mar 7, 2012, at 11:41 AM, Oritteropus wrote:> Hi, > I need to sample randomly my dataset for 1000 times. The sample need > to be > the 80%. I know how to do that, my problem is that not only I need > the 80%, > but I also need the corresponding 20% each time. Is there any way to > do > that? > Alternatively, I was thinking to something like setdiff () function to > compare my 80% sample to the original dataset and obtain the > corresponding > 20%, unfortunately setdiff works just for vectors, do you know a > similar > function for dataframes?Create an index vector with runif or sample and then use that to get you sample and use negative indexing to get the remainder. idx <- sample(1:1000, 800) x[ idx, ] # 80% x[ -idx, ] # the other 20% (I think this does presume you have not mucked with the default rownames.) -- David Winsemius, MD West Hartford, CT
Hi sarah, it is not clear to me how to do that, can you show me please? Imagine I have a situation like this: MeanA <- read.csv("MeanAmf.csv",header=T) mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Then? -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4455921.html Sent from the R help mailing list archive at Nabble.com.
Hi everybody, Thank you all for your suggestions, you have been very helpful. However at the end I solved in this way: mysample <- MaxDH[sample(1:nrow(MaxDH), 150, replace=FALSE),] A<-mysample[1:120,] B<-mysample[121:150,] So simple at the end... Best, Luca -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456469.html Sent from the R help mailing list archive at Nabble.com.