Guillaume Chapron
2008-Dec-30 14:59 UTC
[R] Randomly remove condition-selected rows from a matrix
Hello all, I create the following matrix: m <- matrix(1:20, nrow = 10, ncol = 2) which looks like: [,1] [,2] [1,] 1 11 [2,] 2 12 [3,] 3 13 [4,] 4 14 [5,] 5 15 [6,] 6 16 [7,] 7 17 [8,] 8 18 [9,] 9 19 [10,] 10 20 Then, I want to remove randomly 2 rows among the ones where m[,1]<8 and m[,2]>12 I suppose the best way is to use the sample() function. I understand how to do it when I remove among any rows, but I have not been able to do it when I remove among specific rows only. What I could do is split the matrix into two matrices, one with the rows to be sampled and removed, one with the other rows. I would sample and remove, and then merge the two matrices again. But since this part of the code is going to be done many times, I would like to have it the most efficient possible without creating new objects. Any idea? Thanks! Cheers Guillaume
Sarah Goslee
2008-Dec-30 15:08 UTC
[R] Randomly remove condition-selected rows from a matrix
Assuming your values aren't always in such neat order, you could use something like: valtoremove1 <- sample((1:nrow(m))[m[,1] < 8], 1) valtoremove2 <- sample((1:nrow(m))[m[,1] > 12], 1) Sarah On Tue, Dec 30, 2008 at 9:59 AM, Guillaume Chapron <carnivorescience at gmail.com> wrote:> Hello all, > > I create the following matrix: > > m <- matrix(1:20, nrow = 10, ncol = 2) > > which looks like: > > [,1] [,2] > [1,] 1 11 > [2,] 2 12 > [3,] 3 13 > [4,] 4 14 > [5,] 5 15 > [6,] 6 16 > [7,] 7 17 > [8,] 8 18 > [9,] 9 19 > [10,] 10 20 > > Then, I want to remove randomly 2 rows among the ones where m[,1]<8 and > m[,2]>12 > > I suppose the best way is to use the sample() function. I understand how to > do it when I remove among any rows, but I have not been able to do it when I > remove among specific rows only. What I could do is split the matrix into > two matrices, one with the rows to be sampled and removed, one with the > other rows. I would sample and remove, and then merge the two matrices > again. But since this part of the code is going to be done many times, I > would like to have it the most efficient possible without creating new > objects. Any idea? Thanks! > > Cheers > > Guillaume-- Sarah Goslee http://www.functionaldiversity.org
Daniel Malter
2008-Dec-30 15:30 UTC
[R] Randomly remove condition-selected rows from a matrix
Hi, The approach below uses a function. The nice thing about it is that you can define the cutoff values dynamically (i.e. what is 8 and 12 in your example). The functions extract a row index to remove. Be aware that there is no warning if both return the same row index. You might have to adjust for that. x=1:10 y=11:20 z=cbind(x,y) a=function(x,m){which(x==sample(x[x<m],1))} b=function(y,n){which(y==sample(y[y>n],1))} z[-c(a(x,8),b(y,12)),] Cheers, Daniel Guillaume Chapron-3 wrote:> > Hello all, > > I create the following matrix: > > m <- matrix(1:20, nrow = 10, ncol = 2) > > which looks like: > > [,1] [,2] > [1,] 1 11 > [2,] 2 12 > [3,] 3 13 > [4,] 4 14 > [5,] 5 15 > [6,] 6 16 > [7,] 7 17 > [8,] 8 18 > [9,] 9 19 > [10,] 10 20 > > Then, I want to remove randomly 2 rows among the ones where m[,1]<8 > and m[,2]>12 > > I suppose the best way is to use the sample() function. I understand > how to do it when I remove among any rows, but I have not been able to > do it when I remove among specific rows only. What I could do is split > the matrix into two matrices, one with the rows to be sampled and > removed, one with the other rows. I would sample and remove, and then > merge the two matrices again. But since this part of the code is going > to be done many times, I would like to have it the most efficient > possible without creating new objects. Any idea? Thanks! > > Cheers > > Guillaume > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Randomly-remove-condition-selected-rows-from-a-matrix-tp21218219p21218541.html Sent from the R help mailing list archive at Nabble.com.
Stavros Macrakis
2008-Dec-30 17:20 UTC
[R] Randomly remove condition-selected rows from a matrix
I believe this does what you want: m[-sample(which(m[,1]<8 & m[,2]>12),2),] Analysis: Get a boolean vector of rows fitting criteria: m[,1]<8 & m[,2]>12 What are their indexes? which(...) Choose two among those indexes: sample(...,2) Choose all except the selected rows from the original: m[- ... , ] -s On Tue, Dec 30, 2008 at 9:59 AM, Guillaume Chapron <carnivorescience at gmail.com> wrote:> m <- matrix(1:20, nrow = 10, ncol = 2) > [,1] [,2] > [1,] 1 11 > [2,] 2 12 > [3,] 3 13 > [4,] 4 14 > [5,] 5 15 > [6,] 6 16 > [7,] 7 17 > [8,] 8 18 > [9,] 9 19 > [10,] 10 20 > > Then, I want to remove randomly 2 rows among the ones where m[,1]<8 and > m[,2]>12
Guillaume Chapron
2008-Dec-31 17:44 UTC
[R] Randomly remove condition-selected rows from a matrix
> I believe this does what you want: > > m[-sample(which(m[,1]<8 & m[,2]>12),2),] > > Analysis: > > Get a boolean vector of rows fitting criteria: > m[,1]<8 & m[,2]>12 > > What are their indexes? > which(...) > > Choose two among those indexes: > sample(...,2)Thanks, but this does not seem to always work. Supposing I sample only one row among the ones matching my criteria. Then consider the case where there is just one row matching this criteria. Sure, there is no need to sample, but the instruction would still be executed. Then if this row index is 15, my instruction becomes which(15,1), and this can gives me any row from 1 to 15, which is not correct. I have to make a condition in case there is only one row matching the criteria.
Wacek Kusnierczyk
2009-Jan-02 19:18 UTC
[R] Randomly remove condition-selected rows from a matrix
xxx wrote:> On Fri, Jan 2, 2009 at 10:07 AM, Wacek Kusnierczyk > <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote: > >> ... 'sample' takes a sample of the specified size from the elements of >> 'x' using either with or without replacement. >> >> x: Either a (numeric, complex, character or logical) vector of >> more than one element from which to choose, or a positive >> integer. >> >> If 'x' has length 1, is numeric (in the sense of 'is.numeric') and >> 'x >= 1', sampling takes place from '1:x'. _Note_ that this >> convenience feature may lead to undesired behaviour when 'x' is of >> varying length 'sample(x)'. See the 'resample()' example below. >> ... >> yet the following works, even though x has length 1 and is *not* numeric:... >> is this a bug in the code, or a bug in the documentation? >> > > I would guess it's a bug in the documentation. > >possibly. looking at the r code for sample, it's clear why sample("foo") works: function (x, size, replace = FALSE, prob = NULL) { if (length(x) == 1 && is.numeric(x) && x >= 1) { if (missing(size)) size <- x .Internal(sample(x, size, replace, prob)) } else { if (missing(size)) size <- length(x) x[.Internal(sample(length(x), size, replace, prob))] } } what is also clear from the code is that the function has another, supposedly buggy behaviour due to the smart behaviour of the : operator: sample(1.1) # 1, not 1.1 this is consistent with " If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x >= 1', sampling takes place from '1:x'. " due to the downcast performed by the colon operator, but not with " x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer. " both from ?sample. tfm is seemingly wrong wrt. the implementation, and i find sample(1.1) returning 1 a design flaw. (i guess the note "_Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'." is supposed to explain away such cases.) vQ