thr3ads.net - R help - [R] sampling rows with values never sampled before [Jun 2015]

If this information is useful, please help other people find it:
Share via:

C W

2015-Jun-22 16:42 UTC

[R] sampling rows with values never sampled before

Hello R list,

I am have question about sampling unique coordinate values.

Here's how my data looks like
> dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
> dat      x1  x2
 [1,]  1 3.7
 [2,]  2 3.7
 [3,]  3 3.7
 [4,]  4 3.7
 [5,]  5 3.7
 [6,]  1 2.9
 [7,]  2 2.9
 [8,]  3 2.9
 [9,]  4 2.9
[10,]  5 2.9
[11,]  1 5.2
[12,]  2 5.2
[13,]  3 5.2
[14,]  4 5.2
[15,]  5 5.2


If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).

I want to avoid either the first or second coordinate repeated.  It leads
to undefined matrix inversion.

I thought of using sampling(), but not sure about applying it to a data
frame.

Thanks in advance,

Mike

	[[alternative HTML version deleted]]

Adams, Jean

2015-Jun-22 17:09 UTC

head link

[R] sampling rows with values never sampled before

Mike,

There may be a more efficient way to do this, but this works on your
example.

# mix up the order of the rows
mix <- dat[order(runif(dim(dat)[1])), ]

# get rid of duplicate x1s and x2s
sub <- mix[!duplicated(mix[, "x1"]) & !duplicated(mix[,
"x2"]), ]
sub

Jean

On Mon, Jun 22, 2015 at 11:42 AM, C W <tmrsg11 at gmail.com> wrote:
> Hello R list,
>
> I am have question about sampling unique coordinate values.
>
> Here's how my data looks like
>
> > dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
> > dat
>       x1  x2
>  [1,]  1 3.7
>  [2,]  2 3.7
>  [3,]  3 3.7
>  [4,]  4 3.7
>  [5,]  5 3.7
>  [6,]  1 2.9
>  [7,]  2 2.9
>  [8,]  3 2.9
>  [9,]  4 2.9
> [10,]  5 2.9
> [11,]  1 5.2
> [12,]  2 5.2
> [13,]  3 5.2
> [14,]  4 5.2
> [15,]  5 5.2
>
>
> If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).
>
> I want to avoid either the first or second coordinate repeated.  It leads
> to undefined matrix inversion.
>
> I thought of using sampling(), but not sure about applying it to a data
> frame.
>
> Thanks in advance,
>
> Mike
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Daniel Nordlund

2015-Jun-22 18:19 UTC

head link

[R] sampling rows with values never sampled before

On 6/22/2015 9:42 AM, C W wrote:> Hello R list,
>
> I am have question about sampling unique coordinate values.
>
> Here's how my data looks like
>
>> dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
>> dat
>        x1  x2
>   [1,]  1 3.7
>   [2,]  2 3.7
>   [3,]  3 3.7
>   [4,]  4 3.7
>   [5,]  5 3.7
>   [6,]  1 2.9
>   [7,]  2 2.9
>   [8,]  3 2.9
>   [9,]  4 2.9
> [10,]  5 2.9
> [11,]  1 5.2
> [12,]  2 5.2
> [13,]  3 5.2
> [14,]  4 5.2
> [15,]  5 5.2
>
>
> If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).
>
> I want to avoid either the first or second coordinate repeated.  It leads
> to undefined matrix inversion.
>
> I thought of using sampling(), but not sure about applying it to a data
> frame.
>
> Thanks in advance,
>
> Mike
>
I am not sure you gave us enough information to solve your real world 
problem.  But I have a few comments and a potential solution.

1. In your example the unique values in in x1 are completely crossed 
with the unique values in x2.
2. since you don't want duplicates of either number, then the maximum 
number of samples that you can take is the minimum number of unique 
values in either vector, x1 or x2 (in this case x2 with 3 unique values).
3. Sample without replace from the smallest set of unique values first.
4. Sample without replacement from the larger set second.

 > x <- 1:5
 > xx <- c(3.7, 2.9, 5.2)
 > s2 <- sample(xx,2, replace=FALSE)
 > s1 <- sample(x,2, replace=FALSE)
 > samp <- cbind(s1,s2)
 >
 > samp
      s1  s2
[1,]  5 3.7
[2,]  1 5.2
 >

Your actual data is probably larger, and the unique values in each 
vector may not be completely crossed, in which case the task is a little 
harder.  In that case, you could remove values from your data as you 
sample.  This may not be efficient, but it will work.

smpl <- function(dat, size){
   mysamp <- numeric(0)
   for(i in 1:size) {
     s <- dat[sample(nrow(dat),1),]
     mysamp <- rbind(mysamp,s, deparse.level=0)
     dat <- dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
     }
   mysamp
}

This is just an example of how you might approach your real world 
problem.  There is no error checking, and for large samples it may not 
scale well.

Hope this is helpful,

Dan

-- 
Daniel Nordlund
Bothell, WA USA

C W

2015-Jun-22 22:13 UTC

head link

[R] sampling rows with values never sampled before

Hi Jean,

Thanks!

Daniel,
Yes, you are absolutely right.  I want sampled vectors to be as different
as possible.

I added a little more to the earlier data set.
        x1  x2  x3
 [1,]  1 3.7  2.1
 [2,]  2 3.7  5.3
 [3,]  3 3.7  6.2
 [4,]  4 3.7  8.9
 [5,]  5 3.7  4.1
 [6,]  1 2.9  2.1
 [7,]  2 2.9  5.3
 [8,]  3 2.9  6.2
 [9,]  4 2.9  8.9
[10,]  5 2.9 4.1
[11,]  1 5.2 2.1
[12,]  2 5.2 5.3
[13,]  3 5.2 6.2
[14,]  4 5.2 8.9
[15,]  5 5.2 4.1

If I sampled row, 1, 6, 11, solving the system of equations will not be
possible.  So, I am avoiding "similar vectors".

Thanks,

Mike


On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund <djnordlund at
frontier.com>
wrote:
> On 6/22/2015 9:42 AM, C W wrote:
>
>> Hello R list,
>>
>> I am have question about sampling unique coordinate values.
>>
>> Here's how my data looks like
>>
>>  dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
>>> dat
>>>
>>        x1  x2
>>   [1,]  1 3.7
>>   [2,]  2 3.7
>>   [3,]  3 3.7
>>   [4,]  4 3.7
>>   [5,]  5 3.7
>>   [6,]  1 2.9
>>   [7,]  2 2.9
>>   [8,]  3 2.9
>>   [9,]  4 2.9
>> [10,]  5 2.9
>> [11,]  1 5.2
>> [12,]  2 5.2
>> [13,]  3 5.2
>> [14,]  4 5.2
>> [15,]  5 5.2
>>
>>
>> If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).
>>
>> I want to avoid either the first or second coordinate repeated.  It
leads
>> to undefined matrix inversion.
>>
>> I thought of using sampling(), but not sure about applying it to a data
>> frame.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
> I am not sure you gave us enough information to solve your real world
> problem.  But I have a few comments and a potential solution.
>
> 1. In your example the unique values in in x1 are completely crossed with
> the unique values in x2.
> 2. since you don't want duplicates of either number, then the maximum
> number of samples that you can take is the minimum number of unique values
> in either vector, x1 or x2 (in this case x2 with 3 unique values).
> 3. Sample without replace from the smallest set of unique values first.
> 4. Sample without replacement from the larger set second.
>
> > x <- 1:5
> > xx <- c(3.7, 2.9, 5.2)
> > s2 <- sample(xx,2, replace=FALSE)
> > s1 <- sample(x,2, replace=FALSE)
> > samp <- cbind(s1,s2)
> >
> > samp
>      s1  s2
> [1,]  5 3.7
> [2,]  1 5.2
> >
>
> Your actual data is probably larger, and the unique values in each vector
> may not be completely crossed, in which case the task is a little harder.
> In that case, you could remove values from your data as you sample.  This
> may not be efficient, but it will work.
>
> smpl <- function(dat, size){
>   mysamp <- numeric(0)
>   for(i in 1:size) {
>     s <- dat[sample(nrow(dat),1),]
>     mysamp <- rbind(mysamp,s, deparse.level=0)
>     dat <- dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
>     }
>   mysamp
> }
>
>
> This is just an example of how you might approach your real world
> problem.  There is no error checking, and for large samples it may not
> scale well.
>
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Bothell, WA USA
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Jun 2015 - sampling rows with values never sampled before

[R] sampling rows with values never sampled before

[R] sampling rows with values never sampled before

[R] sampling rows with values never sampled before

[R] sampling rows with values never sampled before