thr3ads.net - R help - [R] Conditional Random selection [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Ashta

2015-Nov-21 19:30 UTC

[R] Conditional Random selection

Thank you Bert!

What I want is at least 500 samples based on random  sampling of time
period. This allows samples  collected at the same time period are
included together.

Your script is doing what I wanted to do!!

Many thanks




On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:> David's "solution" is incorrect. It can also fail to give you
times
> with a total of 500 items to sample from in the time periods.
>
> It is not entirely clear what you want. The solution below gives you a
> random sample of time periods in which X1>0 and the total number of
> samples among them is >= 500. It does not give you the fewest number
> of periods that can do this. Is this what you want?
>
> tab[with(tab,{
>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>   sz <- cumsum(X2[rownums])
>   rownums[c(TRUE,sz<500)]
> }),]
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>    -- Clifford Stoll
>
>
> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote:
>> Thank you  David!
>>
>> I rerun the your script and it is giving me the first three time
periods
>> is it doing random sampling?
>>
>>       tab.fan
>>   time X1  X2
>> 2    2  5 230
>> 3    3  1 300
>> 5    5  2  10
>>
>>
>>
>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at
tamu.edu> wrote:
>>> Use dput() to send data to the list as it is more compact:
>>>
>>>> dput(tab)
>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names
= c("time",
>>> "X1", "X2"), class = "data.frame",
row.names = c(NA, -8L))
>>>
>>> You can just remove the lines with X1 = 0 since you don't want
to use them.
>>>
>>>> tab.sub <- tab[tab$X1>0, ]
>>>
>>> Then the following gives you a sample:
>>>
>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>
>>> Note, that your "solution" of times 6, 7, and 8 will
never appear because the sum of the values is 586.
>>>
>>>
>>> David L. Carlson
>>> Department of Anthropology
>>> Texas A&M University
>>>
>>> -----Original Message-----
>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Ashta
>>> Sent: Saturday, November 21, 2015 11:53 AM
>>> To: R help <r-help at r-project.org>
>>> Subject: [R] Conditional Random selection
>>>
>>> Hi all,
>>>
>>> I have a data set that contains samples collected over time.   In
>>> each time period the total number of samples are given (X2)   The
goal
>>> is to  select 500  random samples.    The selection should be based
on
>>> time  (select time periods until I reach 500 samples). Also the
time
>>> period should have greater than 0 for  X1 variable. X1 is an
indicator
>>> variable.
>>>
>>> Select "time" until reaching the  sum of X2  is > 500
and if   X1 is  >  0
>>>
>>> tab  <- read.table(textConnection(" time   X1 X2
>>> 1      0        251
>>> 2      5        230
>>> 3      1        300
>>> 4      0         25
>>> 5      2         10
>>> 6      3         101
>>> 7      1         300
>>>  8     4         185   "),header = TRUE)
>>>
>>> In the above example,  samples from time 1 and 4  will not be
selected
>>> ( X1 is zero)
>>> So I could reach my target by selecting time 6,7, and 8 or  time 2
and
>>> 3 and so on.
>>>
>>> Can any one help to do that?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Ashta

2015-Nov-21 19:52 UTC

head link

[R] Conditional Random selection

Hi  Bert  and all,
I have related question.  In each  time period there were different
locations where the samples were collected (S1).   I  want count  the
number of unique locations (S1)  for each unique time period . So in
time 1 the samples were collected from two locations and time 2 only
from one location and time 3  from  three locations..

tab  <- read.table(textConnection(" time   S1  rep
1      1       1
1      2       1
1      2       2
2      1       1
2      1       2
2      1       3
2      1       4
3      1       1
3      2       1
3      3       1   "),header = TRUE)

what I want is

time  S1
    1    2
    2    1
    3    3

Thank you again.



On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com>
wrote:>  Thank you Bert!
>
> What I want is at least 500 samples based on random  sampling of time
> period. This allows samples  collected at the same time period are
> included together.
>
> Your script is doing what I wanted to do!!
>
> Many thanks
>
>
>
>
> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>> David's "solution" is incorrect. It can also fail to give
you times
>> with a total of 500 items to sample from in the time periods.
>>
>> It is not entirely clear what you want. The solution below gives you a
>> random sample of time periods in which X1>0 and the total number of
>> samples among them is >= 500. It does not give you the fewest number
>> of periods that can do this. Is this what you want?
>>
>> tab[with(tab,{
>>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>>   sz <- cumsum(X2[rownums])
>>   rownums[c(TRUE,sz<500)]
>> }),]
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And
knowledge
>> is certainly not wisdom."
>>    -- Clifford Stoll
>>
>>
>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com>
wrote:
>>> Thank you  David!
>>>
>>> I rerun the your script and it is giving me the first three time
periods
>>> is it doing random sampling?
>>>
>>>       tab.fan
>>>   time X1  X2
>>> 2    2  5 230
>>> 3    3  1 300
>>> 5    5  2  10
>>>
>>>
>>>
>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at
tamu.edu> wrote:
>>>> Use dput() to send data to the list as it is more compact:
>>>>
>>>>> dput(tab)
>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)),
.Names = c("time",
>>>> "X1", "X2"), class =
"data.frame", row.names = c(NA, -8L))
>>>>
>>>> You can just remove the lines with X1 = 0 since you don't
want to use them.
>>>>
>>>>> tab.sub <- tab[tab$X1>0, ]
>>>>
>>>> Then the following gives you a sample:
>>>>
>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>>
>>>> Note, that your "solution" of times 6, 7, and 8 will
never appear because the sum of the values is 586.
>>>>
>>>>
>>>> David L. Carlson
>>>> Department of Anthropology
>>>> Texas A&M University
>>>>
>>>> -----Original Message-----
>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf
Of Ashta
>>>> Sent: Saturday, November 21, 2015 11:53 AM
>>>> To: R help <r-help at r-project.org>
>>>> Subject: [R] Conditional Random selection
>>>>
>>>> Hi all,
>>>>
>>>> I have a data set that contains samples collected over time.  
In
>>>> each time period the total number of samples are given (X2)  
The goal
>>>> is to  select 500  random samples.    The selection should be
based on
>>>> time  (select time periods until I reach 500 samples). Also the
time
>>>> period should have greater than 0 for  X1 variable. X1 is an
indicator
>>>> variable.
>>>>
>>>> Select "time" until reaching the  sum of X2  is >
500 and if   X1 is  >  0
>>>>
>>>> tab  <- read.table(textConnection(" time   X1 X2
>>>> 1      0        251
>>>> 2      5        230
>>>> 3      1        300
>>>> 4      0         25
>>>> 5      2         10
>>>> 6      3         101
>>>> 7      1         300
>>>>  8     4         185   "),header = TRUE)
>>>>
>>>> In the above example,  samples from time 1 and 4  will not be
selected
>>>> ( X1 is zero)
>>>> So I could reach my target by selecting time 6,7, and 8 or 
time 2 and
>>>> 3 and so on.
>>>>
>>>> Can any one help to do that?
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2015-Nov-21 19:58 UTC

head link

[R] Conditional Random selection

Time to do your own homework by working through an R tutorial or two.
There are many on the web -- or see the Intro to R tutorial that ships
with R.

?tapply
?unique

is one of many answers to your query.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Nov 21, 2015 at 11:52 AM, Ashta <sewashm at gmail.com>
wrote:> Hi  Bert  and all,
> I have related question.  In each  time period there were different
> locations where the samples were collected (S1).   I  want count  the
> number of unique locations (S1)  for each unique time period . So in
> time 1 the samples were collected from two locations and time 2 only
> from one location and time 3  from  three locations..
>
> tab  <- read.table(textConnection(" time   S1  rep
> 1      1       1
> 1      2       1
> 1      2       2
> 2      1       1
> 2      1       2
> 2      1       3
> 2      1       4
> 3      1       1
> 3      2       1
> 3      3       1   "),header = TRUE)
>
> what I want is
>
> time  S1
>     1    2
>     2    1
>     3    3
>
> Thank you again.
>
>
>
> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote:
>>  Thank you Bert!
>>
>> What I want is at least 500 samples based on random  sampling of time
>> period. This allows samples  collected at the same time period are
>> included together.
>>
>> Your script is doing what I wanted to do!!
>>
>> Many thanks
>>
>>
>>
>>
>> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>> David's "solution" is incorrect. It can also fail to
give you times
>>> with a total of 500 items to sample from in the time periods.
>>>
>>> It is not entirely clear what you want. The solution below gives
you a
>>> random sample of time periods in which X1>0 and the total number
of
>>> samples among them is >= 500. It does not give you the fewest
number
>>> of periods that can do this. Is this what you want?
>>>
>>> tab[with(tab,{
>>>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>>>   sz <- cumsum(X2[rownums])
>>>   rownums[c(TRUE,sz<500)]
>>> }),]
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And
knowledge
>>> is certainly not wisdom."
>>>    -- Clifford Stoll
>>>
>>>
>>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at
gmail.com> wrote:
>>>> Thank you  David!
>>>>
>>>> I rerun the your script and it is giving me the first three
time periods
>>>> is it doing random sampling?
>>>>
>>>>       tab.fan
>>>>   time X1  X2
>>>> 2    2  5 230
>>>> 3    3  1 300
>>>> 5    5  2  10
>>>>
>>>>
>>>>
>>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson
at tamu.edu> wrote:
>>>>> Use dput() to send data to the list as it is more compact:
>>>>>
>>>>>> dput(tab)
>>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L,
1L,
>>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)),
.Names = c("time",
>>>>> "X1", "X2"), class =
"data.frame", row.names = c(NA, -8L))
>>>>>
>>>>> You can just remove the lines with X1 = 0 since you
don't want to use them.
>>>>>
>>>>>> tab.sub <- tab[tab$X1>0, ]
>>>>>
>>>>> Then the following gives you a sample:
>>>>>
>>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>>>
>>>>> Note, that your "solution" of times 6, 7, and 8
will never appear because the sum of the values is 586.
>>>>>
>>>>>
>>>>> David L. Carlson
>>>>> Department of Anthropology
>>>>> Texas A&M University
>>>>>
>>>>> -----Original Message-----
>>>>> From: R-help [mailto:r-help-bounces at r-project.org] On
Behalf Of Ashta
>>>>> Sent: Saturday, November 21, 2015 11:53 AM
>>>>> To: R help <r-help at r-project.org>
>>>>> Subject: [R] Conditional Random selection
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a data set that contains samples collected over
time.   In
>>>>> each time period the total number of samples are given (X2)
The goal
>>>>> is to  select 500  random samples.    The selection should
be based on
>>>>> time  (select time periods until I reach 500 samples). Also
the time
>>>>> period should have greater than 0 for  X1 variable. X1 is
an indicator
>>>>> variable.
>>>>>
>>>>> Select "time" until reaching the  sum of X2  is
> 500 and if   X1 is  >  0
>>>>>
>>>>> tab  <- read.table(textConnection(" time   X1 X2
>>>>> 1      0        251
>>>>> 2      5        230
>>>>> 3      1        300
>>>>> 4      0         25
>>>>> 5      2         10
>>>>> 6      3         101
>>>>> 7      1         300
>>>>>  8     4         185   "),header = TRUE)
>>>>>
>>>>> In the above example,  samples from time 1 and 4  will not
be selected
>>>>> ( X1 is zero)
>>>>> So I could reach my target by selecting time 6,7, and 8 or 
time 2 and
>>>>> 3 and so on.
>>>>>
>>>>> Can any one help to do that?
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.

ruipbarradas at sapo.pt

2015-Nov-21 20:38 UTC

head link

[R] Conditional Random selection

Hello,

Try

tapply(tab$S1, tab$time, function(x) length(unique(x)))

Hope this helps,

Rui Barradas
?

Citando Ashta <sewashm at gmail.com>:
> Hi? Bert? and all,
> I have related question.? In each? time period there were different
> locations where the samples were collected (S1).? ?I? want count? the
> number of unique locations (S1)? for each unique time period . So in
> time 1 the samples were collected from two locations and time 2 only
> from one location and time 3? from? three locations..
>
> tab? <- read.table(textConnection(" time? ?S1? rep
> 1? ? ? 1? ? ? ?1
> 1? ? ? 2? ? ? ?1
> 1? ? ? 2? ? ? ?2
> 2? ? ? 1? ? ? ?1
> 2? ? ? 1? ? ? ?2
> 2? ? ? 1? ? ? ?3
> 2? ? ? 1? ? ? ?4
> 3? ? ? 1? ? ? ?1
> 3? ? ? 2? ? ? ?1
> 3? ? ? 3? ? ? ?1? ?"),header = TRUE)
>
> what I want is
>
> time? S1
> ? ?1? ? 2
> ? ?2? ? 1
> ? ?3? ? 3
>
> Thank you again.
>
> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote:
>> Thank you Bert!
>>
>> What I want is at least 500 samples based on random? sampling of time
>> period. This allows samples? collected at the same time period are
>> included together.
>>
>> Your script is doing what I wanted to do!!
>>
>> Many thanks
>>
>> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>> David's "solution" is incorrect. It can also fail to
give you times
>>> with a total of 500 items to sample from in the time periods.
>>>
>>> It is not entirely clear what you want. The solution below gives
you a
>>> random sample of time periods in which X1>0 and the total number
of
>>> samples among them is >= 500. It does not give you the fewest
number
>>> of periods that can do this. Is this what you want?
>>>
>>> tab[with(tab,{
>>> ? rownums<- sample(seq_len(nrow(tab))[X1>0])
>>> ? sz <- cumsum(X2[rownums])
>>> ? rownums[c(TRUE,sz<500)]
>>> }),]
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And
knowledge
>>> is certainly not wisdom."
>>> ? ?-- Clifford Stoll
>>>
>>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at
gmail.com> wrote:
>>>> Thank you? David!
>>>>
>>>> I rerun the your script and it is giving me the first three
time periods
>>>> is it doing random sampling?
>>>>
>>>> ? ? ? tab.fan
>>>> ? time X1? X2
>>>> 2? ? 2? 5 230
>>>> 3? ? 3? 1 300
>>>> 5? ? 5? 2? 10
>>>>
>>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson  
>>>> <dcarlson at tamu.edu> wrote:
>>>>> Use dput() to send data to the list as it is more compact:
>>>>>> dput(tab)
>>>>>
>>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L,
1L,
>>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)),
>>>>> .Names = c("time",
>>>>> "X1", "X2"), class =
"data.frame", row.names = c(NA, -8L))
>>>>>
>>>>> You can just remove the lines with X1 = 0 since you
don't want
>>>>> to use them.
>>>>>> tab.sub <- tab[tab$X1>0, ]
>>>>>
>>>>> Then the following gives you a sample:
>>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>>>
>>>>> Note, that your "solution" of times 6, 7, and 8
will never
>>>>> appear because the sum of the values is 586.
>>>>>
>>>>> David L. Carlson
>>>>> Department of Anthropology
>>>>> Texas A&M University
>>>>>
>>>>> -----Original Message-----
>>>>> From: R-help [mailto:r-help-bounces at r-project.org] On
Behalf Of Ashta
>>>>> Sent: Saturday, November 21, 2015 11:53 AM
>>>>> To: R help <r-help at r-project.org>
>>>>> Subject: [R] Conditional Random selection
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a data set that contains samples collected over
time.? ?In
>>>>> each time period the total number of samples are given
(X2)? ?The goal
>>>>> is to? select 500? random samples.? ? The selection should
be based on
>>>>> time? (select time periods until I reach 500 samples). Also
the time
>>>>> period should have greater than 0 for? X1 variable. X1 is
an indicator
>>>>> variable.
>>>>>
>>>>> Select "time" until reaching the? sum of X2? is
> 500 and if?
>>>>> ?X1 is? >? 0
>>>>>
>>>>> tab? <- read.table(textConnection(" time? ?X1 X2
>>>>> 1? ? ? 0? ? ? ? 251
>>>>> 2? ? ? 5? ? ? ? 230
>>>>> 3? ? ? 1? ? ? ? 300
>>>>> 4? ? ? 0? ? ? ? ?25
>>>>> 5? ? ? 2? ? ? ? ?10
>>>>> 6? ? ? 3? ? ? ? ?101
>>>>> 7? ? ? 1? ? ? ? ?300
>>>>> 8? ? ?4? ? ? ? ?185? ?"),header = TRUE)
>>>>>
>>>>> In the above example,? samples from time 1 and 4? will not
be selected
>>>>> ( X1 is zero)
>>>>> So I could reach my target by selecting time 6,7, and 8 or?
time 2 and
>>>>> 3 and so on.
>>>>>
>>>>> Can any one help to do that?
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide  
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide  
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.
?

	[[alternative HTML version deleted]]

R help - Nov 2015 - Conditional Random selection

[R] Conditional Random selection

[R] Conditional Random selection

[R] Conditional Random selection

[R] Conditional Random selection