thr3ads.net - R help - [R] Sampling problems [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Oritteropus

2012-Mar-07 16:41 UTC

[R] Sampling problems

Hi,
I need to sample randomly my dataset for 1000 times. The sample need to be
the 80%. I know how to do that, my problem is that not only I need the 80%,
but I also need the corresponding 20% each time. Is there any way to do
that?
Alternatively, I was thinking to something like setdiff () function to
compare my 80% sample to the original dataset and obtain the corresponding
20%, unfortunately setdiff works just for vectors, do you know a similar
function for dataframes?
Thanks

--
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4453752.html
Sent from the R help mailing list archive at Nabble.com.

Sarah Goslee

2012-Mar-07 20:04 UTC

head link

[R] Sampling problems

You could make a vector containing the number of TRUE values that
makes up 80% of your data, and the number of FALSE values that makes
up 20% of your data. Use sample() to reorder it, then use it to divide
your dataset.

If you had provided a reproducible example, I could write you code.

Sarah

On Wed, Mar 7, 2012 at 11:41 AM, Oritteropus <lucasantini85 at
hotmail.com> wrote:> Hi,
> I need to sample randomly my dataset for 1000 times. The sample need to be
> the 80%. I know how to do that, my problem is that not only I need the 80%,
> but I also need the corresponding 20% each time. Is there any way to do
> that?
> Alternatively, I was thinking to something like setdiff () function to
> compare my 80% sample to the original dataset and obtain the corresponding
> 20%, unfortunately setdiff works just for vectors, do you know a similar
> function for dataframes?
> Thanks
>
-- 
Sarah Goslee
http://www.functionaldiversity.org

Petr Savicky

2012-Mar-07 20:24 UTC

head link

[R] Sampling problems

On Wed, Mar 07, 2012 at 08:41:35AM -0800, Oritteropus
wrote:> Hi,
> I need to sample randomly my dataset for 1000 times. The sample need to be
> the 80%. I know how to do that, my problem is that not only I need the 80%,
> but I also need the corresponding 20% each time. Is there any way to do
> that?
Hi.

If you use sample() to get the 80% and store the indices, you
can also get the remaining cases

  a <- matrix(1:30, ncol=3)
  i <- sample(10, 8)
  a[sort(i), ]

       [,1] [,2] [,3]
  [1,]    1   11   21
  [2,]    2   12   22
  [3,]    3   13   23
  [4,]    4   14   24
  [5,]    6   16   26
  [6,]    7   17   27
  [7,]    8   18   28
  [8,]   10   20   30

  a[-i, ]

       [,1] [,2] [,3]
  [1,]    5   15   25
  [2,]    9   19   29

Hope this helps.

Petr Savicky.

David Winsemius

2012-Mar-07 20:24 UTC

head link

[R] Sampling problems

On Mar 7, 2012, at 11:41 AM, Oritteropus wrote:
> Hi,
> I need to sample randomly my dataset for 1000 times. The sample need  
> to be
> the 80%. I know how to do that, my problem is that not only I need  
> the 80%,
> but I also need the corresponding 20% each time. Is there any way to  
> do
> that?
> Alternatively, I was thinking to something like setdiff () function to
> compare my 80% sample to the original dataset and obtain the  
> corresponding
> 20%, unfortunately setdiff works just for vectors, do you know a  
> similar
> function for dataframes?
Create an index vector with runif or sample and then use that to get  
you sample and use negative indexing to get the remainder.

idx <- sample(1:1000, 800)
x[ idx, ]  # 80%
x[ -idx, ] # the other 20%

(I think this does presume you have not mucked with the default  
rownames.)


-- 

David Winsemius, MD
West Hartford, CT

Oritteropus

2012-Mar-08 09:02 UTC

head link

[R] Sampling problems

Hi sarah, it is not clear to me how to do that, can you show me please?

Imagine I have a situation like this:

MeanA <- read.csv("MeanAmf.csv",header=T)
mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),]

Then?


--
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4455921.html
Sent from the R help mailing list archive at Nabble.com.

Oritteropus

2012-Mar-08 14:00 UTC

head link

[R] Simple solution

Hi everybody,
Thank you all for your suggestions, you have been very helpful. 
However at the end I solved in this way:

mysample <- MaxDH[sample(1:nrow(MaxDH), 150, replace=FALSE),]
A<-mysample[1:120,]
B<-mysample[121:150,]

So simple at the end...

Best,

Luca

--
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456469.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Mar 2012 - Sampling problems

[R] Sampling problems

[R] Sampling problems

[R] Sampling problems

[R] Sampling problems

[R] Sampling problems

[R] Simple solution

Reasonably Related Threads