thr3ads.net - R help - [R] Bootstrap or subsampling using loop? [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Suparna Mitra

2013-Nov-26 07:51 UTC

[R] Bootstrap or subsampling using loop?

Hello R experts,
  I am trying to do a job where I need to take random subsample from a data
and then frequency count of that. Then the median or the frequency from
say, 1000 replicates. Should I try this with subsample in loop or
bootstrap?
My data format is
> str(Data)
'data.frame': 155752 obs. of  2 variables:

 $ ReadName: Factor w/ 155752 levels
"HWI-ST884185C1PEWACXX:3:1101:10047:62439#0/2",..: 49 325 800 624 786
77 203
825 249 369 ...

 $ Taxa    : Factor w/ 25 levels "Acidimicrobium",..: 1 1 1 1 1 1 1 1
1 1 ..

and then if I take 10 sample like
> Data[sample(nrow(Data), 10), ]
                                           ReadName          Taxa

122657 HWI-ST884185C1PEWACXX:4:2105:16386:68246#0/2       Frankia

91721  HWI-ST884185C1PEWACXX:3:2314:16967:14996#0/1   Rhodococcus

62980  HWI-ST884185C1PEWACXX:4:2101:13052:29946#0/1 Mycobacterium

::::

::::

And count the frequency as:

counts <- ddply(Sample, .(Sample$Taxa), nrow), which results like
> counts
    Sample$Taxa V1

1   Actinomyces  1

2       Frankia  3

3      Gordonia  1

4 Modestobacter  1

5 Mycobacterium  2

6   Rhodococcus  1

7  Tsukamurella  1

Now I need to do this 1000 times and get a median of counts (V1 col). Can
you please suggest the quickest way?

I want to do this with really big data, and my subsample size will be 1
mil, replicate 1000, out of 10 mil size (row) data.

Thanks a lot for help.

Mitra

	[[alternative HTML version deleted]]

R help - Nov 2013 - Bootstrap or subsampling using loop?

[R] Bootstrap or subsampling using loop?