thr3ads.net - R help - [R] Comparison of the amount of computation [Apr 2011]

If this information is useful, please help other people find it:
Share via:

helin_susam

2011-Apr-13 23:12 UTC

[R] Comparison of the amount of computation

Hi dear list,

I want to compare the amount of computation of two functions. For example,
by using this algorithm;

data <- rnorm(n=100, mean=10, sd=3)

output1 <- list ()
for(i in 1:100) {
data1 <- sample(100, 100, replace = TRUE)
statistic1 <- mean(data1)
output1 <- c(output1, list(statistic1))
}
output1

output2 <- list()
for(i in 1:100) {
data2 <- unique(sample(100, 100, replace=TRUE))
statistic2 <- mean(data2)
output2 <- c(output2, list(statistic2))
}
output2

data1 consists of exactly 100 elements, but data2 consists of roughly 55 or
60 elements. So, to get statistic1, for each sample, 100 data points are
used. But, to get statistic2 roughly half of them are used.
I want to proof this difference. Is there any way to do this ? May be R has
a property about this process such as Rprof, i tried use but i could not
sure. 

Thans for any help !

Regards,
Helin.

--
View this message in context:
http://r.789695.n4.nabble.com/Comparison-of-the-amount-of-computation-tp3448436p3448436.html
Sent from the R help mailing list archive at Nabble.com.

Petr Savicky

2011-Apr-14 06:58 UTC

head link

[R] Comparison of the amount of computation

On Wed, Apr 13, 2011 at 04:12:39PM -0700, helin_susam
wrote:> Hi dear list,
> 
> I want to compare the amount of computation of two functions. For example,
> by using this algorithm;
> 
> data <- rnorm(n=100, mean=10, sd=3)
> 
> output1 <- list ()
> for(i in 1:100) {
> data1 <- sample(100, 100, replace = TRUE)
> statistic1 <- mean(data1)
> output1 <- c(output1, list(statistic1))
> }
> output1
> 
> output2 <- list()
> for(i in 1:100) {
> data2 <- unique(sample(100, 100, replace=TRUE))
> statistic2 <- mean(data2)
> output2 <- c(output2, list(statistic2))
> }
> output2
> 
> data1 consists of exactly 100 elements, but data2 consists of roughly 55 or
> 60 elements. So, to get statistic1, for each sample, 100 data points are
> used. But, to get statistic2 roughly half of them are used.
> I want to proof this difference. Is there any way to do this ?
Hi.

Every number from 1:100 has probability 1 - (1 - 1/100)^100 = 0.6339677
to appear in sample(100, 100, replace=TRUE). So, the expected length
of data2 is 63.39677. If you want to estimate the distribution of the
lengths of data2 using a simulation, then record length(data2). For
example

  n <- 10000
  s <- rep(NA, times=n)
  for (i in 1:n) {
      s[i] <- length(unique(sample(100, 100, replace=TRUE)))
  }
  cbind(table(s))

I obtained

     [,1]
  53    5
  54   16
  55   27
  56   82
  57  165
  58  294
  59  465
  60  672
  61  970
  62 1168
  63 1283
  64 1303
  65 1111
  66  882
  67  626
  68  435
  69  250
  70  143
  71   57
  72   27
  73   14
  74    5

In this case, mean(sample(100, 100, replace=TRUE)) and
mean(unique(sample(100, 100, replace=TRUE))) have the same
expected value 50.5. However, eliminating repeated values may,
in general, change the expected value of the sample mean.

Hope this helps.

Petr Savicky.

helin_susam

2011-Apr-14 07:40 UTC

head link

[R] Comparison of the amount of computation

Hi Petr,

Your idea looks like logically. So, can we say this with your idea; the
expected number of computation in unique(sample(...)) is fewer than
sample(...). Because, the expected length is 63.39677 in unique case, while
the expected length is 100 in non-unique case ?

Thanks for reply,

Helin.

--
View this message in context:
http://r.789695.n4.nabble.com/Comparison-of-the-amount-of-computation-tp3448436p3448986.html
Sent from the R help mailing list archive at Nabble.com.

Petr Savicky

2011-Apr-14 13:51 UTC

head link

[R] Comparison of the amount of computation

On Thu, Apr 14, 2011 at 12:40:53AM -0700, helin_susam
wrote:> Hi Petr,
> 
> Your idea looks like logically. So, can we say this with your idea; the
> expected number of computation in unique(sample(...)) is fewer than
> sample(...). Because, the expected length is 63.39677 in unique case, while
> the expected length is 100 in non-unique case ?
Hi Helin:

The number of operations "in unique(sample(...))" sounds like
you mean the operations needed to compute unique(sample(...)).
Your previous question suggests that you mean something different,
namely to compare computing mean(data1) and mean(data2), when

  data1 <- sample(...)
  data2 <- unique(sample(...))

If you only want to confirm that the number of operations needed
to compute mean(data2) is on average smaller than the number of
operations needed to compute mean(data1), then yes, it is.

However, it is not a way to make some computation more efficient,
since mean(data2) is something different from mean(data1).

Petr.

helin_susam

2011-Apr-14 14:13 UTC

head link

[R] Comparison of the amount of computation

Dear Pert,

Many thanks to your reply. Fully you are right!

Best wishes,

Helin.

--
View this message in context:
http://r.789695.n4.nabble.com/Comparison-of-the-amount-of-computation-tp3448436p3449722.html
Sent from the R help mailing list archive at Nabble.com.

R help - Apr 2011 - Comparison of the amount of computation

[R] Comparison of the amount of computation

[R] Comparison of the amount of computation

[R] Comparison of the amount of computation

[R] Comparison of the amount of computation

[R] Comparison of the amount of computation