Hi dear list, I want to compare the amount of computation of two functions. For example, by using this algorithm; data <- rnorm(n=100, mean=10, sd=3) output1 <- list () for(i in 1:100) { data1 <- sample(100, 100, replace = TRUE) statistic1 <- mean(data1) output1 <- c(output1, list(statistic1)) } output1 output2 <- list() for(i in 1:100) { data2 <- unique(sample(100, 100, replace=TRUE)) statistic2 <- mean(data2) output2 <- c(output2, list(statistic2)) } output2 data1 consists of exactly 100 elements, but data2 consists of roughly 55 or 60 elements. So, to get statistic1, for each sample, 100 data points are used. But, to get statistic2 roughly half of them are used. I want to proof this difference. Is there any way to do this ? May be R has a property about this process such as Rprof, i tried use but i could not sure. Thans for any help ! Regards, Helin. -- View this message in context: http://r.789695.n4.nabble.com/Comparison-of-the-amount-of-computation-tp3448436p3448436.html Sent from the R help mailing list archive at Nabble.com.
On Wed, Apr 13, 2011 at 04:12:39PM -0700, helin_susam wrote:> Hi dear list, > > I want to compare the amount of computation of two functions. For example, > by using this algorithm; > > data <- rnorm(n=100, mean=10, sd=3) > > output1 <- list () > for(i in 1:100) { > data1 <- sample(100, 100, replace = TRUE) > statistic1 <- mean(data1) > output1 <- c(output1, list(statistic1)) > } > output1 > > output2 <- list() > for(i in 1:100) { > data2 <- unique(sample(100, 100, replace=TRUE)) > statistic2 <- mean(data2) > output2 <- c(output2, list(statistic2)) > } > output2 > > data1 consists of exactly 100 elements, but data2 consists of roughly 55 or > 60 elements. So, to get statistic1, for each sample, 100 data points are > used. But, to get statistic2 roughly half of them are used. > I want to proof this difference. Is there any way to do this ?Hi. Every number from 1:100 has probability 1 - (1 - 1/100)^100 = 0.6339677 to appear in sample(100, 100, replace=TRUE). So, the expected length of data2 is 63.39677. If you want to estimate the distribution of the lengths of data2 using a simulation, then record length(data2). For example n <- 10000 s <- rep(NA, times=n) for (i in 1:n) { s[i] <- length(unique(sample(100, 100, replace=TRUE))) } cbind(table(s)) I obtained [,1] 53 5 54 16 55 27 56 82 57 165 58 294 59 465 60 672 61 970 62 1168 63 1283 64 1303 65 1111 66 882 67 626 68 435 69 250 70 143 71 57 72 27 73 14 74 5 In this case, mean(sample(100, 100, replace=TRUE)) and mean(unique(sample(100, 100, replace=TRUE))) have the same expected value 50.5. However, eliminating repeated values may, in general, change the expected value of the sample mean. Hope this helps. Petr Savicky.
Hi Petr, Your idea looks like logically. So, can we say this with your idea; the expected number of computation in unique(sample(...)) is fewer than sample(...). Because, the expected length is 63.39677 in unique case, while the expected length is 100 in non-unique case ? Thanks for reply, Helin. -- View this message in context: http://r.789695.n4.nabble.com/Comparison-of-the-amount-of-computation-tp3448436p3448986.html Sent from the R help mailing list archive at Nabble.com.
On Thu, Apr 14, 2011 at 12:40:53AM -0700, helin_susam wrote:> Hi Petr, > > Your idea looks like logically. So, can we say this with your idea; the > expected number of computation in unique(sample(...)) is fewer than > sample(...). Because, the expected length is 63.39677 in unique case, while > the expected length is 100 in non-unique case ?Hi Helin: The number of operations "in unique(sample(...))" sounds like you mean the operations needed to compute unique(sample(...)). Your previous question suggests that you mean something different, namely to compare computing mean(data1) and mean(data2), when data1 <- sample(...) data2 <- unique(sample(...)) If you only want to confirm that the number of operations needed to compute mean(data2) is on average smaller than the number of operations needed to compute mean(data1), then yes, it is. However, it is not a way to make some computation more efficient, since mean(data2) is something different from mean(data1). Petr.
Dear Pert, Many thanks to your reply. Fully you are right! Best wishes, Helin. -- View this message in context: http://r.789695.n4.nabble.com/Comparison-of-the-amount-of-computation-tp3448436p3449722.html Sent from the R help mailing list archive at Nabble.com.