Black Yellow
2018-Oct-28 20:26 UTC
[R] Gamma Simulation from Exponential Distribution: find sample size given power and sig.Level
I built my function to simulate two gamma distributions X and Y based on the sum of i.i.d exponential distributions. Assume my code is correct about this simulation, I am interested in finding an equal sample size n for X and Y such that n can be determined given 90% power and 5% significance level. My thought process is that I create a sequence of n's value and use a for loop to reiterate each n value in the sequence along with an "if" statement saying that if given the n value, 95% of the time when I replicate the process, the probability that I reject the null hypothesis given the alternative is true is 90%. The problem I am facing is that n value is always the maximum within my n_seq. I am doubting there must be something wrong when I generate my two Gamma distributions. Any input is appreciated. set.seed(3701) n_seq = seq(from = 1, to = 200, by = 1) Z.list = numeric(reps) var.list = numeric(reps) gamma.sim = function(n, mu){ for(j in 1:5e5){ u.list = runif(n = n) x.list = -mu*log(1-u.list) Z.list[j] = sum(x.list) } return(Z.list) } # Generate n generations for X.gamma and Y.gamma gen_p_vals <- function(n = reps){ p_vec <- numeric(reps) for(ii in 1:reps){ x <- gamma.sim(n = 79, mu = 1) y <- gamma.sim(n = 80, mu = 1) t_stat_numerator <- mean(x) - mean(y) - 0 # Test Statistic under null hypothesis s_p <- sqrt(sum((x-mean(x))^2) + sum((y-mean(y))^2)) / (sqrt(2*reps-2)) t_stat <- t_stat_numerator /(s_p*(sqrt((1/reps)+(1/reps)))) p_vec[ii] <- 2* pt(-abs(t_stat), 2*reps-2) } return(p_vec) } power_pvals <- gen_p_vals(n = 10) mean(power_pvals < 0.05) for(j in 2:100){ power_pvals = gen_p_vals(n = n_seq[j]) if (mean(power_pvals < 0.05) == 0.90){ print(j) } } [[alternative HTML version deleted]]
MacQueen, Don
2018-Oct-29 20:00 UTC
[R] Gamma Simulation from Exponential Distribution: find sample size given power and sig.Level
If this is homework, then r-help has a no homework policy. I'm assuming that if it is homework then the focus is on statistical concepts, not on R programming. It looks like your gen_p_vals function should be defined as gen_p_vals <- function(reps = n) instead of n = reps. Why not just use the rgamma() function to simulate your gamma distributions? Why not just use the t.test() function do calculate the t test? But if you really want to calculate the t test manually, use the var() function in the denominator, instead of that x-mean(x) business. It's unnecessary to use return(). Initializing Z.list outside the gamma.sim function does no good. Are you using the variable "n" for different purposes in different places? If so, it's confusing. Although it didn't appear to cause problems this time, sending html formatted email to r-help is generally not a good idea. (and real names are preferred) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 ?On 10/28/18, 1:26 PM, "R-help on behalf of Black Yellow" <r-help-bounces at r-project.org on behalf of blackish952 at gmail.com> wrote: I built my function to simulate two gamma distributions X and Y based on the sum of i.i.d exponential distributions. Assume my code is correct about this simulation, I am interested in finding an equal sample size n for X and Y such that n can be determined given 90% power and 5% significance level. My thought process is that I create a sequence of n's value and use a for loop to reiterate each n value in the sequence along with an "if" statement saying that if given the n value, 95% of the time when I replicate the process, the probability that I reject the null hypothesis given the alternative is true is 90%. The problem I am facing is that n value is always the maximum within my n_seq. I am doubting there must be something wrong when I generate my two Gamma distributions. Any input is appreciated. set.seed(3701) n_seq = seq(from = 1, to = 200, by = 1) Z.list = numeric(reps) var.list = numeric(reps) gamma.sim = function(n, mu){ for(j in 1:5e5){ u.list = runif(n = n) x.list = -mu*log(1-u.list) Z.list[j] = sum(x.list) } return(Z.list) } # Generate n generations for X.gamma and Y.gamma gen_p_vals <- function(n = reps){ p_vec <- numeric(reps) for(ii in 1:reps){ x <- gamma.sim(n = 79, mu = 1) y <- gamma.sim(n = 80, mu = 1) t_stat_numerator <- mean(x) - mean(y) - 0 # Test Statistic under null hypothesis s_p <- sqrt(sum((x-mean(x))^2) + sum((y-mean(y))^2)) / (sqrt(2*reps-2)) t_stat <- t_stat_numerator /(s_p*(sqrt((1/reps)+(1/reps)))) p_vec[ii] <- 2* pt(-abs(t_stat), 2*reps-2) } return(p_vec) } power_pvals <- gen_p_vals(n = 10) mean(power_pvals < 0.05) for(j in 2:100){ power_pvals = gen_p_vals(n = n_seq[j]) if (mean(power_pvals < 0.05) == 0.90){ print(j) } } [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.