Padmanabhan, Sudharsha
2003-Aug-19 17:42 UTC
[R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!
Hello, I am running a few simulations for clinical trial anlysis. I want some help regarding the following. We know trhat as the sample size increases, the variance should decrease, but I am getting some unexpected results. SO I ran a code (shown below) to check the validity of this. large<-array(1,c(1000,1000)) small<-array(1,c(100,1000)) for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} yy<-array(1,100) for(i in 1:100){yy[i]<-var(small[i,])} y1y<-array(1,1000) for(i in 1:1000){y1y[i]<-var(large[i,])} mean(yy);mean(y1y); [1] 8.944 [1] 9.098 This shows that on an average,for 1000 such samples of 1000 Normal numbers, the variance is higher than that of a 100 samples of 1000 random numbers. Why is this so? Can someone please help me out???? Thanks. Regards ~S.
Padmanabhan, Sudharsha
2003-Aug-19 17:42 UTC
[R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!!
Hello, I am running a few simulations for clinical trial anlysis. I want some help regarding the following. We know trhat as the sample size increases, the variance should decrease, but I am getting some unexpected results. SO I ran a code (shown below) to check the validity of this. large<-array(1,c(1000,1000)) small<-array(1,c(100,1000)) for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} yy<-array(1,100) for(i in 1:100){yy[i]<-var(small[i,])} y1y<-array(1,1000) for(i in 1:1000){y1y[i]<-var(large[i,])} mean(yy);mean(y1y); [1] 8.944 [1] 9.098 This shows that on an average,for 1000 such samples of 1000 Normal numbers, the variance is higher than that of a 100 samples of 1000 random numbers. Why is this so? Can someone please help me out???? Thanks. Regards ~S.
The variance of Xbar decreases as 1/n; the sample variance of X does not. - tom blackwell - u michigan medical school - ann arbor - On Tue, 19 Aug 2003, Padmanabhan, Sudharsha wrote:> I am running a few simulations for clinical trial anlysis. I want some help > regarding the following. > > We know trhat as the sample size increases, the variance should decrease, but > I am getting some unexpected results. SO I ran a code (shown below) to check > the validity of this. > > large<-array(1,c(1000,1000)) > small<-array(1,c(100,1000)) > for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} > for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} > yy<-array(1,100) > for(i in 1:100){yy[i]<-var(small[i,])} > y1y<-array(1,1000) > for(i in 1:1000){y1y[i]<-var(large[i,])} > mean(yy);mean(y1y); > [1] 8.944 > [1] 9.098 > > This shows that on an average,for 1000 such samples of 1000 Normal numbers, > the variance is higher than that of a 100 samples of 1000 random numbers. > > Why is this so? > Can someone please help me out???? >
On 08/19/03 17:42, Padmanabhan, Sudharsha wrote:> >Hello, > >I am running a few simulations for clinical trial anlysis. I want some help >regarding the following. > >We know trhat as the sample size increases, the variance should decrease, but >I am getting some unexpected results. SO I ran a code (shown below) to check >the validity of this. > >large<-array(1,c(1000,1000)) >small<-array(1,c(100,1000)) >for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} >for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} >yy<-array(1,100) >for(i in 1:100){yy[i]<-var(small[i,])} >y1y<-array(1,1000) >for(i in 1:1000){y1y[i]<-var(large[i,])} >mean(yy);mean(y1y); >[1] 8.944 >[1] 9.098 > > >This shows that on an average,for 1000 such samples of 1000 Normal numbers, >the variance is higher than that of a 100 samples of 1000 random numbers. > >Why is this so?Don't know, but it could be a fluke. You don't say how many times you did it. I did the following, with 1000 in each test. You have 100 in the small test and 1000 in the big one. My numbers look pretty close.> bigmat <- matrix(rnorm(1000000),1000,1000) # 1000 rows of 1000 each > smallmat <- matrix(rnorm(100000),1000,100) # 1000 rows of 100 each > mean(apply(bigmat,1,var)) # get variance of each row, then take mean[1] 0.9999344> mean(apply(smallmat,1,var))[1] 0.9967427 -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron R page: http://finzi.psych.upenn.edu/
Perhaps you were trying for "as sample size increases, variance *of the mean* decreases" (a least when variance is finite). If you swap "mean" and "var" in your code, I think you will get what you are looking for. -- Tony Plate At Tuesday 05:42 PM 8/19/2003 +0000, Padmanabhan, Sudharsha wrote:>Hello, > >I am running a few simulations for clinical trial anlysis. I want some help >regarding the following. > >We know trhat as the sample size increases, the variance should decrease, but >I am getting some unexpected results. SO I ran a code (shown below) to check >the validity of this. > >large<-array(1,c(1000,1000)) >small<-array(1,c(100,1000)) >for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} >for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} >yy<-array(1,100) >for(i in 1:100){yy[i]<-var(small[i,])} >y1y<-array(1,1000) >for(i in 1:1000){y1y[i]<-var(large[i,])} >mean(yy);mean(y1y); >[1] 8.944 >[1] 9.098 > > >This shows that on an average,for 1000 such samples of 1000 Normal numbers, >the variance is higher than that of a 100 samples of 1000 random numbers. > >Why is this so? > > >Can someone please help me out???? > >Thanks. > >Regards > >~S. > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help
First of all, your subscripting is wrong. The first index is for row, and the second for column. Thus large[i,] refers to the i-th row of large, rather than the i-th column. Also, the code as you provided contain syntax error. Try: set.seed(311) ## Always a good idea to set seed for simulation! large <- matrix(rnorm(1000*1000), 1000, 1000) small <- matrix(rnorm(100*1000), 100, 1000) var.large <- apply(large, 2, var) ## Apply the var function to each column var.small <- apply(small, 2, var) The result looks like:> summary(var.large); summary(var.small)Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8617 0.9705 1.0010 1.0020 1.0320 1.1520 Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5846 0.9021 0.9948 0.9990 1.0850 1.5360 as expected: The mean is about the same, but the spread is much smaller for larger sample size. This sort of things can be computed exactly using basic math stat, BTW. Andy> -----Original Message----- > From: Padmanabhan, Sudharsha [mailto:sudAR_80 at neo.tamu.edu] > Sent: Tuesday, August 19, 2003 1:43 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!! > > > > Hello, > > I am running a few simulations for clinical trial anlysis. I > want some help > regarding the following. > > We know trhat as the sample size increases, the variance > should decrease, but > I am getting some unexpected results. SO I ran a code (shown > below) to check > the validity of this. > > large<-array(1,c(1000,1000)) > small<-array(1,c(100,1000)) > for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} > for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} > yy<-array(1,100) > for(i in 1:100){yy[i]<-var(small[i,])} > y1y<-array(1,1000) > for(i in 1:1000){y1y[i]<-var(large[i,])} > mean(yy);mean(y1y); > [1] 8.944 > [1] 9.098 > > > This shows that on an average,for 1000 such samples of 1000 > Normal numbers, > the variance is higher than that of a 100 samples of 1000 > random numbers. > > Why is this so? > > > Can someone please help me out???? > > Thanks. > > Regards > > ~S. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.
I think you are confused. As sample size increases, the variance of an estimate based on that sample will decrease asymtotically to zero (e.g., the standard error of the mean will go to zero). However the variance of the sample itself will not change. Any difference you see in your data is simply due to chance. If you repeat, the larger set may or may not have a larger variance.> var(rnorm(10000, 0, 3))[1] 8.958727> var(rnorm(10000, 0, 3))[1] 9.155332> var(rnorm(10000, 0, 3))[1] 9.050894> var(rnorm(10000, 0, 3))[1] 9.282509> var(rnorm(100000, 0, 3))[1] 8.990778> var(rnorm(100000, 0, 3))[1] 9.024343> var(rnorm(100000, 0, 3))[1] 8.999064> > var(rnorm(100000, 0, 3))[1] 9.088034 HTH Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623>>> "Padmanabhan, Sudharsha" <sudAR_80 at neo.tamu.edu> 08/19/03 01:42PM >>>Hello, I am running a few simulations for clinical trial anlysis. I want some help regarding the following. We know trhat as the sample size increases, the variance should decrease, but I am getting some unexpected results. SO I ran a code (shown below) to check the validity of this. large<-array(1,c(1000,1000)) small<-array(1,c(100,1000)) for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} yy<-array(1,100) for(i in 1:100){yy[i]<-var(small[i,])} y1y<-array(1,1000) for(i in 1:1000){y1y[i]<-var(large[i,])} mean(yy);mean(y1y); [1] 8.944 [1] 9.098 This shows that on an average,for 1000 such samples of 1000 Normal numbers, the variance is higher than that of a 100 samples of 1000 random numbers. Why is this so? Can someone please help me out???? Thanks. Regards ~S. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Hi. There is no reason the variance of a normal should decrease as you take larger samples. Indeed, in your call itself, you say that you want a sample from a normal with a standard deviation of 3, and so a variance of 9. As expected, both of your estimates of variance are close to 9. What should decrease is the variance of the estimate of the mean, which is the variance of the sample divided by the number of elements in your sample. That will indeed decrease as n increases. Also, a couple of R programming points raised by your example: You can populate your entire matrix of random numbers with a single call, with good time savings. (That probably doesn't matter much in this toy example, but might if you do larger simulations for some problem.) For example: matrix(rnorm(100000, 0, 3), nr = 100, nc = 1000) gets you your matrix "small". Similarly, your loop over the rows for taking variance can be replaced by yy <- apply(small, 1, var) Which may not be faster, but is certainly easier to read. And of course you'd want to replace the call to var with a function that calculates standard error. Hope this helps, Matt Wiener -----Original Message----- From: Padmanabhan, Sudharsha [mailto:sudAR_80 at neo.tamu.edu] Sent: Tuesday, August 19, 2003 1:43 PM To: r-help at stat.math.ethz.ch Subject: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!! Hello, I am running a few simulations for clinical trial anlysis. I want some help regarding the following. We know trhat as the sample size increases, the variance should decrease, but I am getting some unexpected results. SO I ran a code (shown below) to check the validity of this. large<-array(1,c(1000,1000)) small<-array(1,c(100,1000)) for(i in 1:1000){large[i,]<-rnorm(1000,0,3)} for(i in 1:1000){small[i,]<-rnorm(100,0,3)}} yy<-array(1,100) for(i in 1:100){yy[i]<-var(small[i,])} y1y<-array(1,1000) for(i in 1:1000){y1y[i]<-var(large[i,])} mean(yy);mean(y1y); [1] 8.944 [1] 9.098 This shows that on an average,for 1000 such samples of 1000 Normal numbers, the variance is higher than that of a 100 samples of 1000 random numbers. Why is this so? Can someone please help me out???? Thanks. Regards ~S. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.
Hi All: Many servers, routers, and firewall have the configuration files set such that if the word HELP appears in the subject line the message is not delivered to the addressee but is delivered to they system operator. Re: [R] Variance Computing- - HELP!!!!!!!!!!!!!!!!!! Thanks Frank
"Padmanabhan, Sudharsha" <sudAR_80 at neo.tamu.edu> We know trhat as the sample size increases, the variance should decrease, Should it? I can paraphrase his test case thus: v100 <- sapply(1:100, function(i) var(rnorm(100, 0, 3))) # We expect the elements of v100 to cluster around 3^2 v1000 <- sapply(1:1000, function(i) var(rnorm(1000, 0, 3))) # We expect the elements of v1000 to cluster around 3^2 too. fivenum(v100) => [1] 6.469134 7.884637 8.916314 10.189463 13.897817 # ^^^^^^^^ fivenum(v1000) => [1] 7.874345 8.692326 8.967684 9.268955 10.503038 # ^^^^^^^^ The population parameter sigma-squared is 3^2 = 9. The estimates are 8.92 in one case and 8.97 in the other; sounds about right to me. Looking at density(v100) and density(v1000) is enlightening. Means and standard deviations: mean(v100) var(v100) => 9.080676 2.376193 mean(v1000) var(v1000) => 8.98147 0.1721246 Are these not pretty much as expected? Not that a t-test is the ideal test for the distributions involved, but it's familiar and since the distribution is pretty bell-shaped, it may be usable as a rough guide to whether to be worried or not.> t.test(v100, v1000)Welch Two Sample t-test data: v100 and v1000 t = 0.6413, df = 100.439, p-value = 0.5228 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2077100 0.4061231 sample estimates: mean of x mean of y 9.080676 8.981469