An apparatus exists whereby a collection of balls is displaced to the top of a stack by suction. A top level (Level 1) each ball is shifted 1 unit to the left or 1 unit to the right at random with equal probability. The ball then drops down to level 2. At Level 2, each ball is again shifted 1 unit to the left or 1 unit to the right at random. The process continues for 15 levels and the balls are collected at the bottom for a short time until being collected by suction to the top. Can we do a simulation of the process using R with 100 balls, and plot the frequency distribution of the position of the balls at the bottom with respect to the entry position? thx in advance ej
Ethan Johnsons <ethan.johnsons <at> gmail.com> writes:> > An apparatus exists ... > > Can we do a simulation of the process using R with 100 balls, and plot > the frequency distribution of the position of the balls at the bottom > with respect to the entry position? >Can you make a plausible case that this isn't a homework problem? Ben Bolker
R 2.3.1 Windows XP I am surprised at the lack of precision in R, as noted below. I would expect the values to be closer to zero, particularly the later examples where the sample size is quite large.> mean(rnorm(500,0,1))[1] -0.03209727> mean(rnorm(5000,0,1))[1] -0.005991322> mean(rnorm(50000,0,1))[1] -0.0006160524> mean(rnorm(500000,0,1))[1] -0.001254309> mean(rnorm(5000000,0,1))[1] 0.0004633778> mean(rnorm(50000000,0,1))[1] -0.0001325764 I would appreciate any thoughts. Is the lack of precision due to running on a 32-bit system? Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}}
John Sorkin wrote:> R 2.3.1 > Windows XP > > I am surprised at the lack of precision in R, as noted below. I would > expect the values to be closer to zero, particularly the later examples > where the sample size is quite large. > > >> mean(rnorm(500,0,1)) >> > [1] -0.03209727 > >> mean(rnorm(5000,0,1)) >> > [1] -0.005991322 > >> mean(rnorm(50000,0,1)) >> > [1] -0.0006160524 > >> mean(rnorm(500000,0,1)) >> > [1] -0.001254309 > >> mean(rnorm(5000000,0,1)) >> > [1] 0.0004633778 > >> mean(rnorm(50000000,0,1)) >> > [1] -0.0001325764 > > I would appreciate any thoughts. Is the lack of precision due to > running on a 32-bit system? > > Thanks, > John > > > JLooks OK to me. The theoretical SEM in the latter case is sqrt(1/5e7)==0.0001414214.
Indeed, you are correct! In never pays to do stats too early in the morning! John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu>>> Peter Dalgaard <p.dalgaard at biostat.ku.dk> 12/9/2006 9:41 AM >>>John Sorkin wrote:> R 2.3.1 > Windows XP > > I am surprised at the lack of precision in R, as noted below. Iwould> expect the values to be closer to zero, particularly the laterexamples> where the sample size is quite large. > > >> mean(rnorm(500,0,1)) >> > [1] -0.03209727 > >> mean(rnorm(5000,0,1)) >> > [1] -0.005991322 > >> mean(rnorm(50000,0,1)) >> > [1] -0.0006160524 > >> mean(rnorm(500000,0,1)) >> > [1] -0.001254309 > >> mean(rnorm(5000000,0,1)) >> > [1] 0.0004633778 > >> mean(rnorm(50000000,0,1)) >> > [1] -0.0001325764 > > I would appreciate any thoughts. Is the lack of precision due to > running on a 32-bit system? > > Thanks, > John > > > JLooks OK to me. The theoretical SEM in the latter case is sqrt(1/5e7)==0.0001414214. Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}}
On 12/9/2006 9:26 AM, John Sorkin wrote:> R 2.3.1 > Windows XP > > I am surprised at the lack of precision in R, as noted below. I would > expect the values to be closer to zero, particularly the later examples > where the sample size is quite large.These are all cases where the theoretical distributions are easy to calculate, and the results you are getting are not particularly unusual:> >> mean(rnorm(500,0,1)) > [1] -0.03209727> pnorm(-0.03209727, sd=1/sqrt(500)) [1] 0.2364660>> mean(rnorm(5000,0,1)) > [1] -0.005991322> pnorm(-0.005991322, sd=1/sqrt(5000)) [1] 0.3359104 [skipped some]>> mean(rnorm(50000,0,1)) > [1] -0.0006160524 >> mean(rnorm(500000,0,1)) > [1] -0.001254309 >> mean(rnorm(5000000,0,1)) > [1] 0.0004633778 >> mean(rnorm(50000000,0,1)) > [1] -0.0001325764> pnorm(-0.0001325764, sd=1/sqrt(50000000)) [1] 0.1742618> > I would appreciate any thoughts. Is the lack of precision due to > running on a 32-bit system?No, the lack of precision seems to be due to the distributions you are sampling from. Duncan Murdoch> > Thanks, > John > > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > jsorkin at grecc.umaryland.edu > > Confidentiality Statement: > This email message, including any attachments, is for the so...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ducan, First, thank you for your reply. As I previously noted in a reply to Peter Dalgaard, you, and he, are correct. I missed something quite obvious. Sometimes it does not pay to do statistics early in the morning! Again, many thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu>>> Duncan Murdoch <murdoch at stats.uwo.ca> 12/09/06 11:39 AM >>>On 12/9/2006 9:26 AM, John Sorkin wrote:> R 2.3.1 > Windows XP > > I am surprised at the lack of precision in R, as noted below. Iwould> expect the values to be closer to zero, particularly the laterexamples> where the sample size is quite large.These are all cases where the theoretical distributions are easy to calculate, and the results you are getting are not particularly unusual:> >> mean(rnorm(500,0,1)) > [1] -0.03209727> pnorm(-0.03209727, sd=1/sqrt(500)) [1] 0.2364660>> mean(rnorm(5000,0,1)) > [1] -0.005991322> pnorm(-0.005991322, sd=1/sqrt(5000)) [1] 0.3359104 [skipped some]>> mean(rnorm(50000,0,1)) > [1] -0.0006160524 >> mean(rnorm(500000,0,1)) > [1] -0.001254309 >> mean(rnorm(5000000,0,1)) > [1] 0.0004633778 >> mean(rnorm(50000000,0,1)) > [1] -0.0001325764> pnorm(-0.0001325764, sd=1/sqrt(50000000)) [1] 0.1742618> > I would appreciate any thoughts. Is the lack of precision due to > running on a 32-bit system?No, the lack of precision seems to be due to the distributions you are sampling from. Duncan Murdoch> > Thanks, > John > > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > jsorkin at grecc.umaryland.edu > > Confidentiality Statement: > This email message, including any attachments, is for the\...{{dropped}}
Ethan Johnsons <ethan.johnsons <at> gmail.com> writes:> > An apparatus exists whereby a collection of balls is displaced to the > top of a stack by suction. A top level (Level 1) each ball is shifted > 1 unit to the left or 1 unit to the right at random with equal > probability. The ball then drops down to level 2. At Level 2, each > ball is again shifted 1 unit to the left or 1 unit to the right at > random. The process continues for 15 levels and the balls are > collected at the bottom for a short time until being collected by > suction to the top. > > Can we do a simulation of the process using R with 100 balls, and plot > the frequency distribution of the position of the balls at the bottom > with respect to the entry position? >After poking around a bit, I see from your Google groups profile that you are (or at least claim to be) self-teaching yourself biostats -- which explains your really heavy posting record here and in other stats forums and means this isn't necessarily homework. (I know, I could just keep my mouth shut). It would probably be helpful if it were possible for you to find someone local to help with some of these questions, as many of them are basic stats questions ... As for this question -- this process generates a binomial distribution. Some ways of simulating the frequency distribution are (1) use rbinom() to sample a number of left- moves for each of 100 balls -- then the number of right-moves is known, and you can calculate the ending position (2) construct a string of 1 and -1 values for each ball (either use sample(c(-1,1),size=100,replace=TRUE), or something involving sign(runif(100),-1,1)) and use cumsum() to calculate the final position (3) use a for() loop to sample values and calculate the position one step at a time [move <- sample(c(-1,1)); cpos <- cpos+move; etc.] #1 is by far the most efficient, #3 the least, but might be the most transparent. Would you consider "giving back to the community" by writing up some of the most useful advice you've gotten for the R wiki? Ben Bolker