Icabalceta, Jorge L.
2004-Feb-17 17:05 UTC
[R] A log on Bayesian statistics, stochastic cost frontier, montecarl o markov chains, bayesian P-values
Dear friends, Over the past weeks, I have been asking a lot of questions about how to use R in Bayesian analysis. I am brand new to R, but I am very pleased with it. I started with winbugs but I found winbugs to be a limited software, not bad but has several limitations. By contrast, R allows the analyst to tackle any problem with a huge set of tools for any kind of analysis. I love R. In addition, I wanted to thank all the people that without hesitation have lent a hand. I have received prompt answers from many people on many questions. At this moment, I feel that I am a novice in R but hopefully I will become an expert and be able to help other users the same way I have been helped by you. I wanted to give back to the R-help list the log of all communications I have received regarding the questions I had. I think that this log can help next time somebody who may have some questions about Bayesian analysis and, particularly stochastic cost frontier analysis. In addition, I offer to share the little I know about it with anybody interested in it. Jorge -----Original Message----- From: Icabalceta, Jorge L. Sent: Thursday, February 05, 2004 1:53 PM To: 'r-help@lists.R-project.org' <mailto:'r-help@lists.R-project.org'> Subject: rgamma question I was trying to generate random numbers with a gamma distribution. In R the function is: rgamma(n, shape, rate = 1, scale = 1/rate). My question is that if X~gamma(alpha, beta) and I want to generate one random number where do I plug alpha and beta in rgamma? and, what is the meaning and use of rate? Thanks for your attention, Jorge Did you look at the help? From ?rgamma: <quote> Details: If 'scale' is omitted, it assumes the default value of '1'. The Gamma distribution with parameters 'shape' = a and 'scale' = s has density f(x)= 1/(s^a Gamma(a)) x^(a-1) e^-(x/s) for x > 0, a > 0 and s > 0. The mean and variance are E(X) = a*s and Var(X) = a*s^2. </quote> Then, depending how you define "alpha" and "beta" use the above to figure out how to use rgamma. -sundar -----Original Message----- From: Peter Dalgaard [mailto:p.dalgaard@biostat.ku.dk] Sent: Thursday, February 05, 2004 2:54 PM To: Icabalceta, Jorge L. Cc: 'r-help@lists.R-project.org' <mailto:'r-help@lists.R-project.org'> Subject: Re: [R] rgamma question "Icabalceta, Jorge L." < Icabalceta_j@wlf.state.la.us <mailto:Icabalceta_j@wlf.state.la.us> > writes:> I was trying to generate random numbers with a gamma distribution. In Rthe> function is: > rgamma(n, shape, rate = 1, scale = 1/rate). My question is that if > X~gamma(alpha, beta) and I want to generate one random number where do I > plug alpha and beta in rgamma? and, what is the meaning and use of rate?Well, it depends on your definition of alpha and beta.... You need to match up your notation for the gamma density with that given on help(rgamma), which will also tell you what to do with them. The "rate" argument just allows you to specify the scale as its inverse. A large rate corresponds to a narrow distribution. I suspect this is popular notation for interarrival distributions in queuing theory. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ( p.dalgaard@biostat.ku.dk <mailto:p.dalgaard@biostat.ku.dk> ) FAX: (+45) 35327907 -----Original Message----- From: Thomas Lumley [mailto:tlumley@u.washington.edu] Sent: Thursday, February 05, 2004 3:26 PM To: Icabalceta, Jorge L. Cc: 'r-help@lists.R-project.org' <mailto:'r-help@lists.R-project.org'> Subject: Re: [R] rgamma question On Thu, 5 Feb 2004, Icabalceta, Jorge L. wrote: It depends on what you mean by gamma(alpha,beta). It could be rgamma(1,alpha,beta) or rgamma(1,alpha,1/beta) since both of these parameterisations are used. If you think the mean of gamma(alpha,beta) is alpha*beta, use the second one, if you think it is alpha/beta use the first one. -thomas -----Original Message----- From: Rolf Turner [mailto:rolf@math.unb.ca] Sent: Friday, February 06, 2004 1:40 PM To: Icabalceta_j@wlf.state.la.us <mailto:Icabalceta_j@wlf.state.la.us> Subject: RE: [R] rgamma question> Sorry to bother you. I am sort of confused with the random number > generation from a gamma distribution. For some reason, if a variable > X~gamma(shape, scale)I have been told that the mean mu can be eithe > mu=shape/scale or mu=shape*scale. Why is that so? This is making me > have doubt about the random generation I am using:For crying out loud READ the messages you have already received. These have made it abundantly clear: It all DEPENDS on how you ***parameterize*** the gamma distribution. If you can't understand this simple fact then you shouldn't be doing this stuff. The help for the gamma distribution also makes this parameterization issue quite clear. READ THE HELP!> 1.- The first variable I have is the precision (h) (the inverse of > the variance)distributed as: > > h~gamma((T-2)/2, SSE/2) where T is the number of observations and SSE > is the sum of square errors. How do I draw random number in R with > this information. How do I plug this into rgamma(n,shape,scale)?As you have been told ***repeatedly*** it depends on how the the parameters of the gamma distribution are to be interpreted. If you have ``h ~ gamma(a,b)'' does this mean that E(h) = a*b or E(h) = a/b ? (These correspond to the two common parameterizations of the gamma family.) If the first applies then a = shape and b = scale. If the second applies then a = shape and b = rate. So R allows for both possibilities.> 2.- The second variable is L^-1 ~ gamma(T+1, vi-ln(r*)^-1) (please, > don't mind my no explanation for the terms used in the shape and > scale parameters). Again, How do I plug this into > rgamma(n,shape,scale)?Same answer.> 3.- I am having a problem putting the results of each iteration from > for(i 1:11000) into samp. For some reason I get 10000 identical > values for each column. What am I doing wrong? To see what the > problem is you can try to run the program. Attached is the data I am > using, the program with comments and without them.I haven't bothered to read through the incredibly long-winded load of rubbish that you enclosed. If you are trying to get an 11000 x 10000 matrix, each of whose rows is a random sample of size 10000 (constructed in some arcane --- probably unnecessarily arcane --- fashion) then you might do something like: samp <- matrix(0,11000,1000) for(i in 1:11000) { . . (construct a random sample of size 10000, called ``xxx'') . . samp[i,] <- xxx } There is probably a much more efficient way of accomplishing this; using a for loop is likely to be unnecessary and slow. However it will (eventually) work. cheers, Rolf Turner rolf@math.unb.ca <mailto:rolf@math.unb.ca> -----Original Message----- From: Thomas Lumley [mailto:tlumley@u.washington.edu] Sent: Friday, February 06, 2004 2:09 PM To: Icabalceta, Jorge L. Cc: 'r-help@lists.R-project.org' <mailto:'r-help@lists.R-project.org'> Subject: RE: [R] rgamma question On Fri, 6 Feb 2004, Icabalceta, Jorge L. wrote:> Sorry to bother you. I am sort of confused with the random numbergeneration> from a gamma distribution. For some reason, if a variable X~gamma(shape, > scale)I have been told that the mean mu can be eithe mu=shape/scale or > mu=shape*scale. Why is that so?It isn't. The mean is always shape*scale. However, the notation Gamma(alpha,beta) is ambiguous. Some people use beta=scale (in which case mean=alpha*beta) and others use beta=1/scale (in which case mean=alpha/beta). A few people even use use alpha=mean, beta=variance, but they usually know who they are. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley@u.washington.edu <mailto:tlumley@u.washington.edu> University of Washington, Seattle -----Original Message----- From: Rolf Turner [mailto:rolf@math.unb.ca] Sent: Friday, February 06, 2004 2:15 PM To: Icabalceta_j@wlf.state.la.us <mailto:Icabalceta_j@wlf.state.la.us> Subject: RE: [R] rgamma question One more thing --- I don't know, but I ***suspect*** that the first parameterization is what you want. I.e. the one in which the mean is a*b. This is (by a considerable margin) the ***more*** common of the two common parameterizations of the gamma distribution. cheers, Rolf Turner -----Original Message----- From: Duncan Murdoch [mailto:dmurdoch@pair.com] Sent: Thursday, February 12, 2004 11:28 AM To: Icabalceta, Jorge L. Cc: 'r-help@lists.R-project.org' <mailto:'r-help@lists.R-project.org'> Subject: Re: [R] How do you create a "MCMC" object? On Thu, 12 Feb 2004 10:06:41 -0600, "Icabalceta, Jorge L." < Icabalceta_j@wlf.state.la.us <mailto:Icabalceta_j@wlf.state.la.us> > wrote :>I have been running a Gibbs Sampler to estimate levels of efficiency in the >Louisiana Shrimp Industry. I created a matrix (samp) where I stored the >results of each iteration for 86 variables. I run 10,000 iterations. So,the>matrix samp is 10,000 x 86. I want to use the gelman-rubin test to checkfor>convergence. To do that, I need at least two chains. If I run second chain >with different starting values and seed, I could save to the matrix'samp2'.>So, I will have two matrices 10,000x86. I want to use the function >gelman.diag(x, confidence = 0.95, transform=FALSE), where x: An 'mcmc.list' >object with more than one chain, and with starting values that are >overdispersed with respect to the posterior distribution. How do I create >mcmc object from these matrices? I need to create an mcmc object with the >two chains I have stored in 'samp' and 'samp2'. >Thanks for your help and attention.I think you're asking about functions from the coda package. The normal way to create an MCMC object is to read BUGS output, but it's probably not hard to create one from your own data. There's a function called "as.mcmc" which looks like it should do what you need; if not, you might want to write to the coda maintainer, Martyn Plummer < plummer@iarc.fr <mailto:plummer@iarc.fr> >. Duncan Murdoch -----Original Message----- From: Icabalceta, Jorge L. To: 'r-help@lists.R-project.org' <mailto:'r-help@lists.R-project.org'> Sent: 2/16/2004 5:05 PM Subject: [R] How do we obtain Posterior Predictive (Bayesian) P-values in R (a sking a second time) Dear Friends, According to Gelman et al (2003), "...Bayesian P-values are defined as the probability that the replicated data could be more extreme than the observed data, as measured by the test quantity p=pr[T(y_rep,tetha) >T(y,tetha)|y]..." where p=Bayesian P-value, T=test statistics, y_rep=data from replicated experiment, y=data from original experiment, tetha=the function of interest. My question is, How do I calculate p (the bayesian P-value) in R from the chain I obtained from the Gibbs sampler? I have a matrix 'samp' [10,000x86] where I stored the result of each of the 10,000 iterations of the 86 variables of interest. Something I want to add is that Gelman also states that "...in practice, we usually compute the posterior predictive distribution using simulation. If we already have L simulations from the posterior density of theta, we just draw one y_rep from the predictive distribution for each simulated theta; we now have L draws from the joint posterior distribution, p(y_rep,theta|y). The posterior predictive check is the compariosn between the realized test quantities, T(y, theta_L) ant the predictive test quantities, T(y_rep, theta_L). The estimated P-value is just the proportion of these L simulations for which the test quantity equals or exceeds its realized value; that is, for which T(y_rep,theta_L)>=T(y,theta_L)..." Does this means that the usual p-value applies? i.e., pv <- 1 - chosen_CDF(parameters)? Can anybody clarify that for me please? Thanks for your help. Jorge -----Original Message----- From: Davis, Sean (NIH/NHGRI) [mailto:sdavis2@mail.nih.gov] Sent: Monday, February 16, 2004 2:42 PM To: 'Icabalceta, Jorge L. ' Subject: RE: [R] How do we obtain Posterior Predictive (Bayesian) P-values in R (a sking a second time) If you have a matrix as you describe with rows corresponding to replicates and columns corresponding to variables, then you can get the p-value as: m <- rbind(orig,sim) p <- apply(m,2,function(x) {(sum(x[2:length(x)]>x[1])/(length(x)-1))}) where orig is the vector of "observed values" for the 86 variables and sim is the matrix of 10000 x 86. This will count the number of times that the simulated replicate exceeded the observed value and then divide by (in your specific case) 10000 to get the proportion of replicates in which the value was greater than the observed value (the p-value). I didn't test this (not in front of a computer with R installed), so try it and see if you get what you want (should be a vector of length 86 with values between 0 and 1.) Of course, you may need to alter the function to compute the correct p-value; as given, it is a one-sided test. Is this what you had in mind, or am I misinterpreting your email? Sean To: "Icabalceta, Jorge L." < Icabalceta_j@wlf.state.la.us <mailto:Icabalceta_j@wlf.state.la.us> > Subject: Re: [R] How do we obtain Posterior Predictive (Bayesian) P-values in R (a sking a second time) References: < FF01C406D3A336489C58B9D0AE8E8E3702A0B0C2@wlfnt1.wlf.state.la.us <mailto:FF01C406D3A336489C58B9D0AE8E8E3702A0B0C2@wlfnt1.wlf.state.la.us> > Content-Type: text/plain; charset=IBM850; format=flowed Content-Transfer-Encoding: 7bit Icabalceta, Jorge L. wrote:> Dear Friends, > According to Gelman et al (2003), "...Bayesian P-values are defined as > the probability that the replicated data could be more extreme than the > observed data, as measured by the test quantity p=pr[T(y_rep,tetha) >> T(y,tetha)|y]..." where p=Bayesian P-value, T=test statistics, y_rep=data > from replicated experiment, y=data from original experiment, tetha=the > function of interest. My question is, How do I calculate p (the bayesian > P-value) in R from the chain I obtained from the Gibbs sampler? I have a > matrix 'samp' [10,000x86] where I stored the result of each of the 10,000 > iterations of the 86 variables of interest. > Something I want to add is that Gelman also states that "...inpractice,> we usually compute the posterior predictive distribution using simulation. > If we already have L simulations from the posterior density of theta, we > just draw one y_rep from the predictive distribution for each simulated > theta; we now have L draws from the joint posterior distribution, > p(y_rep,theta|y). The posterior predictive check is the compariosn between > the realized test quantities, T(y, theta_L) ant the predictive test > quantities, T(y_rep, theta_L). The estimated P-value is just theproportion> of these L simulations for which the test quantity equals or exceeds its > realized value; that is, for which T(y_rep,theta_L)>=T(y,theta_L)..." > Does this means that the usual p-value applies? i.e., pv <- 1 - > chosen_CDF(parameters)?Or "pv <- chosen_CDF(parameters)" depending on which tail you're testing.> Can anybody clarify that for me please?Yes, it really is this simple, assuming your sampler is working well. If you have MCMC draws for theta in a vector Post, and you want to find the probability that theta > A, then you can use: PV <- length(Post>A)/length(Post) You can create confidence limits in a similar way, using quantiles. I'll leave that as an exercise for the reader. A couple of words of warning: 1. MCMC samplers won't visit the extremes of the distribution very often, so the approximation gets worse the further out into the tail you go. Smoothing the density might help, as long as you're not too far into the tail. 2. A lot of Bayesians (and others!) frown on significance testing. At worst, they are useless (because the null hypothesis is silly), at next worse you're just testing whether you have enough data to distinguish an effect from zero (because a priori the null hypothesis is merely unlikely to be true). Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 4 (Yliopistonkatu 5) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 23743 Mobile: +358 50 599 0540 Fax: +358-9-191 22 779 WWW: http://www.RNI.Helsinki.FI/~boh/ <http://www.RNI.Helsinki.FI/~boh/> Journal of Negative Results - EEB: http://www.jnr-eeb.org <http://www.jnr-eeb.org> [[alternative HTML version deleted]]