Bert I am surprised by your response. Statistics serves two purposes: estimation and hypothesis testing. Sometimes we are fortunate and theory, physiology, physics, or something else tell us what is the correct, or perhaps I should same most adequate model. Sometimes theory fails us and we wish to choose between two competing models. This is my case. The cell sizes may come from one normal distribution (theory 1) or two (theory 2). Choosing between the models will help us postulate about physiology. I want to use statistics to help me decide between the two competing models, and thus inform my understanding of physiology. It is true that statistics can't tell me which model is the "correct" or "true" model, but it should be able to help me select the more "adequate" or "appropriate" or "closer to he truth" model. In any event, I still don't know how to fit a single normal distribution and get a measure of fit e.g. log likelihood. John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Bert Gunter <bgunter.4567 at gmail.com> 09/22/15 4:48 PM >>>I'll be brief in my reply to you both, as this is off topic. So what? All this statistical stuff is irrelevant baloney(and of questionable accuracy, since based on asymptotics and strong assumptions, anyway) . The question of interest is whether a mixture fit better suits the context, which only the OP knows and which none of us can answer. I know that many will disagree with this -- maybe a few might agree -- but please send all replies, insults, praise, and learned discourse to me privately, as I have already occupied more space on the list than I should. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds <markleeds2 at gmail.com> wrote:> That's true but if he uses some AIC or BIC criterion that penalizes the > number of parameters, > then he might see something else ? This ( comparing mixtures to not mixtures > ) is not something I deal with so I'm just throwing it out there. > > > > > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >> Two normals will **always** be a better fit than one, as the latter >> must be a subset of the former (with identical parameters for both >> normals). >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> -- Clifford Stoll >> >> >> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin >> <JSorkin at grecc.umaryland.edu> wrote: >> > I have data that may be the mixture of two normal distributions (one >> > contained within the other) vs. a single normal. >> > I used normalmixEM to get estimates of parameters assuming two normals: >> > >> > >> > GLUT <- scale(na.omit(data[,"FCW_glut"])) >> > GLUT >> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE) >> > summary(mixmdl) >> > plot(mixmdl,which=2) >> > lines(density(data[,"GLUT"]), lty=2, lwd=2) >> > >> > >> > >> > >> > >> > summary of normalmixEM object: >> > comp 1 comp 2 >> > lambda 0.7035179 0.296482 >> > mu -0.0592302 0.140545 >> > sigma 1.1271620 0.536076 >> > loglik at estimate: -110.8037 >> > >> > >> > >> > I would like to see if the two normal distributions are a better fit >> > that one normal. I have two problems >> > (1) normalmixEM does not seem to what to fit a single normal (even if I >> > address the error message produced): >> > >> > >> >> mixmdl = normalmixEM(GLUT,k=1) >> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k >> > k, : >> > arbmean and arbvar cannot both be FALSE >> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE) >> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k >> > k, : >> > arbmean and arbvar cannot both be FALSE >> > >> > >> > >> > (2) Even if I had the loglik from a single normal, I am not sure how >> > many DFs to use when computing the -2LL ratio test. >> > >> > >> > Any suggestions for comparing the two-normal vs. one normal distribution >> > would be appreciated. >> > >> > >> > Thanks >> > John >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > John David Sorkin M.D., Ph.D. >> > Professor of Medicine >> > Chief, Biostatistics and Informatics >> > University of Maryland School of Medicine Division of Gerontology and >> > Geriatric Medicine >> > Baltimore VA Medical Center >> > 10 North Greene Street >> > GRECC (BT/18/GR) >> > Baltimore, MD 21201-1524 >> > (Phone) 410-605-7119410-605-7119 >> > (Fax) 410-605-7913 (Please call phone number above prior to faxing) >> > >> > >> > Confidentiality Statement: >> > This email message, including any attachments, is for ...{{dropped:12}} >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >Call Send SMS Call from mobile Add to Skype You'll need Skype CreditFree via Skype Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
On Tue, 22 Sep 2015, John Sorkin wrote:> > In any event, I still don't know how to fit a single normal distribution > and get a measure of fit e.g. log likelihood. >Gotta love R:> y <- rnorm(10) > logLik(glm(y~1))'log Lik.' -17.36071 (df=2) HTH, Chuck
Charles, I am not sure the answer to me question, given a dataset, how can one compare the fit of a model of the fits the data to a mixture of two normal distributions to the fit of a model that uses a single normal distribution, can be based on the glm model you suggest. I have used normalmixEM to fit the data to a mixture of two normal curves. The model estimates four (perhaps five) parameters: mu1, sd^2 1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing proportion may not need to be estimated, it may be determined once once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data to a model that contains only the mean, and estimates 2 parameters mu0 and sd0^2. I am not sure that your model and mine can be considered to be nested. If I am correct I can't compare the log likelihood values from the two models. I may be wrong. If I am, I should be able to perform a log likelihood test with 2 (or 3, I am not sure which) DFs. Are you suggesting the models are nested? If so, should I use 3 or 2 DFs? May thanks, John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> "Charles C. Berry" <ccberry at ucsd.edu> 09/22/15 6:23 PM >>>On Tue, 22 Sep 2015, John Sorkin wrote:> > In any event, I still don't know how to fit a single normal distribution > and get a measure of fit e.g. log likelihood. >Gotta love R:> y <- rnorm(10) > logLik(glm(y~1))'log Lik.' -17.36071 (df=2) HTH, Chuck Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Hi John: For the log likelihood in the single case, you can just calculate it directly using the normal density, so the sum from i = 1 to n of f(x_i, uhat, sigmahat) where f(x_i, uhat, sigma hat) is the density of the normal with that mean and variance. so you can use dnorm with log = TRUE. Of course you need to estimate the parameters uhat and sigma hat first but for the single normal case, they are of course just the sample mean and sample variance Note though: If you going to calculate a log likelihood ratio, make sure you compare apples and apples and not apples and oranges in the sense that the loglikelihood that comes out of the mixture case may include constants such 1/radical(2pi) etc. So you need to know EXACTLY how the mixture algorithm is calculating it's log likelihood. In fact, it may be better and safer to just calculate the loglikelihood for the mixture yourself also so sum from i = 1 to n of [ lambda*f(x_i, mu1hat, sigma1hat) + (1-lambda)*f(x_i, mu2hat, sigma2hat) By calculating it yourself and being consistent, you then know that you will be calculating apples and applies. As I said earlier, another way is by comparing AICs. in that case, you calculate it in both cases and see which AIC is lower. Lower wins and it penalizes for number of parameters. There are asymptotics required in both the LRT approach and the AIC approach so you can pick your poison !!! :). On Tue, Sep 22, 2015 at 6:01 PM, John Sorkin <JSorkin at grecc.umaryland.edu> wrote:> Bert > I am surprised by your response. Statistics serves two purposes: > estimation and hypothesis testing. Sometimes we are fortunate and theory, > physiology, physics, or something else tell us what is the correct, or > perhaps I should same most adequate model. Sometimes theory fails us and we > wish to choose between two competing models. This is my case. The cell > sizes may come from one normal distribution (theory 1) or two (theory 2). > Choosing between the models will help us postulate about physiology. I want > to use statistics to help me decide between the two competing models, and > thus inform my understanding of physiology. It is true that statistics > can't tell me which model is the "correct" or "true" model, but it should > be able to help me select the more "adequate" or "appropriate" or "closer > to he truth" model. > > In any event, I still don't know how to fit a single normal distribution > and get a measure of fit e.g. log likelihood. > > John > > > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and > Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > >>> Bert Gunter <bgunter.4567 at gmail.com> 09/22/15 4:48 PM >>> > I'll be brief in my reply to you both, as this is off topic. > > So what? All this statistical stuff is irrelevant baloney(and of > questionable accuracy, since based on asymptotics and strong > assumptions, anyway) . The question of interest is whether a mixture > fit better suits the context, which only the OP knows and which none > of us can answer. > > I know that many will disagree with this -- maybe a few might agree -- > but please send all replies, insults, praise, and learned discourse to > me privately, as I have already occupied more space on the list than > I should. > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds <markleeds2 at gmail.com> wrote: > > That's true but if he uses some AIC or BIC criterion that penalizes the > > number of parameters, > > then he might see something else ? This ( comparing mixtures to not > mixtures > > ) is not something I deal with so I'm just throwing it out there. > > > > > > > > > > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> > wrote: > >> > >> Two normals will **always** be a better fit than one, as the latter > >> must be a subset of the former (with identical parameters for both > >> normals). > >> > >> Cheers, > >> Bert > >> > >> > >> Bert Gunter > >> > >> "Data is not information. Information is not knowledge. And knowledge > >> is certainly not wisdom." > >> -- Clifford Stoll > >> > >> > >> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin > >> <JSorkin at grecc.umaryland.edu> wrote: > >> > I have data that may be the mixture of two normal distributions (one > >> > contained within the other) vs. a single normal. > >> > I used normalmixEM to get estimates of parameters assuming two > normals: > >> > > >> > > >> > GLUT <- scale(na.omit(data[,"FCW_glut"])) > >> > GLUT > >> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE) > >> > summary(mixmdl) > >> > plot(mixmdl,which=2) > >> > lines(density(data[,"GLUT"]), lty=2, lwd=2) > >> > > >> > > >> > > >> > > >> > > >> > summary of normalmixEM object: > >> > comp 1 comp 2 > >> > lambda 0.7035179 0.296482 > >> > mu -0.0592302 0.140545 > >> > sigma 1.1271620 0.536076 > >> > loglik at estimate: -110.8037 > >> > > >> > > >> > > >> > I would like to see if the two normal distributions are a better fit > >> > that one normal. I have two problems > >> > (1) normalmixEM does not seem to what to fit a single normal (even if > I > >> > address the error message produced): > >> > > >> > > >> >> mixmdl = normalmixEM(GLUT,k=1) > >> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k > > >> > k, : > >> > arbmean and arbvar cannot both be FALSE > >> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE) > >> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k > > >> > k, : > >> > arbmean and arbvar cannot both be FALSE > >> > > >> > > >> > > >> > (2) Even if I had the loglik from a single normal, I am not sure how > >> > many DFs to use when computing the -2LL ratio test. > >> > > >> > > >> > Any suggestions for comparing the two-normal vs. one normal > distribution > >> > would be appreciated. > >> > > >> > > >> > Thanks > >> > John > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > John David Sorkin M.D., Ph.D. > >> > Professor of Medicine > >> > Chief, Biostatistics and Informatics > >> > University of Maryland School of Medicine Division of Gerontology and > >> > Geriatric Medicine > >> > Baltimore VA Medical Center > >> > 10 North Greene Street > >> > GRECC (BT/18/GR) > >> > Baltimore, MD 21201-1524 > >> > (Phone) 410-605-7119410-605-7119 > >> > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > >> > > >> > > >> > Confidentiality Statement: > >> > This email message, including any attachments, is for > ...{{dropped:12}} > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > Call > Send SMS > Call from mobile > Add to Skype > You'll need Skype CreditFree via Skype > > *Confidentiality Statement:* > > This email message, including any attachments, is for ...{{dropped:10}}
John: After I sent what I wrote, I read Rolf's intelligent response. I didn't realize that there are boundary issues so yes, he's correct and my approach is EL WRONGO. I feel very not good that I just sent that email being that it's totally wrong. My apologies for noise and thanks Rolf for the correct response. Oh, thing that does still hold in my response is the AIC approach unless Rolf tells us that it's not valid also. I don't see why it wouldn't be though because you're not doing a hypothesis test when you go the AIC route. On Wed, Sep 23, 2015 at 12:33 AM, Mark Leeds <markleeds2 at gmail.com> wrote:> Hi John: For the log likelihood in the single case, you can just > calculate it directly > using the normal density, so the sum from i = 1 to n of f(x_i, uhat, > sigmahat) > where f(x_i, uhat, sigma hat) is the density of the normal with that mean > and variance. > so you can use dnorm with log = TRUE. Of course you need to estimate the > parameters uhat and sigma hat first but for the single normal case, they > are of course just the sample mean and sample variance > > Note though: If you going to calculate a log likelihood ratio, make sure > you compare > apples and apples and not apples and oranges in the sense that the > loglikelihood > that comes out of the mixture case may include constants such > 1/radical(2pi) etc. > So you need to know EXACTLY how the mixture algorithm is calculating it's > log likelihood. > > In fact, it may be better and safer to just calculate the loglikelihood > for the mixture yourself also so sum from i = 1 to n of [ lambda*f(x_i, > mu1hat, sigma1hat) + (1-lambda)*f(x_i, mu2hat, sigma2hat) By calculating it > yourself and being consistent, you then know that you will be calculating > apples and applies. > > As I said earlier, another way is by comparing AICs. in that case, you > calculate it > in both cases and see which AIC is lower. Lower wins and it penalizes for > number of parameters. There are asymptotics required in both the LRT > approach and the AIC > approach so you can pick your poison !!! :). > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 22, 2015 at 6:01 PM, John Sorkin <JSorkin at grecc.umaryland.edu> > wrote: > >> Bert >> I am surprised by your response. Statistics serves two purposes: >> estimation and hypothesis testing. Sometimes we are fortunate and theory, >> physiology, physics, or something else tell us what is the correct, or >> perhaps I should same most adequate model. Sometimes theory fails us and we >> wish to choose between two competing models. This is my case. The cell >> sizes may come from one normal distribution (theory 1) or two (theory 2). >> Choosing between the models will help us postulate about physiology. I want >> to use statistics to help me decide between the two competing models, and >> thus inform my understanding of physiology. It is true that statistics >> can't tell me which model is the "correct" or "true" model, but it should >> be able to help me select the more "adequate" or "appropriate" or "closer >> to he truth" model. >> >> In any event, I still don't know how to fit a single normal distribution >> and get a measure of fit e.g. log likelihood. >> >> John >> >> >> John David Sorkin M.D., Ph.D. >> Professor of Medicine >> Chief, Biostatistics and Informatics >> University of Maryland School of Medicine Division of Gerontology and >> Geriatric Medicine >> Baltimore VA Medical Center >> 10 North Greene Street >> GRECC (BT/18/GR) >> Baltimore, MD 21201-1524 >> (Phone) 410-605-7119 >> (Fax) 410-605-7913 (Please call phone number above prior to faxing) >> >> >>> Bert Gunter <bgunter.4567 at gmail.com> 09/22/15 4:48 PM >>> >> I'll be brief in my reply to you both, as this is off topic. >> >> So what? All this statistical stuff is irrelevant baloney(and of >> questionable accuracy, since based on asymptotics and strong >> assumptions, anyway) . The question of interest is whether a mixture >> fit better suits the context, which only the OP knows and which none >> of us can answer. >> >> I know that many will disagree with this -- maybe a few might agree -- >> but please send all replies, insults, praise, and learned discourse to >> me privately, as I have already occupied more space on the list than >> I should. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> -- Clifford Stoll >> >> >> On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds <markleeds2 at gmail.com> wrote: >> > That's true but if he uses some AIC or BIC criterion that penalizes the >> > number of parameters, >> > then he might see something else ? This ( comparing mixtures to not >> mixtures >> > ) is not something I deal with so I'm just throwing it out there. >> > >> > >> > >> > >> > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> >> wrote: >> >> >> >> Two normals will **always** be a better fit than one, as the latter >> >> must be a subset of the former (with identical parameters for both >> >> normals). >> >> >> >> Cheers, >> >> Bert >> >> >> >> >> >> Bert Gunter >> >> >> >> "Data is not information. Information is not knowledge. And knowledge >> >> is certainly not wisdom." >> >> -- Clifford Stoll >> >> >> >> >> >> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin >> >> <JSorkin at grecc.umaryland.edu> wrote: >> >> > I have data that may be the mixture of two normal distributions (one >> >> > contained within the other) vs. a single normal. >> >> > I used normalmixEM to get estimates of parameters assuming two >> normals: >> >> > >> >> > >> >> > GLUT <- scale(na.omit(data[,"FCW_glut"])) >> >> > GLUT >> >> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE) >> >> > summary(mixmdl) >> >> > plot(mixmdl,which=2) >> >> > lines(density(data[,"GLUT"]), lty=2, lwd=2) >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > summary of normalmixEM object: >> >> > comp 1 comp 2 >> >> > lambda 0.7035179 0.296482 >> >> > mu -0.0592302 0.140545 >> >> > sigma 1.1271620 0.536076 >> >> > loglik at estimate: -110.8037 >> >> > >> >> > >> >> > >> >> > I would like to see if the two normal distributions are a better fit >> >> > that one normal. I have two problems >> >> > (1) normalmixEM does not seem to what to fit a single normal (even >> if I >> >> > address the error message produced): >> >> > >> >> > >> >> >> mixmdl = normalmixEM(GLUT,k=1) >> >> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, >> k >> >> > k, : >> >> > arbmean and arbvar cannot both be FALSE >> >> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE) >> >> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, >> k >> >> > k, : >> >> > arbmean and arbvar cannot both be FALSE >> >> > >> >> > >> >> > >> >> > (2) Even if I had the loglik from a single normal, I am not sure how >> >> > many DFs to use when computing the -2LL ratio test. >> >> > >> >> > >> >> > Any suggestions for comparing the two-normal vs. one normal >> distribution >> >> > would be appreciated. >> >> > >> >> > >> >> > Thanks >> >> > John >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > John David Sorkin M.D., Ph.D. >> >> > Professor of Medicine >> >> > Chief, Biostatistics and Informatics >> >> > University of Maryland School of Medicine Division of Gerontology and >> >> > Geriatric Medicine >> >> > Baltimore VA Medical Center >> >> > 10 North Greene Street >> >> > GRECC (BT/18/GR) >> >> > Baltimore, MD 21201-1524 >> >> > (Phone) 410-605-7119410-605-7119 >> >> > (Fax) 410-605-7913 (Please call phone number above prior to faxing) >> >> > >> >> > >> >> > Confidentiality Statement: >> >> > This email message, including any attachments, is for >> ...{{dropped:12}} >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > >> >> Call >> Send SMS >> Call from mobile >> Add to Skype >> You'll need Skype CreditFree via Skype >> >> *Confidentiality Statement:* >> >> This email message, including any attachments, is for the sole use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message. >> > >[[alternative HTML version deleted]]