Hello, I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. Warning message: In sqrt(diag(solve(res$hessian))) : NaNs produced However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? Thank you so much! Brittany [[alternative HTML version deleted]]
I suggest you consult a local statistician. You are (way) over your head statistically here, and statistical matters are off topic on this list. The brief answer to your question is: you are almost certainly producing nonsense. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Thu, Jul 16, 2015 at 4:35 PM, Brittany Demmitt <demmitba at gmail.com> wrote:> Hello, > > I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. > > Warning message: > In sqrt(diag(solve(res$hessian))) : NaNs produced > > However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? > > I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? > > Thank you so much! > > Brittany > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Brittany, On Thu, 16 Jul 2015 17:35:38 -0600 Brittany Demmitt <demmitba at gmail.com> wrote:> Hello, > > I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. > > Warning message: > In sqrt(diag(solve(res$hessian))) : NaNs produced > > However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? >No, it is not safe to ignore the warning, and the problem has nothing to do with non-positive values in the data -- when you say that there are no 0s in the data, I assume that you mean that the data values are all positive. The square-roots of the diagonal entries of the Hessian at the (pseudo-) ML estimates are the SEs of the estimated transformation parameters. If the Hessian can't be inverted, that usually implies that the maximum of the (pseudo-) likelihood isn't well defined. This isn't surprising when you're trying to transform as many as 40 variables at a time to multivariate normality. It's my general experience that people often throw their data into the Box-Cox black box and hope for the best without first examining the data, and, e.g., insuring a reasonable ratio of maximum/minimum values for each variable, checking for extreme outliers, etc. Of course, I don't know that you did that, and it's perfectly possible that you were careful.> I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? >Yes. It's very unlikely that both are right. If, e.g., the variables are multivariate normal within groups then their marginal distribution is a mixture of multivariate normals, which almost surely isn't itself normal. I hope this helps, John ------------------------------------------------ John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/> Thank you so much! > > Brittany > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you so much for the explanation. That was very helpful! :-) Thanks! Brittany> On Jul 16, 2015, at 6:16 PM, John Fox <jfox at mcmaster.ca> wrote: > > Dear Brittany, > > On Thu, 16 Jul 2015 17:35:38 -0600 > Brittany Demmitt <demmitba at gmail.com> wrote: >> Hello, >> >> I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. >> >> Warning message: >> In sqrt(diag(solve(res$hessian))) : NaNs produced >> >> However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? >> > > No, it is not safe to ignore the warning, and the problem has nothing to do with non-positive values in the data -- when you say that there are no 0s in the data, I assume that you mean that the data values are all positive. The square-roots of the diagonal entries of the Hessian at the (pseudo-) ML estimates are the SEs of the estimated transformation parameters. If the Hessian can't be inverted, that usually implies that the maximum of the (pseudo-) likelihood isn't well defined. > > This isn't surprising when you're trying to transform as many as 40 variables at a time to multivariate normality. It's my general experience that people often throw their data into the Box-Cox black box and hope for the best without first examining the data, and, e.g., insuring a reasonable ratio of maximum/minimum values for each variable, checking for extreme outliers, etc. Of course, I don't know that you did that, and it's perfectly possible that you were careful. > >> I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? >> > > Yes. It's very unlikely that both are right. If, e.g., the variables are multivariate normal within groups then their marginal distribution is a mixture of multivariate normals, which almost surely isn't itself normal. > > I hope this helps, > John > > ------------------------------------------------ > John Fox, Professor > McMaster University > Hamilton, Ontario, Canada > http://socserv.mcmaster.ca/jfox/ > > >> Thank you so much! >> >> Brittany >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >