Saji Ren
2010-Jan-11 05:09 UTC
[R] Problem about Box-Cox transformation (topic in html form)
Hi: Recently, I want to perform a transformation on my data to make it more normal, meanwhile the order statistics is unchanged. So I decided to use a box-cox transformation. below is the qq-plot of the original data http://n4.nabble.com/file/n1011015/start%2Bvalue%2Bproblem%2B02.jpeg Note that the min of my data is -1099, so I add a fix value 1200 to the original sample. I choose the "box.cox.powers" function in package 'car'. Here is the result:> box.cox.powers(na.exclude(c888.dl.ma080+1200))Box-Cox Transformation to Normality Est.Power Std.Err. Wald(Power=0) Wald(Power=1) 0.9526 0.0237 40.2638 -2.0036 L.R. test, power = 0: 2014.192 df = 1 p = 0 L.R. test, power = 1: 3.9807 df = 1 p = 0.046 Then I compared the result with original data, and it really confused me: http://n4.nabble.com/file/n1011015/start%2Bvalue%2Bproblem.jpeg The left is my original data sample, you can see that it is symetric and the mean is close to 0. It just that the spread is large (there are outliers). The right is the transformed data, and the distribution is obviously no normal. Can anyone explain that to me? Thank you in advanced. ----- ------------------------------------------------------------------ Saji Ren from Shanghai China GoldenHeart Investment Group ------------------------------------------------------------------ -- View this message in context: http://n4.nabble.com/Problem-about-Box-Cox-transformation-topic-in-html-form-tp1011015p1011015.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]
Dieter Menne
2010-Jan-11 09:49 UTC
[R] Problem about Box-Cox transformation (topic in html form)
Saji Ren wrote:> > > Recently, I want to perform a transformation on my data to make it more > normal, meanwhile the order statistics is unchanged. So I decided to use a > box-cox transformation. > ... > Then I compared the result with original data, and it really confused me: > http://n4.nabble.com/file/n1011015/start%2Bvalue%2Bproblem.jpeg > The left is my original data sample, you can see that it is symetric and > the mean is close to 0. It just that the spread is large (there are > outliers). > The right is the transformed data, and the distribution is obviously no > normal. > >This has nothing to do with box.cox.powers. summary(rnorm(100,0,200)^0.95) gives about 50 NA from the negative numbers which are not plotted in hist(), and only the positive are plotted, so you get the skewed distribution. As an aside, a transformation of 0.95 is rarely worth the pain of having to explain it to your audience later, even if it is "significant". Try library(fortunes) fortune(234) # Hi Peter, fortunes needs a "search" Please do not use HTML mail. Dieter -- View this message in context: http://n4.nabble.com/Problem-about-Box-Cox-transformation-topic-in-html-form-tp1011015p1011117.html Sent from the R help mailing list archive at Nabble.com.
John Fox
2010-Jan-11 13:56 UTC
[R] Problem about Box-Cox transformation (topic in html form)
Dear Saji Ren, Dieter Menne has already pointed out that you lost the negative values in the transformation. Another point is that since you selected the transformation based on the "started" data c888.dl.ma080 + 1200, then you should transform c888.dl.ma080 + 1200 and not c888.dl.ma080. But as Dieter also pointed out, the 0.95 power isn't going to change the distribution of the data much. As well, the problem here is that the distribution is more heavy-tailed than asymmetric, and a Box-Cox transformation isn't going to help. Regards, John -------------------------------- John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]On> Behalf Of Saji Ren > Sent: January-11-10 12:09 AM > To: r-help at r-project.org > Subject: [R] Problem about Box-Cox transformation (topic in html form) > > > Hi: > > Recently, I want to perform a transformation on my data to make it more > normal, meanwhile the order statistics is unchanged. So I decided to use a > box-cox transformation. > below is the qq-plot of the original data > http://n4.nabble.com/file/n1011015/start%2Bvalue%2Bproblem%2B02.jpeg > Note that the min of my data is -1099, so I add a fix value 1200 to the > original sample. > > I choose the "box.cox.powers" function in package 'car'. Here is theresult:> > box.cox.powers(na.exclude(c888.dl.ma080+1200)) > Box-Cox Transformation to Normality > > Est.Power Std.Err. Wald(Power=0) Wald(Power=1) > 0.9526 0.0237 40.2638 -2.0036 > > L.R. test, power = 0: 2014.192 df = 1 p = 0 > L.R. test, power = 1: 3.9807 df = 1 p = 0.046 > > Then I compared the result with original data, and it really confused me: > http://n4.nabble.com/file/n1011015/start%2Bvalue%2Bproblem.jpeg > The left is my original data sample, you can see that it is symetric andthe> mean is close to 0. It just that the spread is large (there are outliers). > The right is the transformed data, and the distribution is obviously no > normal. > > Can anyone explain that to me? > > Thank you in advanced. > > > ----- > ------------------------------------------------------------------ > Saji Ren > from Shanghai China > GoldenHeart Investment Group > ------------------------------------------------------------------ > -- > View this message in context: http://n4.nabble.com/Problem-about-Box-Cox- > transformation-topic-in-html-form-tp1011015p1011015.html > Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Saji Ren
2010-Jan-12 04:22 UTC
[R] Problem about Box-Cox transformation (topic in html form)
Thank you, now I understand. If I plot the distribution of c888.dl.ma080+1200, then i will get a normally looked histogram. ----- ------------------------------------------------------------------ Saji Ren from Shanghai China GoldenHeart Investment Group ------------------------------------------------------------------ -- View this message in context: http://n4.nabble.com/Problem-about-Box-Cox-transformation-topic-in-html-form-tp1011015p1011839.html Sent from the R help mailing list archive at Nabble.com.
Reasonably Related Threads
- How to perform a substitution in a loop?
- Confusion in 'quantile' and getting rolling estimation of sample quantiles
- Help with function "fitdistr" in "MASS"
- How to compute Rolling analysis of Standard Deviation using ZOO package?
- Problem about SARMA model forcasting