thr3ads.net - R help - [R] Box-Cox Transformation: Drastic differences when varying added constants [May 2010]

If this information is useful, please help other people find it:
Share via:

Holger Steinmetz

2010-May-16 12:22 UTC

[R] Box-Cox Transformation: Drastic differences when varying added constants

Dear experts,

I tried to learn about Box-Cox-transformation but found the following thing:

When I had to add a constant to make all values of the original variable
positive, I found that 
the lambda estimates (box.cox.powers-function) differed dramatically
depending on the specific constant chosen.

In addition, the correlation between the transformed variable and the
original were not 1 (as I think it should be to use the transformed variable
meaningfully) but much lower.

With higher added values (and a right skewed variable) the lambda estimate
was even negative and the correlation between the transformed variable and
the original varible was -.91!!?

I guess that is something fundmental missing in my current thinking about
box-cox...

Best,
Holger


P.S. Here is what i did:

# Creating of a skewed variable X (mixture of two normals)
x1 = rnorm(120,0,.5)
x2 = rnorm(40,2.5,2)
X = c(x1,x2)

# Adding a small constant
Xnew1 = X +abs(min(X))+ .1
box.cox.powers(Xnew1)
Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)

# Adding a larger constant
Xnew2 = X +abs(min(X)) + 1
box.cox.powers(Xnew2)
Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)

#Plotting it all
par(mfrow=c(3,2))
hist(X)
qqnorm(X)
qqline(X,lty=2) 
hist(Xtrans1)
qqnorm(Xtrans1)        
qqline(Xtrans1,lty=2) 
hist(Xtrans2)
qqnorm(Xtrans2)        
qqline(Xtrans2,lty=2) 

#correlation among original and transformed variables
round(cor(cbind(X,Xtrans1,Xtrans2)),2)
-- 
View this message in context:
http://r.789695.n4.nabble.com/Box-Cox-Transformation-Drastic-differences-when-varying-added-constants-tp2218490p2218490.html
Sent from the R help mailing list archive at Nabble.com.

Peter Ehlers

2010-May-16 17:01 UTC

head link

[R] Box-Cox Transformation: Drastic differences when varying added constants

On 2010-05-16 6:22, Holger Steinmetz wrote:>
> Dear experts,
>
> I tried to learn about Box-Cox-transformation but found the following
thing:
>
> When I had to add a constant to make all values of the original variable
> positive, I found that
> the lambda estimates (box.cox.powers-function) differed dramatically
> depending on the specific constant chosen.
Let's say that x is such that 1/x has a Normal distribution,
i.e. lambda = -1.
Then y = (1/x) + b also has a Normal distribution.
But you're expecting 1/(x+b) to also have a Normal distribution.
>
> In addition, the correlation between the transformed variable and the
> original were not 1 (as I think it should be to use the transformed
variable
> meaningfully) but much lower.
Again, your expectation is faulty. The relationship between the
original and transformed variables is not linear (otherwise,
why do the transformation?), but cor() computes the Pearson
correlation coefficient by default. Try method='spearman'.
Better yet, plot the transformed variables vs the original
variable for further enlightenment.

  -Peter Ehlers
>
> With higher added values (and a right skewed variable) the lambda estimate
> was even negative and the correlation between the transformed variable and
> the original varible was -.91!!?
>
> I guess that is something fundmental missing in my current thinking about
> box-cox...
>
> Best,
> Holger
>
>
> P.S. Here is what i did:
>
> # Creating of a skewed variable X (mixture of two normals)
> x1 = rnorm(120,0,.5)
> x2 = rnorm(40,2.5,2)
> X = c(x1,x2)
>
> # Adding a small constant
> Xnew1 = X +abs(min(X))+ .1
> box.cox.powers(Xnew1)
> Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)
>
> # Adding a larger constant
> Xnew2 = X +abs(min(X)) + 1
> box.cox.powers(Xnew2)
> Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)
>
> #Plotting it all
> par(mfrow=c(3,2))
> hist(X)
> qqnorm(X)
> qqline(X,lty=2)
> hist(Xtrans1)
> qqnorm(Xtrans1)
> qqline(Xtrans1,lty=2)
> hist(Xtrans2)
> qqnorm(Xtrans2)
> qqline(Xtrans2,lty=2)
>
> #correlation among original and transformed variables
> round(cor(cbind(X,Xtrans1,Xtrans2)),2)
--

Greg Snow

2010-May-19 03:41 UTC

head link

[R] Box-Cox Transformation: Drastic differences when varying added constants

Have you read the BoxCox paper?  It has the theory in there for dealing with an
offset parameter (though I don't know of any existing functions that help in
estimating both lambdas at the same time).  Though another important point (in
the paper as well) is that the lambda values used should be based on sound
science, not just what fits best.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Holger Steinmetz
> Sent: Sunday, May 16, 2010 6:22 AM
> To: r-help at r-project.org
> Subject: [R] Box-Cox Transformation: Drastic differences when varying
> added constants
> 
> 
> Dear experts,
> 
> I tried to learn about Box-Cox-transformation but found the following
> thing:
> 
> When I had to add a constant to make all values of the original
> variable
> positive, I found that
> the lambda estimates (box.cox.powers-function) differed dramatically
> depending on the specific constant chosen.
> 
> In addition, the correlation between the transformed variable and the
> original were not 1 (as I think it should be to use the transformed
> variable
> meaningfully) but much lower.
> 
> With higher added values (and a right skewed variable) the lambda
> estimate
> was even negative and the correlation between the transformed variable
> and
> the original varible was -.91!!?
> 
> I guess that is something fundmental missing in my current thinking
> about
> box-cox...
> 
> Best,
> Holger
> 
> 
> P.S. Here is what i did:
> 
> # Creating of a skewed variable X (mixture of two normals)
> x1 = rnorm(120,0,.5)
> x2 = rnorm(40,2.5,2)
> X = c(x1,x2)
> 
> # Adding a small constant
> Xnew1 = X +abs(min(X))+ .1
> box.cox.powers(Xnew1)
> Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)
> 
> # Adding a larger constant
> Xnew2 = X +abs(min(X)) + 1
> box.cox.powers(Xnew2)
> Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)
> 
> #Plotting it all
> par(mfrow=c(3,2))
> hist(X)
> qqnorm(X)
> qqline(X,lty=2)
> hist(Xtrans1)
> qqnorm(Xtrans1)
> qqline(Xtrans1,lty=2)
> hist(Xtrans2)
> qqnorm(Xtrans2)
> qqline(Xtrans2,lty=2)
> 
> #correlation among original and transformed variables
> round(cor(cbind(X,Xtrans1,Xtrans2)),2)
> --
> View this message in context: http://r.789695.n4.nabble.com/Box-Cox-
> Transformation-Drastic-differences-when-varying-added-constants-
> tp2218490p2218490.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - May 2010 - Box-Cox Transformation: Drastic differences when varying added constants

[R] Box-Cox Transformation: Drastic differences when varying added constants

[R] Box-Cox Transformation: Drastic differences when varying added constants

[R] Box-Cox Transformation: Drastic differences when varying added constants

Seemingly Similar Threads