thr3ads.net - R help - [R] Using MASS::boxcox for a single variable gives different results than the original paper [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Tal Galili

2015-Oct-12 09:17 UTC

[R] Using MASS::boxcox for a single variable gives different results than the original paper

Hello all,

Given a set of observations, I would like to find lambda for a boxcox
transformation so to get a symmetric/normal result as possible.

I try to use MASS::boxcox, but get different results than when using the
formula from the original box-cox paper (link
<http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).

I probably have made an error somewhere, but I can't figure out where.

Here is an example in which the lambda by MASS::boxcox is 0.42424, while by
the formula from the paper I get 0.40782.








# Toy data
################
set.seed(13241089)
x <- rnorm(1000, 10)
x2 <- x**2 # we want to transform x2 to something more normal
plot(density(x2))

# using MASS::boxcox
################

zpoints <- function(y) {
n <- length(y)
qnorm(ppoints(n))[order(order(y))]
}
mle <- function(BC) {
with(BC, x[which.max(y)])
}

a <- MASS::boxcox(x2 ~ zpoints(x2))
mle(a)
# lambda:
# 0.42424



# using formula from the paper
################

loglik_lambda <- function(l, y) {
GM <- exp(mean(log(y)))
if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) )
#  if(l==0) x <- log(y) else x <- (y^l-1)/ (l )
sd(x)
}
fo <- function(l) loglik_lambda(l, y = x2)
V_fo <- Vectorize(fo)
V_fo(2)
curve(V_fo, -.5,1.5)
optimize(V_fo, c(-3,3))
# lambda:
# 0.40782

	[[alternative HTML version deleted]]

Tal Galili

2015-Oct-12 13:32 UTC

head link

[R] Using MASS::boxcox for a single variable gives different results than the original paper

After trying this with the function "estimateTransform" from {car}, it
returns values similar to my solution rather than the one from MASS::boxcox:


# Toy data
################
set.seed(13241089)
x <- rnorm(1000, 10)
x2 <- x**2 # we want to transform x2 to something more normal



# using MASS::boxcox
################

mle <- function(BC) {
with(BC, x[which.max(y)])
}

ONES <- rep(1, length(x2))
a <- MASS::boxcox(lm(x2 ~ ONES))
mle(a)
# lambda:
# 0.42424



# using estimateTransform from car
################

# Same result as the paper: !
library(car)
ONES <- rep(1, length(x2))
estimateTransform(X=data.frame(x = ONES), Y = x2)
# lambda:
# 0.40782

(just as with my own function in the previous email)



What am I missing?









----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------


On Mon, Oct 12, 2015 at 12:17 PM, Tal Galili <tal.galili at gmail.com>
wrote:
> Hello all,
>
> Given a set of observations, I would like to find lambda for a boxcox
> transformation so to get a symmetric/normal result as possible.
>
> I try to use MASS::boxcox, but get different results than when using the
> formula from the original box-cox paper (link
> <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
>
> I probably have made an error somewhere, but I can't figure out where.
>
> Here is an example in which the lambda by MASS::boxcox is 0.42424, while
> by the formula from the paper I get 0.40782.
>
>
>
>
>
>
>
>
> # Toy data
> ################
> set.seed(13241089)
> x <- rnorm(1000, 10)
> x2 <- x**2 # we want to transform x2 to something more normal
> plot(density(x2))
>
> # using MASS::boxcox
> ################
>
> zpoints <- function(y) {
> n <- length(y)
> qnorm(ppoints(n))[order(order(y))]
> }
> mle <- function(BC) {
> with(BC, x[which.max(y)])
> }
>
> a <- MASS::boxcox(x2 ~ zpoints(x2))
> mle(a)
> # lambda:
> # 0.42424
>
>
>
> # using formula from the paper
> ################
>
> loglik_lambda <- function(l, y) {
> GM <- exp(mean(log(y)))
> if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) )
> #  if(l==0) x <- log(y) else x <- (y^l-1)/ (l )
> sd(x)
> }
> fo <- function(l) loglik_lambda(l, y = x2)
> V_fo <- Vectorize(fo)
> V_fo(2)
> curve(V_fo, -.5,1.5)
> optimize(V_fo, c(-3,3))
> # lambda:
> # 0.40782
>
>
>
>
>
>
>
>
>
>
>
>
>
	[[alternative HTML version deleted]]

peter dalgaard

2015-Oct-12 13:43 UTC

head link

[R] Using MASS::boxcox for a single variable gives different results than the original paper

Two things:

A) x2 ~ zpoints(x2) is not right
B) granularity:> a <- MASS::boxcox(x2 ~ 1,lambda=seq(-2,2,1e-5))
> mle(a)[1] 0.40783

-pd

On 12 Oct 2015, at 11:17 , Tal Galili <tal.galili at gmail.com> wrote:
> Hello all,
> 
> Given a set of observations, I would like to find lambda for a boxcox
> transformation so to get a symmetric/normal result as possible.
> 
> I try to use MASS::boxcox, but get different results than when using the
> formula from the original box-cox paper (link
> <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
> 
> I probably have made an error somewhere, but I can't figure out where.
> 
> Here is an example in which the lambda by MASS::boxcox is 0.42424, while by
> the formula from the paper I get 0.40782.
> 
> 
> 
> 
> 
> 
> 
> 
> # Toy data
> ################
> set.seed(13241089)
> x <- rnorm(1000, 10)
> x2 <- x**2 # we want to transform x2 to something more normal
> plot(density(x2))
> 
> # using MASS::boxcox
> ################
> 
> zpoints <- function(y) {
> n <- length(y)
> qnorm(ppoints(n))[order(order(y))]
> }
> mle <- function(BC) {
> with(BC, x[which.max(y)])
> }
> 
> a <- MASS::boxcox(x2 ~ zpoints(x2))
> mle(a)
> # lambda:
> # 0.42424
> 
> 
> 
> # using formula from the paper
> ################
> 
> loglik_lambda <- function(l, y) {
> GM <- exp(mean(log(y)))
> if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) )
> #  if(l==0) x <- log(y) else x <- (y^l-1)/ (l )
> sd(x)
> }
> fo <- function(l) loglik_lambda(l, y = x2)
> V_fo <- Vectorize(fo)
> V_fo(2)
> curve(V_fo, -.5,1.5)
> optimize(V_fo, c(-3,3))
> # lambda:
> # 0.40782
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Fox, John

2015-Oct-12 13:45 UTC

head link

[R] Using MASS::boxcox for a single variable gives different results than the original paper

Dear Tal,

MASS:boxcox() evaluates the pseudo-log-likelihood at a pre-specified vector of
values of the transformation parameter lambda. In your example,
> head(a$x)[1] -2.000000 -1.959596 -1.919192 -1.878788 -1.838384 -1.797980

Which accounts, I think, for the small difference in the answer.

I hope this helps,
 John

-----------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tal
Galili
> Sent: October 12, 2015 9:32 AM
> To: r-help at r-project.org
> Subject: Re: [R] Using MASS::boxcox for a single variable gives different
results
> than the original paper
> 
> After trying this with the function "estimateTransform" from
{car}, it returns
> values similar to my solution rather than the one from MASS::boxcox:
> 
> 
> # Toy data
> ################
> set.seed(13241089)
> x <- rnorm(1000, 10)
> x2 <- x**2 # we want to transform x2 to something more normal
> 
> 
> 
> # using MASS::boxcox
> ################
> 
> mle <- function(BC) {
> with(BC, x[which.max(y)])
> }
> 
> ONES <- rep(1, length(x2))
> a <- MASS::boxcox(lm(x2 ~ ONES))
> mle(a)
> # lambda:
> # 0.42424
> 
> 
> 
> # using estimateTransform from car
> ################
> 
> # Same result as the paper: !
> library(car)
> ONES <- rep(1, length(x2))
> estimateTransform(X=data.frame(x = ONES), Y = x2) # lambda:
> # 0.40782
> 
> (just as with my own function in the previous email)
> 
> 
> 
> What am I missing?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
----------------------------------------------------------------------------------------------
> 
> 
> On Mon, Oct 12, 2015 at 12:17 PM, Tal Galili <tal.galili at
gmail.com> wrote:
> 
> > Hello all,
> >
> > Given a set of observations, I would like to find lambda for a boxcox
> > transformation so to get a symmetric/normal result as possible.
> >
> > I try to use MASS::boxcox, but get different results than when using
> > the formula from the original box-cox paper (link
> >
<http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
> >
> > I probably have made an error somewhere, but I can't figure out
where.
> >
> > Here is an example in which the lambda by MASS::boxcox is 0.42424,
> > while by the formula from the paper I get 0.40782.
> >
> >
> >
> >
> >
> >
> >
> >
> > # Toy data
> > ################
> > set.seed(13241089)
> > x <- rnorm(1000, 10)
> > x2 <- x**2 # we want to transform x2 to something more normal
> > plot(density(x2))
> >
> > # using MASS::boxcox
> > ################
> >
> > zpoints <- function(y) {
> > n <- length(y)
> > qnorm(ppoints(n))[order(order(y))]
> > }
> > mle <- function(BC) {
> > with(BC, x[which.max(y)])
> > }
> >
> > a <- MASS::boxcox(x2 ~ zpoints(x2))
> > mle(a)
> > # lambda:
> > # 0.42424
> >
> >
> >
> > # using formula from the paper
> > ################
> >
> > loglik_lambda <- function(l, y) {
> > GM <- exp(mean(log(y)))
> > if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) ) # 
if(l==0)
> > x <- log(y) else x <- (y^l-1)/ (l )
> > sd(x)
> > }
> > fo <- function(l) loglik_lambda(l, y = x2) V_fo <- Vectorize(fo)
> > V_fo(2)
> > curve(V_fo, -.5,1.5)
> > optimize(V_fo, c(-3,3))
> > # lambda:
> > # 0.40782
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2015 - Using MASS::boxcox for a single variable gives different results than the original paper

[R] Using MASS::boxcox for a single variable gives different results than the original paper

[R] Using MASS::boxcox for a single variable gives different results than the original paper

[R] Using MASS::boxcox for a single variable gives different results than the original paper

[R] Using MASS::boxcox for a single variable gives different results than the original paper