Tal Galili
2015-Oct-12 09:17 UTC
[R] Using MASS::boxcox for a single variable gives different results than the original paper
Hello all,
Given a set of observations, I would like to find lambda for a boxcox
transformation so to get a symmetric/normal result as possible.
I try to use MASS::boxcox, but get different results than when using the
formula from the original box-cox paper (link
<http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
I probably have made an error somewhere, but I can't figure out where.
Here is an example in which the lambda by MASS::boxcox is 0.42424, while by
the formula from the paper I get 0.40782.
# Toy data
################
set.seed(13241089)
x <- rnorm(1000, 10)
x2 <- x**2 # we want to transform x2 to something more normal
plot(density(x2))
# using MASS::boxcox
################
zpoints <- function(y) {
n <- length(y)
qnorm(ppoints(n))[order(order(y))]
}
mle <- function(BC) {
with(BC, x[which.max(y)])
}
a <- MASS::boxcox(x2 ~ zpoints(x2))
mle(a)
# lambda:
# 0.42424
# using formula from the paper
################
loglik_lambda <- function(l, y) {
GM <- exp(mean(log(y)))
if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) )
# if(l==0) x <- log(y) else x <- (y^l-1)/ (l )
sd(x)
}
fo <- function(l) loglik_lambda(l, y = x2)
V_fo <- Vectorize(fo)
V_fo(2)
curve(V_fo, -.5,1.5)
optimize(V_fo, c(-3,3))
# lambda:
# 0.40782
[[alternative HTML version deleted]]
Tal Galili
2015-Oct-12 13:32 UTC
[R] Using MASS::boxcox for a single variable gives different results than the original paper
After trying this with the function "estimateTransform" from {car}, it
returns values similar to my solution rather than the one from MASS::boxcox:
# Toy data
################
set.seed(13241089)
x <- rnorm(1000, 10)
x2 <- x**2 # we want to transform x2 to something more normal
# using MASS::boxcox
################
mle <- function(BC) {
with(BC, x[which.max(y)])
}
ONES <- rep(1, length(x2))
a <- MASS::boxcox(lm(x2 ~ ONES))
mle(a)
# lambda:
# 0.42424
# using estimateTransform from car
################
# Same result as the paper: !
library(car)
ONES <- rep(1, length(x2))
estimateTransform(X=data.frame(x = ONES), Y = x2)
# lambda:
# 0.40782
(just as with my own function in the previous email)
What am I missing?
----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------
On Mon, Oct 12, 2015 at 12:17 PM, Tal Galili <tal.galili at gmail.com>
wrote:
> Hello all,
>
> Given a set of observations, I would like to find lambda for a boxcox
> transformation so to get a symmetric/normal result as possible.
>
> I try to use MASS::boxcox, but get different results than when using the
> formula from the original box-cox paper (link
> <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
>
> I probably have made an error somewhere, but I can't figure out where.
>
> Here is an example in which the lambda by MASS::boxcox is 0.42424, while
> by the formula from the paper I get 0.40782.
>
>
>
>
>
>
>
>
> # Toy data
> ################
> set.seed(13241089)
> x <- rnorm(1000, 10)
> x2 <- x**2 # we want to transform x2 to something more normal
> plot(density(x2))
>
> # using MASS::boxcox
> ################
>
> zpoints <- function(y) {
> n <- length(y)
> qnorm(ppoints(n))[order(order(y))]
> }
> mle <- function(BC) {
> with(BC, x[which.max(y)])
> }
>
> a <- MASS::boxcox(x2 ~ zpoints(x2))
> mle(a)
> # lambda:
> # 0.42424
>
>
>
> # using formula from the paper
> ################
>
> loglik_lambda <- function(l, y) {
> GM <- exp(mean(log(y)))
> if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) )
> # if(l==0) x <- log(y) else x <- (y^l-1)/ (l )
> sd(x)
> }
> fo <- function(l) loglik_lambda(l, y = x2)
> V_fo <- Vectorize(fo)
> V_fo(2)
> curve(V_fo, -.5,1.5)
> optimize(V_fo, c(-3,3))
> # lambda:
> # 0.40782
>
>
>
>
>
>
>
>
>
>
>
>
>
[[alternative HTML version deleted]]
peter dalgaard
2015-Oct-12 13:43 UTC
[R] Using MASS::boxcox for a single variable gives different results than the original paper
Two things: A) x2 ~ zpoints(x2) is not right B) granularity:> a <- MASS::boxcox(x2 ~ 1,lambda=seq(-2,2,1e-5)) > mle(a)[1] 0.40783 -pd On 12 Oct 2015, at 11:17 , Tal Galili <tal.galili at gmail.com> wrote:> Hello all, > > Given a set of observations, I would like to find lambda for a boxcox > transformation so to get a symmetric/normal result as possible. > > I try to use MASS::boxcox, but get different results than when using the > formula from the original box-cox paper (link > <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>). > > I probably have made an error somewhere, but I can't figure out where. > > Here is an example in which the lambda by MASS::boxcox is 0.42424, while by > the formula from the paper I get 0.40782. > > > > > > > > > # Toy data > ################ > set.seed(13241089) > x <- rnorm(1000, 10) > x2 <- x**2 # we want to transform x2 to something more normal > plot(density(x2)) > > # using MASS::boxcox > ################ > > zpoints <- function(y) { > n <- length(y) > qnorm(ppoints(n))[order(order(y))] > } > mle <- function(BC) { > with(BC, x[which.max(y)]) > } > > a <- MASS::boxcox(x2 ~ zpoints(x2)) > mle(a) > # lambda: > # 0.42424 > > > > # using formula from the paper > ################ > > loglik_lambda <- function(l, y) { > GM <- exp(mean(log(y))) > if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) ) > # if(l==0) x <- log(y) else x <- (y^l-1)/ (l ) > sd(x) > } > fo <- function(l) loglik_lambda(l, y = x2) > V_fo <- Vectorize(fo) > V_fo(2) > curve(V_fo, -.5,1.5) > optimize(V_fo, c(-3,3)) > # lambda: > # 0.40782 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Fox, John
2015-Oct-12 13:45 UTC
[R] Using MASS::boxcox for a single variable gives different results than the original paper
Dear Tal, MASS:boxcox() evaluates the pseudo-log-likelihood at a pre-specified vector of values of the transformation parameter lambda. In your example,> head(a$x)[1] -2.000000 -1.959596 -1.919192 -1.878788 -1.838384 -1.797980 Which accounts, I think, for the small difference in the answer. I hope this helps, John ----------------------------- John Fox, Professor McMaster University Hamilton, Ontario Canada L8S 4M4 Web: socserv.mcmaster.ca/jfox> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tal Galili > Sent: October 12, 2015 9:32 AM > To: r-help at r-project.org > Subject: Re: [R] Using MASS::boxcox for a single variable gives different results > than the original paper > > After trying this with the function "estimateTransform" from {car}, it returns > values similar to my solution rather than the one from MASS::boxcox: > > > # Toy data > ################ > set.seed(13241089) > x <- rnorm(1000, 10) > x2 <- x**2 # we want to transform x2 to something more normal > > > > # using MASS::boxcox > ################ > > mle <- function(BC) { > with(BC, x[which.max(y)]) > } > > ONES <- rep(1, length(x2)) > a <- MASS::boxcox(lm(x2 ~ ONES)) > mle(a) > # lambda: > # 0.42424 > > > > # using estimateTransform from car > ################ > > # Same result as the paper: ! > library(car) > ONES <- rep(1, length(x2)) > estimateTransform(X=data.frame(x = ONES), Y = x2) # lambda: > # 0.40782 > > (just as with my own function in the previous email) > > > > What am I missing? > > > > > > > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > > On Mon, Oct 12, 2015 at 12:17 PM, Tal Galili <tal.galili at gmail.com> wrote: > > > Hello all, > > > > Given a set of observations, I would like to find lambda for a boxcox > > transformation so to get a symmetric/normal result as possible. > > > > I try to use MASS::boxcox, but get different results than when using > > the formula from the original box-cox paper (link > > <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>). > > > > I probably have made an error somewhere, but I can't figure out where. > > > > Here is an example in which the lambda by MASS::boxcox is 0.42424, > > while by the formula from the paper I get 0.40782. > > > > > > > > > > > > > > > > > > # Toy data > > ################ > > set.seed(13241089) > > x <- rnorm(1000, 10) > > x2 <- x**2 # we want to transform x2 to something more normal > > plot(density(x2)) > > > > # using MASS::boxcox > > ################ > > > > zpoints <- function(y) { > > n <- length(y) > > qnorm(ppoints(n))[order(order(y))] > > } > > mle <- function(BC) { > > with(BC, x[which.max(y)]) > > } > > > > a <- MASS::boxcox(x2 ~ zpoints(x2)) > > mle(a) > > # lambda: > > # 0.42424 > > > > > > > > # using formula from the paper > > ################ > > > > loglik_lambda <- function(l, y) { > > GM <- exp(mean(log(y))) > > if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) ) # if(l==0) > > x <- log(y) else x <- (y^l-1)/ (l ) > > sd(x) > > } > > fo <- function(l) loglik_lambda(l, y = x2) V_fo <- Vectorize(fo) > > V_fo(2) > > curve(V_fo, -.5,1.5) > > optimize(V_fo, c(-3,3)) > > # lambda: > > # 0.40782 > > > > > > > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.