thr3ads.net - R devel - [Rd] prcomp with previously scaled data: predict with 'newdata' wrong [May 2012]

If this information is useful, please help other people find it:
Share via:

Jari Oksanen

2012-May-23 10:49 UTC

[Rd] prcomp with previously scaled data: predict with 'newdata' wrong

Hello folks,

it may be regarded as a user error to scale() your data prior to prcomp()
instead of using its 'scale.' argument. However, it is a user thing that
may happen and sounds a legitimate thing to do, but in that case predict() with
'newdata' can give wrong results:

x <- scale(USArrests)
sol <- prcomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] "Mean relative difference: 0.9033485"

Predicting with the same data gives different results than the original PCA of
the data.

The reason of this behaviour seems to be in these first lines of
stats:::prcomp.default():

    x <- scale(x, center = center, scale = scale.)
    cen <- attr(x, "scaled:center")
    sc <- attr(x, "scaled:scale")

If input data 'x' have 'scaled:scale' attribute, it will be
retained if scale() is called with argument "scale = FALSE" like is
the case with default options in prcomp(). So scale(scale(x, scale = TRUE),
scale = FALSE) will have the 'scaled:center' of the outer scale() (i.e,
numerical zero), but the 'scaled:scale' of the inner scale().

Function princomp  finds the 'scale' directly instead of looking at the
attributes of the input data, and works like expected:

 sol <- princomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] TRUE

I don't have any nifty solution to this -- only checking the
'scale.' attribute and acting accordingly:

sc <- if (scale.) attr(x, "scaled:scale") else FALSE

Cheers, Jari Oksanen

Jari Oksanen

2012-May-23 11:02 UTC

head link

[Rd] prcomp with previously scaled data: predict with 'newdata' wrong

To fix myself: the stupid solution I suggested won't work as
'scale.' need not be TRUE or FALSE, but it can be a vector of scales.
The following looks like being able to handle this, but is not transparent nor
elegant:

sc <- if (isTRUE(scale.)) attr(x, "scaled:scale") else scale.

I trust you find an elegant solution (if you think this is worth fixing).

Cheers, Jari Oksanen

PS. Sorry for the top posting: cannot help with the email system I have in my
work desktop.
________________________________________
From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] on
behalf of Jari Oksanen [jari.oksanen at oulu.fi]
Sent: 23 May 2012 13:51
To: r-devel at stat.math.ethz.ch
Subject: [Rd] prcomp with previously scaled data: predict with 'newdata'
wrong

Hello folks,

it may be regarded as a user error to scale() your data prior to prcomp()
instead of using its 'scale.' argument. However, it is a user thing that
may happen and sounds a legitimate thing to do, but in that case predict() with
'newdata' can give wrong results:

x <- scale(USArrests)
sol <- prcomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] "Mean relative difference: 0.9033485"

Predicting with the same data gives different results than the original PCA of
the data.

The reason of this behaviour seems to be in these first lines of
stats:::prcomp.default():

    x <- scale(x, center = center, scale = scale.)
    cen <- attr(x, "scaled:center")
    sc <- attr(x, "scaled:scale")

If input data 'x' have 'scaled:scale' attribute, it will be
retained if scale() is called with argument "scale = FALSE" like is
the case with default options in prcomp(). So scale(scale(x, scale = TRUE),
scale = FALSE) will have the 'scaled:center' of the outer scale() (i.e,
numerical zero), but the 'scaled:scale' of the inner scale().

Function princomp  finds the 'scale' directly instead of looking at the
attributes of the input data, and works like expected:

 sol <- princomp(x)
all.equal(predict(sol), predict(sol, newdata=x))
## [1] TRUE

I don't have any nifty solution to this -- only checking the
'scale.' attribute and acting accordingly:

sc <- if (scale.) attr(x, "scaled:scale") else FALSE

Cheers, Jari Oksanen


______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - May 2012 - prcomp with previously scaled data: predict with 'newdata' wrong

[Rd] prcomp with previously scaled data: predict with 'newdata' wrong

[Rd] prcomp with previously scaled data: predict with 'newdata' wrong

Apparently Analagous Threads