Michael Chirico
2018-Feb-27 12:18 UTC
[Rd] scale.default gives an incorrect error message when is.numeric() fails on a sparse row matrix (dgeMatrix)
I am attempting to use the lars package with a sparse input feature matrix, but the following fails: library(Matrix) library(lars) data(diabetes) attach(diabetes) x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') lars(x, y, intercept = FALSE) Error in scale.default(x, FALSE, normx) :> > length of 'scale' must equal the number of columns of 'x' > >More specifically, scale.default fails: normx = new( "dgeMatrix", x = c(1.00000000000004, 1, 1.00000000000009, 1.00000000000001, 1.00000000000001, 0.999999999999992, 1.00000000000004, 0.999999999999975, 1.00000000000006, 1.00000000000006), Dim = c(1L, 10L), Dimnames list(NULL, c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu")), factors = list() ) scale(x, FALSE, normx) The problem is that this check fails because is.numeric(normx) is FALSE: if (is.numeric(scale) && length(scale) == nc) So, the error message is misleading. In fact length(scale) is the same as nc. At a minimum, the error message needs to be repaired; do we also want to attempt as.numeric(normx) (which I believe would have allowed scale to work in this case)? (I'm aware that there's some import issues in lars, as the offending line to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*% (x^2)))) is TRUE -- it's simply that lars doesn't import the appropriate S4 methods) Michael Chirico [[alternative HTML version deleted]]
Martin Maechler
2018-Mar-01 17:52 UTC
[Rd] scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix
>>>>> Michael Chirico <michaelchirico4 at gmail.com> >>>>> on Tue, 27 Feb 2018 20:18:34 +0800 writes:Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* sparse) MM: modified to commented R code, slightly changed from your post: ## I am attempting to use the lars package with a sparse input feature matrix, ## but the following fails: library(Matrix) library(lars) data(diabetes) # from 'lars' ##UAagghh! not like this -- both attach() *and* as.data.frame() are horrific! ##UA attach(diabetes) ##UA x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') x <- as(unclass(diabetes$x), "dgCMatrix") lars(x, y, intercept = FALSE) ## Error in scale.default(x, FALSE, normx) : ## length of 'scale' must equal the number of columns of 'x' ## More specifically, scale.default fails as called from lars(): normx <- new("dgeMatrix", x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L), Dimnames = list(NULL, c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu"))) scale.default(x, center=FALSE, scale = normx) ## Error in scale.default(x, center = FALSE, scale = normx) : ## length of 'scale' must equal the number of columns of 'x'> The problem is that this check fails because is.numeric(normx) is FALSE:> if (is.numeric(scale) && length(scale) == nc)> So, the error message is misleading. In fact length(scale) is the same as > nc.Correct, twice.> At a minimum, the error message needs to be repaired; do we also want to > attempt as.numeric(normx) (which I believe would have allowed scale to work > in this case)?It seems sensible to allow both 'center' and 'scale' to only have to *obey* as.numeric(.) rather than fulfill is.numeric(.). Though that is not a bug in scale() as its help page has always said that 'center' and 'scale' should either be a logical value or a numeric vector. For that reason I can really claim a bug in 'lars' which should really not use scale(x, FALSE, normx) but rather scale(x, FALSE, scale = as.numeric(normx)) and then all would work.> -----------------> (I'm aware that there's some import issues in lars, as the offending line > to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*% > (x^2)))) is TRUE -- it's simply that lars doesn't import the appropriate S4 > methods)> Michael ChiricoYes, 'lars' has _not_ been updated since Spring 2013, notably because its authors have been saying (for rather more than 5 years I think) that one should really use require("glmnet") instead. Your point is still valid that it would be easy to enhance base :: scale.default() so it'd work in more cases. Thank you for that. I do plan to consider such a change in R-devel (planned to become R 3.5.0 in April). Martin Maechler, ETH Zurich
Michael Chirico
2018-Mar-02 00:27 UTC
[Rd] scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix
thanks. I know the setup code is a mess, just duct-taped something together from the examples in lars (which are a mess in turn). in fact when I messaged Prof. Hastie he recommended using glmnet. I wonder why lars is kept on CRAN if they've no intention of maintaining it... but I digress... On Mar 2, 2018 1:52 AM, "Martin Maechler" <maechler at stat.math.ethz.ch> wrote:> >>>>> Michael Chirico <michaelchirico4 at gmail.com> > >>>>> on Tue, 27 Feb 2018 20:18:34 +0800 writes: > > Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* > sparse) > > MM: modified to commented R code, slightly changed from your post: > > > ## I am attempting to use the lars package with a sparse input feature > matrix, > ## but the following fails: > > library(Matrix) > library(lars) > data(diabetes) # from 'lars' > ##UAagghh! not like this -- both attach() *and* as.data.frame() are > horrific! > ##UA attach(diabetes) > ##UA x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') > x <- as(unclass(diabetes$x), "dgCMatrix") > lars(x, y, intercept = FALSE) > ## Error in scale.default(x, FALSE, normx) : > ## length of 'scale' must equal the number of columns of 'x' > > ## More specifically, scale.default fails as called from lars(): > normx <- new("dgeMatrix", > x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L), > Dimnames = list(NULL, > c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", > "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu"))) > scale.default(x, center=FALSE, scale = normx) > ## Error in scale.default(x, center = FALSE, scale = normx) : > ## length of 'scale' must equal the number of columns of 'x' > > > The problem is that this check fails because is.numeric(normx) is FALSE: > > > if (is.numeric(scale) && length(scale) == nc) > > > So, the error message is misleading. In fact length(scale) is the same > as > > nc. > > Correct, twice. > > > At a minimum, the error message needs to be repaired; do we also want to > > attempt as.numeric(normx) (which I believe would have allowed scale to > work > > in this case)? > > It seems sensible to allow both 'center' and 'scale' to only > have to *obey* as.numeric(.) rather than fulfill is.numeric(.). > > Though that is not a bug in scale() as its help page has always > said that 'center' and 'scale' should either be a logical value > or a numeric vector. > > For that reason I can really claim a bug in 'lars' which should > really not use > > scale(x, FALSE, normx) > > but rather > > scale(x, FALSE, scale = as.numeric(normx)) > > and then all would work. > > > ----------------- > > > (I'm aware that there's some import issues in lars, as the offending > line > > to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) > %*% > > (x^2)))) is TRUE -- it's simply that lars doesn't import the > appropriate S4 > > methods) > > > Michael Chirico > > Yes, 'lars' has _not_ been updated since Spring 2013, notably > because its authors have been saying (for rather more than 5 > years I think) that one should really use > > require("glmnet") > > instead. > > Your point is still valid that it would be easy to enhance > base :: scale.default() so it'd work in more cases. > > Thank you for that. I do plan to consider such a change in > R-devel (planned to become R 3.5.0 in April). > > Martin Maechler, > ETH Zurich > > >[[alternative HTML version deleted]]
Maybe Matching Threads
- scale.default gives an incorrect error message when is.numeric() fails on a sparse row matrix (dgeMatrix)
- R 3.5.0 fails its regression test suite on Linux/x86_64
- R 3.5.0 fails its regression test suite on Linux/x86_64
- Matrix:::qr.qy and signature(qr = "sparseQR", y = "dgCMatrix")
- Inconsistency in handling of numeric input with %d by sprintf