luke-tierney at uiowa.edu
2015-May-05 13:46 UTC
[Rd] Why is the diag function so slow (for extraction)?
Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure why the change was made many years ago, but this was before names were handled explicitly. It would definitely be better to not force the duplicate, at least in the case where we are sure c() and [ would not dispatch. Best, luke On Mon, 4 May 2015, peter dalgaard wrote:> >> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com> wrote: >> >> But I'm still wondering why diag() uses c()...? With it being so slow, I'd >> be inclined to write a qdiag() without the c() and just use that the next >> time I need matrix algebra. Any insight would be appreciated; thanks! > > Well, there are two possibilities: Either it is deliberate or it isn't. > > The latter isn't too unlikely, given that the effect is seen for large matrices. I would appear to be a matter of O(n) (picking out n items) vs. O(n^2) (copying an n x n matrix), but this might drown out in a context involving matrix multiplication and/or inversion, both of which are O(n^3). > > If it is deliberate, the question is why. There could be devils in the details; notice in particular that c() strips off non-name attributes. However, I'm not aware of a situation where such attributes could cause trouble. > > -pd > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Steve Bronder
2015-May-07 15:49 UTC
[Rd] Why is the diag function so slow (for extraction)?
Is it possible to replace c() with .subset()? Example below
####
####
library(microbenchmark)
diag2 <- function(x,nrow, ncol){
if (is.matrix(x)) {
if (nargs() > 1L)
stop("'nrow' or 'ncol' cannot be specified when
'x' is a matrix")
if ((m <- min(dim(x))) == 0L)
return(vector(typeof(x), 0L))
# replace this part
y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
nms <- dimnames(x)
if (is.list(nms) && !any(sapply(nms, is.null)) &&
identical((nm <-
nms[[1L]][seq_len(m)]),
nms[[2L]][seq_len(m)]))
names(y) <- nm
return(y)
}
if (is.array(x) && length(dim(x)) != 1L)
stop("'x' is an array, but not one-dimensional.")
if (missing(x))
n <- nrow
else if (length(x) == 1L && nargs() == 1L) {
n <- as.integer(x)
x <- 1
}
else n <- length(x)
if (!missing(nrow))
n <- nrow
if (missing(ncol))
ncol <- n
}
nc <- 10
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
runoff <- microbenchmark(
diaga = diag(m),
diagb = diag2(m)
)
Regards,
Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbronder at stevebronder.com
On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:
> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
> why the change was made many years ago, but this was before names were
> handled explicitly. It would definitely be better to not force the
> duplicate, at least in the case where we are sure c() and [ would not
> dispatch.
>
> Best,
>
> luke
>
> On Mon, 4 May 2015, peter dalgaard wrote:
>
>
>> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com>
wrote:
>>>
>>> But I'm still wondering why diag() uses c()...? With it being
so slow,
>>> I'd
>>> be inclined to write a qdiag() without the c() and just use that
the next
>>> time I need matrix algebra. Any insight would be appreciated;
thanks!
>>>
>>
>> Well, there are two possibilities: Either it is deliberate or it
isn't.
>>
>> The latter isn't too unlikely, given that the effect is seen for
large
>> matrices. I would appear to be a matter of O(n) (picking out n items)
vs.
>> O(n^2) (copying an n x n matrix), but this might drown out in a context
>> involving matrix multiplication and/or inversion, both of which are
O(n^3).
>>
>> If it is deliberate, the question is why. There could be devils in the
>> details; notice in particular that c() strips off non-name attributes.
>> However, I'm not aware of a situation where such attributes could
cause
>> trouble.
>>
>> -pd
>>
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa Phone: 319-335-3386
> Department of Statistics and Fax: 319-335-3017
> Actuarial Science
> 241 Schaeffer Hall email: luke-tierney at uiowa.edu
> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
Steve Bronder
2015-May-11 17:10 UTC
[Rd] Why is the diag function so slow (for extraction)?
Sorry if this is a re-post, not sure if my original message got though. Is
it possible to replace c() with .subset()? Or would this throw errors
because of something specific in c() or as.matrix()? There's a pretty nice
speed up by replacing c() with the .subset().
Ex.
----
library(microbenchmark)
diag2 <- function(x,nrow, ncol){
if (is.matrix(x)) {
if (nargs() > 1L)
stop("'nrow' or 'ncol' cannot be specified when
'x' is a matrix")
if ((m <- min(dim(x))) == 0L)
return(vector(typeof(x), 0L))
# replace this part
y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
nms <- dimnames(x)
if (is.list(nms) && !any(sapply(nms, is.null)) &&
identical((nm <-
nms[[1L]][seq_len(m)]),
nms[[2L]][seq_len(m)]))
names(y) <- nm
return(y)
}
if (is.array(x) && length(dim(x)) != 1L)
stop("'x' is an array, but not one-dimensional.")
if (missing(x))
n <- nrow
else if (length(x) == 1L && nargs() == 1L) {
n <- as.integer(x)
x <- 1
}
else n <- length(x)
if (!missing(nrow))
n <- nrow
if (missing(ncol))
ncol <- n
}
# Tests
nc <- 1e04
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
runoff <- microbenchmark(
diaga <- diag(m),
diagb <- diag2(m)
)
runoff
Unit: microseconds
expr min lq mean median uq max
neval
diaga 429033.896 434186.694 512143.6728 503355.5865 572811.11 656035.584
100
diagb 216.112 251.445 536.8531 688.3595 706.98 2437.921
100
Regards,
Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbronder at stevebronder.com
On Thu, May 7, 2015 at 11:49 AM, Steve Bronder <sbronder at
stevebronder.com>
wrote:
> Is it possible to replace c() with .subset()? Example below
>
> ####
> ####
>
> library(microbenchmark)
>
>
> diag2 <- function(x,nrow, ncol){
> if (is.matrix(x)) {
> if (nargs() > 1L)
> stop("'nrow' or 'ncol' cannot be specified when
'x' is a matrix")
> if ((m <- min(dim(x))) == 0L)
> return(vector(typeof(x), 0L))
> # replace this part
> y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
> nms <- dimnames(x)
> if (is.list(nms) && !any(sapply(nms, is.null)) &&
identical((nm <-
> nms[[1L]][seq_len(m)]),
>
> nms[[2L]][seq_len(m)]))
> names(y) <- nm
> return(y)
> }
> if (is.array(x) && length(dim(x)) != 1L)
> stop("'x' is an array, but not one-dimensional.")
> if (missing(x))
> n <- nrow
> else if (length(x) == 1L && nargs() == 1L) {
> n <- as.integer(x)
> x <- 1
> }
> else n <- length(x)
> if (!missing(nrow))
> n <- nrow
> if (missing(ncol))
> ncol <- n
> }
>
> nc <- 10
>
> set.seed(1)
> m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
>
>
> runoff <- microbenchmark(
> diaga = diag(m),
> diagb = diag2(m)
> )
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbronder at stevebronder.com
>
>
> On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:
>
>> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
>> why the change was made many years ago, but this was before names were
>> handled explicitly. It would definitely be better to not force the
>> duplicate, at least in the case where we are sure c() and [ would not
>> dispatch.
>>
>> Best,
>>
>> luke
>>
>> On Mon, 4 May 2015, peter dalgaard wrote:
>>
>>
>>> On 04 May 2015, at 19:59 , franknarf <by.hook.or at
gmail.com> wrote:
>>>>
>>>> But I'm still wondering why diag() uses c()...? With it
being so slow,
>>>> I'd
>>>> be inclined to write a qdiag() without the c() and just use
that the
>>>> next
>>>> time I need matrix algebra. Any insight would be appreciated;
thanks!
>>>>
>>>
>>> Well, there are two possibilities: Either it is deliberate or it
isn't.
>>>
>>> The latter isn't too unlikely, given that the effect is seen
for large
>>> matrices. I would appear to be a matter of O(n) (picking out n
items) vs.
>>> O(n^2) (copying an n x n matrix), but this might drown out in a
context
>>> involving matrix multiplication and/or inversion, both of which are
O(n^3).
>>>
>>> If it is deliberate, the question is why. There could be devils in
the
>>> details; notice in particular that c() strips off non-name
attributes.
>>> However, I'm not aware of a situation where such attributes
could cause
>>> trouble.
>>>
>>> -pd
>>>
>>>
>>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa Phone: 319-335-3386
>> Department of Statistics and Fax: 319-335-3017
>> Actuarial Science
>> 241 Schaeffer Hall email: luke-tierney at uiowa.edu
>> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
[[alternative HTML version deleted]]
Martin Maechler
2015-May-12 12:33 UTC
[Rd] Why is the diag function so slow (for extraction)?
>>>>> Steve Bronder <sbronder at stevebronder.com> >>>>> on Thu, 7 May 2015 11:49:49 -0400 writes:> Is it possible to replace c() with .subset()? It would be possible, but I think "entirely" wrong. .subset() is documented to be an internal function not to be used "lightly" and more to the point it is documented to *NOT* dispatch at all. If you read and understood what Peter and Luke wrote, you'd not special case here: diag() should not work only for pure matrices, but for all "matrix-like" objects for which ``the usual methods'' work, such as as.vector(.), c(.) That's why there has been the c(.) in there. You can always make code faster if you write the code so it only has to work in one special case .. and work there very fast. > Example below > #### > #### > library(microbenchmark)