thr3ads.net - R devel - [Rd] Why is the diag function so slow (for extraction)? [May 2015]

If this information is useful, please help other people find it:
Share via:

luke-tierney at uiowa.edu

2015-May-05 13:46 UTC

[Rd] Why is the diag function so slow (for extraction)?

Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
why the change was made many years ago, but this was before names were
handled explicitly. It would definitely be better to not force the
duplicate, at least in the case where we are sure c() and [ would not
dispatch.

Best,

luke

On Mon, 4 May 2015, peter dalgaard wrote:
>
>> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com>
wrote:
>>
>> But I'm still wondering why diag() uses c()...? With it being so
slow, I'd
>> be inclined to write a qdiag() without the c() and just use that the
next
>> time I need matrix algebra. Any insight would be appreciated; thanks!
>
> Well, there are two possibilities: Either it is deliberate or it isn't.
>
> The latter isn't too unlikely, given that the effect is seen for large
matrices. I would appear to be a matter of O(n) (picking out n items) vs. O(n^2)
(copying an n x n matrix), but this might drown out in a context involving
matrix multiplication and/or inversion, both of which are O(n^3).
>
> If it is deliberate, the question is why. There could be devils in the
details; notice in particular that c() strips off non-name attributes. However,
I'm not aware of a situation where such attributes could cause trouble.
>
> -pd
>
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Steve Bronder

2015-May-07 15:49 UTC

head link

[Rd] Why is the diag function so slow (for extraction)?

Is it possible to replace c() with .subset()? Example below

####
####

library(microbenchmark)


diag2 <- function(x,nrow, ncol){
  if (is.matrix(x)) {
    if (nargs() > 1L)
      stop("'nrow' or 'ncol' cannot be specified when
'x' is a matrix")
    if ((m <- min(dim(x))) == 0L)
      return(vector(typeof(x), 0L))
    # replace this part
    y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
    nms <- dimnames(x)
    if (is.list(nms) && !any(sapply(nms, is.null)) &&
identical((nm <-
nms[[1L]][seq_len(m)]),

nms[[2L]][seq_len(m)]))
      names(y) <- nm
    return(y)
  }
  if (is.array(x) && length(dim(x)) != 1L)
    stop("'x' is an array, but not one-dimensional.")
  if (missing(x))
    n <- nrow
  else if (length(x) == 1L && nargs() == 1L) {
    n <- as.integer(x)
    x <- 1
  }
  else n <- length(x)
  if (!missing(nrow))
    n <- nrow
  if (missing(ncol))
    ncol <- n
}

nc  <- 10

set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)


runoff <- microbenchmark(
  diaga = diag(m),
  diagb = diag2(m)
)

Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbronder at stevebronder.com


On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:
> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
> why the change was made many years ago, but this was before names were
> handled explicitly. It would definitely be better to not force the
> duplicate, at least in the case where we are sure c() and [ would not
> dispatch.
>
> Best,
>
> luke
>
> On Mon, 4 May 2015, peter dalgaard wrote:
>
>
>>  On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com>
wrote:
>>>
>>> But I'm still wondering why diag() uses c()...? With it being
so slow,
>>> I'd
>>> be inclined to write a qdiag() without the c() and just use that
the next
>>> time I need matrix algebra. Any insight would be appreciated;
thanks!
>>>
>>
>> Well, there are two possibilities: Either it is deliberate or it
isn't.
>>
>> The latter isn't too unlikely, given that the effect is seen for
large
>> matrices. I would appear to be a matter of O(n) (picking out n items)
vs.
>> O(n^2) (copying an n x n matrix), but this might drown out in a context
>> involving matrix multiplication and/or inversion, both of which are
O(n^3).
>>
>> If it is deliberate, the question is why. There could be devils in the
>> details; notice in particular that c() strips off non-name attributes.
>> However, I'm not aware of a situation where such attributes could
cause
>> trouble.
>>
>> -pd
>>
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Steve Bronder

2015-May-11 17:10 UTC

head link

[Rd] Why is the diag function so slow (for extraction)?

Sorry if this is a re-post, not sure if my original message got though. Is
it possible to replace c() with .subset()? Or would this throw errors
because of something specific in c() or as.matrix()? There's a pretty nice
speed up by replacing c() with the .subset().

Ex.
----
library(microbenchmark)


diag2 <- function(x,nrow, ncol){
  if (is.matrix(x)) {
    if (nargs() > 1L)
      stop("'nrow' or 'ncol' cannot be specified when
'x' is a matrix")
    if ((m <- min(dim(x))) == 0L)
      return(vector(typeof(x), 0L))
    # replace this part
    y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
    nms <- dimnames(x)
    if (is.list(nms) && !any(sapply(nms, is.null)) &&
identical((nm <-
nms[[1L]][seq_len(m)]),

nms[[2L]][seq_len(m)]))
      names(y) <- nm
    return(y)
  }
  if (is.array(x) && length(dim(x)) != 1L)
    stop("'x' is an array, but not one-dimensional.")
  if (missing(x))
    n <- nrow
  else if (length(x) == 1L && nargs() == 1L) {
    n <- as.integer(x)
    x <- 1
  }
  else n <- length(x)
  if (!missing(nrow))
    n <- nrow
  if (missing(ncol))
    ncol <- n
}

# Tests

nc  <- 1e04

set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)

runoff <- microbenchmark(
  diaga <- diag(m),
  diagb <- diag2(m)
)

runoff

Unit: microseconds
expr        min         lq        mean      median        uq        max
neval
diaga 429033.896 434186.694 512143.6728 503355.5865 572811.11 656035.584
100
diagb    216.112    251.445    536.8531    688.3595    706.98   2437.921
100


Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbronder at stevebronder.com


On Thu, May 7, 2015 at 11:49 AM, Steve Bronder <sbronder at
stevebronder.com>
wrote:
> Is it possible to replace c() with .subset()? Example below
>
> ####
> ####
>
> library(microbenchmark)
>
>
> diag2 <- function(x,nrow, ncol){
>   if (is.matrix(x)) {
>     if (nargs() > 1L)
>       stop("'nrow' or 'ncol' cannot be specified when
'x' is a matrix")
>     if ((m <- min(dim(x))) == 0L)
>       return(vector(typeof(x), 0L))
>     # replace this part
>     y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
>     nms <- dimnames(x)
>     if (is.list(nms) && !any(sapply(nms, is.null)) &&
identical((nm <-
> nms[[1L]][seq_len(m)]),
>
> nms[[2L]][seq_len(m)]))
>       names(y) <- nm
>     return(y)
>   }
>   if (is.array(x) && length(dim(x)) != 1L)
>     stop("'x' is an array, but not one-dimensional.")
>   if (missing(x))
>     n <- nrow
>   else if (length(x) == 1L && nargs() == 1L) {
>     n <- as.integer(x)
>     x <- 1
>   }
>   else n <- length(x)
>   if (!missing(nrow))
>     n <- nrow
>   if (missing(ncol))
>     ncol <- n
> }
>
> nc  <- 10
>
> set.seed(1)
> m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
>
>
> runoff <- microbenchmark(
>   diaga = diag(m),
>   diagb = diag2(m)
> )
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbronder at stevebronder.com
>
>
> On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:
>
>> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
>> why the change was made many years ago, but this was before names were
>> handled explicitly. It would definitely be better to not force the
>> duplicate, at least in the case where we are sure c() and [ would not
>> dispatch.
>>
>> Best,
>>
>> luke
>>
>> On Mon, 4 May 2015, peter dalgaard wrote:
>>
>>
>>>  On 04 May 2015, at 19:59 , franknarf <by.hook.or at
gmail.com> wrote:
>>>>
>>>> But I'm still wondering why diag() uses c()...? With it
being so slow,
>>>> I'd
>>>> be inclined to write a qdiag() without the c() and just use
that the
>>>> next
>>>> time I need matrix algebra. Any insight would be appreciated;
thanks!
>>>>
>>>
>>> Well, there are two possibilities: Either it is deliberate or it
isn't.
>>>
>>> The latter isn't too unlikely, given that the effect is seen
for large
>>> matrices. I would appear to be a matter of O(n) (picking out n
items) vs.
>>> O(n^2) (copying an n x n matrix), but this might drown out in a
context
>>> involving matrix multiplication and/or inversion, both of which are
O(n^3).
>>>
>>> If it is deliberate, the question is why. There could be devils in
the
>>> details; notice in particular that c() strips off non-name
attributes.
>>> However, I'm not aware of a situation where such attributes
could cause
>>> trouble.
>>>
>>> -pd
>>>
>>>
>>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
	[[alternative HTML version deleted]]

Martin Maechler

2015-May-12 12:33 UTC

head link

[Rd] Why is the diag function so slow (for extraction)?

>>>>> Steve Bronder <sbronder at stevebronder.com>
>>>>>     on Thu, 7 May 2015 11:49:49 -0400 writes:
    > Is it possible to replace c() with .subset()? 

It would be possible, but I think "entirely" wrong.

.subset() is documented to be an internal function not to be
used "lightly" and more to the point it is documented to *NOT*
dispatch at all.

If you read and understood what Peter and Luke wrote, you'd not
special case here:

diag() should not work only for pure matrices, but for all
"matrix-like" objects for which ``the usual methods'' work,
such
as
   as.vector(.), c(.)

That's why there has been the c(.) in there.

You can always make code faster if you write the code so it only
has to work in one special case .. and work there very fast.

    > Example below
    > ####
    > ####

    > library(microbenchmark)

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - May 2015 - Why is the diag function so slow (for extraction)?

[Rd] Why is the diag function so slow (for extraction)?

[Rd] Why is the diag function so slow (for extraction)?

[Rd] Why is the diag function so slow (for extraction)?

[Rd] Why is the diag function so slow (for extraction)?

Possibly Parallel Threads