thr3ads.net - R devel - [R] package:Matrix handling of data with identical indices [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Thaden, John J

2006-Jul-09 03:45 UTC

[R] package:Matrix handling of data with identical indices

In the Matrix package v. 0.995-11 I see that the dgTMatrix
Class for compressed, sparse, triplet-form matrices handles
Identically indexed data instances by summing their values,
e.g., 

library(Matrix)
(Mt <- new("dgTMatrix", 
   i = as.integer(c(0,0,1,1,4)),
   j = as.integer(c(0,1,2,2,4)),
   x = as.double(1:5),
   Dim = as.integer(c(5,5))))
## 5 x 5 sparse Matrix of class "dgTMatrix"
## [1,] 1 2 . . .
## [2,] . . 7 . .    <--- 7 = 3 + 4.
## [3,] . . . . .
## [4,] . . . . .
## [5,] . . . . 5

# If instead I make a dgCMatrix-class matrix, the first
# instance is overwritten by the second, e.g.,

library(Matrix)
(Mc <- new("dgCMatrix", 
   i = as.integer(c(0,0,1,1,4)),
   p = as.integer(c(0,1,2,4,5)),
   x = as.double(1:5),
   Dim = as.integer(c(5,5))))
## 5 x 5 sparse Matrix of class "dgCMatrix"
##             
## [1,] 1 2 . .
## [2,] . . 4 .   <-- the datum '3' has been lost.
## [3,] . . . .
## [4,] . . . .
## [5,] . . . 5 

# If one arrives at the dgCMatrix via the dgTMatrix class,
# the summed value is of course preserved, e.g.,

(Mtc <- as(Mt, "dgCMatrix"))
## 5 x 5 sparse Matrix of class "dgCMatrix"
##               
## [1,] 1 2 . . .
## [2,] . . 7 . .
## [3,] . . . . .
## [4,] . . . . .
## [5,] . . . . 5

As there is nothing inherent in either compressed, sparse,
format that would prevent recognition and handling of
duplicated index pairs, I'm curious why the dgCMatrix
class doesn't also add x values in those instances?
I wonder also if others might benefit also by being able
to choose how these instances are handled, i.e.,
whether they are summed, averaged or overwritten?  

-John Thaden, Ph.D.
Research Assistant Professor of Geriatrics
University of Arkansas for Medical Sciences
Little Rock AR, USA


Confidentiality Notice: This e-mail message, including any a...{{dropped}}

Douglas Bates

2006-Jul-09 16:06 UTC

head link

[R] package:Matrix handling of data with identical indices

The apparent contradiction here is again a case of inadequate
documentation and checking.

Because the order of the triplets in the triplet form of a sparse
matrix is essentially random it is permissible to have repeated
indices.  As you have seen, the interpretation of repeated indices is
that the value at any index is the sum of the values in the triplets
corresponding to that index.

 It is not permissible to have repeated indices in the compressed
form.  In the compressed form there is a well-defined ordering of the
indices, first by columns then by row within column and the row
indices must be increasing within columns.

Your matrix Mc should be flagged as invalid.  Martin and I should
discuss whether we want to add such a test to the validity method.  It
is not difficult to add the test but there will be a penalty in that
it will slow down all operations on such matrices and I'm not sure if
we want to pay that price to catch a rather infrequently occuring
problem.

There is some documentation of the internal representations of these
formats in the directory $R_LIB/Matrix/doc/UFSparse/.  The User Guides
in that directory are taken directly from the various sparse matrix
packages that Tim Davis at the University of Florida has written.  He
has a book that is scheduled for publication this  September

Tim Davis (2006), Direct Methods for Sparse Linear Systems, SIAM,
Philadelphia, PA

I hope we will be able to refer to that book for details of the
representation and algorithms.

On 7/8/06, Thaden, John J <ThadenJohnJ at uams.edu>
wrote:> In the Matrix package v. 0.995-11 I see that the dgTMatrix
> Class for compressed, sparse, triplet-form matrices handles
> Identically indexed data instances by summing their values,
> e.g.,
>
> library(Matrix)
> (Mt <- new("dgTMatrix",
>    i = as.integer(c(0,0,1,1,4)),
>    j = as.integer(c(0,1,2,2,4)),
>    x = as.double(1:5),
>    Dim = as.integer(c(5,5))))
> ## 5 x 5 sparse Matrix of class "dgTMatrix"
> ## [1,] 1 2 . . .
> ## [2,] . . 7 . .    <--- 7 = 3 + 4.
> ## [3,] . . . . .
> ## [4,] . . . . .
> ## [5,] . . . . 5
>
> # If instead I make a dgCMatrix-class matrix, the first
> # instance is overwritten by the second, e.g.,
>
> library(Matrix)
> (Mc <- new("dgCMatrix",
>    i = as.integer(c(0,0,1,1,4)),
>    p = as.integer(c(0,1,2,4,5)),
>    x = as.double(1:5),
>    Dim = as.integer(c(5,5))))
> ## 5 x 5 sparse Matrix of class "dgCMatrix"
> ##
> ## [1,] 1 2 . .
> ## [2,] . . 4 .   <-- the datum '3' has been lost.
> ## [3,] . . . .
> ## [4,] . . . .
> ## [5,] . . . 5
>
> # If one arrives at the dgCMatrix via the dgTMatrix class,
> # the summed value is of course preserved, e.g.,
>
> (Mtc <- as(Mt, "dgCMatrix"))
> ## 5 x 5 sparse Matrix of class "dgCMatrix"
> ##
> ## [1,] 1 2 . . .
> ## [2,] . . 7 . .
> ## [3,] . . . . .
> ## [4,] . . . . .
> ## [5,] . . . . 5
>
> As there is nothing inherent in either compressed, sparse,
> format that would prevent recognition and handling of
> duplicated index pairs, I'm curious why the dgCMatrix
> class doesn't also add x values in those instances?
> I wonder also if others might benefit also by being able
> to choose how these instances are handled, i.e.,
> whether they are summed, averaged or overwritten?
>
> -John Thaden, Ph.D.
> Research Assistant Professor of Geriatrics
> University of Arkansas for Medical Sciences
> Little Rock AR, USA
>
>
> Confidentiality Notice: This e-mail message, including any attachments, is
for the sole use of the intended recipient(s) and may contain confidential and
privileged information.  Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original
message.
>
>

Seemingly Similar Threads

Search for more maybe matching threads

R devel - Jul 2006 - package:Matrix handling of data with identical indices

[R] package:Matrix handling of data with identical indices

[R] package:Matrix handling of data with identical indices

Seemingly Similar Threads