thr3ads.net - R devel - [Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2015-Mar-19 22:02 UTC

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

This is a Request For Comment, also BCCed to 390 package maintainers
of reverse dependencies of the Matrix package.

Most users and package authors working with our 'Matrix' package will
be using it for numerical computations, and so will be using
"dMatrix" (d : double precision) matrix objects  M,   and indirectly,
e.g., for
M >= c  will also use "lMatrix" (l: logical i.e.  TRUE/FALSE/NA).
All the following is  **not** affecting those numerical / logical
computations.

A few others will know that we also have "pattern" matrices (purely
binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix" etc,
all starting with "n" (from ``patter[n]``) which do play a prominent
role in the internal sparse matrix algorithms, notably of the
(underlying C code) CHOLMOD library in the so-called "symbolic"
cholesky decomposition and other such operations. Another reason you
may use them because they are equivalent to incidence matrices of
unweighted (directed or undirected) graphs.

Now, as the subject says, I'm bringing up the topic of what should
happen when these matrices appear in matrix multiplications.
Somewhat by design, but also partly by coincidence,  the *sparse*
pattern matrices multiplication in the Matrix package mostly builds on
the CHOLMOD library `cholmod_ssmult()` function which implements
"Boolean arithmetic" for them, instead of regular arithmetic:
 "+" is logical "or"
 "*" is  logical "and".
Once we map  TRUE <-> 1  and  FALSE <-> 0, the only difference
between
boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
boolean arithmetic, because  "TRUE | TRUE" is TRUE in original logic.

The drawback of using the boolean arithmetic here is the "clash" with
the usual numeric arithmetic, and arithmetic in R where logical is
coerced to integer (and that to "double") when certain numerical
functions/operations are used.

A more severe problem --- which I had not been aware of until
relatively recently -- is the fact that  the CHOLMD function
cholmod_ssdmult(A, B)
treats *both* A and B as "pattern" as soon as one of them is a
(sparse) pattern matrix.
And this is - I say - in clear contrast to what R users would expect:
If you multiply a numeric with a "kind of logical" matrix (a pattern
one), you will expect that the
TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
combined with a numeric matrix.
So we could say that in this case, the Matrix package behavior is
clearly bugous .... but still it has been the behavior for the last 10
years or so.

RFC 1: "Change 1":
I currently propose to change this behavior for the upcoming release
of Matrix (version 1.2-0),  though I have no idea if dependent
packages would partly fail their checks or otherwise have changed
behavior subsequently.
The change seems sensible, since I think if your package relied on
this behavior, it was inadvertent and accidental.
Still you may differ in your opinion about this change nr.1

RFC 2: "Change 2":
This change would be more radical, and something I would not plan for
the upcoming release of Matrix, but possibly for an update say one or
two months later or so:  It concerns the matrix products when *both*
matrices are pattern.  A situation where the boolean arithmetic may
really make sense and where indeed packages may have depended on the
current behavior  ("T + T  |--> T"). ... although that is currently
only used for *sparse* pattern matrices, not for dense ones.

Further, it may still seem surprising that matrix multiplication does
not behave numerically for a pair of such matrices, and by the
principle of "least surprise" we should provide the boolean arithmetic
matrix products in another way than  by the   standard  %*%,
crossprod()  and  tcrossprod() functions.
So one possibility could be to change the standard functions to behave
numerically,
and e.g., use   %&%  (replace the numeric "*" by a logical
"&")  and
crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
for the three  boolean arithmetic  version of matrix multiplications.

What do you think about this?   I'm particularly interested to hear
from authors and users of  packages such as 'arules'  which IIRC
explicitly work with sparse pattern matrices.

Thank you for your thoughts and creative ideas,
Martin Maechler, ETH Zurich

Trevor Hastie

2015-Mar-19 23:03 UTC

head link

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

Hi Martin

I got stung by this last week.
glmnet produces a coefficient matrix of class ?dgCMatrix?
If a predictor matrix was created using sparseMatrix as follows,
one gets unexpected results, as this simple example shows.
My fix was easy (I always convert the predictor matrix to class ?dgCMatrix? now)

Trevor
> y=Matrix(diag(4))
> y4 x 4 diagonal matrix of class "ddiMatrix"
     [,1] [,2] [,3] [,4]
[1,]    1    .    .    .
[2,]    .    1    .    .
[3,]    .    .    1    .
[4,]    .    .    .    1> z=sparseMatrix(1:4,1:4)
> z4 x 4 sparse Matrix of class "ngCMatrix"
            
[1,] | . . .
[2,] . | . .
[3,] . . | .
[4,] . . . |> beta=as(Matrix(1:4),"dgCMatrix")
> y%*%beta4 x 1 sparse Matrix of class "dgCMatrix"
      
[1,] 1
[2,] 2
[3,] 3
[4,] 4> z%*%beta4 x 1 sparse Matrix of class "ngCMatrix"
      
[1,] |
[2,] |
[3,] |
[4,] |> 
> On Mar 19, 2015, at 3:02 PM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
> 
> This is a Request For Comment, also BCCed to 390 package maintainers
> of reverse dependencies of the Matrix package.
> 
> Most users and package authors working with our 'Matrix' package
will
> be using it for numerical computations, and so will be using
> "dMatrix" (d : double precision) matrix objects  M,   and
indirectly, e.g., for
> M >= c  will also use "lMatrix" (l: logical i.e. 
TRUE/FALSE/NA).
> All the following is  **not** affecting those numerical / logical
> computations.
> 
> A few others will know that we also have "pattern" matrices
(purely
> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix"
etc,
> all starting with "n" (from ``patter[n]``) which do play a
prominent
> role in the internal sparse matrix algorithms, notably of the
> (underlying C code) CHOLMOD library in the so-called "symbolic"
> cholesky decomposition and other such operations. Another reason you
> may use them because they are equivalent to incidence matrices of
> unweighted (directed or undirected) graphs.
> 
> Now, as the subject says, I'm bringing up the topic of what should
> happen when these matrices appear in matrix multiplications.
> Somewhat by design, but also partly by coincidence,  the *sparse*
> pattern matrices multiplication in the Matrix package mostly builds on
> the CHOLMOD library `cholmod_ssmult()` function which implements
> "Boolean arithmetic" for them, instead of regular arithmetic:
> "+" is logical "or"
> "*" is  logical "and".
> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only difference
between
> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
> boolean arithmetic, because  "TRUE | TRUE" is TRUE in original
logic.
> 
> The drawback of using the boolean arithmetic here is the "clash"
with
> the usual numeric arithmetic, and arithmetic in R where logical is
> coerced to integer (and that to "double") when certain numerical
> functions/operations are used.
> 
> A more severe problem --- which I had not been aware of until
> relatively recently -- is the fact that  the CHOLMD function
> cholmod_ssdmult(A, B)
> treats *both* A and B as "pattern" as soon as one of them is a
> (sparse) pattern matrix.
> And this is - I say - in clear contrast to what R users would expect:
> If you multiply a numeric with a "kind of logical" matrix (a
pattern
> one), you will expect that the
> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
> combined with a numeric matrix.
> So we could say that in this case, the Matrix package behavior is
> clearly bugous .... but still it has been the behavior for the last 10
> years or so.
> 
> RFC 1: "Change 1":
> I currently propose to change this behavior for the upcoming release
> of Matrix (version 1.2-0),  though I have no idea if dependent
> packages would partly fail their checks or otherwise have changed
> behavior subsequently.
> The change seems sensible, since I think if your package relied on
> this behavior, it was inadvertent and accidental.
> Still you may differ in your opinion about this change nr.1
> 
> RFC 2: "Change 2":
> This change would be more radical, and something I would not plan for
> the upcoming release of Matrix, but possibly for an update say one or
> two months later or so:  It concerns the matrix products when *both*
> matrices are pattern.  A situation where the boolean arithmetic may
> really make sense and where indeed packages may have depended on the
> current behavior  ("T + T  |--> T"). ... although that is
currently
> only used for *sparse* pattern matrices, not for dense ones.
> 
> Further, it may still seem surprising that matrix multiplication does
> not behave numerically for a pair of such matrices, and by the
> principle of "least surprise" we should provide the boolean
arithmetic
> matrix products in another way than  by the   standard  %*%,
> crossprod()  and  tcrossprod() functions.
> So one possibility could be to change the standard functions to behave
> numerically,
> and e.g., use   %&%  (replace the numeric "*" by a logical
"&")  and
> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
> for the three  boolean arithmetic  version of matrix multiplications.
> 
> What do you think about this?   I'm particularly interested to hear
> from authors and users of  packages such as 'arules'  which IIRC
> explicitly work with sparse pattern matrices.
> 
> Thank you for your thoughts and creative ideas,
> Martin Maechler, ETH Zurich
 ----------------------------------------------------------------------------------------
  Trevor Hastie                                   hastie at stanford.edu
<mailto:hastie at stanford.edu>
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231                 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie
<http://www-stat.stanford.edu/~hastie>
   address: room 104, Department of Statistics, Sequoia Hall
           390 Serra Mall, Stanford University, CA 94305-4065  
 --------------------------------------------------------------------------------------




	[[alternative HTML version deleted]]

Michael Hahsler

2015-Mar-20 01:15 UTC

head link

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

Hi Martin,

package arules heavily relies on ngCMatrix and uses multiplication and 
addition for logical operations. I think it makes sense that in a mixed 
operation with one dgCMatrix and one ngCMatrix the ngCMatrix should be 
"promoted" to a dgCMatrix.

The current behavior of %*% and friends is in deed confusing:

 > m <- matrix(sample(c(0,1), 5*5, replace=TRUE), nrow=5)
 > x <- as(m, "dgCMatrix")
 > y <- as(m, "ngCMatrix")
 > x %*% y
5 x 5 sparse Matrix of class "ngCMatrix"

[1,] | | | . |
[2,] | | | . |
[3,] . . | | .
[4,] . . . | .
[5,] | | | | |

 > x %*% x
5 x 5 sparse Matrix of class "dgCMatrix"

[1,] 1 2 1 . 2
[2,] 1 3 1 . 3
[3,] . . 1 2 .
[4,] . . . 1 .
[5,] 1 2 2 1 2

We even explicitly coerce in our code ngCMatrix to dgCMatrix to avoid 
this behavior. I think all these operations probably should result 
consistently in a dgCMatrix.

I would love to see | and & for position-wise AND and OR for ngCMatrix.

Thanks,
-Michael

On 03/19/2015 05:02 PM, Martin Maechler wrote:> This is a Request For Comment, also BCCed to 390 package maintainers
> of reverse dependencies of the Matrix package.
>
> Most users and package authors working with our 'Matrix' package
will
> be using it for numerical computations, and so will be using
> "dMatrix" (d : double precision) matrix objects  M,   and
indirectly, e.g., for
> M >= c  will also use "lMatrix" (l: logical i.e. 
TRUE/FALSE/NA).
> All the following is  **not** affecting those numerical / logical
> computations.
>
> A few others will know that we also have "pattern" matrices
(purely
> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix"
etc,
> all starting with "n" (from ``patter[n]``) which do play a
prominent
> role in the internal sparse matrix algorithms, notably of the
> (underlying C code) CHOLMOD library in the so-called "symbolic"
> cholesky decomposition and other such operations. Another reason you
> may use them because they are equivalent to incidence matrices of
> unweighted (directed or undirected) graphs.
>
> Now, as the subject says, I'm bringing up the topic of what should
> happen when these matrices appear in matrix multiplications.
> Somewhat by design, but also partly by coincidence,  the *sparse*
> pattern matrices multiplication in the Matrix package mostly builds on
> the CHOLMOD library `cholmod_ssmult()` function which implements
> "Boolean arithmetic" for them, instead of regular arithmetic:
>   "+" is logical "or"
>   "*" is  logical "and".
> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only difference
between
> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
> boolean arithmetic, because  "TRUE | TRUE" is TRUE in original
logic.
>
> The drawback of using the boolean arithmetic here is the "clash"
with
> the usual numeric arithmetic, and arithmetic in R where logical is
> coerced to integer (and that to "double") when certain numerical
> functions/operations are used.
>
> A more severe problem --- which I had not been aware of until
> relatively recently -- is the fact that  the CHOLMD function
> cholmod_ssdmult(A, B)
> treats *both* A and B as "pattern" as soon as one of them is a
> (sparse) pattern matrix.
> And this is - I say - in clear contrast to what R users would expect:
> If you multiply a numeric with a "kind of logical" matrix (a
pattern
> one), you will expect that the
> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
> combined with a numeric matrix.
> So we could say that in this case, the Matrix package behavior is
> clearly bugous .... but still it has been the behavior for the last 10
> years or so.
>
> RFC 1: "Change 1":
> I currently propose to change this behavior for the upcoming release
> of Matrix (version 1.2-0),  though I have no idea if dependent
> packages would partly fail their checks or otherwise have changed
> behavior subsequently.
> The change seems sensible, since I think if your package relied on
> this behavior, it was inadvertent and accidental.
> Still you may differ in your opinion about this change nr.1
>
> RFC 2: "Change 2":
> This change would be more radical, and something I would not plan for
> the upcoming release of Matrix, but possibly for an update say one or
> two months later or so:  It concerns the matrix products when *both*
> matrices are pattern.  A situation where the boolean arithmetic may
> really make sense and where indeed packages may have depended on the
> current behavior  ("T + T  |--> T"). ... although that is
currently
> only used for *sparse* pattern matrices, not for dense ones.
>
> Further, it may still seem surprising that matrix multiplication does
> not behave numerically for a pair of such matrices, and by the
> principle of "least surprise" we should provide the boolean
arithmetic
> matrix products in another way than  by the   standard  %*%,
> crossprod()  and  tcrossprod() functions.
> So one possibility could be to change the standard functions to behave
> numerically,
> and e.g., use   %&%  (replace the numeric "*" by a logical
"&")  and
> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
> for the three  boolean arithmetic  version of matrix multiplications.
>
> What do you think about this?   I'm particularly interested to hear
> from authors and users of  packages such as 'arules'  which IIRC
> explicitly work with sparse pattern matrices.
>
> Thank you for your thoughts and creative ideas,
> Martin Maechler, ETH Zurich
>
-- 
   Michael Hahsler, Assistant Professor
   Department of Engineering Management, Information, and Systems
   Department of Computer Science and Engineering (by courtesy)
   Bobby B. Lyle School of Engineering
   Southern Methodist University, Dallas, Texas

   office: Caruth Hall, suite 337, room 311
   email:  mhahsler at lyle.smu.edu
   web:    http://lyle.smu.edu/~mhahsler

Heather Turner

2015-Mar-20 08:42 UTC

head link

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

We don't use the pattern matrices, nevertheless the proposed changes
sound good to me. I particularly like the suggestion to treat the
matrices as numeric by default, but provide simple ways to use boolean
arithmetic instead - this means that developers have access to both
forms of arithmetic and it will be more obvious from the code which
arithmetic is being used.

Best wishes,

Heather

On Thu, Mar 19, 2015, at 10:02 PM, Martin Maechler
wrote:> This is a Request For Comment, also BCCed to 390 package maintainers
> of reverse dependencies of the Matrix package.
> 
> Most users and package authors working with our 'Matrix' package
will
> be using it for numerical computations, and so will be using
> "dMatrix" (d : double precision) matrix objects  M,   and
indirectly,
> e.g., for
> M >= c  will also use "lMatrix" (l: logical i.e. 
TRUE/FALSE/NA).
> All the following is  **not** affecting those numerical / logical
> computations.
> 
> A few others will know that we also have "pattern" matrices
(purely
> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix"
etc,
> all starting with "n" (from ``patter[n]``) which do play a
prominent
> role in the internal sparse matrix algorithms, notably of the
> (underlying C code) CHOLMOD library in the so-called "symbolic"
> cholesky decomposition and other such operations. Another reason you
> may use them because they are equivalent to incidence matrices of
> unweighted (directed or undirected) graphs.
> 
> Now, as the subject says, I'm bringing up the topic of what should
> happen when these matrices appear in matrix multiplications.
> Somewhat by design, but also partly by coincidence,  the *sparse*
> pattern matrices multiplication in the Matrix package mostly builds on
> the CHOLMOD library `cholmod_ssmult()` function which implements
> "Boolean arithmetic" for them, instead of regular arithmetic:
>  "+" is logical "or"
>  "*" is  logical "and".
> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only difference
between
> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
> boolean arithmetic, because  "TRUE | TRUE" is TRUE in original
logic.
> 
> The drawback of using the boolean arithmetic here is the "clash"
with
> the usual numeric arithmetic, and arithmetic in R where logical is
> coerced to integer (and that to "double") when certain numerical
> functions/operations are used.
> 
> A more severe problem --- which I had not been aware of until
> relatively recently -- is the fact that  the CHOLMD function
> cholmod_ssdmult(A, B)
> treats *both* A and B as "pattern" as soon as one of them is a
> (sparse) pattern matrix.
> And this is - I say - in clear contrast to what R users would expect:
> If you multiply a numeric with a "kind of logical" matrix (a
pattern
> one), you will expect that the
> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
> combined with a numeric matrix.
> So we could say that in this case, the Matrix package behavior is
> clearly bugous .... but still it has been the behavior for the last 10
> years or so.
> 
> RFC 1: "Change 1":
> I currently propose to change this behavior for the upcoming release
> of Matrix (version 1.2-0),  though I have no idea if dependent
> packages would partly fail their checks or otherwise have changed
> behavior subsequently.
> The change seems sensible, since I think if your package relied on
> this behavior, it was inadvertent and accidental.
> Still you may differ in your opinion about this change nr.1
> 
> RFC 2: "Change 2":
> This change would be more radical, and something I would not plan for
> the upcoming release of Matrix, but possibly for an update say one or
> two months later or so:  It concerns the matrix products when *both*
> matrices are pattern.  A situation where the boolean arithmetic may
> really make sense and where indeed packages may have depended on the
> current behavior  ("T + T  |--> T"). ... although that is
currently
> only used for *sparse* pattern matrices, not for dense ones.
> 
> Further, it may still seem surprising that matrix multiplication does
> not behave numerically for a pair of such matrices, and by the
> principle of "least surprise" we should provide the boolean
arithmetic
> matrix products in another way than  by the   standard  %*%,
> crossprod()  and  tcrossprod() functions.
> So one possibility could be to change the standard functions to behave
> numerically,
> and e.g., use   %&%  (replace the numeric "*" by a logical
"&")  and
> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
> for the three  boolean arithmetic  version of matrix multiplications.
> 
> What do you think about this?   I'm particularly interested to hear
> from authors and users of  packages such as 'arules'  which IIRC
> explicitly work with sparse pattern matrices.
> 
> Thank you for your thoughts and creative ideas,
> Martin Maechler, ETH Zurich

Martin Maechler

2015-Mar-20 09:33 UTC

head link

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

>>>>> Trevor Hastie <hastie at stanford.edu>
>>>>>     on Thu, 19 Mar 2015 16:03:38 -0700 writes:
    > Hi Martin
    > I got stung by this last week.
    > glmnet produces a coefficient matrix of class ?dgCMatrix?
    > If a predictor matrix was created using sparseMatrix as follows,
    > one gets unexpected results, as this simple example shows.
    > My fix was easy (I always convert the predictor matrix to class
?dgCMatrix? now)

    > Trevor

    >> y=Matrix(diag(4))


Considerably faster  (for larger n):   

	  Diagonal(4)

if you want a sparse matrix directly, there are

    .sparseDiagonal() 
and .symDiagonal()  
function


    >> y
    > 4 x 4 diagonal matrix of class "ddiMatrix"
    > [,1] [,2] [,3] [,4]
    > [1,]    1    .    .    .
    > [2,]    .    1    .    .
    > [3,]    .    .    1    .
    > [4,]    .    .    .    1

there's no problem with 'y' which is a "diagonalMatrix"
and only
needs  O(n) storage  rather than  diag(n),
right ?

    >> z=sparseMatrix(1:4,1:4)
    >> z
    > 4 x 4 sparse Matrix of class "ngCMatrix"
            
    > [1,] | . . .
    > [2,] . | . .
    > [3,] . . | .
    > [4,] . . . |
    >> beta=as(Matrix(1:4),"dgCMatrix")
    >> y%*%beta
    > 4 x 1 sparse Matrix of class "dgCMatrix"
      
    > [1,] 1
    > [2,] 2
    > [3,] 3
    > [4,] 4
    >> z%*%beta
    > 4 x 1 sparse Matrix of class "ngCMatrix"
      
    > [1,] |
    > [2,] |
    > [3,] |
    > [4,] |
    >> 
Yes, the last one is what I consieder bogous.

Thank you, Trevor, for the feedback!
Martin


    >> On Mar 19, 2015, at 3:02 PM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
    >> 
    >> This is a Request For Comment, also BCCed to 390 package
maintainers
    >> of reverse dependencies of the Matrix package.
    >> 
    >> Most users and package authors working with our 'Matrix'
package will
    >> be using it for numerical computations, and so will be using
    >> "dMatrix" (d : double precision) matrix objects  M,   and
indirectly, e.g., for
    >> M >= c  will also use "lMatrix" (l: logical i.e. 
TRUE/FALSE/NA).
    >> All the following is  **not** affecting those numerical / logical
    >> computations.
    >> 
    >> A few others will know that we also have "pattern"
matrices (purely
    >> binary: TRUE/FALSE, no NA) notably sparse ones, those
"ngCMatrix" etc,
    >> all starting with "n" (from ``patter[n]``) which do play
a prominent
    >> role in the internal sparse matrix algorithms, notably of the
    >> (underlying C code) CHOLMOD library in the so-called
"symbolic"
    >> cholesky decomposition and other such operations. Another reason
you
    >> may use them because they are equivalent to incidence matrices of
    >> unweighted (directed or undirected) graphs.
    >> 
    >> Now, as the subject says, I'm bringing up the topic of what
should
    >> happen when these matrices appear in matrix multiplications.
    >> Somewhat by design, but also partly by coincidence,  the *sparse*
    >> pattern matrices multiplication in the Matrix package mostly builds
on
    >> the CHOLMOD library `cholmod_ssmult()` function which implements
    >> "Boolean arithmetic" for them, instead of regular
arithmetic:
    >> "+" is logical "or"
    >> "*" is  logical "and".
    >> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only
difference between
    >> boolean and regular arithmetic is that "1+1 = 1" in the
(mapped)
    >> boolean arithmetic, because  "TRUE | TRUE" is TRUE in
original logic.
    >> 
    >> The drawback of using the boolean arithmetic here is the
"clash" with
    >> the usual numeric arithmetic, and arithmetic in R where logical is
    >> coerced to integer (and that to "double") when certain
numerical
    >> functions/operations are used.
    >> 
    >> A more severe problem --- which I had not been aware of until
    >> relatively recently -- is the fact that  the CHOLMD function
    >> cholmod_ssdmult(A, B)
    >> treats *both* A and B as "pattern" as soon as one of them
is a
    >> (sparse) pattern matrix.
    >> And this is - I say - in clear contrast to what R users would
expect:
    >> If you multiply a numeric with a "kind of logical" matrix
(a pattern
    >> one), you will expect that the
    >> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
    >> combined with a numeric matrix.
    >> So we could say that in this case, the Matrix package behavior is
    >> clearly bugous .... but still it has been the behavior for the last
10
    >> years or so.
    >> 
    >> RFC 1: "Change 1":
    >> I currently propose to change this behavior for the upcoming
release
    >> of Matrix (version 1.2-0),  though I have no idea if dependent
    >> packages would partly fail their checks or otherwise have changed
    >> behavior subsequently.
    >> The change seems sensible, since I think if your package relied on
    >> this behavior, it was inadvertent and accidental.
    >> Still you may differ in your opinion about this change nr.1
    >> 
    >> RFC 2: "Change 2":
    >> This change would be more radical, and something I would not plan
for
    >> the upcoming release of Matrix, but possibly for an update say one
or
    >> two months later or so:  It concerns the matrix products when
*both*
    >> matrices are pattern.  A situation where the boolean arithmetic may
    >> really make sense and where indeed packages may have depended on
the
    >> current behavior  ("T + T  |--> T"). ... although that
is currently
    >> only used for *sparse* pattern matrices, not for dense ones.
    >> 
    >> Further, it may still seem surprising that matrix multiplication
does
    >> not behave numerically for a pair of such matrices, and by the
    >> principle of "least surprise" we should provide the
boolean arithmetic
    >> matrix products in another way than  by the   standard  %*%,
    >> crossprod()  and  tcrossprod() functions.
    >> So one possibility could be to change the standard functions to
behave
    >> numerically,
    >> and e.g., use   %&%  (replace the numeric "*" by a
logical "&")  and
    >> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
    >> for the three  boolean arithmetic  version of matrix
multiplications.
    >> 
    >> What do you think about this?   I'm particularly interested to
hear
    >> from authors and users of  packages such as 'arules'  which
IIRC
    >> explicitly work with sparse pattern matrices.
    >> 
    >> Thank you for your thoughts and creative ideas,
    >> Martin Maechler, ETH Zurich

    >
----------------------------------------------------------------------------------------
    > Trevor Hastie                                   hastie at stanford.edu
<mailto:hastie at stanford.edu>
    > Professor, Department of Statistics, Stanford University
    > Phone: (650) 725-2231                 Fax: (650) 725-8977  
    > URL: http://www.stanford.edu/~hastie
<http://www-stat.stanford.edu/~hastie>
    > address: room 104, Department of Statistics, Sequoia Hall
    > 390 Serra Mall, Stanford University, CA 94305-4065  
    >
--------------------------------------------------------------------------------------

Martin Maechler

2015-Mar-20 10:07 UTC

head link

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

>>>>> "MH" == Michael Hahsler <mhahsler at
lyle.smu.edu>
>>>>>     on Thu, 19 Mar 2015 20:15:37 -0500 writes:
    MH> Hi Martin,
    MH> package arules heavily relies on ngCMatrix and uses multiplication
and
    MH> addition for logical operations. I think it makes sense that in a
mixed
    MH> operation with one dgCMatrix and one ngCMatrix the ngCMatrix should
be
    MH> "promoted" to a dgCMatrix.

    MH> The current behavior of %*% and friends is in deed confusing:

    >> m <- matrix(sample(c(0,1), 5*5, replace=TRUE), nrow=5)
    >> x <- as(m, "dgCMatrix")
    >> y <- as(m, "ngCMatrix")
    >> x %*% y
    MH> 5 x 5 sparse Matrix of class "ngCMatrix"

    MH> [1,] | | | . |
    MH> [2,] | | | . |
    MH> [3,] . . | | .
    MH> [4,] . . . | .
    MH> [5,] | | | | |

    >> x %*% x
    MH> 5 x 5 sparse Matrix of class "dgCMatrix"

    MH> [1,] 1 2 1 . 2
    MH> [2,] 1 3 1 . 3
    MH> [3,] . . 1 2 .
    MH> [4,] . . . 1 .
    MH> [5,] 1 2 2 1 2

Indeed, that is not what one should expect.

    MH> We even explicitly coerce in our code ngCMatrix to dgCMatrix to avoid
    MH> this behavior. I think all these operations probably should result 
    MH> consistently in a dgCMatrix.

Eventually.   As I said, it *is* useful to work with boolean
arithmetic in some cases here, so I do want to provide that
.. hopefully entirely consistently as well in the future, but
longer term not via '%*%'

    MH> I would love to see | and & for position-wise AND and OR for
ngCMatrix.

Well, why don't you look? ;-)

These have worked for a long time already! (I checked a version
from 2008)

Thanks a lot, Michael, for your valuable feedback.
Martin

    MH> Thanks,
    MH> -Michael

    MH> On 03/19/2015 05:02 PM, Martin Maechler wrote:
    >> This is a Request For Comment, also BCCed to 390 package
maintainers
    >> of reverse dependencies of the Matrix package.
    >> 
    >> Most users and package authors working with our 'Matrix'
package will
    >> be using it for numerical computations, and so will be using
    >> "dMatrix" (d : double precision) matrix objects  M,   and
indirectly, e.g., for
    >> M >= c  will also use "lMatrix" (l: logical i.e. 
TRUE/FALSE/NA).
    >> All the following is  **not** affecting those numerical / logical
    >> computations.
    >> 
    >> A few others will know that we also have "pattern"
matrices (purely
    >> binary: TRUE/FALSE, no NA) notably sparse ones, those
"ngCMatrix" etc,
    >> all starting with "n" (from ``patter[n]``) which do play
a prominent
    >> role in the internal sparse matrix algorithms, notably of the
    >> (underlying C code) CHOLMOD library in the so-called
"symbolic"
    >> cholesky decomposition and other such operations. Another reason
you
    >> may use them because they are equivalent to incidence matrices of
    >> unweighted (directed or undirected) graphs.
    >> 
    >> Now, as the subject says, I'm bringing up the topic of what
should
    >> happen when these matrices appear in matrix multiplications.
    >> Somewhat by design, but also partly by coincidence,  the *sparse*
    >> pattern matrices multiplication in the Matrix package mostly builds
on
    >> the CHOLMOD library `cholmod_ssmult()` function which implements
    >> "Boolean arithmetic" for them, instead of regular
arithmetic:
    >> "+" is logical "or"
    >> "*" is  logical "and".
    >> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only
difference between
    >> boolean and regular arithmetic is that "1+1 = 1" in the
(mapped)
    >> boolean arithmetic, because  "TRUE | TRUE" is TRUE in
original logic.
    >> 
    >> The drawback of using the boolean arithmetic here is the
"clash" with
    >> the usual numeric arithmetic, and arithmetic in R where logical is
    >> coerced to integer (and that to "double") when certain
numerical
    >> functions/operations are used.
    >> 
    >> A more severe problem --- which I had not been aware of until
    >> relatively recently -- is the fact that  the CHOLMD function
    >> cholmod_ssdmult(A, B)
    >> treats *both* A and B as "pattern" as soon as one of them
is a
    >> (sparse) pattern matrix.
    >> And this is - I say - in clear contrast to what R users would
expect:
    >> If you multiply a numeric with a "kind of logical" matrix
(a pattern
    >> one), you will expect that the
    >> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
    >> combined with a numeric matrix.
    >> So we could say that in this case, the Matrix package behavior is
    >> clearly bugous .... but still it has been the behavior for the last
10
    >> years or so.
    >> 
    >> RFC 1: "Change 1":
    >> I currently propose to change this behavior for the upcoming
release
    >> of Matrix (version 1.2-0),  though I have no idea if dependent
    >> packages would partly fail their checks or otherwise have changed
    >> behavior subsequently.
    >> The change seems sensible, since I think if your package relied on
    >> this behavior, it was inadvertent and accidental.
    >> Still you may differ in your opinion about this change nr.1
    >> 
    >> RFC 2: "Change 2":
    >> This change would be more radical, and something I would not plan
for
    >> the upcoming release of Matrix, but possibly for an update say one
or
    >> two months later or so:  It concerns the matrix products when
*both*
    >> matrices are pattern.  A situation where the boolean arithmetic may
    >> really make sense and where indeed packages may have depended on
the
    >> current behavior  ("T + T  |--> T"). ... although that
is currently
    >> only used for *sparse* pattern matrices, not for dense ones.
    >> 
    >> Further, it may still seem surprising that matrix multiplication
does
    >> not behave numerically for a pair of such matrices, and by the
    >> principle of "least surprise" we should provide the
boolean arithmetic
    >> matrix products in another way than  by the   standard  %*%,
    >> crossprod()  and  tcrossprod() functions.
    >> So one possibility could be to change the standard functions to
behave
    >> numerically,
    >> and e.g., use   %&%  (replace the numeric "*" by a
logical "&")  and
    >> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
    >> for the three  boolean arithmetic  version of matrix
multiplications.
    >> 
    >> What do you think about this?   I'm particularly interested to
hear
    >> from authors and users of  packages such as 'arules'  which
IIRC
    >> explicitly work with sparse pattern matrices.
    >> 
    >> Thank you for your thoughts and creative ideas,
    >> Martin Maechler, ETH Zurich
    >> 

    MH> -- 
    MH> Michael Hahsler, Assistant Professor
    MH> Department of Engineering Management, Information, and Systems
    MH> Department of Computer Science and Engineering (by courtesy)
    MH> Bobby B. Lyle School of Engineering
    MH> Southern Methodist University, Dallas, Texas

    MH> office: Caruth Hall, suite 337, room 311
    MH> email:  mhahsler at lyle.smu.edu
    MH> web:    http://lyle.smu.edu/~mhahsler

Dr. Peter Ruckdeschel

2015-Mar-20 19:04 UTC

head link

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

Hi Martin,

many thanks to you and Doug for providing the Matrix package
in the first place, and, second, for taking us into this decision.

I have only some minor comments to make:

+ wherever there is a usual function call involved, using an
   argument "boolean" as you proposed seems perfect to me

+ default behaviour and default values in function arguments
   should, even if bugous, stick to the old behaviour for backward
   compatibility right now, but you might still want to change this
   after a long enough announcement period

+ when it comes to arithmetic symbols, something like %&%
   certainly is nice to have, but the inadvertent user (like me,
   probably) would not know of this, unless this is documented
   at a prominent place

+ although this is against the functional paradigm of R, I would
   --exceptionally-- opt for a global option to change the behaviour
   (a) in function argument defaults and (b), more importantly, in
    binary arithmetic operators like %*%, *, +
    --- this way everybody can have the Matrix flavour he likes
 

just my 2c,
best regards,
Peter


Am 19.03.2015 um 23:02 schrieb Martin Maechler:> This is a Request For Comment, also BCCed to 390 package maintainers
> of reverse dependencies of the Matrix package.
>
> Most users and package authors working with our 'Matrix' package
will
> be using it for numerical computations, and so will be using
> "dMatrix" (d : double precision) matrix objects  M,   and
indirectly, e.g., for
> M >= c  will also use "lMatrix" (l: logical i.e. 
TRUE/FALSE/NA).
> All the following is  **not** affecting those numerical / logical
> computations.
>
> A few others will know that we also have "pattern" matrices
(purely
> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix"
etc,
> all starting with "n" (from ``patter[n]``) which do play a
prominent
> role in the internal sparse matrix algorithms, notably of the
> (underlying C code) CHOLMOD library in the so-called "symbolic"
> cholesky decomposition and other such operations. Another reason you
> may use them because they are equivalent to incidence matrices of
> unweighted (directed or undirected) graphs.
>
> Now, as the subject says, I'm bringing up the topic of what should
> happen when these matrices appear in matrix multiplications.
> Somewhat by design, but also partly by coincidence,  the *sparse*
> pattern matrices multiplication in the Matrix package mostly builds on
> the CHOLMOD library `cholmod_ssmult()` function which implements
> "Boolean arithmetic" for them, instead of regular arithmetic:
>  "+" is logical "or"
>  "*" is  logical "and".
> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only difference
between
> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
> boolean arithmetic, because  "TRUE | TRUE" is TRUE in original
logic.
>
> The drawback of using the boolean arithmetic here is the "clash"
with
> the usual numeric arithmetic, and arithmetic in R where logical is
> coerced to integer (and that to "double") when certain numerical
> functions/operations are used.
>
> A more severe problem --- which I had not been aware of until
> relatively recently -- is the fact that  the CHOLMD function
> cholmod_ssdmult(A, B)
> treats *both* A and B as "pattern" as soon as one of them is a
> (sparse) pattern matrix.
> And this is - I say - in clear contrast to what R users would expect:
> If you multiply a numeric with a "kind of logical" matrix (a
pattern
> one), you will expect that the
> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
> combined with a numeric matrix.
> So we could say that in this case, the Matrix package behavior is
> clearly bugous .... but still it has been the behavior for the last 10
> years or so.
>
> RFC 1: "Change 1":
> I currently propose to change this behavior for the upcoming release
> of Matrix (version 1.2-0),  though I have no idea if dependent
> packages would partly fail their checks or otherwise have changed
> behavior subsequently.
> The change seems sensible, since I think if your package relied on
> this behavior, it was inadvertent and accidental.
> Still you may differ in your opinion about this change nr.1
>
> RFC 2: "Change 2":
> This change would be more radical, and something I would not plan for
> the upcoming release of Matrix, but possibly for an update say one or
> two months later or so:  It concerns the matrix products when *both*
> matrices are pattern.  A situation where the boolean arithmetic may
> really make sense and where indeed packages may have depended on the
> current behavior  ("T + T  |--> T"). ... although that is
currently
> only used for *sparse* pattern matrices, not for dense ones.
>
> Further, it may still seem surprising that matrix multiplication does
> not behave numerically for a pair of such matrices, and by the
> principle of "least surprise" we should provide the boolean
arithmetic
> matrix products in another way than  by the   standard  %*%,
> crossprod()  and  tcrossprod() functions.
> So one possibility could be to change the standard functions to behave
> numerically,
> and e.g., use   %&%  (replace the numeric "*" by a logical
"&")  and
> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
> for the three  boolean arithmetic  version of matrix multiplications.
>
> What do you think about this?   I'm particularly interested to hear
> from authors and users of  packages such as 'arules'  which IIRC
> explicitly work with sparse pattern matrices.
>
> Thank you for your thoughts and creative ideas,
> Martin Maechler, ETH Zurich

-- 
Dr. habil. Peter Ruckdeschel, Abteilung Finanzmathematik, F3.17
Fraunhofer ITWM, Fraunhofer Platz 1, 67663 Kaiserslautern
Telefon:  +49 631/31600-4699   Fax:  +49 631/31600-5699
E-Mail :  peter.ruckdeschel at itwm.fraunhofer.de
http://www.itwm.fraunhofer.de/abteilungen/finanzmathematik/mitarbeiterinnen/mitarbeiter/dr-peter-ruckdeschel.html

Seemingly Similar Threads

Search for more seemingly similar threads

R devel - Mar 2015 - RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

Seemingly Similar Threads