thr3ads.net - R help - [R] Why does matrix selection behave differently when using which? [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Asis Hallab

2012-Dec-17 19:22 UTC

[R] Why does matrix selection behave differently when using which?

Dear R community,

I have a medium sized matrix stored in variable "t" and a simple
function "
countRows" (see below) to count the number of rows in which a selected
column "C" matches a given value. If I count all rows matching all
pairwise
distinct values in the column "C" and sum these counts up, I get the
number
or rows of "t". If I delete the "which" calls from function
"countRows" the
resulting sum of matching row numbers is much greater than the number of
rows in "t".

The table "t" I use can be downloaded from here:
https://github.com/groupschoof/PhyloFun/archive/test_selector.zip
Unzip the file and read in the table "t" using t <-
read.table("test.tbl")

The above function "sumRows" is defined as follows:
sumRows <- function( tbl, ps ) {
  sum(
    sapply(ps,
      function(x) {
        t <- if ( is.na(x) ) {
          tbl[ which( is.na(tbl[ , "Domain.Architecture.Distance" ])
), ,
drop=F]
        } else {
          tbl[ which( tbl[ , "Domain.Architecture.Distance" ] == x ),
,
drop=F]
        }
        nrow(t)
      }
    )
  )
}

What does cause the different behavior of sumRows, when the which calls are
deleted?
What does which do, I seem not to grasp?
Or is there an error in my test.tbl?
* *
Any help on this subject will be greatly appreciated.
Kind regards and *merry christmas*!

	[[alternative HTML version deleted]]

Berend Hasselman

2012-Dec-17 19:39 UTC

head link

[R] Why does matrix selection behave differently when using which?

On 17-12-2012, at 20:22, Asis Hallab wrote:
> Dear R community,
> 
> I have a medium sized matrix stored in variable "t" and a simple
function "
> countRows" (see below) to count the number of rows in which a selected
> column "C" matches a given value. If I count all rows matching
all pairwise
> distinct values in the column "C" and sum these counts up, I get
the number
> or rows of "t". If I delete the "which" calls from
function "countRows" the
> resulting sum of matching row numbers is much greater than the number of
> rows in "t".
> 
> The table "t" I use can be downloaded from here:
> https://github.com/groupschoof/PhyloFun/archive/test_selector.zip
> Unzip the file and read in the table "t" using t <-
read.table("test.tbl")
> 
> The above function "sumRows" is defined as follows:
> sumRows <- function( tbl, ps ) {
>  sum(
>    sapply(ps,
>      function(x) {
>        t <- if ( is.na(x) ) {
>          tbl[ which( is.na(tbl[ , "Domain.Architecture.Distance"
]) ), ,
> drop=F]
>        } else {
>          tbl[ which( tbl[ , "Domain.Architecture.Distance" ] == x
), ,
> drop=F]
>        }
>        nrow(t)
>      }
>    )
>  )
> }
> 
And how are we supposed to call sumRows()?

sumRows(???, ???

Berend

David Winsemius

2012-Dec-17 20:00 UTC

head link

[R] Why does matrix selection behave differently when using which?

On Dec 17, 2012, at 11:22 AM, Asis Hallab wrote:
> Dear R community,
> 
> I have a medium sized matrix stored in variable "t" and a simple
function "
> countRows" (see below) to count the number of rows in which a selected
> column "C" matches a given value. If I count all rows matching
all pairwise
> distinct values in the column "C" and sum these counts up, I get
the number
> or rows of "t". If I delete the "which" calls from
function "countRows" the
> resulting sum of matching row numbers is much greater than the number of
> rows in "t".
> 
> The table "t" I use can be downloaded from here:
> https://github.com/groupschoof/PhyloFun/archive/test_selector.zip
What part of "minimal" example are you having difficulty
understanding? That zip file expands to a 1.8 MB file!

> Unzip the file and read in the table "t" using t <-
read.table("test.tbl")
Since it has a header line, you will be creating all factors and it's
doubtful you are getting what you want.

Instead:

 t <- read.table("test.tbl", header=TRUE)> 
> The above function "sumRows" is defined as follows:
> sumRows <- function( tbl, ps ) {
>  sum(
>    sapply(ps,
'ps'? What is ps????
>      function(x) {
>        t <- if ( is.na(x) ) {
I suspect that it is not `which` that is the problem, but rahter your
understanding of how `if` processes vectors. (This also should be simplified
greatly to avoid stepping through vectors one element at a time.)
>          tbl[ which( is.na(tbl[ , "Domain.Architecture.Distance"
]) ), ,
> drop=F]
You didn't do anything with that result!
>        } else {
>          tbl[ which( tbl[ , "Domain.Architecture.Distance" ] == x
), ,
> drop=F]
>        }
>        nrow(t)
That value will not depend in any manner on what preceded it.  ???? It will
simply be the number of rows in the local copy of "t"

You goal is _only_ to get a count? 

Why not just this:

 sum( tbl[!is.na(tbl$Domain.Architecture.Distance),
"Domain.Architecture.Distance" ] == x )

E.g.:
> sum( tbl[!is.na(tbl$Domain.Architecture.Distance),
"Domain.Architecture.Distance" ] == 0.99)[1] 3440

You should probably be creating a factor variable with `cut` to create
reasonable intervals for grouping, and if you do not know this it suggests you
need to do more stufy of the text or introductory materials.To get a quick look
at the distribution this is useful"

plot( density(tbl[!is.na(tbl$Domain.Architecture.Distance),
"Domain.Architecture.Distance" ] ))

(125 KB file so not attached)
> table( cut(tbl$Domain.Architecture.Distance, breaks=(0:10)/10) )
  (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8]
(0.8,0.9]   (0.9,1]
      616      1864       328       103       923      1763      1151      2490 
3709     38563
>      }
>    )
>  )
> }
> 
> What does cause the different behavior of sumRows, when the which calls are
> deleted?
> What does which do, I seem not to grasp?
The question ... as yet unanswered ....  is _how_ exactly are you calling that
function. You posted a link to data "t" but there is no code that
calls that function with the data. I do not see anything that would resemble a
"ps"-object.

> Or is there an error in my test.tbl?
(See above.)
> * *
> Any help on this subject will be greatly appreciated.
> Kind regards and *merry christmas*!
> 
> 	[[alternative HTML version deleted]]
Please read the Posting Guide and learn to post in plain text.

-- 
David Winsemius
Alameda, CA, USA

Reasonably Related Threads

Search for more possibly parallel threads

R help - Dec 2012 - Why does matrix selection behave differently when using which?

[R] Why does matrix selection behave differently when using which?

[R] Why does matrix selection behave differently when using which?

[R] Why does matrix selection behave differently when using which?

Reasonably Related Threads