thr3ads.net - R help - [R] Filter a big matrix [Feb 2009]

If this information is useful, please help other people find it:
Share via:

cruz

2009-Feb-11 12:32 UTC

[R] Filter a big matrix

Hi,

I have a big matrix X with M rows and N columns that I want to filter
it into smaller ones with m (<M) rows and N columns.
The filter rule is based on the values of each columns, i.e.

X looks like this:
column name: a, b, c, d, ... etc

a   b   c   d   ...
1   2   3   4   ...
5   6   7   8   ...
9   8   7   6   ...
...   ...   ...   ...

The filter rule with the result that I want is:

X[X$a<5 & X$b<5 & X$c<5 & X$d<5 ...etc ,]
X[X$a<5 & X$b<5 & X$c<5 & X$d>=5 ...etc ,]
X[X$a<5 & X$b<5 & X$c>=5 & X$d<5 ...etc ,]
...   ...   ...
...

with all the possible combinations which is 2^M

I try to use multiple for loops to separate it:

for (i in 1:2)
  for (j in 1:2)
    for (k in 1:2)
      ... ...
        assign(paste(i,j,k,...,sep="")), X[if (i==1)
paste("X$a<5")
else paste("X$a>=5") & if (i==1) paste("X$b<5")
else paste("X$b>=5") &
..., ])

# there might be syntax errors, I just want to clearly describe my problem

Since paste("X$a>=5") gives type of character; whereas the type of
X$a>=5 should be logical.

How can I do this?

All thoughts are greatly appreciated.

Many Thanks,
cruz

jim holtman

2009-Feb-11 13:35 UTC

head link

[R] Filter a big matrix

Try something like this.  This setups the "<" condition and then
uses
'xor' to flip it to ">=" assuming that you are using the
same values

# test data
n <- 5  # number of columns
m <- 100  # number of rows
x <- matrix(sample(1:10, n * m, TRUE), ncol=n, nrow=m)
# create tests assuming you want to know if things are greater than or
less than a value
test <- apply(x, 2, "<", 5)
# setup all combinations to test for
comb <- expand.grid(rep(list(c(TRUE, FALSE)), n))
result <- lapply(seq(nrow(comb)), function(.row){
    # apply the test using 'xor' to change the conditions from
"<" to ">="
    x[apply(test, 1, function(z) all(xor(z, comb[.row,]))),,drop=FALSE]
})



On Wed, Feb 11, 2009 at 7:32 AM, cruz <cruadam at gmail.com>
wrote:> Hi,
>
> I have a big matrix X with M rows and N columns that I want to filter
> it into smaller ones with m (<M) rows and N columns.
> The filter rule is based on the values of each columns, i.e.
>
> X looks like this:
> column name: a, b, c, d, ... etc
>
> a   b   c   d   ...
> 1   2   3   4   ...
> 5   6   7   8   ...
> 9   8   7   6   ...
> ...   ...   ...   ...
>
> The filter rule with the result that I want is:
>
> X[X$a<5 & X$b<5 & X$c<5 & X$d<5 ...etc ,]
> X[X$a<5 & X$b<5 & X$c<5 & X$d>=5 ...etc ,]
> X[X$a<5 & X$b<5 & X$c>=5 & X$d<5 ...etc ,]
> ...   ...   ...
> ...
>
> with all the possible combinations which is 2^M
>
> I try to use multiple for loops to separate it:
>
> for (i in 1:2)
>  for (j in 1:2)
>    for (k in 1:2)
>      ... ...
>        assign(paste(i,j,k,...,sep="")), X[if (i==1)
paste("X$a<5")
> else paste("X$a>=5") & if (i==1)
paste("X$b<5") else paste("X$b>=5") &
> ..., ])
>
> # there might be syntax errors, I just want to clearly describe my problem
>
> Since paste("X$a>=5") gives type of character; whereas the
type of
> X$a>=5 should be logical.
>
> How can I do this?
>
> All thoughts are greatly appreciated.
>
> Many Thanks,
> cruz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Douglas Bates

2009-Feb-11 13:45 UTC

head link

[R] Filter a big matrix

On Wed, Feb 11, 2009 at 6:32 AM, cruz <cruadam at gmail.com>
wrote:> Hi,
>
> I have a big matrix X with M rows and N columns that I want to filter
> it into smaller ones with m (<M) rows and N columns.
> The filter rule is based on the values of each columns, i.e.
>
> X looks like this:
> column name: a, b, c, d, ... etc
>
> a   b   c   d   ...
> 1   2   3   4   ...
> 5   6   7   8   ...
> 9   8   7   6   ...
> ...   ...   ...   ...
>
> The filter rule with the result that I want is:
>
> X[X$a<5 & X$b<5 & X$c<5 & X$d<5 ...etc ,]
> X[X$a<5 & X$b<5 & X$c<5 & X$d>=5 ...etc ,]
> X[X$a<5 & X$b<5 & X$c>=5 & X$d<5 ...etc ,]
> ...   ...   ...
> ...
>
> with all the possible combinations which is 2^M
>
> I try to use multiple for loops to separate it:
>
> for (i in 1:2)
>  for (j in 1:2)
>    for (k in 1:2)
>      ... ...
>        assign(paste(i,j,k,...,sep="")), X[if (i==1)
paste("X$a<5")
> else paste("X$a>=5") & if (i==1)
paste("X$b<5") else paste("X$b>=5") &
> ..., ])
>
> # there might be syntax errors, I just want to clearly describe my problem
>
> Since paste("X$a>=5") gives type of character; whereas the
type of
> X$a>=5 should be logical.
>
> How can I do this?
Wow.  It's hard to know where to begin to comment on this.

Generally we recommend taking the "whole object" approach when
possible.  For example

Xl <- X < 5

performs all the comparisons in one go, returning a logical matrix of
the same dimension as X.  If you want only those rows of X in which
every element is less than 5 you could take the matrix X < 5 and
"apply" the "&" operator across the rows.  There is a
complication
here in that "&" is a binary operator, not a summary function but
for
logical values the "prod" summary function has the same effect as
reduction by "&" as long as you convert the result back to a
logical
value.  That is

X[as.logical(apply(Xl, 1, prod)),]

However, even before considering that aspect of the calculation it
would be best to back up and consider how you would store the result
and what you would do with it once you got it.  I really would
recommend that you think about how you are approaching the larger
problem of which, I assume, this represents one step.  You are trying
to do something difficult and the code you have outlined indicates
that you have not yet achieved fluency in R.  If indeed this approach
is the best approach to the problem then you should spend some time
reading up on R programming (Robert Gentleman's book "R Programming
for Bioinformatics" would be a good starting point I think) to save
yourself a lot of grief.

For example, paste("foo") is simply "foo".  The
"$" operator extracts
a component by name but the name must be a symbol, not the value of a
variable.  If you want a named component where the name is the value
of a variable you must use x[[nm]].

When you find yourself trying to describe an algorithm as a set of
nested loops where the number of loops is variable you need to rethink
the algorithm.

> All thoughts are greatly appreciated.
>
> Many Thanks,
> cruz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gustaf Rydevik

2009-Feb-11 14:13 UTC

head link

[R] Filter a big matrix

On Wed, Feb 11, 2009 at 1:32 PM, cruz <cruadam at gmail.com>
wrote:> Hi,
>
> I have a big matrix X with M rows and N columns that I want to filter
> it into smaller ones with m (<M) rows and N columns.
> The filter rule is based on the values of each columns, i.e.
>
> X looks like this:
> column name: a, b, c, d, ... etc
>
> a   b   c   d   ...
> 1   2   3   4   ...
> 5   6   7   8   ...
> 9   8   7   6   ...
> ...   ...   ...   ...
>
> The filter rule with the result that I want is:
>
> X[X$a<5 & X$b<5 & X$c<5 & X$d<5 ...etc ,]
> X[X$a<5 & X$b<5 & X$c<5 & X$d>=5 ...etc ,]
> X[X$a<5 & X$b<5 & X$c>=5 & X$d<5 ...etc ,]
> ...   ...   ...
> ...
>
> with all the possible combinations which is 2^M
>
> I try to use multiple for loops to separate it:
>
> for (i in 1:2)
>  for (j in 1:2)
>    for (k in 1:2)
>      ... ...
>        assign(paste(i,j,k,...,sep="")), X[if (i==1)
paste("X$a<5")
> else paste("X$a>=5") & if (i==1)
paste("X$b<5") else paste("X$b>=5") &
> ..., ])
>
> # there might be syntax errors, I just want to clearly describe my problem
>
> Since paste("X$a>=5") gives type of character; whereas the
type of
> X$a>=5 should be logical.
>
> How can I do this?
>
> All thoughts are greatly appreciated.
>
> Many Thanks,
> cruz
>
Assuming that I understood your data structure correctly, that all
columns should be tested in your filter, and that exactly one column
should not match the condition, the following should work:


##sample data
X<-matrix(sample(1:200,10000,replace=T),nrow=100)
colnames(X)<-1:100

### Filter function - modify to suit your purpose
filterFunction<-function(n,data){
filteredData<-data[rowSums(data>=199)==1&(data[,n]>=199),,drop=FALSE]
if(nrow(filteredData)==0){
    filteredData<-"NoMatchingRows"
    }
return(filteredData)
}

names<-colnames(X)
lapply(as.list(names),filterFunction,X)


Hope it helps.
Best regards,
Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Feb 2009 - Filter a big matrix

[R] Filter a big matrix

[R] Filter a big matrix

[R] Filter a big matrix

[R] Filter a big matrix

Seemingly Similar Threads