thr3ads.net - R help - [R] subset of a matrix [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Carlos Gonzalo Merino Mendez

2009-Aug-27 16:27 UTC

[R] subset of a matrix

Hello everyone, I would appreciate any help with the following.

My dataset is a list containing matrices. So if you type e.g.

data[[1]]

you get something like:

           [,1]    [,2]
361a       A    T
456b       A    G
72145a    T    G
........

As you can see my rows have names which are character strings containing numbers
and letters. I want something similar to a histogram, per column. i.e. I want to
know how many times I have a single repeat character in a column and how many
times I have a twice repeated character and so on. Maybe there is an easy way to
do this, but I wrote my own code which works perfectly, so don't bother to
correct it unless extremely necessary. I write down the code so you know exactly
what I'm trying to do:

table <- vector()

for (i in (1:length(data))){

    for (j in (1:length(data[[i]][1,]))){

        t <- table(data[[i]][,j])

        table <- c(table, t)
}}

ncount <- table[names(table) != "-"] #this line is necessary to
eliminate "-" characters which should not be included in the analysis

sfs <- table (ncount)

And with this code I get something like:

 1   2   3   4   5   6   7   8   9  10 ....

542 125  98  49  47  41  26  31  22  18  ....

which is what I'm looking for.


Now comes THE problem:

As I said before my rows have names. Each name is unique. I want to apply my
analysis to a subset of rows en each matrix, namely all rows whose names start
with 3, all that start with 4, all that start with 721. In most cases only the
first character is important, but since I have names of different length, in
some cases I need the first three characters to differentiate the groups. I want
to integrate this into the loop so that I get a vector (such as the one called
"table" in my code) for each subset analyzed.

I tried using the subset function, but I couldn't figure out how to use it,
because it's intended to use row values to define the subset, not row names.

I hope someone can help me out, but please bear in mind I am really new at R and
most commands and parameters are really unfamiliar to me.

Thanks.


      
	[[alternative HTML version deleted]]

milton ruser

2009-Aug-27 16:39 UTC

head link

[R] subset of a matrix

Hi Carlos,

how about this step first:

rownames(mydata)<-gsub("361a","00361a",rownames(mydata))
rownames(mydata)<-gsub("456a","00456a",rownames(mydata))

good luck

milton
On Thu, Aug 27, 2009 at 12:27 PM, Carlos Gonzalo Merino Mendez <
carlosgmerino@yahoo.com> wrote:
> Hello everyone, I would appreciate any help with the following.
>
> My dataset is a list containing matrices. So if you type e.g.
>
> data[[1]]
>
> you get something like:
>
>           [,1]    [,2]
> 361a       A    T
> 456b       A    G
> 72145a    T    G
> ........
>
> As you can see my rows have names which are character strings containing
> numbers and letters. I want something similar to a histogram, per column.
> i.e. I want to know how many times I have a single repeat character in a
> column and how many times I have a twice repeated character and so on.
Maybe
> there is an easy way to do this, but I wrote my own code which works
> perfectly, so don't bother to correct it unless extremely necessary. I
write
> down the code so you know exactly what I'm trying to do:
>
> table <- vector()
>
> for (i in (1:length(data))){
>
>    for (j in (1:length(data[[i]][1,]))){
>
>        t <- table(data[[i]][,j])
>
>        table <- c(table, t)
> }}
>
> ncount <- table[names(table) != "-"] #this line is necessary
to eliminate
> "-" characters which should not be included in the analysis
>
> sfs <- table (ncount)
>
> And with this code I get something like:
>
>  1   2   3   4   5   6   7   8   9  10 ....
>
> 542 125  98  49  47  41  26  31  22  18  ....
>
> which is what I'm looking for.
>
>
> Now comes THE problem:
>
> As I said before my rows have names. Each name is unique. I want to apply
> my analysis to a subset of rows en each matrix, namely all rows whose names
> start with 3, all that start with 4, all that start with 721. In most cases
> only the first character is important, but since I have names of different
> length, in some cases I need the first three characters to differentiate
the
> groups. I want to integrate this into the loop so that I get a vector (such
> as the one called "table" in my code) for each subset analyzed.
>
> I tried using the subset function, but I couldn't figure out how to use
it,
> because it's intended to use row values to define the subset, not row
names.
>
> I hope someone can help me out, but please bear in mind I am really new at
> R and most commands and parameters are really unfamiliar to me.
>
> Thanks.
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Henrique Dallazuanna

2009-Aug-27 17:00 UTC

head link

[R] subset of a matrix

Try this:

lapply(data,
       function(r)
            lapply(split(r,
                         substr(sprintf("%05d",
as.numeric(gsub("[a-z]", "",
row.names(r)))), 1, 3)), table))

On Thu, Aug 27, 2009 at 1:27 PM, Carlos Gonzalo Merino Mendez <
carlosgmerino@yahoo.com> wrote:
> Hello everyone, I would appreciate any help with the following.
>
> My dataset is a list containing matrices. So if you type e.g.
>
> data[[1]]
>
> you get something like:
>
>           [,1]    [,2]
> 361a       A    T
> 456b       A    G
> 72145a    T    G
> ........
>
> As you can see my rows have names which are character strings containing
> numbers and letters. I want something similar to a histogram, per column.
> i.e. I want to know how many times I have a single repeat character in a
> column and how many times I have a twice repeated character and so on.
Maybe
> there is an easy way to do this, but I wrote my own code which works
> perfectly, so don't bother to correct it unless extremely necessary. I
write
> down the code so you know exactly what I'm trying to do:
>
> table <- vector()
>
> for (i in (1:length(data))){
>
>    for (j in (1:length(data[[i]][1,]))){
>
>        t <- table(data[[i]][,j])
>
>        table <- c(table, t)
> }}
>
> ncount <- table[names(table) != "-"] #this line is necessary
to eliminate
> "-" characters which should not be included in the analysis
>
> sfs <- table (ncount)
>
> And with this code I get something like:
>
>  1   2   3   4   5   6   7   8   9  10 ....
>
> 542 125  98  49  47  41  26  31  22  18  ....
>
> which is what I'm looking for.
>
>
> Now comes THE problem:
>
> As I said before my rows have names. Each name is unique. I want to apply
> my analysis to a subset of rows en each matrix, namely all rows whose names
> start with 3, all that start with 4, all that start with 721. In most cases
> only the first character is important, but since I have names of different
> length, in some cases I need the first three characters to differentiate
the
> groups. I want to integrate this into the loop so that I get a vector (such
> as the one called "table" in my code) for each subset analyzed.
>
> I tried using the subset function, but I couldn't figure out how to use
it,
> because it's intended to use row values to define the subset, not row
names.
>
> I hope someone can help me out, but please bear in mind I am really new at
> R and most commands and parameters are really unfamiliar to me.
>
> Thanks.
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Aug 2009 - subset of a matrix

[R] subset of a matrix

[R] subset of a matrix

[R] subset of a matrix

Possibly Parallel Threads