Hello everyone, I would appreciate any help with the following.
My dataset is a list containing matrices. So if you type e.g.
data[[1]]
you get something like:
[,1] [,2]
361a A T
456b A G
72145a T G
........
As you can see my rows have names which are character strings containing numbers
and letters. I want something similar to a histogram, per column. i.e. I want to
know how many times I have a single repeat character in a column and how many
times I have a twice repeated character and so on. Maybe there is an easy way to
do this, but I wrote my own code which works perfectly, so don't bother to
correct it unless extremely necessary. I write down the code so you know exactly
what I'm trying to do:
table <- vector()
for (i in (1:length(data))){
for (j in (1:length(data[[i]][1,]))){
t <- table(data[[i]][,j])
table <- c(table, t)
}}
ncount <- table[names(table) != "-"] #this line is necessary to
eliminate "-" characters which should not be included in the analysis
sfs <- table (ncount)
And with this code I get something like:
1 2 3 4 5 6 7 8 9 10 ....
542 125 98 49 47 41 26 31 22 18 ....
which is what I'm looking for.
Now comes THE problem:
As I said before my rows have names. Each name is unique. I want to apply my
analysis to a subset of rows en each matrix, namely all rows whose names start
with 3, all that start with 4, all that start with 721. In most cases only the
first character is important, but since I have names of different length, in
some cases I need the first three characters to differentiate the groups. I want
to integrate this into the loop so that I get a vector (such as the one called
"table" in my code) for each subset analyzed.
I tried using the subset function, but I couldn't figure out how to use it,
because it's intended to use row values to define the subset, not row names.
I hope someone can help me out, but please bear in mind I am really new at R and
most commands and parameters are really unfamiliar to me.
Thanks.
[[alternative HTML version deleted]]
Hi Carlos,
how about this step first:
rownames(mydata)<-gsub("361a","00361a",rownames(mydata))
rownames(mydata)<-gsub("456a","00456a",rownames(mydata))
good luck
milton
On Thu, Aug 27, 2009 at 12:27 PM, Carlos Gonzalo Merino Mendez <
carlosgmerino@yahoo.com> wrote:
> Hello everyone, I would appreciate any help with the following.
>
> My dataset is a list containing matrices. So if you type e.g.
>
> data[[1]]
>
> you get something like:
>
> [,1] [,2]
> 361a A T
> 456b A G
> 72145a T G
> ........
>
> As you can see my rows have names which are character strings containing
> numbers and letters. I want something similar to a histogram, per column.
> i.e. I want to know how many times I have a single repeat character in a
> column and how many times I have a twice repeated character and so on.
Maybe
> there is an easy way to do this, but I wrote my own code which works
> perfectly, so don't bother to correct it unless extremely necessary. I
write
> down the code so you know exactly what I'm trying to do:
>
> table <- vector()
>
> for (i in (1:length(data))){
>
> for (j in (1:length(data[[i]][1,]))){
>
> t <- table(data[[i]][,j])
>
> table <- c(table, t)
> }}
>
> ncount <- table[names(table) != "-"] #this line is necessary
to eliminate
> "-" characters which should not be included in the analysis
>
> sfs <- table (ncount)
>
> And with this code I get something like:
>
> 1 2 3 4 5 6 7 8 9 10 ....
>
> 542 125 98 49 47 41 26 31 22 18 ....
>
> which is what I'm looking for.
>
>
> Now comes THE problem:
>
> As I said before my rows have names. Each name is unique. I want to apply
> my analysis to a subset of rows en each matrix, namely all rows whose names
> start with 3, all that start with 4, all that start with 721. In most cases
> only the first character is important, but since I have names of different
> length, in some cases I need the first three characters to differentiate
the
> groups. I want to integrate this into the loop so that I get a vector (such
> as the one called "table" in my code) for each subset analyzed.
>
> I tried using the subset function, but I couldn't figure out how to use
it,
> because it's intended to use row values to define the subset, not row
names.
>
> I hope someone can help me out, but please bear in mind I am really new at
> R and most commands and parameters are really unfamiliar to me.
>
> Thanks.
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Try this:
lapply(data,
function(r)
lapply(split(r,
substr(sprintf("%05d",
as.numeric(gsub("[a-z]", "",
row.names(r)))), 1, 3)), table))
On Thu, Aug 27, 2009 at 1:27 PM, Carlos Gonzalo Merino Mendez <
carlosgmerino@yahoo.com> wrote:
> Hello everyone, I would appreciate any help with the following.
>
> My dataset is a list containing matrices. So if you type e.g.
>
> data[[1]]
>
> you get something like:
>
> [,1] [,2]
> 361a A T
> 456b A G
> 72145a T G
> ........
>
> As you can see my rows have names which are character strings containing
> numbers and letters. I want something similar to a histogram, per column.
> i.e. I want to know how many times I have a single repeat character in a
> column and how many times I have a twice repeated character and so on.
Maybe
> there is an easy way to do this, but I wrote my own code which works
> perfectly, so don't bother to correct it unless extremely necessary. I
write
> down the code so you know exactly what I'm trying to do:
>
> table <- vector()
>
> for (i in (1:length(data))){
>
> for (j in (1:length(data[[i]][1,]))){
>
> t <- table(data[[i]][,j])
>
> table <- c(table, t)
> }}
>
> ncount <- table[names(table) != "-"] #this line is necessary
to eliminate
> "-" characters which should not be included in the analysis
>
> sfs <- table (ncount)
>
> And with this code I get something like:
>
> 1 2 3 4 5 6 7 8 9 10 ....
>
> 542 125 98 49 47 41 26 31 22 18 ....
>
> which is what I'm looking for.
>
>
> Now comes THE problem:
>
> As I said before my rows have names. Each name is unique. I want to apply
> my analysis to a subset of rows en each matrix, namely all rows whose names
> start with 3, all that start with 4, all that start with 721. In most cases
> only the first character is important, but since I have names of different
> length, in some cases I need the first three characters to differentiate
the
> groups. I want to integrate this into the loop so that I get a vector (such
> as the one called "table" in my code) for each subset analyzed.
>
> I tried using the subset function, but I couldn't figure out how to use
it,
> because it's intended to use row values to define the subset, not row
names.
>
> I hope someone can help me out, but please bear in mind I am really new at
> R and most commands and parameters are really unfamiliar to me.
>
> Thanks.
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]