Hello everyone, I would appreciate any help with the following. My dataset is a list containing matrices. So if you type e.g. data[[1]] you get something like: [,1] [,2] 361a A T 456b A G 72145a T G ........ As you can see my rows have names which are character strings containing numbers and letters. I want something similar to a histogram, per column. i.e. I want to know how many times I have a single repeat character in a column and how many times I have a twice repeated character and so on. Maybe there is an easy way to do this, but I wrote my own code which works perfectly, so don't bother to correct it unless extremely necessary. I write down the code so you know exactly what I'm trying to do: table <- vector() for (i in (1:length(data))){ for (j in (1:length(data[[i]][1,]))){ t <- table(data[[i]][,j]) table <- c(table, t) }} ncount <- table[names(table) != "-"] #this line is necessary to eliminate "-" characters which should not be included in the analysis sfs <- table (ncount) And with this code I get something like: 1 2 3 4 5 6 7 8 9 10 .... 542 125 98 49 47 41 26 31 22 18 .... which is what I'm looking for. Now comes THE problem: As I said before my rows have names. Each name is unique. I want to apply my analysis to a subset of rows en each matrix, namely all rows whose names start with 3, all that start with 4, all that start with 721. In most cases only the first character is important, but since I have names of different length, in some cases I need the first three characters to differentiate the groups. I want to integrate this into the loop so that I get a vector (such as the one called "table" in my code) for each subset analyzed. I tried using the subset function, but I couldn't figure out how to use it, because it's intended to use row values to define the subset, not row names. I hope someone can help me out, but please bear in mind I am really new at R and most commands and parameters are really unfamiliar to me. Thanks. [[alternative HTML version deleted]]
Hi Carlos, how about this step first: rownames(mydata)<-gsub("361a","00361a",rownames(mydata)) rownames(mydata)<-gsub("456a","00456a",rownames(mydata)) good luck milton On Thu, Aug 27, 2009 at 12:27 PM, Carlos Gonzalo Merino Mendez < carlosgmerino@yahoo.com> wrote:> Hello everyone, I would appreciate any help with the following. > > My dataset is a list containing matrices. So if you type e.g. > > data[[1]] > > you get something like: > > [,1] [,2] > 361a A T > 456b A G > 72145a T G > ........ > > As you can see my rows have names which are character strings containing > numbers and letters. I want something similar to a histogram, per column. > i.e. I want to know how many times I have a single repeat character in a > column and how many times I have a twice repeated character and so on. Maybe > there is an easy way to do this, but I wrote my own code which works > perfectly, so don't bother to correct it unless extremely necessary. I write > down the code so you know exactly what I'm trying to do: > > table <- vector() > > for (i in (1:length(data))){ > > for (j in (1:length(data[[i]][1,]))){ > > t <- table(data[[i]][,j]) > > table <- c(table, t) > }} > > ncount <- table[names(table) != "-"] #this line is necessary to eliminate > "-" characters which should not be included in the analysis > > sfs <- table (ncount) > > And with this code I get something like: > > 1 2 3 4 5 6 7 8 9 10 .... > > 542 125 98 49 47 41 26 31 22 18 .... > > which is what I'm looking for. > > > Now comes THE problem: > > As I said before my rows have names. Each name is unique. I want to apply > my analysis to a subset of rows en each matrix, namely all rows whose names > start with 3, all that start with 4, all that start with 721. In most cases > only the first character is important, but since I have names of different > length, in some cases I need the first three characters to differentiate the > groups. I want to integrate this into the loop so that I get a vector (such > as the one called "table" in my code) for each subset analyzed. > > I tried using the subset function, but I couldn't figure out how to use it, > because it's intended to use row values to define the subset, not row names. > > I hope someone can help me out, but please bear in mind I am really new at > R and most commands and parameters are really unfamiliar to me. > > Thanks. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Try this: lapply(data, function(r) lapply(split(r, substr(sprintf("%05d", as.numeric(gsub("[a-z]", "", row.names(r)))), 1, 3)), table)) On Thu, Aug 27, 2009 at 1:27 PM, Carlos Gonzalo Merino Mendez < carlosgmerino@yahoo.com> wrote:> Hello everyone, I would appreciate any help with the following. > > My dataset is a list containing matrices. So if you type e.g. > > data[[1]] > > you get something like: > > [,1] [,2] > 361a A T > 456b A G > 72145a T G > ........ > > As you can see my rows have names which are character strings containing > numbers and letters. I want something similar to a histogram, per column. > i.e. I want to know how many times I have a single repeat character in a > column and how many times I have a twice repeated character and so on. Maybe > there is an easy way to do this, but I wrote my own code which works > perfectly, so don't bother to correct it unless extremely necessary. I write > down the code so you know exactly what I'm trying to do: > > table <- vector() > > for (i in (1:length(data))){ > > for (j in (1:length(data[[i]][1,]))){ > > t <- table(data[[i]][,j]) > > table <- c(table, t) > }} > > ncount <- table[names(table) != "-"] #this line is necessary to eliminate > "-" characters which should not be included in the analysis > > sfs <- table (ncount) > > And with this code I get something like: > > 1 2 3 4 5 6 7 8 9 10 .... > > 542 125 98 49 47 41 26 31 22 18 .... > > which is what I'm looking for. > > > Now comes THE problem: > > As I said before my rows have names. Each name is unique. I want to apply > my analysis to a subset of rows en each matrix, namely all rows whose names > start with 3, all that start with 4, all that start with 721. In most cases > only the first character is important, but since I have names of different > length, in some cases I need the first three characters to differentiate the > groups. I want to integrate this into the loop so that I get a vector (such > as the one called "table" in my code) for each subset analyzed. > > I tried using the subset function, but I couldn't figure out how to use it, > because it's intended to use row values to define the subset, not row names. > > I hope someone can help me out, but please bear in mind I am really new at > R and most commands and parameters are really unfamiliar to me. > > Thanks. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]