debra ragland
2015-Dec-02 15:39 UTC
[R] Looping through multiple sub elements of a list to compare to multiple components of a vector
I think I am making this problem harder than it has to be and so I keep getting stuck on what might be a trivial problem.? I have used the seqinr package to load a protein sequence alignment containing 15 protein sequences; ? ? > library(seqinr)? ? > x = read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This automatically loads in a list of 4 elements including the sequences and other information. I store the sequences to a new list; ? ?> mylist = x$seqwhich returns a character vector of 15 strings. I have found that if I split the long character strings into individual characters it is easy to use lapply to loop over this list. So I use strsplit; ? ? >list.2 = strsplit(mylist, split = NULL)>From this list I can determine which proteins have changes at certain positions by using;? ? >lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for those elements of the list that do/do not the letter L at position 10.? Because each of the protein sequences contains 99amino acids, I want to automate this process so that I do not have to compare/contrast positions 1 x 1. Most of the changes occur between positions/letters 10-95. I have a standard character vector that I wish to use for comparison when looping through the list.? Should I perhaps combine all -- ?the standard "letter"/aa vector, the list of protein sequences -- into one list? Or is it better to leave them separate for this comparison? I'm not sure what the output should be as I need to use it for another statistical test. Would a list of logical vectors be the most sufficient output to return?? [[alternative HTML version deleted]]
Adams, Jean
2015-Dec-02 17:53 UTC
[R] Looping through multiple sub elements of a list to compare to multiple components of a vector
First, a couple posting tips. It's helpful to provide some example data people can work with. Also, please post in plain text (not html). If you have a single standard for comparison, you might find an approach like this helpful. # example data mylist <- c("AAEBCC", "AABDCC", "AABBCD") list.2 <- strsplit(mylist, split=NULL) # setting a standard for comparison std.string <- "AABBCC" standard <- unlist(strsplit(std.string, split=NULL)) sapply(list.2, function(x) x==standard) This gives you a matrix of logicals with the number of rows the same length as your original strings (the 99 amino acids) and the number of columns the same length as the number of strings you're comparing (the 15 sequences). [,1] [,2] [,3] [1,] TRUE TRUE TRUE [2,] TRUE TRUE TRUE [3,] FALSE TRUE TRUE [4,] TRUE FALSE TRUE [5,] TRUE TRUE TRUE [6,] TRUE TRUE FALSE Jean On Wed, Dec 2, 2015 at 9:39 AM, debra ragland via R-help < r-help at r-project.org> wrote:> I think I am making this problem harder than it has to be and so I keep > getting stuck on what might be a trivial problem. > I have used the seqinr package to load a protein sequence alignment > containing 15 protein sequences; > > library(seqinr) > x > read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This > automatically loads in a list of 4 elements including the sequences and > other information. > I store the sequences to a new list; > > mylist = x$seqwhich returns a character vector of 15 strings. > I have found that if I split the long character strings into individual > characters it is easy to use lapply to loop over this list. So I use > strsplit; > >list.2 = strsplit(mylist, split = NULL) > >From this list I can determine which proteins have changes at certain > positions by using; > >lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for > those elements of the list that do/do not the letter L at position 10. > Because each of the protein sequences contains 99amino acids, I want to > automate this process so that I do not have to compare/contrast positions 1 > x 1. Most of the changes occur between positions/letters 10-95. I have a > standard character vector that I wish to use for comparison when looping > through the list. > Should I perhaps combine all -- the standard "letter"/aa vector, the list > of protein sequences -- into one list? Or is it better to leave them > separate for this comparison? I'm not sure what the output should be as I > need to use it for another statistical test. Would a list of logical > vectors be the most sufficient output to return? > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]