Any better solution than this ? sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G") _________________________________________________________________ [[alternative HTML version deleted]]
Ken Knoblauch
2008-Jul-15 15:40 UTC
[R] counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"
Daren Tan <daren76 <at> hotmail.com> writes: > Any better solution than this ?> sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G")Try table(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")) A C G T 5 7 8 5 and get all 4 at once. HTH -- Ken Knoblauch Inserm U846 Institut Cellule Souche et Cerveau D?partement Neurosciences Int?gratives 18 avenue du Doyen L?pine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr
Henrik Bengtsson
2008-Jul-15 15:43 UTC
[R] counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"
Seems like you can do: library("matchprobes") # on Bioconductor countbases("TCGGGGGACAATCGGTAACCCGTCT")[,"G"] The catch is that it only counts A, C, G, and T:s and no other symbols. /Henrik On Tue, Jul 15, 2008 at 8:27 AM, Daren Tan <daren76 at hotmail.com> wrote:> > Any better solution than this ? > > sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G") > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Wolfgang Huber
2008-Jul-15 15:59 UTC
[R] counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"
Hi, And the Bioconductor package "Biostrings" is the place to go for any serious work with sequences. -- Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 15/07/2008 16:43 Henrik Bengtsson scripsit> Seems like you can do: > > library("matchprobes") # on Bioconductor > countbases("TCGGGGGACAATCGGTAACCCGTCT")[,"G"] > > The catch is that it only counts A, C, G, and T:s and no other symbols. > > /Henrik > > On Tue, Jul 15, 2008 at 8:27 AM, Daren Tan <daren76 a hotmail.com> wrote: >> Any better solution than this ? >> >> sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G") >> _________________________________________________________________
Patrick Aboyoun
2008-Jul-15 16:29 UTC
[R] counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"
Henrik, As Wolfgang mentioned, the Biostrings package in Bioconductor has a number of sequence manipulation functions. The alphabetFrequency function would get you what you need. > library(Biostrings) > alphabetFrequency(DNAString("TCGGGGGACAATCGGTAACCCGTCT")) A C G T M R W S Y K V H D B N - + 5 7 8 5 0 0 0 0 0 0 0 0 0 0 0 0 0 > alphabetFrequency(DNAString("TCGGGGGACAATCGGTAACCCGTCT"), baseOnly = TRUE) A C G T other 5 7 8 5 0 Patrick Wolfgang Huber wrote:> Hi, > > And the Bioconductor package "Biostrings" is the place to go for any > serious work with sequences. >
Riley, Steve
2008-Jul-15 17:28 UTC
[R] counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"
Daren, Not sure if it is any easier, but another solution is: code <- unlist(strsplit("TCGGGGGACAATCGGTAACCCGTCT","")) length(grep("[G]",code)) Steve -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daren Tan Sent: Tuesday, July 15, 2008 11:28 AM To: r-help at stat.math.ethz.ch Subject: [R] counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT" Any better solution than this ? sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G") _________________________________________________________________ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.