Marcelo Araya
2011-Sep-27 16:51 UTC
[R] searching several subsequences in a single string sequence
Hi all I am analyzing bird song element sequences. I would like to know how can I get how many times a given subsequence is found in single string sequence. For example: If I have this single sequence: ABCABAABABABCAB I am looking for the subsequence "ABC". Want I need to get here is that the subsequence is found twice. Any idea how can I do this? Thanks in advance Marcelo Araya-Salas Ph.D. Student Avian Communication and Evolution Lab Department of Biology New Mexico State University Lab: 575-646-4863 [[alternative HTML version deleted]]
Ivan Calandra
2011-Sep-27 17:03 UTC
[R] searching several subsequences in a single string sequence
Hi Marcelo, Try this: x <- "ABCABAABABABCAB" length(gregexpr(pattern="ABC", x)[[1]]) See ?gregexpr for more details (though I admit that it is not easy to understand this help page) HTH, Ivan Le 9/27/2011 18:51, Marcelo Araya a ?crit :> Hi all > > > > I am analyzing bird song element sequences. I would like to know how can I > get how many times a given subsequence is found in single string sequence. > > > > > > For example: > > > > If I have this single sequence: > > > > ABCABAABABABCAB > > > > I am looking for the subsequence "ABC". Want I need to get here is that the > subsequence is found twice. > > > > Any idea how can I do this? > > > > Thanks in advance > > > > > > Marcelo Araya-Salas > > Ph.D. Student > > Avian Communication and Evolution Lab > > Department of Biology > > New Mexico State University > > Lab: 575-646-4863 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Dept. Mammalogy Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
Barry Rowlingson
2011-Sep-27 17:06 UTC
[R] searching several subsequences in a single string sequence
On Tue, Sep 27, 2011 at 5:51 PM, Marcelo Araya <marceloa27 at gmail.com> wrote:> Hi all > > > > I am analyzing bird song element sequences. I would like to know how can I > get how many times a given subsequence is found in single string sequence. > > > > > > For example: > > > > If I have this single sequence: > > > > ABCABAABABABCAB > > > > I am looking for the subsequence "ABC". Want I need to get here is that the > subsequence is found twice. > > > > Any idea how can I do this? >gregexpr will return the position and length of multiple matches. And you can feed it a vector. So: > songs=c("ABCABAABABABCAB","ABACAB","ABABCABCBC") > gregexpr(m,songs) [[1]] [1] 1 11 attr(,"match.length") [1] 3 3 [[2]] [1] -1 attr(,"match.length") [1] -1 [[3]] [1] 3 6 attr(,"match.length") [1] 3 3 - in the first item, it was found at posn 1 and 11 - in the second it wasnt found at all - in the third, it was found at posn 3 and 6 so just do some apply-ing to the returned list and get the length of each element. Job done! Barry PS bonus points for spotting the hidden prog-rock song title.