Hi,
May be
library(Biostrings) from Bioconductor helps you.
source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
?matchPattern()
?letterFrequency()
vec1<- "ababbbassdaa"
alphabetFrequency(DNAString(vec1))
#A C G T M R W S Y K V H D B N - +
#5 0 0 0 0 0 0 2 0 0 0 0 1 4 0 0 0
letterFrequency(DNAStringSet(vec1),letters="AC",OR=0)
?# ?? A C
#[1,] 5 0
vec2<- "addffggssbbsbbs"
longestConsecutive(c(vec1,vec2),"b")
#[1] 3 2
?matchPattern(DNAString("AB"),DNAString(vec1))
?# Views on a 12-letter DNAString subject
#subject: ABABBBASSDAA
#views:
?# ? start end width
#[1]???? 1?? 2???? 2 [AB]
#[2]???? 3?? 4???? 2 [AB]
Also,
library(seqinr)
?lapply(seq(s2c(vec2)),function(i) table(splitseq(s2c(vec2),word=i)))
#[[1]]
#
#a b d f g s
#1 4 2 2 2 4
#
#[[2]]
#
#ad bb bs df fg gs sb
# 1? 1? 1? 1? 1? 1? 1
---------------------------------------
A.K.
----- Original Message -----
From: ben1983 <ben_thompson at talk21.com>
To: r-help at r-project.org
Cc:
Sent: Friday, April 19, 2013 7:21 AM
Subject: [R] Sequence analysis
Hiya,
? ? ? ? ? ? I am trying to look at the similarities between a number of
sequences, for example i am trying to see how similar "ababbbassdaa"
is to
"addffggssbbsbbs" I was wondering is the some way for me to see how
similar
they are in terms of, for example, number of a's, number of b's, how
often a
and ab are consecutive, how often abab is together etc.
Any advice would be really useful......any kind of shove in the right
direction would be amazing! I've tried doing basic alignments but i think
this is loosing quite a lot of information.
Many thanks,
Ben
--
View this message in context:
http://r.789695.n4.nabble.com/Sequence-analysis-tp4664693.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.