Hi Debra,
As already noted by Boris, the right packages can be found in Bioconductor,
namely Biostrings (for handling sets of sequences and pairwise alignments) and
msa (for multiple alignments; a package I am maintaining). Your question does
not yet clearly indicate to me whether pairwise or multiple sequence alignments
are the right choice. If I understand you correctly, this is not the point
anyway, since you want a solution how to find out in which positions two aligned
sequences differ, right? The following code snippet demonstrates by means of a
simple example with two DNA strings how to check where two strings in a pairwise
alignment differ:
library(Biostrings)
seq1 <- DNAString("AGCGAGCGA")
seq2 <- DNAString("AGGATCGA")
aln <- pairwiseAlignment(seq1, seq2, type="global",
substitutionMatrix=nucleotideSubstitutionMatrix(4, -1))
which(strsplit(as.character(pattern(aln)), split="")[[1]] !
strsplit(as.character(subject(aln)), split="")[[1]])
Maybe there are more elegant solutions than this one. This is just a
quick approach using basic R. Note that the positions are relative to
the positions in the alignment, which need not be the same as the
positions in your original "non-mutant" sequence if the alignment
algorithm has inserted some gaps in this sequence. If you use multiple
alignments, the approach is basically the same:
library(msa)
seqs <- DNAStringSet(c(seq1="AGCGAGCGA",
seq2="AGGATCGA",
seq3="AGCGAGC"))
aln <- msaMuscle(seqs, order="input")
## seq1 against seq2:
which(strsplit(as.character(aln)[1], split="")[[1]] !
strsplit(as.character(aln)[2], split="")[[1]])
## seq1 against seq3:
which(strsplit(as.character(aln)[1], split="")[[1]] !
strsplit(as.character(aln)[3], split="")[[1]])
I hope this helps.
Best regards,
Ulrich
P.S.: I fully agree with Bert that the Bioconductor support forum would
be a good place to discuss this.
On Nov 24, 2015, at 12:04 PM, debra ragland via R-help<r-help at
r-project.org> wrote:
> I have 15 protein sequences of 99 amino acids each. After doing some
looking around I have found that there are several ways you can read sequences
into R and do pairwise or multiple alignments. I, however, do not know how to
probe changes at specific positions. For instance, I would like to know the best
way to align a standard sequence with one(1) or several mutant sequences and
probe each amino acid position that does not match the standard sequence. In
other words seq1 = "standard amino acid seq" and seq2 = "mutant
seq", align these 2 and then have a way to ask R to report whether there is
a change at position 10, or 11, or 12 and so on such that R reports(for example)
TRUE or FALSE for this question. Where all the sequences that have a reported
TRUE for a change at position X can be grouped against those that do not have a
change at this position.I'm not even sure that R is the best way to do this,
but it's the only language I'm somewhat familiar with.I hope this makes
sense.