thr3ads.net - R help - [R] Probing a protein sequence alignment in R [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Ulrich Bodenhofer

2015-Nov-25 13:45 UTC

[R] Probing a protein sequence alignment in R

Hi Debra,

As already noted by Boris, the right packages can be found in Bioconductor,
namely Biostrings (for handling sets of sequences and pairwise alignments) and
msa (for multiple alignments; a package I am maintaining). Your question does
not yet clearly indicate to me whether pairwise or multiple sequence alignments
are the right choice. If I understand you correctly, this is not the point
anyway, since you want a solution how to find out in which positions two aligned
sequences differ, right? The following code snippet demonstrates by means of a
simple example with two DNA strings how to check where two strings in a pairwise
alignment differ:

    library(Biostrings)
    seq1 <- DNAString("AGCGAGCGA")
    seq2 <- DNAString("AGGATCGA")
    aln <- pairwiseAlignment(seq1, seq2, type="global",
    substitutionMatrix=nucleotideSubstitutionMatrix(4, -1))
    which(strsplit(as.character(pattern(aln)), split="")[[1]] !   
strsplit(as.character(subject(aln)), split="")[[1]])

Maybe there are more elegant solutions than this one. This is just a 
quick approach using basic R. Note that the positions are relative to 
the positions in the alignment, which need not be the same as the 
positions in your original "non-mutant" sequence if the alignment 
algorithm has inserted some gaps in this sequence. If you use multiple 
alignments, the approach is basically the same:

    library(msa)
    seqs <- DNAStringSet(c(seq1="AGCGAGCGA",
seq2="AGGATCGA",
    seq3="AGCGAGC"))
    aln <- msaMuscle(seqs, order="input")
    ## seq1 against seq2:
    which(strsplit(as.character(aln)[1], split="")[[1]] !   
strsplit(as.character(aln)[2], split="")[[1]])
    ## seq1 against seq3:
    which(strsplit(as.character(aln)[1], split="")[[1]] !   
strsplit(as.character(aln)[3], split="")[[1]])

I hope this helps.

Best regards,
Ulrich

P.S.: I fully agree with Bert that the Bioconductor support forum would 
be a good place to discuss this.


On Nov 24, 2015, at 12:04 PM, debra ragland via R-help<r-help at
r-project.org>  wrote:
> I have 15 protein sequences of 99 amino acids each. After doing some
looking around I have found that there are several ways you can read sequences
into R and do pairwise or multiple alignments. I, however, do not know how to
probe changes at specific positions. For instance, I would like to know the best
way to align a standard sequence with one(1) or several mutant sequences and
probe each amino acid position that does not match the standard sequence. In
other words seq1 = "standard amino acid seq" and seq2 = "mutant
seq", align these 2 and then have a way to ask R to report whether there is
a change at position 10, or 11, or 12 and so on such that R reports(for example)
TRUE or FALSE for this question. Where all the sequences that have a reported
TRUE for a change at position X can be grouped against those that do not have a
change at this position.I'm not even sure that R is the best way to do this,
but it's the only language I'm somewhat familiar with.I hope this makes
sense.

R help - Nov 2015 - Probing a protein sequence alignment in R

[R] Probing a protein sequence alignment in R