Kim Milferstedt
2008-May-28 00:02 UTC
[R] function to compute consensus DNA sequence by plurality?
Hello, I am looking for a function that computes a consensus DNA sequence by plurality. I found "consensus" in bio3d which almost does what I need. However, it would be important for me to include ambiguities and not to omit every position that is less than the threshold set in "consensus". Is anybody aware of a package with a function that includes ambiguities in a consensus sequence? Thanks already! Kim Here are a couple of sequences to illustrate what I would like: TGCATACACCGACAACATCCTCGACGACTACACCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG GC TACACC AC A TCCT GA GACT CTGCTACTACG #This is what I get from "consensus" in bio3d with a threshold of 0.85. HGCMTACACCRACRAYRTCCTSGAYGACTWCTGCTACTACG # This is what I would like to get. ___________________________________________ Kim Milferstedt, PhD Postdoctoral Researcher University of Illinois at Urbana-Champaign Department of Microbiology C207 CLSL 601 S. Goodwin Avenue Urbana, IL 61801 phone: 001-217-244-0721 email: milferst at uiuc.edu
Jean lobry
2008-May-28 12:48 UTC
[R] function to compute consensus DNA sequence by plurality?
Kim,
Is is what you want?
tmp <- readLines(textConnection(
"TGCATACACCGACAACATCCTCGACGACTACACCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG"))
library(seqinr)
mat <- matrix(sapply(tmp, s2c), nrow = length(tmp), byrow = TRUE)
bma <- function(nucl){
nucl <- tolower(nucl)
iupac <- sapply(amb(), amb)
iupac$u <- NULL # no RNA
names(iupac)[unlist(lapply(iupac, setequal, nucl))]
}
res <- apply(mat, 2, bma)
toupper(c2s(res))
[1] "HGCMTACACCRACRAYRTCCTSGAYGACTWCWSCTACTACG"
Best,
Jean
--
Jean R. Lobry (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo : +33 472 43 27 56 fax : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/