Kim Milferstedt
2008-May-28 00:02 UTC
[R] function to compute consensus DNA sequence by plurality?
Hello, I am looking for a function that computes a consensus DNA sequence by plurality. I found "consensus" in bio3d which almost does what I need. However, it would be important for me to include ambiguities and not to omit every position that is less than the threshold set in "consensus". Is anybody aware of a package with a function that includes ambiguities in a consensus sequence? Thanks already! Kim Here are a couple of sequences to illustrate what I would like: TGCATACACCGACAACATCCTCGACGACTACACCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG GC TACACC AC A TCCT GA GACT CTGCTACTACG #This is what I get from "consensus" in bio3d with a threshold of 0.85. HGCMTACACCRACRAYRTCCTSGAYGACTWCTGCTACTACG # This is what I would like to get. ___________________________________________ Kim Milferstedt, PhD Postdoctoral Researcher University of Illinois at Urbana-Champaign Department of Microbiology C207 CLSL 601 S. Goodwin Avenue Urbana, IL 61801 phone: 001-217-244-0721 email: milferst at uiuc.edu
Jean lobry
2008-May-28 12:48 UTC
[R] function to compute consensus DNA sequence by plurality?
Kim, Is is what you want? tmp <- readLines(textConnection( "TGCATACACCGACAACATCCTCGACGACTACACCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG")) library(seqinr) mat <- matrix(sapply(tmp, s2c), nrow = length(tmp), byrow = TRUE) bma <- function(nucl){ nucl <- tolower(nucl) iupac <- sapply(amb(), amb) iupac$u <- NULL # no RNA names(iupac)[unlist(lapply(iupac, setequal, nucl))] } res <- apply(mat, 2, bma) toupper(c2s(res)) [1] "HGCMTACACCRACRAYRTCCTSGAYGACTWCWSCTACTACG" Best, Jean -- Jean R. Lobry (lobry at biomserv.univ-lyon1.fr) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I, 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE allo : +33 472 43 27 56 fax : +33 472 43 13 88 http://pbil.univ-lyon1.fr/members/lobry/