Hello I have 2 columns of short sequences that I would like to compare and count the number of mismatches and record the number of mismatches in a new column. The sequences are part of a data frame that looks like this: seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA") seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA") d.f=data.frame(seq1, seq2) thank you for your help Joseph ____________________________________________________________________________________ Looking for last minute shopping deals? [[alternative HTML version deleted]]
One kind of ugly solution > d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE) > d.f[["nMismatch"]] <- with(d.f, { + m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, "")) + colSums(m) + }) Check out the Bioconductor Biostrings package, especially the version available with the development version of R, for DNA string algorithms. Martin joseph wrote:> Hello > I have 2 columns of short sequences that I would like to compare and count the number of mismatches and record the number of mismatches in a new column. The sequences are part of a data frame that looks like this: > seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA") > seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA") > d.f=data.frame(seq1, seq2) > thank you for your help > Joseph > > > > > > > ____________________________________________________________________________________ > Looking for last minute shopping deals? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
it is pretty enough for me. Thanks ----- Original Message ---- From: Martin Morgan <mtmorgan@fhcrc.org> To: joseph <jdsandjd@yahoo.com> Cc: r-help@r-project.org Sent: Friday, February 22, 2008 6:41:41 PM Subject: Re: [R] counting sequence mismatches One kind of ugly solution>d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)>d.f[["nMismatch"]] <- with(d.f, { + m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, "")) + colSums(m) + }) Check out the Bioconductor Biostrings package, especially the version available with the development version of R, for DNA string algorithms. Martin joseph wrote:>Hello>I have 2 columns of short sequences that I would like to compare and count the number of mismatches and record the number of mismatches in a new column. The sequences are part of a data frame that looks like this:>seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")>seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")>d.f=data.frame(seq1, seq2)>thank you for your help>Joseph> > > > > > >____________________________________________________________________________________>Looking for last minute shopping deals?> >[[alternative HTML version deleted]]> >______________________________________________>R-help@r-project.org mailing list>https://stat.ethz.ch/mailman/listinfo/r-help>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>and provide commented, minimal, self-contained, reproducible code. ____________________________________________________________________________________ Be a better friend, newshound, and [[alternative HTML version deleted]]