Hello
I have 2 columns of short sequences that I would like to compare and count the
number of mismatches and record the number of mismatches in a new column. The
sequences are part of a data frame that looks like this:
seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
d.f=data.frame(seq1, seq2)
thank you for your help
Joseph
____________________________________________________________________________________
Looking for last minute shopping deals?
[[alternative HTML version deleted]]
One kind of ugly solution
> d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)
> d.f[["nMismatch"]] <- with(d.f, {
+ m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2,
""))
+ colSums(m)
+ })
Check out the Bioconductor Biostrings package, especially the version
available with the development version of R, for DNA string algorithms.
Martin
joseph wrote:> Hello
> I have 2 columns of short sequences that I would like to compare and count
the number of mismatches and record the number of mismatches in a new column.
The sequences are part of a data frame that looks like this:
>
seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
>
seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> d.f=data.frame(seq1, seq2)
> thank you for your help
> Joseph
>
>
>
>
>
>
>
____________________________________________________________________________________
> Looking for last minute shopping deals?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
it is pretty enough for me. Thanks ----- Original Message ---- From: Martin Morgan <mtmorgan@fhcrc.org> To: joseph <jdsandjd@yahoo.com> Cc: r-help@r-project.org Sent: Friday, February 22, 2008 6:41:41 PM Subject: Re: [R] counting sequence mismatches One kind of ugly solution>d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)>d.f[["nMismatch"]] <- with(d.f, { + m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, "")) + colSums(m) + }) Check out the Bioconductor Biostrings package, especially the version available with the development version of R, for DNA string algorithms. Martin joseph wrote:>Hello>I have 2 columns of short sequences that I would like to compare and count the number of mismatches and record the number of mismatches in a new column. The sequences are part of a data frame that looks like this:>seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")>seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")>d.f=data.frame(seq1, seq2)>thank you for your help>Joseph> > > > > > >____________________________________________________________________________________>Looking for last minute shopping deals?> >[[alternative HTML version deleted]]> >______________________________________________>R-help@r-project.org mailing list>https://stat.ethz.ch/mailman/listinfo/r-help>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>and provide commented, minimal, self-contained, reproducible code. ____________________________________________________________________________________ Be a better friend, newshound, and [[alternative HTML version deleted]]