thr3ads.net - R help - [R] counting sequence mismatches [Feb 2008]

If this information is useful, please help other people find it:
Share via:

joseph

2008-Feb-23 02:16 UTC

[R] counting sequence mismatches

Hello
I have 2 columns of short sequences that I would like to compare and count the
number of mismatches and  record the  number of mismatches in a new column. The
sequences are part of a data frame that looks like this:
seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
d.f=data.frame(seq1, seq2)
thank you for your help
Joseph






     
____________________________________________________________________________________
Looking for last minute shopping deals?  

	[[alternative HTML version deleted]]

Martin Morgan

2008-Feb-23 02:41 UTC

head link

[R] counting sequence mismatches

One kind of ugly solution

 > d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)
 > d.f[["nMismatch"]] <- with(d.f, {
+   m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2,
""))
+   colSums(m)
+ })

Check out the Bioconductor Biostrings package, especially the version 
available with the development version of R, for DNA string algorithms.

Martin

joseph wrote:> Hello
> I have 2 columns of short sequences that I would like to compare and count
the number of mismatches and  record the  number of mismatches in a new column.
The sequences are part of a data frame that looks like this:
>
seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
>
seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> d.f=data.frame(seq1, seq2)
> thank you for your help
> Joseph
> 
> 
> 
> 
> 
> 
>      
____________________________________________________________________________________
> Looking for last minute shopping deals?  
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

joseph

2008-Feb-23 16:18 UTC

head link

[R] counting sequence mismatches

it is pretty enough for me.  Thanks

----- Original Message ----
From: Martin Morgan <mtmorgan@fhcrc.org>
To: joseph <jdsandjd@yahoo.com>
Cc: r-help@r-project.org
Sent: Friday, February 22, 2008 6:41:41 PM
Subject: Re: [R] counting sequence mismatches


One 
kind 
of 
ugly 
solution

 > d.f=data.frame(seq1, 
seq2, 
stringsAsFactors=FALSE)
 > d.f[["nMismatch"]] 
<- 
with(d.f, 
{
+  
 
m 
<- 
mapply("!=", 
strsplit(seq1, 
""), 
strsplit(seq2, 
""))
+  
 
colSums(m)
+ 
})

Check 
out 
the 
Bioconductor 
Biostrings 
package, 
especially 
the 
version 
available 
with 
the 
development 
version 
of 
R, 
for 
DNA 
string 
algorithms.

Martin

joseph 
wrote:> 
Hello> I 
have 
2 
columns 
of 
short 
sequences 
that 
I 
would 
like 
to 
compare 
and 
count 
the 
number 
of 
mismatches 
and  
record 
the  
number 
of 
mismatches 
in 
a 
new 
column. 
The 
sequences 
are 
part 
of 
a 
data 
frame 
that 
looks 
like 
this:> seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")> seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT",
"CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")> d.f=data.frame(seq1, 
seq2)> thank 
you 
for 
your 
help> 
Joseph> 
> 
> 
> 
> 
> 
>    
  
 
____________________________________________________________________________________> Looking 
for 
last 
minute 
shopping 
deals?  > 
>     
[[alternative 
HTML 
version 
deleted]]> 
> 
______________________________________________> R-help@r-project.org 
mailing 
list> 
https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE 
do 
read 
the 
posting 
guide 
http://www.R-project.org/posting-guide.html> and 
provide 
commented, 
minimal, 
self-contained, 
reproducible 
code.







     
____________________________________________________________________________________
Be a better friend, newshound, and 


	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Feb 2008 - counting sequence mismatches

[R] counting sequence mismatches

[R] counting sequence mismatches

[R] counting sequence mismatches

Maybe Matching Threads