thr3ads.net - R help - [R] similarity matrix conversion to dissimilarity [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Dr. Thomas Isenbarger

2004-Dec-08 21:12 UTC

[R] similarity matrix conversion to dissimilarity

I have a matrix of similarity scores that I want to convert into a 
matrix of dissimilarity scores so that I can apply some clustering 
methods to the data.  That is, high values in my matrix signify 
similarity and low values (zero being the lowest) signify no 
similarity.  What functions/options in R or its packages are available 
for making this kind of transformation of a matrix?

Specifically, I am a molecular biologist.  I have a set of 700+ 
nucleotide sequences i want to group into clusters based on sequence 
similarities.  There is a wide range of sequences in the set, some of 
which are homologous to other sequences in the set.  I want to use 
clustering to identify these groups.

If the sequences were related and good be trimmed to the same length, I 
would do an alignment and then use phylip (or some other distance 
method) to create a distance matrix, but since my sequences are 
unrelated and cannot be trimmed to the same length, I am at a loss for 
what to do.

For a set with so many unrelated sequences of different lengths, the 
only thing I have been able to is an all-against-all BLAST to create 
the matrix, but this gives high scores for similarities, not high 
scores for dissimilarities.  The only thought I had was to use the 
reciprocal of the BLAST score as some perverse measure of distance.

I am not subscribed to the list, so can I ask for responses directly to 
my email address?

Thank-you,
Tom Isenbarger


--
isen at plantpath.wisc.edu
thomas a isenbarger
(608) 265-0850

Doran, Harold

2004-Dec-08 22:43 UTC

head link

[R] similarity matrix conversion to dissimilarity

Dear Sir:

I posed a similar question a few months back and received many
responses. Check the searchable archives at R Cran for those helpful
email. I did a search for 'similarity matrix' and many results were
returned.

Harold

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dr. Thomas
Isenbarger
Sent: Wednesday, December 08, 2004 4:12 PM
To: r-help at stat.math.ethz.ch
Subject: [R] similarity matrix conversion to dissimilarity

I have a matrix of similarity scores that I want to convert into a
matrix of dissimilarity scores so that I can apply some clustering
methods to the data.  That is, high values in my matrix signify
similarity and low values (zero being the lowest) signify no similarity.
What functions/options in R or its packages are available for making
this kind of transformation of a matrix?

Specifically, I am a molecular biologist.  I have a set of 700+
nucleotide sequences i want to group into clusters based on sequence
similarities.  There is a wide range of sequences in the set, some of
which are homologous to other sequences in the set.  I want to use
clustering to identify these groups.

If the sequences were related and good be trimmed to the same length, I
would do an alignment and then use phylip (or some other distance
method) to create a distance matrix, but since my sequences are
unrelated and cannot be trimmed to the same length, I am at a loss for
what to do.

For a set with so many unrelated sequences of different lengths, the
only thing I have been able to is an all-against-all BLAST to create the
matrix, but this gives high scores for similarities, not high scores for
dissimilarities.  The only thought I had was to use the reciprocal of
the BLAST score as some perverse measure of distance.

I am not subscribed to the list, so can I ask for responses directly to
my email address?

Thank-you,
Tom Isenbarger


--
isen at plantpath.wisc.edu
thomas a isenbarger
(608) 265-0850

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

(Ted Harding)

2004-Dec-08 23:10 UTC

head link

[R] similarity matrix conversion to dissimilarity

[replying to your personal address as well as the list; but
 I think you should subscribe to the list since this topic
 may well be pursued further]

On 08-Dec-04 Dr. Thomas Isenbarger wrote:> I have a matrix of similarity scores that I want to convert into a 
> matrix of dissimilarity scores so that I can apply some clustering 
> methods to the data.  That is, high values in my matrix signify 
> similarity and low values (zero being the lowest) signify no 
> similarity.  What functions/options in R or its packages are available 
> for making this kind of transformation of a matrix?
> 
> Specifically, I am a molecular biologist.  I have a set of 700+ 
> nucleotide sequences i want to group into clusters based on sequence 
> similarities.  There is a wide range of sequences in the set, some of 
> which are homologous to other sequences in the set.  I want to use 
> clustering to identify these groups.
> 
> If the sequences were related and good be trimmed to the same length, I
> would do an alignment and then use phylip (or some other distance 
> method) to create a distance matrix, but since my sequences are 
> unrelated and cannot be trimmed to the same length, I am at a loss for 
> what to do.
> 
> For a set with so many unrelated sequences of different lengths, the 
> only thing I have been able to is an all-against-all BLAST to create 
> the matrix, but this gives high scores for similarities, not high 
> scores for dissimilarities.  The only thought I had was to use the 
> reciprocal of the BLAST score as some perverse measure of distance.
> 
> I am not subscribed to the list, so can I ask for responses directly to
> my email address?
Clearly any function which "inverts" the measure of
"similarity"
(i.e. decreases as "similarity" increases) could be used as a
measure of dissimilarity in general. Indeed you imply as much
yourself. There is quite a wide choice ... "reciprocal" could be one.

However, reading between your lines, it seems that you do
not have a substantive interpretation for "dissimilarity".
Yet apparently you have one for "similarity". Otherwise, on
what basis do you claim that your similarity matrix expresses
*substantive* similarity?

But, if you can attach an interpretation (in some substantive
terms) to your measure of similarity, can you not then negate
the propositions that this expresses and obtain a measure of
dissimilarity? In that case, the function could be programmed
in R (though it may not be a function of your "similarity" and.
you would need to derive it from the data).

If not, why not? Or, if your measure of "similarity" in fact
does not carry a substantive interpretation, then one could
assert that any decreasing function of "similarity" could
be used, and would be as meaningful as your measure of
"similarity". Again, this can be programmed in R.

Again reading between your lines, it could be inferred that
in the situation you describe ("unrelated sequences" which
"cannot be trimmed to the same length"), while you can derive
a measure of similarity which matches established concepts
for similarity in your field, you cannot match the concepts
for dissimilarity.

If that is the case, R cannot help you with the conceptual
problem.

This may appear not helpful, but it is a sincere attempt
to clarify the issues.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 08-Dec-04                                       Time: 23:10:55
------------------------------ XFMail ------------------------------

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Dec 2004 - similarity matrix conversion to dissimilarity

[R] similarity matrix conversion to dissimilarity

[R] similarity matrix conversion to dissimilarity

[R] similarity matrix conversion to dissimilarity

Possibly Parallel Threads