Peter Hornbeck
2003-Sep-07 17:28 UTC
[R] Need help with cluster analysis of amino acid sequences
I am just starting to use R and am wanting to use the cluster algorithm for analyzing sequences of amino acids for similarities. The input will be lists of short sequences of 8-15 amino acids. Let me give you a feel for the sort of data I am interested in. Amino acids can be classified by a number of different parameters: e.g., charge and hydrophobicity. Each of these qualities could be described by a numerical assignment: charge (perhaps as either 0 or 1), and hydrophobicity (perhaps as a continuum from 0 to 1). The point of the analysis is to cluster those sequences that have similar properties at different positions along the sequence. My question is: is there a user group of biologists that may be able to provide tips about how to proceed, or perhaps who already have developed algorithms that can be applied/modified to the sort of analysis I need? Or does anyone have suggestions of other on-line resources that might be helpful? Thanks, Peter Peter Hornbeck Magnolia, MA 01930 (978) 5264867
Deepayan Sarkar
2003-Sep-07 18:02 UTC
[R] Need help with cluster analysis of amino acid sequences
The bioconductor project [1] might be of interest to you, and its mailing list [2] is probably more appropriate for your question. [1] http://www.bioconductor.org [2] https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor HTH, Deepayan On Sunday 07 September 2003 12:28, Peter Hornbeck wrote:> I am just starting to use R and am wanting to use the cluster algorithm for > analyzing sequences of amino acids for similarities. The input will be > lists of short sequences of 8-15 amino acids. > > Let me give you a feel for the sort of data I am interested in. > > Amino acids can be classified by a number of different parameters: e.g., > charge and hydrophobicity. Each of these qualities could be described by a > numerical assignment: charge (perhaps as either 0 or 1), and hydrophobicity > (perhaps as a continuum from 0 to 1). The point of the analysis is to > cluster those sequences that have similar properties at different positions > along the sequence. > > My question is: is there a user group of biologists that may be able to > provide tips about how to proceed, or perhaps who already have developed > algorithms that can be applied/modified to the sort of analysis I need? > > Or does anyone have suggestions of other on-line resources that might be > helpful? > > Thanks, Peter > > Peter Hornbeck > Magnolia, MA > 01930 > (978) 5264867 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help