Hi Bogdan --
On 04/14/2010 08:19 PM, Bogdan Tanasa wrote:> Dear all,
>
> please could you suggest any R functions or packages (or external
> programs), that
likely you'll have more luck on the Bioconductor mailing list,
http://bioconductor.org/docs/mailList.html
but...
> a. take as input a large number (> 10 000) of short 20-30 nt
> sequences, and do sequence assembly, to reconstruct larger (extended)
> 30-50 sequences ?
I don't know of any sequence assemblers in R; velvet would be a first
stop third party tool but it sounds like you have some fairly specific
requirements....
> b. take as input a larger number of sequences (100 000 - 1 mil) and
> cluster these sequences in distinct classes based on the sequence
> similarity ?
The Biostrings package has various functions to calculate edit distance,
which might form the input to familiar R clustering algorithms. See
installation instructions at
http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html
This thread
https://stat.ethz.ch/pipermail/bioconductor/2010-March/032580.html
might suggest some directions.
Martin
>
> thanks a lot,
>
> bogdan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________ R-help at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793