thr3ads.net - R help - [R] sequence clustering and assembly [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Bogdan Tanasa

2010-Apr-15 03:19 UTC

[R] sequence clustering and assembly

Dear all,

please could you suggest any R functions or packages (or external programs),
that

a. take as input a large number (> 10 000) of short 20-30 nt sequences, and
do
sequence assembly, to reconstruct larger (extended) 30-50 sequences ?

b. take as input a larger number of sequences (100 000 - 1 mil) and cluster
these
sequences in distinct classes based on the sequence similarity  ?

thanks a lot,

bogdan

	[[alternative HTML version deleted]]

David Winsemius

2010-Apr-15 12:24 UTC

head link

[R] sequence clustering and assembly

On Apr 14, 2010, at 11:19 PM, Bogdan Tanasa wrote:
> Dear all,
>
> please could you suggest any R functions or packages (or external  
> programs),
> that
>
> a. take as input a large number (> 10 000) of short 20-30 nt  
> sequences, and
> do
> sequence assembly, to reconstruct larger (extended) 30-50 sequences ?
>
> b. take as input a larger number of sequences (100 000 - 1 mil) and  
> cluster
> these
> sequences in distinct classes based on the sequence similarity  ?
Most of the discussion about genetics/omics applications occurs on the  
BioConductor mailing list. You should definitely seek it out, get the  
base installed and review their available online resources (before  
sending your next message to the correct mailing list.

http://www.bioconductor.org/docs

-- 

David Winsemius, MD
West Hartford, CT

Martin Morgan

2010-Apr-15 12:33 UTC

head link

[R] sequence clustering and assembly

Hi Bogdan --

On 04/14/2010 08:19 PM, Bogdan Tanasa wrote:> Dear all,
> 
> please could you suggest any R functions or packages (or external
> programs), thatlikely you'll have more luck on the Bioconductor mailing list,

http://bioconductor.org/docs/mailList.html

but...
> a. take as input a large number (> 10 000) of short 20-30 nt
> sequences, and do sequence assembly, to reconstruct larger (extended)
> 30-50 sequences ?
I don't know of any sequence assemblers in R; velvet would be a first
stop third party tool but it sounds like you have some fairly specific
requirements....
> b. take as input a larger number of sequences (100 000 - 1 mil) and
> cluster these sequences in distinct classes based on the sequence
> similarity  ?
The Biostrings package has various functions to calculate edit distance,
which might form the input to familiar R clustering algorithms. See
installation instructions at

http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html

This thread

https://stat.ethz.ch/pipermail/bioconductor/2010-March/032580.html

might suggest some directions.

Martin
> 
> thanks a lot,
> 
> bogdan
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________ R-help at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Reasonably Related Threads

Search for more reasonably related threads

R help - Apr 2010 - sequence clustering and assembly

[R] sequence clustering and assembly

[R] sequence clustering and assembly

[R] sequence clustering and assembly

Reasonably Related Threads