Hi, I am very new to R and wanted to know if there is a package that, given very long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I would like to look for enrichment of certain motifs in genomic sequences. I tried using MEME (not an R package, I know), but the online version only allows sequences up to MAX 60000 nucleotides, and that's too short for my needs.. Thanks A
Hi Alessia,
you may want to post this kind of question on Bioc mailing list,
is more appropriate.
http://www.bioconductor.org/docs/mailList.html
About your question ,
I'm not 100% sure, but check if Biostrings pkg
can do what you need to do.
http://www.bioconductor.org/workshops/2008/SeattleNov08/MatchAlign/
Best Regards
Anna
Anna Freni Sterrantino
Ph.D Student
Department of Statistics
University of Bologna, Italy
via Belle Arti 41, 40124 BO.
________________________________
Da: Alessia Deglincerti <ald2010@med.cornell.edu>
A: r-help@r-project.org
Inviato: Martedì 9 dicembre 2008, 16:03:55
Oggetto: [R] motif search
Hi,
I am very new to R and wanted to know if there is a package that, given very
long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I
would like to look for enrichment of certain motifs in genomic sequences.
I tried using MEME (not an R package, I know), but the online version only
allows sequences up to MAX 60000 nucleotides, and that's too short for my
needs..
Thanks
A
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
Dear Alessia,> I am very new to R and wanted to know if there is a package that, given very > long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I > would like to look for enrichment of certain motifs in genomic sequences. > > I tried using MEME (not an R package, I know), but the online version only > allows sequences up to MAX 60000 nucleotides, and that's too short for my > needs..You may try this: # # Load the seqinr package: # library(seqinr) # # A FASTA file example - that ships with seqinr - which contains # the complete genome sequence of Chlamydia trachomatis : # fastafile <- system.file("sequences/ct.fasta", package = "seqinr") # # Import the sequence as a string of characters: # myseq <- read.fasta(fastafile, as.string = TRUE) nchar(myseq) # 1042519, that is a Mb sequence # # Look for motif "atatatat", with possible overlap: # words.pos("atatatat", myseq, extended = TRUE) # # This returns the posistions where the motif is found, that # is : 236501 236503 283987 687083 792792 792794 # substr(myseq, 236501, 236501 + 8) # # Should be # [1] "atatatata" # HTH, Jean -- Jean R. Lobry (lobry at biomserv.univ-lyon1.fr) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I, 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE allo : +33 472 43 27 56 fax : +33 472 43 13 88 http://pbil.univ-lyon1.fr/members/lobry/