Hi, I am very new to R and wanted to know if there is a package that, given very long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I would like to look for enrichment of certain motifs in genomic sequences. I tried using MEME (not an R package, I know), but the online version only allows sequences up to MAX 60000 nucleotides, and that's too short for my needs.. Thanks A
Hi Alessia, you may want to post this kind of question on Bioc mailing list, is more appropriate. http://www.bioconductor.org/docs/mailList.html About your question , I'm not 100% sure, but check if Biostrings pkg can do what you need to do. http://www.bioconductor.org/workshops/2008/SeattleNov08/MatchAlign/ Best Regards Anna Anna Freni Sterrantino Ph.D Student Department of Statistics University of Bologna, Italy via Belle Arti 41, 40124 BO. ________________________________ Da: Alessia Deglincerti <ald2010@med.cornell.edu> A: r-help@r-project.org Inviato: Martedì 9 dicembre 2008, 16:03:55 Oggetto: [R] motif search Hi, I am very new to R and wanted to know if there is a package that, given very long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I would like to look for enrichment of certain motifs in genomic sequences. I tried using MEME (not an R package, I know), but the online version only allows sequences up to MAX 60000 nucleotides, and that's too short for my needs.. Thanks A ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Dear Alessia,> I am very new to R and wanted to know if there is a package that, given very > long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I > would like to look for enrichment of certain motifs in genomic sequences. > > I tried using MEME (not an R package, I know), but the online version only > allows sequences up to MAX 60000 nucleotides, and that's too short for my > needs..You may try this: # # Load the seqinr package: # library(seqinr) # # A FASTA file example - that ships with seqinr - which contains # the complete genome sequence of Chlamydia trachomatis : # fastafile <- system.file("sequences/ct.fasta", package = "seqinr") # # Import the sequence as a string of characters: # myseq <- read.fasta(fastafile, as.string = TRUE) nchar(myseq) # 1042519, that is a Mb sequence # # Look for motif "atatatat", with possible overlap: # words.pos("atatatat", myseq, extended = TRUE) # # This returns the posistions where the motif is found, that # is : 236501 236503 283987 687083 792792 792794 # substr(myseq, 236501, 236501 + 8) # # Should be # [1] "atatatata" # HTH, Jean -- Jean R. Lobry (lobry at biomserv.univ-lyon1.fr) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I, 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE allo : +33 472 43 27 56 fax : +33 472 43 13 88 http://pbil.univ-lyon1.fr/members/lobry/