mauede at alice.it
2009-Sep-29 16:03 UTC
[R] How can I avoid a for-loop through sapply or lapply ?
Through converting a miRNAs file from FASTA to character format I get a vector which looks like the following:> nml[1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" [2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b" [3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c" [4] "hsa-let-7d MIMAT0000065 Homo sapiens let-7d" [5] "hsa-let-7e MIMAT0000066 Homo sapiens let-7e" [6] "hsa-let-7f MIMAT0000067 Homo sapiens let-7f" [7] "hsa-miR-15a MIMAT0000068 Homo sapiens miR-15a" [8] "hsa-miR-16 MIMAT0000069 Homo sapiens miR-16" [9] "hsa-miR-17 MIMAT0000070 Homo sapiens miR-17" [10] "hsa-miR-18a MIMAT0000072 Homo sapiens miR-18a" ....................................................................................................... [888] "hsa-miR-675* MIMAT0006790 Homo sapiens miR-675*" [889] "hsa-miR-888* MIMAT0004917 Homo sapiens miR-888*" [890] "hsa-miR-541* MIMAT0004919 Homo sapiens miR-541*" My goal is to separate into a vector only the first string preceding the 1st space starting from the left. With reference to the records above listed I would obtain: [1] "hsa-let-7a" [2] "hsa-let-7b" [3] "hsa-let-7c" [4] "hsa-let-7d" [5] "hsa-let-7e" [6] "hsa-let-7f f" [7] "hsa-miR-15a" [8] "hsa-miR-16" [9] "hsa-miR-17" [10] "hsa-miR-18a" ....................................................................................................... [888] "hsa-miR-675*" [889] "hsa-miR-888*" [890] "hsa-miR-541*" I tried using strsplit as follows: strsplit(nml,"MIMAT[0-9]*") from here I get a vector of lists and I can separate the string I need through the [[]] operator, as shown in the following.> strsplit(nml,"MIMAT[0-9]*")[[1]][1][1] "hsa-let-7a "> strsplit(nml,"MIMAT[0-9]*")[[2]][1][1] "hsa-let-7b " Unluckily the [[]] operator acts on one vector element at a time. In fact:> strsplit(nml,"MIMAT[0-9]*")[[]][1]Error in strsplit(nml, "MIMAT[0-9]*")[[]] : invalid subscript type 'symbol' I guess a smart combination os strsplit ans sapply or lapply could do the job with one command line only ... but I haven't been able to get the syntax right ... I would greatly appreciate some help from R language experts. I know I can use a for-loop to get what I am struggling for. But Idefinitely wish to learn to use a high-level language as it deserves rather than the C-style. Thank you in advance, Maura tutti i telefonini TIM! [[alternative HTML version deleted]]
Steve Lianoglou
2009-Sep-29 16:08 UTC
[R] How can I avoid a for-loop through sapply or lapply ?
Hi, On Sep 29, 2009, at 12:03 PM, <mauede at alice.it> wrote:> Through converting a miRNAs file from FASTA to character format I > get a vector which looks like the following: > >> nml > [1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" > [2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b" > [3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c" > [4] "hsa-let-7d MIMAT0000065 Homo sapiens let-7d" > [5] "hsa-let-7e MIMAT0000066 Homo sapiens let-7e" > [6] "hsa-let-7f MIMAT0000067 Homo sapiens let-7f" > [7] "hsa-miR-15a MIMAT0000068 Homo sapiens miR-15a" > [8] "hsa-miR-16 MIMAT0000069 Homo sapiens miR-16" > [9] "hsa-miR-17 MIMAT0000070 Homo sapiens miR-17" > [10] "hsa-miR-18a MIMAT0000072 Homo sapiens miR-18a" > ....................................................................................................... > [888] "hsa-miR-675* MIMAT0006790 Homo sapiens miR-675*" > [889] "hsa-miR-888* MIMAT0004917 Homo sapiens miR-888*" > [890] "hsa-miR-541* MIMAT0004919 Homo sapiens miR-541*" > > > My goal is to separate into a vector only the first string preceding > the 1st space starting from the left. > With reference to the records above listed I would obtain: > [1] "hsa-let-7a" > [2] "hsa-let-7b" > [3] "hsa-let-7c" > [4] "hsa-let-7d" > [5] "hsa-let-7e" > [6] "hsa-let-7f f" > [7] "hsa-miR-15a" > [8] "hsa-miR-16" > [9] "hsa-miR-17" > [10] "hsa-miR-18a" > ....................................................................................................... > [888] "hsa-miR-675*" > [889] "hsa-miR-888*" > [890] "hsa-miR-541*"pieces <- strsplit(nml, " ") sapply(pieces, '[', 1) Or, the same as a 1 liner: sapply(strsplit(nml, " "), '[', 1) Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Charles C. Berry
2009-Sep-29 16:10 UTC
[R] How can I avoid a for-loop through sapply or lapply ?
On Tue, 29 Sep 2009, mauede at alice.it wrote:> Through converting a miRNAs file from FASTA to character format I get a vector which looks like the following: > >> nml > [1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" > [2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b" > [3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c" > [4] "hsa-let-7d MIMAT0000065 Homo sapiens let-7d" > [5] "hsa-let-7e MIMAT0000066 Homo sapiens let-7e" > [6] "hsa-let-7f MIMAT0000067 Homo sapiens let-7f" > [7] "hsa-miR-15a MIMAT0000068 Homo sapiens miR-15a" > [8] "hsa-miR-16 MIMAT0000069 Homo sapiens miR-16" > [9] "hsa-miR-17 MIMAT0000070 Homo sapiens miR-17" > [10] "hsa-miR-18a MIMAT0000072 Homo sapiens miR-18a" > ....................................................................................................... > [888] "hsa-miR-675* MIMAT0006790 Homo sapiens miR-675*" > [889] "hsa-miR-888* MIMAT0004917 Homo sapiens miR-888*" > [890] "hsa-miR-541* MIMAT0004919 Homo sapiens miR-541*" > > > My goal is to separate into a vector only the first string preceding the 1st space starting from the left. > With reference to the records above listed I would obtain: > [1] "hsa-let-7a" > [2] "hsa-let-7b" > [3] "hsa-let-7c" > [4] "hsa-let-7d" > [5] "hsa-let-7e" > [6] "hsa-let-7f f" > [7] "hsa-miR-15a" > [8] "hsa-miR-16" > [9] "hsa-miR-17" > [10] "hsa-miR-18a" > ....................................................................................................... > [888] "hsa-miR-675*" > [889] "hsa-miR-888*" > [890] "hsa-miR-541*"sub( "[ ].*", "", nml )> > I tried using strsplit as follows: > strsplit(nml,"MIMAT[0-9]*") > from here I get a vector of lists and I can separate the string I need through the [[]] operator, as shown in the following. >> strsplit(nml,"MIMAT[0-9]*")[[1]][1] > [1] "hsa-let-7a " >> strsplit(nml,"MIMAT[0-9]*")[[2]][1] > [1] "hsa-let-7b " > > Unluckily the [[]] operator acts on one vector element at a time. In fact: >> strsplit(nml,"MIMAT[0-9]*")[[]][1] > Error in strsplit(nml, "MIMAT[0-9]*")[[]] : > invalid subscript type 'symbol' > > I guess a smart combination os strsplit ans sapply or lapply could do the job with one command line only ... > but I haven't been able to get the syntax right ... I would greatly appreciate some help from R language experts. > I know I can use a for-loop to get what I am struggling for. But Idefinitely wish to learn to use a high-level language > as it deserves rather than the C-style. > > Thank you in advance, > Maura > > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
jim holtman
2009-Sep-29 16:20 UTC
[R] How can I avoid a for-loop through sapply or lapply ?
How about this:> x <- c("sadfdsaf 24353245", 'wqerwqer 6577', 'xzcv 6587') > sub("^([^[:space:]]+)[[:space:]].*", "\\1", x)[1] "sadfdsaf" "wqerwqer" "xzcv">On Tue, Sep 29, 2009 at 12:03 PM, <mauede at alice.it> wrote:> Through converting a miRNAs file from FASTA to character ?format I get a vector which looks like the following: > >> nml > ?[1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" > ?[2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b" > ?[3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c" > ?[4] "hsa-let-7d MIMAT0000065 Homo sapiens let-7d" > ?[5] "hsa-let-7e MIMAT0000066 Homo sapiens let-7e" > ?[6] "hsa-let-7f MIMAT0000067 Homo sapiens let-7f" > ?[7] "hsa-miR-15a MIMAT0000068 Homo sapiens miR-15a" > ?[8] "hsa-miR-16 MIMAT0000069 Homo sapiens miR-16" > ?[9] "hsa-miR-17 MIMAT0000070 Homo sapiens miR-17" > ?[10] "hsa-miR-18a MIMAT0000072 Homo sapiens miR-18a" > ? ? ? ?....................................................................................................... > [888] "hsa-miR-675* MIMAT0006790 Homo sapiens miR-675*" > [889] "hsa-miR-888* MIMAT0004917 Homo sapiens miR-888*" > [890] "hsa-miR-541* MIMAT0004919 Homo sapiens miR-541*" > > > My goal is to separate into a vector only the first string preceding the 1st space starting from the left. > With reference to the records above listed I would obtain: > ?[1] "hsa-let-7a" > ?[2] "hsa-let-7b" > ?[3] "hsa-let-7c" > ?[4] "hsa-let-7d" > ?[5] "hsa-let-7e" > ?[6] "hsa-let-7f f" > ?[7] "hsa-miR-15a" > ?[8] "hsa-miR-16" > ?[9] "hsa-miR-17" > ?[10] "hsa-miR-18a" > ? ? ? ?....................................................................................................... > [888] "hsa-miR-675*" > [889] "hsa-miR-888*" > [890] "hsa-miR-541*" > > I tried using strsplit as follows: > strsplit(nml,"MIMAT[0-9]*") > from here I get a vector of lists and I can separate the string I need through the [[]] operator, as shown in the following. >> strsplit(nml,"MIMAT[0-9]*")[[1]][1] > [1] "hsa-let-7a " >> strsplit(nml,"MIMAT[0-9]*")[[2]][1] > [1] "hsa-let-7b " > > Unluckily the [[]] operator acts on one vector element at a time. In fact: >> strsplit(nml,"MIMAT[0-9]*")[[]][1] > Error in strsplit(nml, "MIMAT[0-9]*")[[]] : > ?invalid subscript type 'symbol' > > I guess a smart combination os strsplit ans sapply or lapply could do the job with one command line only ... > but I haven't been able to get the syntax right ... I would greatly appreciate some help from R language experts. > I know I can use a for-loop to get what I am struggling for. But Idefinitely wish to learn to use a high-level language > as it deserves rather than the C-style. > > Thank you in advance, > Maura > > > > > > > > tutti i telefonini TIM! > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
(Ted Harding)
2009-Sep-29 16:25 UTC
[R] How can I avoid a for-loop through sapply or lapply ?
On 29-Sep-09 16:03:31, mauede at alice.it wrote:> Through converting a miRNAs file from FASTA to character format I get > a vector which looks like the following: > >> nml > [1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" > [2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b" > [3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c" > [4] "hsa-let-7d MIMAT0000065 Homo sapiens let-7d" > [5] "hsa-let-7e MIMAT0000066 Homo sapiens let-7e" > [6] "hsa-let-7f MIMAT0000067 Homo sapiens let-7f" > [7] "hsa-miR-15a MIMAT0000068 Homo sapiens miR-15a" > [8] "hsa-miR-16 MIMAT0000069 Homo sapiens miR-16" > [9] "hsa-miR-17 MIMAT0000070 Homo sapiens miR-17" > [10] "hsa-miR-18a MIMAT0000072 Homo sapiens miR-18a" > > ........................................................................ > ............................... > [888] "hsa-miR-675* MIMAT0006790 Homo sapiens miR-675*" > [889] "hsa-miR-888* MIMAT0004917 Homo sapiens miR-888*" > [890] "hsa-miR-541* MIMAT0004919 Homo sapiens miR-541*" > > > My goal is to separate into a vector only the first string preceding > the 1st space starting from the left. > With reference to the records above listed I would obtain: > [1] "hsa-let-7a" > [2] "hsa-let-7b" > [3] "hsa-let-7c" > [4] "hsa-let-7d" > [5] "hsa-let-7e" > [6] "hsa-let-7f f" > [7] "hsa-miR-15a" > [8] "hsa-miR-16" > [9] "hsa-miR-17" > [10] "hsa-miR-18a" > > ........................................................................ > ............................... > [888] "hsa-miR-675*" > [889] "hsa-miR-888*" > [890] "hsa-miR-541*" > [...]Hi Maura, Example: Strings<-c( "hsa-let-7a MIMAT0000062 Homo sapiens let-7a", "hsa-let-7b MIMAT0000063 Homo sapiens let-7b", "hsa-let-7c MIMAT0000064 Homo sapiens let-7c") sub(" .*","",Strings) # [1] "hsa-let-7a" "hsa-let-7b" "hsa-let-7c" Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 29-Sep-09 Time: 17:25:00 ------------------------------ XFMail ------------------------------