thr3ads.net - R help - [R] how to load only lines that start with a particular symbol [Sep 2009]

If this information is useful, please help other people find it:
Share via:

J Chen

2009-Sep-15 20:59 UTC

[R] how to load only lines that start with a particular symbol

Dear all,

I have DNA sequence data which are fasta-formatted as
>gene A;.....AAAAACCCC
TTTTTGGGG
CCCTTTTTT>gene B;....CCCCCAAAA
GGGGGTTTT

I want to load only the lines that start with ">" where the
annotation
information for the gene is contained. In principle, I can remove the
sequences before loading or after loading all the lines. I just wonder if
there's a way to load only lines with a particular pattern. The skip
argument in read.table() doesn't work for my purpose.

Thanks in advance,
Jimmy

-- 
View this message in context:
http://www.nabble.com/how-to-load-only-lines-that-start-with-a-particular-symbol-tp25461693p25461693.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2009-Sep-15 21:04 UTC

head link

[R] how to load only lines that start with a particular symbol

read in the data with 'readLines' and then use 'grep'
> x[1] ">gene A;....." "AAAAACCCC"     "TTTTTGGGG"
"CCCTTTTTT"
">gene B;...."  "CCCCCAAAA"    
"GGGGGTTTT"> x <- x[grep("^>", x)]
> x[1] ">gene A;....." ">gene
B;....">

On Tue, Sep 15, 2009 at 4:59 PM, J Chen <jiaxuan.chen at mdc-berlin.de>
wrote:>
> Dear all,
>
> I have DNA sequence data which are fasta-formatted as
>
>>gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
>>gene B;....
> CCCCCAAAA
> GGGGGTTTT
>
> I want to load only the lines that start with ">" where the
annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.
>
> Thanks in advance,
> Jimmy
>
> --
> View this message in context:
http://www.nabble.com/how-to-load-only-lines-that-start-with-a-particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

William Dunlap

2009-Sep-15 21:44 UTC

head link

[R] how to load only lines that start with a particular symbol

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of J Chen
> Sent: Tuesday, September 15, 2009 2:00 PM
> To: r-help at r-project.org
> Subject: [R] how to load only lines that start with a 
> particular symbol
> 
> 
> Dear all,
> 
> I have DNA sequence data which are fasta-formatted as
> 
> >gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
> >gene B;....
> CCCCCAAAA
> GGGGGTTTT
> 
> I want to load only the lines that start with ">" where the
annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I 
> just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.
You could use pipe() to call an external program like grep
or perl to filter the lines of interest from the file so R's input
routine  only has to allocate space for those.  E.g., the
following makes a sample file and the readLines(pipe(...))
call reads only the lines starting with ">> " from it.   (It
assumes you don't have grep in PATH and gives where it is
installed on my Windows machine.)

  > tfile <- tempfile()
  > cat(file=tfile, sep="\n", c(">> Date",
">> Author", "columnA
columnB", "1 2", "3 4"))

  > readLines(tfile)
  [1] ">> Date"         ">> Author"      
"columnA columnB" "1 2"

  [5] "3 4"            
  > readLines(pipe(paste("e:/cygwin/bin/grep \"^>> \"
", tfile)))
  [1] ">> Date"   ">> Author"

perl can do more complicated processing and filtering than grep.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  
> 
> Thanks in advance,
> Jimmy
> 
> -- 
> View this message in context: 
> http://www.nabble.com/how-to-load-only-lines-that-start-with-a
> -particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gabor Grothendieck

2009-Sep-16 00:50 UTC

head link

[R] how to load only lines that start with a particular symbol

In the Windows cmd shell ^ means escape the next character
so try this (assuming the data you posted
is in genetest.dat in the current directory):
> readLines(pipe("findstr/b ^> genetest.dat"))[1] ">gene A;....." ">gene B;...."

and on UNIX replace "..." with the corresponding grep command
making sure you appropriately escape the > depending on the
shell you use.

On Tue, Sep 15, 2009 at 4:59 PM, J Chen <jiaxuan.chen at mdc-berlin.de>
wrote:>
> Dear all,
>
> I have DNA sequence data which are fasta-formatted as
>
>>gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
>>gene B;....
> CCCCCAAAA
> GGGGGTTTT
>
> I want to load only the lines that start with ">" where the
annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.
>
> Thanks in advance,
> Jimmy
>
> --
> View this message in context:
http://www.nabble.com/how-to-load-only-lines-that-start-with-a-particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more reasonably related threads

R help - Sep 2009 - how to load only lines that start with a particular symbol

[R] how to load only lines that start with a particular symbol

[R] how to load only lines that start with a particular symbol

[R] how to load only lines that start with a particular symbol

[R] how to load only lines that start with a particular symbol

Apparently Analagous Threads