thr3ads.net - R help - [R] read only certain parts of a file [Oct 2007]

If this information is useful, please help other people find it:
Share via:

João Fadista

2007-Oct-09 13:05 UTC

[R] read only certain parts of a file

Dear all,
 
I would like to know how can I read a text file and create a data frame of only
certain parts of the file.
For instance, from this text file:
 
==================================================
Matches For Query 0 (108 bases): 000019_0070

==================================================
Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

89 000019_0070 Chr15 3 108 43251883 43251778 C 106 95.28 

88 000019_0070 Chr1 4 108 85826948 85826844 C 105 95.24 

==================================================
Matches For Query 1 (124 bases): 000024_1262

==================================================
Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

99 000024_1262 Chr6 16 124 35738256 35738364 F 109 100.00 

 

I would like to have a data frame that has only:

Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

89 000019_0070 Chr15 3 108 43251883 43251778 C 106 95.28 

88 000019_0070 Chr1 4 108 85826948 85826844 C 105 95.24 

99 000024_1262 Chr6 16 124 35738256 35738364 F 109 100.00 

 
 
Best regards,
João Fadista

	[[alternative HTML version deleted]]

Gabor Grothendieck

2007-Oct-09 14:39 UTC

head link

[R] read only certain parts of a file

Here are two possibilities.  The first extracts all lines with 10 fields
and then takes unique ones while the second extracts all lines that
consist only of alphanumerics, space, underscore and period and then
also takes unique lines.  Both then read the result using read.table.

The first one assumes the garbage never has 10 fields and the second
one assumes the garbage always has a character not in the set indicated.
You can probably come up with other rules as well along these lines.


Lines.raw <- "==================================================
Matches For Query 0 (108 bases): 000019_0070

==================================================
Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

89 000019_0070 Chr15 3 108 43251883 43251778 C 106 95.28

88 000019_0070 Chr1 4 108 85826948 85826844 C 105 95.24

==================================================
Matches For Query 1 (124 bases): 000024_1262

==================================================
Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity

99 000024_1262 Chr6 16 124 35738256 35738364 F 109 100.00
"

Lines <- readLines(textConnection(Lines.raw))
Lines <- unique(grep("^[[:alnum:] ._]+$", Lines, value = TRUE))
read.table(textConnection(Lines), header = TRUE)

# or

Lines <- readLines(textConnection(Lines.raw))
idx <- count.fields(textConnection(Lines.raw), blank.lines.skip = FALSE)
Lines <- unique(Lines[idx == 10])
read.table(textConnection(Lines), header = TRUE)


On 10/9/07, Jo?o Fadista <Joao.Fadista at agrsci.dk>
wrote:> Dear all,
>
> I would like to know how can I read a text file and create a data frame of
only certain parts of the file.
> For instance, from this text file:
>
> ==================================================>
> Matches For Query 0 (108 bases): 000019_0070
>
> ==================================================>
> Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity
>
> 89 000019_0070 Chr15 3 108 43251883 43251778 C 106 95.28
>
> 88 000019_0070 Chr1 4 108 85826948 85826844 C 105 95.24
>
> ==================================================>
> Matches For Query 1 (124 bases): 000024_1262
>
> ==================================================>
> Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity
>
> 99 000024_1262 Chr6 16 124 35738256 35738364 F 109 100.00
>
>
>
> I would like to have a data frame that has only:
>
> Score Q_Name S_Name Q_Start Q_End S_Start S_End Direction Bases identity
>
> 89 000019_0070 Chr15 3 108 43251883 43251778 C 106 95.28
>
> 88 000019_0070 Chr1 4 108 85826948 85826844 C 105 95.24
>
> 99 000024_1262 Chr6 16 124 35738256 35738364 F 109 100.00
>
>
>
> Best regards,
> Jo?o Fadista

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Oct 2007 - read only certain parts of a file

[R] read only certain parts of a file

[R] read only certain parts of a file

Seemingly Similar Threads