thr3ads.net - R help - [R] extracting data from unstructured (text?) file [Mar 2012]

If this information is useful, please help other people find it:
Share via:

frauke

2012-Mar-11 19:07 UTC

[R] extracting data from unstructured (text?) file

Dear R community, 

I have the following problem I hoped you could help me with. 

My data is save in thousand of files with a weird extension containing for
numbers and a z. For example *.1405z. With list.files I managed to load this
data into R. It looks like this (the row numbers are not in the original
file):

35                             :LATEST STAGE     3.60 FT AT 730 AM CST ON
0102
36                          .ER ARCT2    0102 C
DC200001020813/DH12/HGIFF/DIH6
37                   :QPF FORECAST        6AM       NOON        6PM      
MDNT
38                   .E1 :0102:              /       3.5/       3.4/      
3.5
39                   .E2 :0103:   /       3.5/       3.0/       2.5/      
2.1
40                   .E3 :0104:   /       1.8/       1.5/       1.3/      
1.2
41                   .E4 :0105:   /       1.2/       1.8/       2.3/      
2.7
42                   .E5 :0106:   /       3.0/       3.0/       3.1/      
3.3
43                                                    .E6 :0107:   /      
3.4

I need the table in rows 37 to 43 in a matrix, for example:
0201     NA    3.5    3.4    3.5
0103     3.5    3.0    2.5     2.1
0104     1.8    1.5    1.3    1.2
0105    1.2     1.8    2.3    2.7
0106     3.0    3.0    3.1    3.3
0107     3.4    NA    NA   NA

 Unfortunately the row numbers vary per file.  I can call up each line with
file[40,1] for line 40 for example. It returns:
[1] .E3 :0104:   /       1.8/       1.5/       1.3/       1.2
38 Levels: .E1 :0102:              /       3.5/       3.4/       3.5 ...

 So I have two problems really:
1. How do I detect the table in the file (resp. the line where the table
starts)?
2. How do I break up each line to write the values into a matrix?

Feel free to suggest an entirely different approach if you think that is
helpful. 

Thanks a lot! Frauke



--
View this message in context:
http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2012-Mar-11 19:35 UTC

head link

[R] extracting data from unstructured (text?) file

Can you at least provide a subset of 2 files so we can see how the
data is really stored in the file and what the separators are between
the 'columns' of data.  Also how do you determine where the data
actually starts for the rows that you want to pull off.  This will aid
in determining how to parse the data.

On Sun, Mar 11, 2012 at 3:07 PM, frauke <fhoss at andrew.cmu.edu>
wrote:> Dear R community,
>
> I have the following problem I hoped you could help me with.
>
> My data is save in thousand of files with a weird extension containing for
> numbers and a z. For example *.1405z. With list.files I managed to load
this
> data into R. It looks like this (the row numbers are not in the original
> file):
>
> 35 ? ? ? ? ? ? ? ? ? ? ? ? ? ? :LATEST STAGE ? ? 3.60 FT AT 730 AM CST ON
> 0102
> 36 ? ? ? ? ? ? ? ? ? ? ? ? ?.ER ARCT2 ? ?0102 C
> DC200001020813/DH12/HGIFF/DIH6
> 37 ? ? ? ? ? ? ? ? ? :QPF FORECAST ? ? ? ?6AM ? ? ? NOON ? ? ? ?6PM
> MDNT
> 38 ? ? ? ? ? ? ? ? ? .E1 :0102: ? ? ? ? ? ? ?/ ? ? ? 3.5/ ? ? ? 3.4/
> 3.5
> 39 ? ? ? ? ? ? ? ? ? .E2 :0103: ? / ? ? ? 3.5/ ? ? ? 3.0/ ? ? ? 2.5/
> 2.1
> 40 ? ? ? ? ? ? ? ? ? .E3 :0104: ? / ? ? ? 1.8/ ? ? ? 1.5/ ? ? ? 1.3/
> 1.2
> 41 ? ? ? ? ? ? ? ? ? .E4 :0105: ? / ? ? ? 1.2/ ? ? ? 1.8/ ? ? ? 2.3/
> 2.7
> 42 ? ? ? ? ? ? ? ? ? .E5 :0106: ? / ? ? ? 3.0/ ? ? ? 3.0/ ? ? ? 3.1/
> 3.3
> 43 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?.E6 :0107: ? /
> 3.4
>
> I need the table in rows 37 to 43 in a matrix, for example:
> 0201 ? ? NA ? ?3.5 ? ?3.4 ? ?3.5
> 0103 ? ? 3.5 ? ?3.0 ? ?2.5 ? ? 2.1
> 0104 ? ? 1.8 ? ?1.5 ? ?1.3 ? ?1.2
> 0105 ? ?1.2 ? ? 1.8 ? ?2.3 ? ?2.7
> 0106 ? ? 3.0 ? ?3.0 ? ?3.1 ? ?3.3
> 0107 ? ? 3.4 ? ?NA ? ?NA ? NA
>
> ?Unfortunately the row numbers vary per file. ?I can call up each line with
> file[40,1] for line 40 for example. It returns:
> [1] .E3 :0104: ? / ? ? ? 1.8/ ? ? ? 1.5/ ? ? ? 1.3/ ? ? ? 1.2
> 38 Levels: .E1 :0102: ? ? ? ? ? ? ?/ ? ? ? 3.5/ ? ? ? 3.4/ ? ? ? 3.5 ...
>
> ?So I have two problems really:
> 1. How do I detect the table in the file (resp. the line where the table
> starts)?
> 2. How do I break up each line to write the values into a matrix?
>
> Feel free to suggest an entirely different approach if you think that is
> helpful.
>
> Thanks a lot! Frauke
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Vijaya Parthiban

2012-Mar-11 19:41 UTC

head link

[R] extracting data from unstructured (text?) file

Hi Frauke,

Try unix commands with R's system() function.

Example:
Let's say you have a matrix like this in the file (note: the first element
is missing) called hello.txt
10 100
2 20 200
3 30 300
4 40 400
5 50 500

You can try something like:

hello = system("cut -f1 hello.txt", intern=T)

VP.

On 11 March 2012 19:07, frauke <fhoss@andrew.cmu.edu> wrote:
> Dear R community,
>
> I have the following problem I hoped you could help me with.
>
> My data is save in thousand of files with a weird extension containing for
> numbers and a z. For example *.1405z. With list.files I managed to load
> this
> data into R. It looks like this (the row numbers are not in the original
> file):
>
> 35                             :LATEST STAGE     3.60 FT AT 730 AM CST ON
> 0102
> 36                          .ER ARCT2    0102 C
> DC200001020813/DH12/HGIFF/DIH6
> 37                   :QPF FORECAST        6AM       NOON        6PM
> MDNT
> 38                   .E1 :0102:              /       3.5/       3.4/
> 3.5
> 39                   .E2 :0103:   /       3.5/       3.0/       2.5/
> 2.1
> 40                   .E3 :0104:   /       1.8/       1.5/       1.3/
> 1.2
> 41                   .E4 :0105:   /       1.2/       1.8/       2.3/
> 2.7
> 42                   .E5 :0106:   /       3.0/       3.0/       3.1/
> 3.3
> 43                                                    .E6 :0107:   /
> 3.4
>
> I need the table in rows 37 to 43 in a matrix, for example:
> 0201     NA    3.5    3.4    3.5
> 0103     3.5    3.0    2.5     2.1
> 0104     1.8    1.5    1.3    1.2
> 0105    1.2     1.8    2.3    2.7
> 0106     3.0    3.0    3.1    3.3
> 0107     3.4    NA    NA   NA
>
>  Unfortunately the row numbers vary per file.  I can call up each line with
> file[40,1] for line 40 for example. It returns:
> [1] .E3 :0104:   /       1.8/       1.5/       1.3/       1.2
> 38 Levels: .E1 :0102:              /       3.5/       3.4/       3.5 ...
>
>  So I have two problems really:
> 1. How do I detect the table in the file (resp. the line where the table
> starts)?
> 2. How do I break up each line to write the values into a matrix?
>
> Feel free to suggest an entirely different approach if you think that is
> helpful.
>
> Thanks a lot! Frauke
>
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

frauke

2012-Mar-11 20:07 UTC

head link

[R] extracting data from unstructured (text?) file

Thank you for the quick reply! I have attached two files.

http://r.789695.n4.nabble.com/file/n4464511/sample1.1339z sample1.1339z 
http://r.789695.n4.nabble.com/file/n4464511/sample2.1949z sample2.1949z 

--
View this message in context:
http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464511.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more seemingly similar threads

R help - Mar 2012 - extracting data from unstructured (text?) file

[R] extracting data from unstructured (text?) file

[R] extracting data from unstructured (text?) file

[R] extracting data from unstructured (text?) file

[R] extracting data from unstructured (text?) file

Reasonably Related Threads