thr3ads.net - R help - [R] Extract from a text file [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Val

2016-Jun-01 01:26 UTC

[R] Extract from a text file

Thank you so much Jeff. It worked for this example.

When I read it from a file (c:\data\test.txt) it did not work

KLEM="c:\data"
KR=paste(KLEM,"\test.txt",sep="")
indta <- readLines(KR, skip=46)  # not interested in the first 46 lines)

pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
firstlines <- grep( pattern, indta )
# Replace the matched portion (entire string) with the first capture # string
v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
# Replace the matched portion (entire string) with the second capture # string
v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
# Convert the lines just after the first lines to numeric
v3 <- as.numeric( indta[ firstlines + 1 ] )
# put it all into a data frame
result <- data.frame( Group = v1, Mean = v2, SE = v3 )

result
[1] Group Mean  SE
<0 rows> (or 0-length row.names)

Thank you in advance


On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:> Please learn to post in plain text (the setting is in your email client...
> somewhere), as HTML is "What We See Is Not What You Saw" on this
mailing
> list.  In conjunction with that, try reading some of the fine material
> mentioned in the Posting Guide about making reproducible examples like this
> one:
>
> # You could read in a file
> # indta <- readLines( "out.txt" )
> # but there is no "current directory" in an email
> # so here I have used the dput() function to make source code
> # that creates a self-contained R object
>
> indta <- c(
> "Mean of weight  group 1, SE of mean  :  72.289037489555276",
> " 11.512956539215610",
> "Average weight of group 2, SE of Mean :  83.940053900595013",
> "  10.198495690144522",
> "group 3 mean , SE of Mean     :               
78.310441258245469",
> " 13.015876679555",
> "Mean of weight of group 4, SE of Mean               :
76.967516495101669",
> " 12.1254882985", "")
>
> # Regular expression patterns are discussed all over the internet
> # in many places OTHER than R
> # You can start with ?regex, but there are many fine tutorials also
>
> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
> # For this task the regex has to match the whole "first line" of
each set
> #  ^ =match starting at the beginning of the string
> #  .* =any character, zero or more times
> #  "group " =match these characters
> #  ( =first capture string starts here
> #  \\d = any digit (first backslash for R, second backslash for regex)
> #  + =one or more of the preceding (any digit)
> #  ) =end of first capture string
> #  [^:] =any non-colon character
> #  * =zero or more of the preceding (non-colon character)
> #  : =match a colon exactly
> #  " *" =match zero or more spaces
> #  ( =second capture string starts here
> #  [ =start of a set of equally acceptable characters
> #  -+ =either of these characters are acceptable
> #  0-9 =any digit would be acceptable
> #  . =a period is acceptable (this is inside the [])
> #  eE =in case you get exponential notation input
> #  ] =end of the set of acceptable characters (number)
> #  * =number of acceptable characters can be zero or more
> #  ) =second capture string stops here
> #  .* =zero or more of any character (just in case)
> #  $ =at end of pattern, requires that the match reach the end
> #     of the string
>
> # identify indexes of strings that match the pattern
> firstlines <- grep( pattern, indta )
> # Replace the matched portion (entire string) with the first capture #
> string
> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
> # Replace the matched portion (entire string) with the second capture #
> string
> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
> # Convert the lines just after the first lines to numeric
> v3 <- as.numeric( indta[ firstlines + 1 ] )
> # put it all into a data frame
> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>
> Figuring out how to deliver your result (output) is a separate question
that
> depends where you want it to go.
>
>
> On Mon, 30 May 2016, Val wrote:
>
>> Hi all,
>>
>> I have a messy text file and from this text file I want extract some
>> information
>> here is the text file (out.txt).  One record has tow lines. The mean
comes
>> in the first line and the SE of the mean is on the second line. Here is
>> the
>> sample of the data.
>>
>> Mean of weight  group 1, SE of mean  :  72.289037489555276
>> 11.512956539215610
>> Average weight of group 2, SE of Mean :  83.940053900595013
>>  10.198495690144522
>> group 3 mean , SE of Mean     :                78.310441258245469
>> 13.015876679555
>> Mean of weight of group 4, SE of Mean               :
76.967516495101669
>> 12.1254882985
>>
>> I want produce the following  table. How do i read it first and then
>> produce a
>>
>>
>> Gr1  72.289037489555276   11.512956539215610
>> Gr2  83.940053900595013   10.198495690144522
>> Gr3  78.310441258245469   13.015876679555
>> Gr4  76.967516495101669   12.1254882985
>>
>>
>> Thank you in advance
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#. 
Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

Jeff Newmiller

2016-Jun-01 02:05 UTC

head link

[R] Extract from a text file

You need to go back and study how I made my solution reproducible and make your
problem reproducible.

You probably also ought to spend some time comparing the regex pattern to your
actual data... the point of this list is to learn how to construct these
solutions yourself.
-- 
Sent from my phone. Please excuse my brevity.

On May 31, 2016 6:26:31 PM PDT, Val <valkremk at gmail.com>
wrote:>Thank you so much Jeff. It worked for this example.
>
>When I read it from a file (c:\data\test.txt) it did not work
>
>KLEM="c:\data"
>KR=paste(KLEM,"\test.txt",sep="")
>indta <- readLines(KR, skip=46)  # not interested in the first 46
>lines)
>
>pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>firstlines <- grep( pattern, indta )
># Replace the matched portion (entire string) with the first capture #
>string
>v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
># Replace the matched portion (entire string) with the second capture #
>string
>v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
># Convert the lines just after the first lines to numeric
>v3 <- as.numeric( indta[ firstlines + 1 ] )
># put it all into a data frame
>result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>
>result
>[1] Group Mean  SE
><0 rows> (or 0-length row.names)
>
>Thank you in advance
>
>
>On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us> wrote:
>> Please learn to post in plain text (the setting is in your email
>client...
>> somewhere), as HTML is "What We See Is Not What You Saw" on
this
>mailing
>> list.  In conjunction with that, try reading some of the fine
>material
>> mentioned in the Posting Guide about making reproducible examples
>like this
>> one:
>>
>> # You could read in a file
>> # indta <- readLines( "out.txt" )
>> # but there is no "current directory" in an email
>> # so here I have used the dput() function to make source code
>> # that creates a self-contained R object
>>
>> indta <- c(
>> "Mean of weight  group 1, SE of mean  :  72.289037489555276",
>> " 11.512956539215610",
>> "Average weight of group 2, SE of Mean : 
83.940053900595013",
>> "  10.198495690144522",
>> "group 3 mean , SE of Mean     :               
78.310441258245469",
>> " 13.015876679555",
>> "Mean of weight of group 4, SE of Mean               :
>76.967516495101669",
>> " 12.1254882985", "")
>>
>> # Regular expression patterns are discussed all over the internet
>> # in many places OTHER than R
>> # You can start with ?regex, but there are many fine tutorials also
>>
>> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>> # For this task the regex has to match the whole "first line"
of each
>set
>> #  ^ =match starting at the beginning of the string
>> #  .* =any character, zero or more times
>> #  "group " =match these characters
>> #  ( =first capture string starts here
>> #  \\d = any digit (first backslash for R, second backslash for
>regex)
>> #  + =one or more of the preceding (any digit)
>> #  ) =end of first capture string
>> #  [^:] =any non-colon character
>> #  * =zero or more of the preceding (non-colon character)
>> #  : =match a colon exactly
>> #  " *" =match zero or more spaces
>> #  ( =second capture string starts here
>> #  [ =start of a set of equally acceptable characters
>> #  -+ =either of these characters are acceptable
>> #  0-9 =any digit would be acceptable
>> #  . =a period is acceptable (this is inside the [])
>> #  eE =in case you get exponential notation input
>> #  ] =end of the set of acceptable characters (number)
>> #  * =number of acceptable characters can be zero or more
>> #  ) =second capture string stops here
>> #  .* =zero or more of any character (just in case)
>> #  $ =at end of pattern, requires that the match reach the end
>> #     of the string
>>
>> # identify indexes of strings that match the pattern
>> firstlines <- grep( pattern, indta )
>> # Replace the matched portion (entire string) with the first capture
>#
>> string
>> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ]
) )
>> # Replace the matched portion (entire string) with the second capture
>#
>> string
>> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ]
) )
>> # Convert the lines just after the first lines to numeric
>> v3 <- as.numeric( indta[ firstlines + 1 ] )
>> # put it all into a data frame
>> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>>
>> Figuring out how to deliver your result (output) is a separate
>question that
>> depends where you want it to go.
>>
>>
>> On Mon, 30 May 2016, Val wrote:
>>
>>> Hi all,
>>>
>>> I have a messy text file and from this text file I want extract
some
>>> information
>>> here is the text file (out.txt).  One record has tow lines. The
mean
>comes
>>> in the first line and the SE of the mean is on the second line.
Here
>is
>>> the
>>> sample of the data.
>>>
>>> Mean of weight  group 1, SE of mean  :  72.289037489555276
>>> 11.512956539215610
>>> Average weight of group 2, SE of Mean :  83.940053900595013
>>>  10.198495690144522
>>> group 3 mean , SE of Mean     :                78.310441258245469
>>> 13.015876679555
>>> Mean of weight of group 4, SE of Mean               :
>76.967516495101669
>>> 12.1254882985
>>>
>>> I want produce the following  table. How do i read it first and
then
>>> produce a
>>>
>>>
>>> Gr1  72.289037489555276   11.512956539215610
>>> Gr2  83.940053900595013   10.198495690144522
>>> Gr3  78.310441258245469   13.015876679555
>>> Gr4  76.967516495101669   12.1254882985
>>>
>>>
>>> Thank you in advance
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.      
##.#.  Live
>Go...
>>                                       Live:   OO#.. Dead: OO#.. 
>Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>>
>---------------------------------------------------------------------------
	[[alternative HTML version deleted]]

Bert Gunter

2016-Jun-01 03:27 UTC

head link

[R] Extract from a text file

On Tue, May 31, 2016 at 7:05 PM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:> You need to go back and study how I made my solution reproducible and make
your problem reproducible.
>
> You probably also ought to spend some time comparing the regex pattern to
your actual data... the point of this list is to learn how to construct these
solutions yourself.

Ah, if only that were the case.

(or is that just the grumbling of an old curmudgeon?)

Cheers,
Bert

> --
> Sent from my phone. Please excuse my brevity.
>
> On May 31, 2016 6:26:31 PM PDT, Val <valkremk at gmail.com> wrote:
>>Thank you so much Jeff. It worked for this example.
>>
>>When I read it from a file (c:\data\test.txt) it did not work
>>
>>KLEM="c:\data"
>>KR=paste(KLEM,"\test.txt",sep="")
>>indta <- readLines(KR, skip=46)  # not interested in the first 46
>>lines)
>>
>>pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>>firstlines <- grep( pattern, indta )
>># Replace the matched portion (entire string) with the first capture #
>>string
>>v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ]
) )
>># Replace the matched portion (entire string) with the second capture #
>>string
>>v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ]
) )
>># Convert the lines just after the first lines to numeric
>>v3 <- as.numeric( indta[ firstlines + 1 ] )
>># put it all into a data frame
>>result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>>
>>result
>>[1] Group Mean  SE
>><0 rows> (or 0-length row.names)
>>
>>Thank you in advance
>>
>>
>>On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
>><jdnewmil at dcn.davis.ca.us> wrote:
>>> Please learn to post in plain text (the setting is in your email
>>client...
>>> somewhere), as HTML is "What We See Is Not What You Saw"
on this
>>mailing
>>> list.  In conjunction with that, try reading some of the fine
>>material
>>> mentioned in the Posting Guide about making reproducible examples
>>like this
>>> one:
>>>
>>> # You could read in a file
>>> # indta <- readLines( "out.txt" )
>>> # but there is no "current directory" in an email
>>> # so here I have used the dput() function to make source code
>>> # that creates a self-contained R object
>>>
>>> indta <- c(
>>> "Mean of weight  group 1, SE of mean  : 
72.289037489555276",
>>> " 11.512956539215610",
>>> "Average weight of group 2, SE of Mean : 
83.940053900595013",
>>> "  10.198495690144522",
>>> "group 3 mean , SE of Mean     :               
78.310441258245469",
>>> " 13.015876679555",
>>> "Mean of weight of group 4, SE of Mean               :
>>76.967516495101669",
>>> " 12.1254882985", "")
>>>
>>> # Regular expression patterns are discussed all over the internet
>>> # in many places OTHER than R
>>> # You can start with ?regex, but there are many fine tutorials also
>>>
>>> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>>> # For this task the regex has to match the whole "first
line" of each
>>set
>>> #  ^ =match starting at the beginning of the string
>>> #  .* =any character, zero or more times
>>> #  "group " =match these characters
>>> #  ( =first capture string starts here
>>> #  \\d = any digit (first backslash for R, second backslash for
>>regex)
>>> #  + =one or more of the preceding (any digit)
>>> #  ) =end of first capture string
>>> #  [^:] =any non-colon character
>>> #  * =zero or more of the preceding (non-colon character)
>>> #  : =match a colon exactly
>>> #  " *" =match zero or more spaces
>>> #  ( =second capture string starts here
>>> #  [ =start of a set of equally acceptable characters
>>> #  -+ =either of these characters are acceptable
>>> #  0-9 =any digit would be acceptable
>>> #  . =a period is acceptable (this is inside the [])
>>> #  eE =in case you get exponential notation input
>>> #  ] =end of the set of acceptable characters (number)
>>> #  * =number of acceptable characters can be zero or more
>>> #  ) =second capture string stops here
>>> #  .* =zero or more of any character (just in case)
>>> #  $ =at end of pattern, requires that the match reach the end
>>> #     of the string
>>>
>>> # identify indexes of strings that match the pattern
>>> firstlines <- grep( pattern, indta )
>>> # Replace the matched portion (entire string) with the first
capture
>>#
>>> string
>>> v1 <- as.numeric( sub( pattern, "\\1", indta[
firstlines ] ) )
>>> # Replace the matched portion (entire string) with the second
capture
>>#
>>> string
>>> v2 <- as.numeric( sub( pattern, "\\2", indta[
firstlines ] ) )
>>> # Convert the lines just after the first lines to numeric
>>> v3 <- as.numeric( indta[ firstlines + 1 ] )
>>> # put it all into a data frame
>>> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>>>
>>> Figuring out how to deliver your result (output) is a separate
>>question that
>>> depends where you want it to go.
>>>
>>>
>>> On Mon, 30 May 2016, Val wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a messy text file and from this text file I want extract
some
>>>> information
>>>> here is the text file (out.txt).  One record has tow lines. The
mean
>>comes
>>>> in the first line and the SE of the mean is on the second line.
Here
>>is
>>>> the
>>>> sample of the data.
>>>>
>>>> Mean of weight  group 1, SE of mean  :  72.289037489555276
>>>> 11.512956539215610
>>>> Average weight of group 2, SE of Mean :  83.940053900595013
>>>>  10.198495690144522
>>>> group 3 mean , SE of Mean     :               
78.310441258245469
>>>> 13.015876679555
>>>> Mean of weight of group 4, SE of Mean               :
>>76.967516495101669
>>>> 12.1254882985
>>>>
>>>> I want produce the following  table. How do i read it first and
then
>>>> produce a
>>>>
>>>>
>>>> Gr1  72.289037489555276   11.512956539215610
>>>> Gr2  83.940053900595013   10.198495690144522
>>>> Gr3  78.310441258245469   13.015876679555
>>>> Gr4  76.967516495101669   12.1254882985
>>>>
>>>>
>>>> Thank you in advance
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>>
>>---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.      
##.#.  Live
>>Go...
>>>                                       Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#. 
with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>rocks...1k
>>>
>>---------------------------------------------------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jun 2016 - Extract from a text file

[R] Extract from a text file

[R] Extract from a text file

[R] Extract from a text file