thr3ads.net - R help - [R] Search and extract string function [Jul 2010]

If this information is useful, please help other people find it:
Share via:

AndrewPage

2010-Jul-15 14:48 UTC

[R] Search and extract string function

Hi all,

I'm trying to write a function that will search and extract from a long
character string, but with a twist: I want to use the characters before and
the characters after what I want to extract as reference points.  For
example, say I'm working with data entries that looks like this:

Drink=Coffee:Location=Office:Time=Morning:Market=Flat

Drink=Water:Location=Office:Time=Afternoon:Market=Up

Drink=Water:Location=Gym:Time=Evening:Market=Closed

Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed


...

For my function, I'd like to find what's located between
"Location=", and
":Time=" in every instance, and extract it, to return something like
"Office, Office, Gym, Restaurant".

In a previous discussion I found
(http://tolstoy.newcastle.edu.au/R/help/05/03/0344.html), someone wrote a
function where you could find and substitute characters in a string, based
on "pre" and "post" variables:

interp <- function(x, e = parent.frame(), pre = "\\$", post =
"" ) {
	for(el in ls(e)) {
		tag <- paste(pre, el, post, sep = "") 
		if (length(grep(tag, x))) x <- gsub(tag, eval(parse(text = el), e), x)
		}
	x
}

I'm not sure how to modify it, however, to do what I want it to do.  Any
suggestions?

Thanks in advance,

Andrew
-- 
View this message in context:
http://r.789695.n4.nabble.com/Search-and-extract-string-function-tp2290268p2290268.html
Sent from the R help mailing list archive at Nabble.com.

Marc Schwartz

2010-Jul-15 15:42 UTC

head link

[R] Search and extract string function

On Jul 15, 2010, at 9:48 AM, AndrewPage wrote:
> 
> Hi all,
> 
> I'm trying to write a function that will search and extract from a long
> character string, but with a twist: I want to use the characters before and
> the characters after what I want to extract as reference points.  For
> example, say I'm working with data entries that looks like this:
> 
> Drink=Coffee:Location=Office:Time=Morning:Market=Flat
> 
> Drink=Water:Location=Office:Time=Afternoon:Market=Up
> 
> Drink=Water:Location=Gym:Time=Evening:Market=Closed
> 
> Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed
> 
> 
> ...
> 
> For my function, I'd like to find what's located between
"Location=", and
> ":Time=" in every instance, and extract it, to return something
like
> "Office, Office, Gym, Restaurant".
> 
> In a previous discussion I found
> (http://tolstoy.newcastle.edu.au/R/help/05/03/0344.html), someone wrote a
> function where you could find and substitute characters in a string, based
> on "pre" and "post" variables:
> 
> interp <- function(x, e = parent.frame(), pre = "\\$", post =
"" ) {
> 	for(el in ls(e)) {
> 		tag <- paste(pre, el, post, sep = "") 
> 		if (length(grep(tag, x))) x <- gsub(tag, eval(parse(text = el), e), x)
> 		}
> 	x
> }
> 
> I'm not sure how to modify it, however, to do what I want it to do. 
Any
> suggestions?
> 
> Thanks in advance,
> 
> Andrew
> Vec[1] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"        
[2] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"         
[3] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"          
[4] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"

> gsub(".*Location=(.+):Time=.*", "\\1", Vec)[1] "Office"     "Office"     "Gym"       
"Restaurant"


This returns the back reference within the parens, found between the two
bounding sets of characters.

HTH,

Marc Schwartz

Gabor Grothendieck

2010-Jul-15 15:47 UTC

head link

[R] Search and extract string function

On Thu, Jul 15, 2010 at 10:48 AM, AndrewPage <savejarvis at yahoo.com>
wrote:>
> Hi all,
>
> I'm trying to write a function that will search and extract from a long
> character string, but with a twist: I want to use the characters before and
> the characters after what I want to extract as reference points. ?For
> example, say I'm working with data entries that looks like this:
>
> Drink=Coffee:Location=Office:Time=Morning:Market=Flat
>
> Drink=Water:Location=Office:Time=Afternoon:Market=Up
>
> Drink=Water:Location=Gym:Time=Evening:Market=Closed
>
> Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed
>
>
> ...
>
> For my function, I'd like to find what's located between
"Location=", and
> ":Time=" in every instance, and extract it, to return something
like
> "Office, Office, Gym, Restaurant".
>
> In a previous discussion I found
> (http://tolstoy.newcastle.edu.au/R/help/05/03/0344.html), someone wrote a
> function where you could find and substitute characters in a string, based
> on "pre" and "post" variables:
>
> interp <- function(x, e = parent.frame(), pre = "\\$", post =
"" ) {
> ? ? ? ?for(el in ls(e)) {
> ? ? ? ? ? ? ? ?tag <- paste(pre, el, post, sep = "")
> ? ? ? ? ? ? ? ?if (length(grep(tag, x))) x <- gsub(tag, eval(parse(text
= el), e), x)
> ? ? ? ? ? ? ? ?}
> ? ? ? ?x
> }
>
> I'm not sure how to modify it, however, to do what I want it to do.
?Any
> suggestions?
The strapply function in gsubfn can do that.  By default it returns
the back reference, i.e. the part of the regular expression between
parentheses:
> s <-
c("Drink=Coffee:Location=Office:Time=Morning:Market=Flat",+ "Drink=Water:Location=Office:Time=Afternoon:Market=Up",
+ "Drink=Water:Location=Gym:Time=Evening:Market=Closed",
+
"Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed")>
> library(gsubfn)
> strapply(s, "Location=(.*):Time", simplify = TRUE)[1] "Office"     "Office"     "Gym"       
"Restaurant">
> # since we know that the field we want is composed of
> # word characters and followed by a non-word character
> # we can even avoid specifying :Office by specifying
> # word characters (\\w+) instead:
>
> strapply(s, "Location=(\\w+)", simplify = TRUE)[1] "Office"     "Office"     "Gym"       
"Restaurant"

See http://gsubfn.googlecode.com for more.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jul 2010 - Search and extract string function

[R] Search and extract string function

[R] Search and extract string function

[R] Search and extract string function

Reasonably Related Threads