thr3ads.net - R help - [R] Substring function? [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Ralf B

2010-Jul-13 12:22 UTC

[R] Substring function?

Hi all,

I would like to detect all strings in the vector 'content' that
contain the strings from the vector 'search'. Here a code example:

content <- data.frame(urls=c(
				
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3",
				
"http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1")
)
search <- data.frame(signatures=c("http://www.google.com/search"))
subset(content, search$signatures %in% content$urls)

I am getting an error:

[1] urls
<0 rows> (or 0-length row.names)


What I would like to achieve is the return of
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3".
Is that possible? In practice I would like to run this over 1000s of
strings in 'content' and 100s of strings in 'search'. Could I
run into
performance issues with this approach and, if so, are there better
ways?

Best,
Ralf

Nikhil Kaza

2010-Jul-13 13:29 UTC

head link

[R] Substring function?

well %in% is really checking if the element is in the set and is not a  
substring operator.

To get the result you want, try

content[grepl(search$signatures, content$urls),]

For multiple operations you could try

sapply(search$signatures, grepl, x=content$urls)




Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.list at gmail.com

On Jul 13, 2010, at 8:22 AM, Ralf B wrote:
> Hi all,
>
> I would like to detect all strings in the vector 'content' that
> contain the strings from the vector 'search'. Here a code example:
>
> content <- data.frame(urls=c(
> 				
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3
> ",
> 				
"http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1
> ")
> )
> search <-
data.frame(signatures=c("http://www.google.com/search"))
> subset(content, search$signatures %in% content$urls)
>
> I am getting an error:
>
> [1] urls
> <0 rows> (or 0-length row.names)
>
>
> What I would like to achieve is the return of
>
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3
> ".
> Is that possible? In practice I would like to run this over 1000s of
> strings in 'content' and 100s of strings in 'search'. Could
I run into
> performance issues with this approach and, if so, are there better
> ways?
>
> Best,
> Ralf
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Erik Iverson

2010-Jul-13 15:08 UTC

head link

[R] Substring function?

The high-level concept you need is called "Regular Expressions".  R 
supports these through several functions, see ?regex .

Ralf B wrote:> Hi all,
> 
> I would like to detect all strings in the vector 'content' that
> contain the strings from the vector 'search'. Here a code example:
> 
> content <- data.frame(urls=c(
> 				
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3",
> 				
"http://search.yahoo.com/search;_ylt=Atvki9MVpnxuEcPmXLEWgMqbvZx4?p=stuff&toggle=1")
> )
> search <-
data.frame(signatures=c("http://www.google.com/search"))
> subset(content, search$signatures %in% content$urls)
> 
> I am getting an error:
> 
> [1] urls
> <0 rows> (or 0-length row.names)
> 
> 
> What I would like to achieve is the return of
>
"http://www.google.com/search?source=ig&hl=en&rlz=&=&q=stuff&aq=f&aqi=g10&aql=&oq=&gs_rfai=CrrIS3".
> Is that possible? In practice I would like to run this over 1000s of
> strings in 'content' and 100s of strings in 'search'. Could
I run into
> performance issues with this approach and, if so, are there better
> ways?
> 
> Best,
> Ralf
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Jul 2010 - Substring function?

[R] Substring function?

[R] Substring function?

[R] Substring function?

Apparently Analagous Threads