x <- "HollandUKTrade: #Dutch companies striking Olympic gold at London
2012 http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade
@dutchembassyUK"
str_extract_all(pattern
"#[1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]+",
x)
As the documentation for ?regexp says, there are some shortcuts to
avoid listing the whole ASCII set, but they are (unstated) platform
and locale dependent; the way given above should be robust.
Michael
On Tue, May 8, 2012 at 10:24 AM, Adedoyin-Olowe Mariam
<mariamolowe2008 at yahoo.com> wrote:> Can someone help me with the code I can use to extract
word?preceded?by?hash tag in live tweets download from twitteR.
> An example of what I require is:
> [[9]]
> [1] "HollandUKTrade: #Dutch companies striking Olympic gold at London
2012 http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade
@dutchembassyUK" ?(Tweet download)
>
> I want a code that will extract this:
> #Dutch companies #london2012, #olympics, #sport
>
> I have used the under-listed code in Stringr which return these outputs I
did not require:
>> str_extract_all("#<-a-z, #<-A-Z",
"[[string1:string10]]") [[1]]
> character(0)
>> str_extract_all("#<-a-z, #<-A-Z",
"[[string9]]") [[1]]
> character(0)
>
>> str_extract_all("#=[1:10]", "#+a-z") [[1]]
> character(0)
> str_extract_all("#=[1:10]", "#+") [[1]]
> [1] "#"
>
> Positive help will be highly appreciated.
> Mariam
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>