thr3ads.net - R help - [R] Extracting Hash-tagged word from Tweets [May 2012]

If this information is useful, please help other people find it:
Share via:

Adedoyin-Olowe Mariam

2012-May-08 14:24 UTC

[R] Extracting Hash-tagged word from Tweets

Can someone help me with the code I can use to extract word preceded by hash tag
in live tweets download from twitteR.
An example of what I require is:
[[9]]
[1] "HollandUKTrade: #Dutch companies striking Olympic gold at London 2012
http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade
@dutchembassyUK"  (Tweet download)

I want a code that will extract this:
#Dutch companies #london2012, #olympics, #sport

I have used the under-listed code in Stringr which return these outputs I did
not require:> str_extract_all("#<-a-z, #<-A-Z",
"[[string1:string10]]") [[1]]
character(0)> str_extract_all("#<-a-z, #<-A-Z", "[[string9]]")
[[1]]character(0)
> str_extract_all("#=[1:10]", "#+a-z") [[1]]character(0) 
str_extract_all("#=[1:10]", "#+") [[1]]
[1] "#" 

Positive help will be highly appreciated.
Mariam
	[[alternative HTML version deleted]]

R. Michael Weylandt

2012-May-09 03:04 UTC

head link

[R] Extracting Hash-tagged word from Tweets

x <- "HollandUKTrade: #Dutch companies striking Olympic gold at London
2012 http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade
@dutchembassyUK"

str_extract_all(pattern
"#[1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]+",
x)

As the documentation for ?regexp says, there are some shortcuts to
avoid listing the whole ASCII set, but they are (unstated) platform
and locale dependent; the way given above should be robust.

Michael

On Tue, May 8, 2012 at 10:24 AM, Adedoyin-Olowe Mariam
<mariamolowe2008 at yahoo.com> wrote:> Can someone help me with the code I can use to extract
word?preceded?by?hash tag in live tweets download from twitteR.
> An example of what I require is:
> [[9]]
> [1] "HollandUKTrade: #Dutch companies striking Olympic gold at London
2012 http://t.co/XsvvXAzT #london2012 #olympics #sport @hollandtrade
@dutchembassyUK" ?(Tweet download)
>
> I want a code that will extract this:
> #Dutch companies #london2012, #olympics, #sport
>
> I have used the under-listed code in Stringr which return these outputs I
did not require:
>> str_extract_all("#<-a-z, #<-A-Z",
"[[string1:string10]]") [[1]]
> character(0)
>> str_extract_all("#<-a-z, #<-A-Z",
"[[string9]]") [[1]]
> character(0)
>
>> str_extract_all("#=[1:10]", "#+a-z") [[1]]
> character(0)
> str_extract_all("#=[1:10]", "#+") [[1]]
> [1] "#"
>
> Positive help will be highly appreciated.
> Mariam
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - May 2012 - Extracting Hash-tagged word from Tweets

[R] Extracting Hash-tagged word from Tweets

[R] Extracting Hash-tagged word from Tweets

Apparently Analagous Threads