thr3ads.net - R help - [R] Print occurrence / positions of words [Apr 2013]

If this information is useful, please help other people find it:
Share via:

arun

2013-Apr-22 19:18 UTC

[R] Print occurrence / positions of words

Hi,
May be this helps:
vec<- "this is a nice text with nice characters" 
library(stringr)
?vec2<-unlist(str_match_all(vec,"\\w+"))

#or
# vec2<-str_split(vec," ")[[1]]

res<-unique(lapply(vec2,function(x) which(!is.na(match(vec2,x)))))
?names(res)<- unique(vec2)
res
#$this
#[1] 1
#
#$is
#[1] 2
#
#$a
#[1] 3
#
#$nice
#[1] 4 7
#
#$text
#[1] 5
#
#$with
#[1] 6
#
#$characters
#[1] 8
A.K.
>Hi, 
>I have tried some different packages in order to build a R program which will take as input a text file, produce a list of the words inside
 that file. Each >word should have a vector with all the places that this
 word exist in the file. >As an example, if the text file has the string: 
>
>"this is a nice text with nice characters" 
>
>The output should be something like: 
>$this ? 
>[1] 1 
>$is ? ? ? 
>[1] 2 
>$a ? ? ? ? 
>[1] 3 
>$nice ? ? 
>[1] 4 7 
>$text ? 
>[1] 5 
>$with ? 
>[1] 6 
>$characters 
>[1] 8 
>A useful post which i came across here was
r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-td4644053.html?.
However it doesnt include the positions of each words.
>A similar function which i found through the documentation i guess it's the "str_locate", however i want to count "words"
and not
"characters". 
>Any guidance of what packages / techniques to use on that, would be really
appreciated
>Thank you.

S Ellison

2013-Apr-26 14:33 UTC

head link

[R] Print occurrence / positions of words

> >I have tried some different packages in order to build a R program
> which will take as input a text file, produce a list of the 
> words inside  that file. Each word should have a vector with 
> all the places that this  word exist in the file. 
How about

txt <- paste(rep("this is a nice text with nice characters", 3),
"But this is not", collapse=" ")

library(stringr)
txt.vec <-str_split(txt, "[^[:alnum:]_]+")[[1]] 
	#vector of all the words in their original sequence

tapply(1:length(txt.vec), txt.vec, c)
	#Returns a list of vectors of locations of each word, sorted alphabetically




S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Apr 2013 - Print occurrence / positions of words

[R] Print occurrence / positions of words

[R] Print occurrence / positions of words

Apparently Analagous Threads