Hi, May be this helps: vec<- "this is a nice text with nice characters" library(stringr) ?vec2<-unlist(str_match_all(vec,"\\w+")) #or # vec2<-str_split(vec," ")[[1]] res<-unique(lapply(vec2,function(x) which(!is.na(match(vec2,x))))) ?names(res)<- unique(vec2) res #$this #[1] 1 # #$is #[1] 2 # #$a #[1] 3 # #$nice #[1] 4 7 # #$text #[1] 5 # #$with #[1] 6 # #$characters #[1] 8 A.K.>Hi, >I have tried some different packages in order to build a R programwhich will take as input a text file, produce a list of the words inside that file. Each >word should have a vector with all the places that this word exist in the file.>As an example, if the text file has the string: > >"this is a nice text with nice characters" > >The output should be something like: >$this ? >[1] 1 >$is ? ? ? >[1] 2 >$a ? ? ? ? >[1] 3 >$nice ? ? >[1] 4 7 >$text ? >[1] 5 >$with ? >[1] 6 >$characters >[1] 8>A useful post which i came across here was r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-td4644053.html?. However it doesnt include the positions of each words. >A similar function which i found through the documentation i guessit's the "str_locate", however i want to count "words" and not "characters".>Any guidance of what packages / techniques to use on that, would be really appreciated >Thank you.
> >I have tried some different packages in order to build a R program > which will take as input a text file, produce a list of the > words inside that file. Each word should have a vector with > all the places that this word exist in the file.How about txt <- paste(rep("this is a nice text with nice characters", 3), "But this is not", collapse=" ") library(stringr) txt.vec <-str_split(txt, "[^[:alnum:]_]+")[[1]] #vector of all the words in their original sequence tapply(1:length(txt.vec), txt.vec, c) #Returns a list of vectors of locations of each word, sorted alphabetically S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}