Dimitri Liakhovitski
2016-Nov-30 20:51 UTC
[R] stringi behaves differently in 2 similar situations
Hello! library(stringi) stri_extract_all_words("me.com", simplify = TRUE) # returns with a dot stri_extract_all_words("watch32.com", simplify = TRUE) # removes the dot Why is the dot removed only in the second case? How is it possible to ask it NOT to remove the dot in the second case? Thanks a lot! -- Dimitri Liakhovitski
Sarah Goslee
2016-Nov-30 21:27 UTC
[R] stringi behaves differently in 2 similar situations
A dot is treated differently if it has a number on no, one, or both sides.> stri_extract_all_words("me.com", simplify = TRUE)[,1] [1,] "me.com"> stri_extract_all_words("me1.com", simplify = TRUE)[,1] [,2] [1,] "me1" "com"> stri_extract_all_words("me1.2com", simplify = TRUE)[,1] [1,] "me1.2com" ?stri_extract_all_words sent me to ?"stringi-search-boundaries" which suggests that you should spend some time with the user guide: _Boundary Analysis_ - ICU User Guide, <URL: http://userguide.icu-project.org/boundaryanalysis> Depending on your objective, you might be better off with strsplit() separating on whitespace. Sarah On Wed, Nov 30, 2016 at 3:51 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > > library(stringi) > > stri_extract_all_words("me.com", simplify = TRUE) # returns with a dot > stri_extract_all_words("watch32.com", simplify = TRUE) # removes the dot > > Why is the dot removed only in the second case? > How is it possible to ask it NOT to remove the dot in the second case? > > Thanks a lot! >-- Sarah Goslee http://www.functionaldiversity.org