Sascha Wolfer
2016-Mar-03 09:47 UTC
[R] Unicode Text Segmentation Algorithms already implemented in R?
Hello list members, I am looking for an implementation of Unicode text segmentation (word boundary detection) algorithms in R. You can find information about the algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries The help page for the function ?casefuns? from the excellent ?Unicode? package says: "Other methods will be added eventually (once the Unicode text segmentation algorithm is implemented for detecting word boundaries).? My simple question is: Are these algorithms already implemented in an R package? I didn?t find anything on the web, but I am counting on the power of this list. My Stata-using colleague is already picking at me? (in Stata, the function ?ustrword? does exactly what I want to do in R). Thanks for your help, have a good day, you all! Sascha W. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160303/95daccff/attachment.bin>
Ista Zahn
2016-Mar-03 13:44 UTC
[R] Unicode Text Segmentation Algorithms already implemented in R?
You searched, but did not tell us what you found, nor why it was unsuitable for you undescribed use case. So all we can do is guess: my guess is http://docs.rexamine.com/R-man/stringi/stringi-search-boundaries.html Best, Ista On Mar 3, 2016 8:14 AM, "Sascha Wolfer" <wolfer at ids-mannheim.de> wrote:> Hello list members, > > I am looking for an implementation of Unicode text segmentation (word > boundary detection) algorithms in R. You can find information about the > algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries > > The help page for the function ?casefuns? from the excellent ?Unicode? > package says: "Other methods will be added eventually (once the Unicode > text segmentation algorithm is implemented for detecting word boundaries).? > My simple question is: Are these algorithms already implemented in an R > package? I didn?t find anything on the web, but I am counting on the power > of this list. My Stata-using colleague is already picking at me? (in Stata, > the function ?ustrword? does exactly what I want to do in R). > > Thanks for your help, have a good day, you all! > Sascha W. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]