Mike
2015-Apr-23 20:10 UTC
[R] Need content_transformer() called by tm_map() to change non-letters to spaces
Hello, In the following code, any characters matching? "/|@| \\|") will be changed to a space.> library(tm) > toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x)) > docs <- tm_map(docs, toSpace, "/|@| \\|")What code would transform all non-letters to a space?? (What goes where the xxxxx's are.)It is very difficult to put all non-letters in a string...? So I'm doing the opposite of the above.> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx)) > docs <- tm_map(docs, toSpace_2, "abcdefghijklmnopqrstuvwxyz")This needs to be done by a content_transformer() function to maintain the integrity of docs. Thanks ? [[alternative HTML version deleted]]
Jeff Newmiller
2015-Apr-24 04:09 UTC
[R] Need content_transformer() called by tm_map() to change non-letters to spaces
Regex "[^a-zA-Z]" reads as "not a letter". --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On April 23, 2015 1:10:41 PM PDT, Mike <mikehall at y7mail.com> wrote:>Hello, >In the following code, any characters matching? "/|@| \\|") will be >changed to a space. >> library(tm) >> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " >", x)) >> docs <- tm_map(docs, toSpace, "/|@| \\|") > >What code would transform all non-letters to a space?? (What goes where >the xxxxx's are.)It is very difficult to put all non-letters in a >string...? So I'm doing the opposite of the above. >> toSpace_2 <- content_transformer(function xxxxxxxxxxxxxxxxxxxxxxx)) >> docs <- tm_map(docs, toSpace_2, "abcdefghijklmnopqrstuvwxyz") > >This needs to be done by a content_transformer() function to maintain >the integrity of docs. > >Thanks >? > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.