Hi all, How is it possible to subset English text from a df containing German and English texts using textcat package? > library(textcat) > dput(data) structure(list(x = structure(c(2L, 6L, 5L, 3L, 1L, 4L), .Label = c("Dieses Buch ist erstaunlich", "I love this book", "ich liebe dieses Buch", "mehrere b?cher in prozess", "several books in proccess", "This book is amazing"), class = "factor")), row.names = c(NA, -6L), class = "data.frame") I want the output to be like the following: "I love this book" "This book is amazing" "several books in proccess" Thanks for any help! Elahe
Robert David Burbidge
2018-Nov-19 11:13 UTC
[R] subset English language using textcat package
Look at the help docs and examples for textcat and sapply: print(as.character(data$x[sapply(data$x, textcat)=="english"])) Although textcat defaults classify "This book is amazing" as dutch, so you may want to read the help for textcat and change the profile db ("p") or "method". On 19/11/2018 09:48, Elahe chalabi via R-help wrote:> Hi all, > > How is it possible to subset English text from a df containing German and English texts using textcat package? > > > > > library(textcat) > > dput(data) > structure(list(x = structure(c(2L, 6L, 5L, 3L, 1L, 4L), .Label = c("Dieses Buch ist erstaunlich", > "I love this book", "ich liebe dieses Buch", "mehrere b?cher in prozess", > "several books in proccess", "This book is amazing"), class = "factor")), row.names = c(NA, > -6L), class = "data.frame") > > I want the output to be like the following: > > > "I love this book" "This book is amazing" "several books in proccess" > > > Thanks for any help! > Elahe >