thr3ads.net - R help - [R] subset English language using textcat package [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Elahe chalabi

2018-Nov-19 09:48 UTC

[R] subset English language using textcat package

Hi all, 

How is it possible to subset English text from a df containing German and
English texts using textcat package?



    > library(textcat)
    > dput(data) 
    structure(list(x = structure(c(2L, 6L, 5L, 3L, 1L, 4L), .Label =
c("Dieses Buch ist erstaunlich",
    "I love this book", "ich liebe dieses Buch",
"mehrere b?cher in prozess",
    "several books in proccess", "This book is amazing"),
class = "factor")), row.names = c(NA,
    -6L), class = "data.frame")

I want the output to be like the following:


    "I love this book"  "This book is amazing" 
"several books in proccess"


Thanks for any help!
Elahe

Robert David Burbidge

2018-Nov-19 11:13 UTC

head link

[R] subset English language using textcat package

Look at the help docs and examples for textcat and sapply:

print(as.character(data$x[sapply(data$x, textcat)=="english"]))

Although textcat defaults classify "This book is amazing" as dutch, so
you may want to read the help for textcat and change the profile db 
("p") or "method".

On 19/11/2018 09:48, Elahe chalabi via R-help wrote:> Hi all,
>
> How is it possible to subset English text from a df containing German and
English texts using textcat package?
>
>
>
>      > library(textcat)
>      > dput(data)
>      structure(list(x = structure(c(2L, 6L, 5L, 3L, 1L, 4L), .Label =
c("Dieses Buch ist erstaunlich",
>      "I love this book", "ich liebe dieses Buch",
"mehrere b?cher in prozess",
>      "several books in proccess", "This book is
amazing"), class = "factor")), row.names = c(NA,
>      -6L), class = "data.frame")
>
> I want the output to be like the following:
>
>
>      "I love this book"  "This book is amazing" 
"several books in proccess"
>
>
> Thanks for any help!
> Elahe
>

R help - Nov 2018 - subset English language using textcat package

[R] subset English language using textcat package

[R] subset English language using textcat package