thr3ads.net - R help - [R] POS tagging generating a string [Nov 2018]

If this information is useful, please help other people find it:
Share via:
Robert David Burbidge
2018-Nov-07 06:32 UTC
[R] POS tagging generating a string

Hi Elahe,
You could modify your count_verbs function from your previous post:

  * use scan to extract the tokens (words) from Message
  * use your previous grepl expression to index the tokens that are verbs
  * paste the verbs together to form the entries of a new column.

Here is one solution:

 >>>>>>>>>>>>>>>
library(openNLP)
library(NLP)

df <- data.frame(DocumentID = c(478920L, 510133L, 499497L, 930234L),
 ???????????????? Message = structure(c(4L, 2L, 3L, 1L), .Label = 
c("Thank you very much for your nice feedback.\n",
"THank you, added it", "Thanks for the well explained
article.",
"The solution has been updated"), class = "factor"))


dput(df)

tagPOS <-? function(x, ...) {
 ? s <- as.String(x)
 ? if(s=="") return(list())
 ? word_token_annotator <- Maxent_Word_Token_Annotator()
 ? a2 <- Annotation(1L, "sentence", 1L, nchar(s))
 ? a2 <- annotate(s, word_token_annotator, a2)
 ? a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
 ? a3w <- a3[a3$type == "word"]
 ? POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
 ? POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse =
" ")
 ? list(POStagged = POStagged, POStags = POStags)
}

verbs <-function(x) {
 ? tagPOSx <- tagPOS(x)
 ? scanx <- scan(text=as.character(x), what="character")
 ? n <- length(scanx)
 ? paste(scanx[(1:n)[grepl("VB", tagPOSx$POStags)]],
collapse="|")
}

library(dplyr)

df %>% group_by(DocumentID) %>% summarise(verbs = verbs(Message))
<<<<<<<<<<<<<<<<<<<<<

I'll leave it to you to extract a column of verbs from the result and 
rbind it to the original data.frame.

Btw, I don't this solution is efficient, I would guess that the 
processing that scan does in the verbs function is duplicating work 
already done in the tagPOS function by annotate, so you may want to 
return a list of tokens from tagPOS and use that instead of scan.

Rgds,
Robert

On 06/11/18 10:26, Elahe chalabi via R-help wrote:> Hi all, In my df I would like to generate a new column which contains 
> a string showing all the verbs in each row of df$Message.
>> library(openNLP) library(NLP) dput(df) 
> structure(list(DocumentID = c(478920L, 510133L, 499497L, 930234L ), 
> Message = structure(c(4L, 2L, 3L, 1L), .Label = c("Thank you very much
> for your nice feedback.\n", "THank you, added it",
"Thanks for the
> well explained article.", "The solution has been updated"),
class =
> "factor")), class = "data.frame", row.names = c(NA,
-4L)) tagPOS <-
> function(x, ...) { s <- as.String(x) word_token_annotator <- 
> Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence",
1L,
> nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <- 
> annotate(s, Maxent_POS_Tag_Annotator(), a2) a3w <- a3[a3$type == 
> "word"] POStags <- unlist(lapply(a3w$features, `[[`,
"POS")) POStagged
> <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = "
")
> list(POStagged = POStagged, POStags = POStags) } Any help? Thanks in 
> advance! Elahe ______________________________________________ 
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the 
> posting guide http://www.R-project.org/posting-guide.html and provide 
> commented, minimal, self-contained, reproducible code.



	[[alternative HTML version deleted]]
R help - Nov 2018 - POS tagging generating a string

[R] POS tagging generating a string