Heiman, Thomas J.
2011-May-11 13:17 UTC
[R] filtering out unwanted words in a Term Document Matrix
Hi Y'all, I am using the text mining package (tm). I am trying to filter out all of the words in a Term Document Matrix that are not in a list of words that I am interested in. I am using the following code: z<-tm_intersect(txt.dtm, c("communications", "safety", "climate", "blood", "surface", "cleanliness", "amenities", "monitoring", "staff", "competency", "policy", "procedure", "inconsistency", "physician", "orders", "treatment", "times", "care", "plan", "strategies", "concerns", "meetings", "equipment", "treatment", "options", "delivery", "care", "discharge", "welfare", "violations", "HIPPS", "professionalism", "lack", "boundaries crossing", "transportation", "benefits", "assistance", "beneficiary", "complaint", "grievance", "inquiry", "formal", "data", "processing", "concern", "facility", "abuse", "data", "request", "disruptive", "information", "patient", "discharge", "transfer", "physical", "ethics", "resolution", "professional","reimbursement", "financial", "request", "status", "educational", "material", "forms", "technical", "assistance", "staff", "related", "quality", "care","disruptive","behavior","special","needs","mental","illness","noncompliance","illegal", "immigrant", "abusive", "violent","litigation", "prisoner", "corporate", "lockout", "disposition", "discharge", "reason")) I get the following error: "no applicable method for 'tm_intersect' applied to an object of class "c('TermDocumentMatrix', 'simple_triplet_matrix')" " What am I doing wrong? I'd greatly appreciate any ideas or thoughts on this!!!! Thank you!! Thomas Heiman, PhD Info Systems Eng, Sr The MITRE Corporation | Center for Enterprise Modernization Office: 703-983-2951 | theiman@mitre.org<mailto:theiman@mitre.org> [[alternative HTML version deleted]]
Ingo Feinerer
2011-May-16 10:27 UTC
[R] filtering out unwanted words in a Term Document Matrix
> Hi Y'all, > > I am using the text mining package (tm). I am trying to filter out all of the words in a Term Document Matrix that are not in a list of words that I am interested in. I am using the following code: > > z<-tm_intersect(txt.dtm, c("communications", "safety", "climate", "blood", "surface", "cleanliness", "amenities", "monitoring", "staff", "competency", "policy", "procedure", "inconsistency", "physician", "orders", "treatment", "times", "care", "plan", "strategies", "concerns", "meetings", "equipment", "treatment", "options", "delivery", "care", "discharge", "welfare", "violations", "HIPPS", "professionalism", "lack", "boundaries crossing", "transportation", "benefits", "assistance", "beneficiary", "complaint", "grievance", "inquiry", "formal", "data", "processing", "concern", "facility", "abuse", "data", "request", "disruptive", "information", "patient", "discharge", "transfer", "physical", "ethics", "resolution", "professional","reimbursement", "financial", "request", "status", "educational", "material", "forms", "technical", "assistance", "staff", "related", "quality", "care","disruptive","behavior","special","needs","mental","illness","noncompliance","illegal", "immigrant", "abusive", "violent","litigation", "prisoner", "corporate", "lockout", "disposition", "discharge", "reason")) > > I get the following error: > > "no applicable method for 'tm_intersect' applied to an object of class "c('TermDocumentMatrix', 'simple_triplet_matrix')" " > > What am I doing wrong? I'd greatly appreciate any ideas or thoughts on this!!!! Thank you!!You can directly subset the matrix, e.g.: library(tm) data(crude) m <- TermDocumentMatrix(crude) z <- m[c("oil", "zone"),] inspect(z) Ensure that you only try to subset for terms occurring in the matrix as otherwise it will not work. You can get all terms via Terms(m). Best regards, Ingo Feinerer