thr3ads.net - similar to: "using package tm to find phrases"

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

2

package "tm" fails to remove "the" with remove stopwords

I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)

reading in MS Word files

2009 Aug 17

2

reading in MS Word files

I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark

tm package: handling contractions

2012 Jan 27

2

tm package: handling contractions

I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp,

RWeka Error

2016 Apr 05

8

RWeka Error

When I use any function of RWeka Package in Rstudio I get an error, "Error in .jnew (name): java.lang.ClassFormatError." can anyone guide me in this? [[alternative HTML version deleted]]

problems installing package XML to a computer without an internet connection

2009 Jan 11

2

problems installing package XML to a computer without an internet connection

Hello, I am hoping for some advice regarding how I can install the XML package which I require to run package tm. Normally I would use the install package option, however, I have to install the packages to a laptop running XP. The laptop does not have an internet connection. Firstly I tried the file - XML_1.99-0.tar.gz . Below is the error I received Error in gzfile(file, "r") :

Recent Dovecot on old operating system

2019 Sep 19

1

Recent Dovecot on old operating system

Hi, sorry for the dumb question and please ignore this post if you think it's far beyond; i know it's not the way to go, but for reasons ... Has anyone running a self compiled recent dovecot (2.2.36.4) on Debian-7 and does it work? Or thinks it should work. Surprisingly it actually compiles flawlessly on Debian-7, but i wonder wether it will become a complete mess replacingthe existing

tm 0.1 uploaded to CRAN

2007 Jan 11

0

tm 0.1 uploaded to CRAN

Dear useRs, a first version of tm has just been released on CRAN. tm provides a sophisticated framework for text mining applications within R. It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the

tm 0.1 uploaded to CRAN

2007 Jan 11

0

tm 0.1 uploaded to CRAN

Dear useRs, a first version of tm has just been released on CRAN. tm provides a sophisticated framework for text mining applications within R. It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the

error while usig "tm" package

2010 Mar 18

0

error while usig "tm" package

I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error

data mining/text mining?

2007 Jun 08

1

data mining/text mining?

Dear R-user, Could anybody tell me of the key difference between data mining and text mining? Please make a list for packages about data/text mining. And give me an example of text mining with R (any relating materials will be highly appreciated), because a vignette written by Ingo Feinerer seems too concise for me. Thanks _____________________________________________ Dr.Ruixin ZHU Shanghai

RWeka Error

2016 Apr 05

0

RWeka Error

Read the Posting Guide mentioned at the bottom of this email. Highlights you should be sure to address: * HTML formatted email gets messed up on the R mailing lists, so post in plain text. Yes, you can and need to do this. * Make sure the problem occurs in R by trying it without RStudio. Sometimes RStudio interferes with R, and you have to ask elsewhere about such problems. * Give us details

DocumentTermMatrix error

2011 May 21

1

DocumentTermMatrix error

Hi all, I have tried to create a DocumentTermMatrix with a tm package, but i get this error : Error in tolower(txt) : invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' I tried doing this as it is showed in : http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining), with this R code :

Index indexed words

2010 Jan 18

4

Index indexed words

Hello, We would like to create Google or Firefox like "search hints". If someone types "abc", the search system should name some possible hints. I think, Firefox does it by indexing 3-characters of the domain name. If you enter parts, you get some hints. Thank you very much Marcus

GSOC 2016 project on Ranking

2016 Mar 04

2

GSOC 2016 project on Ranking

Hello Sir, I am a third-year student at the Department of mathematics at IIT Kharagpur. I have good experience in Information Retrieval and Machine Learning. I have read many chapters of the book Introduction to Information Retrieval. Recently I am doing a project on tagging a question on a Q&A Forum using ranking the tags and probabilistic inference. I also have software development

filtering out unwanted words in a Term Document Matrix

2011 May 11

1

filtering out unwanted words in a Term Document Matrix

Hi Y'all, I am using the text mining package (tm). I am trying to filter out all of the words in a Term Document Matrix that are not in a list of words that I am interested in. I am using the following code: z<-tm_intersect(txt.dtm, c("communications", "safety", "climate", "blood", "surface", "cleanliness",

*wildcard* support?

2005 Oct 08

1

*wildcard* support?

Hello, First I wanted to say thanks for a great piece of software, thanks Olly and others who've contributed! I know that Xapian supports right-truncating, if that's the proper name for wildcard support, as in a search for "xapia*". I don't believe Xapian supports wildcards on both sides of a term, correct? Is this something that is technically unfeasable, unpalatable

SVD for reducing dimensions

2002 Nov 17

1

SVD for reducing dimensions

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, this is probably simple and I'm just doing something stupid, sorry about that :-) I'm trying to convert words (strings of letters) into a fairly small dimensional space (say 10, but anything between about 5 and 50 would be ok), which I will call a feature vector. The the distance between two words represents the similarity of the

R hangs at NGramTokenizer

2013 Sep 26

0

R hangs at NGramTokenizer

Hi: I try to construct a Document-Term Meatrix from a corpus. The commands I used are: > library(parallel)> library(tm)> library(RWeka)> library(topicmodels)> library(RTextTools)> cl=makeCluster(detectCores())> invisible(clusterEvalQ(cl, library(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))>

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

3

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

3

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",

similar to: using package tm to find phrases