similar to: tm package- remove stowords failling

Displaying 20 results from an estimated 300 matches similar to: "tm package- remove stowords failling"

2006 Oct 03
new to R: don't understand errors
Hello all, I'm brand new to the use of R, and I'm trying to quickly learning the rudiments for a couple of projects here at work. I'm working with the lsa package and trying to generate various semantic spaces. I seem to do well with small collections of clean text files, but now that I am trying to work with larger collections of less than perfection files, I'm getting errors
2007 Aug 18
Problem with lsa package (data.frame) on Windows XP
Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1,
2009 Nov 12
package "tm" fails to remove "the" with remove stopwords
I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)
2011 Sep 12
findFreqTerms vs minDocFreq in Package 'tm'
I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I am getting drastically different results with both. I have given the results from both the
2012 Oct 25
Minería de texto
Cordial Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud)
2012 Dec 13
Tamaño de la matriz de términos y memoria. Paquete TM
Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-
2013 Sep 26
R hangs at NGramTokenizer
Hi: I try to construct a Document-Term Meatrix from a corpus. The commands I used are: > library(parallel)> library(tm)> library(RWeka)> library(topicmodels)> library(RTextTools)> cl=makeCluster(detectCores())> invisible(clusterEvalQ(cl, library(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))>
2010 Oct 11
topicmodels error
I try to fit a LDA model to a TermDocumentMatrix with the topicmodels package... but R says: > Error in LDA(TDM, k = k, method = "Gibbs", control = list(seed = SEED, : > x is of class ?TermDocumentMatrix??simple_triplet_matrix? > class(TDM) > [1] "TermDocumentMatrix" "simple_triplet_matrix" I try to use a matrix... but don't work: > MAT
2012 Jan 13
Troubles with stemming (tm + Snowball packages) under MacOS
Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in
2012 Feb 26
tm_map help
Hi all, I am trying to do some text mining with twitter and I am getting the error: Error in structure(names(sapply(possibleCompletions, "[", 1)), names = x) : 'names' attribute [1] must be the same length as the vector [0] When I use tm_map. Has anyone had/seen this error before? The code I have is shown below and this error only occurs with #qantas, hashtags like #asx,
2008 Mar 25
Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
In case someone else runs into this, I found the problem, it was related to having some zero-length text files. Make sure you have valid (non-empty) data files for loading into the document-term matrix. Alex ---------- Forwarded message ---------- From: Alex McKenzie <> Date: Mar 25, 2008 2:07 AM Subject: Error "... x must be atomic" when using lsa (latent
2010 Mar 18
error while usig "tm" package
I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error
2006 Oct 04
FW: new to R: don't understand errors
Hello Jerad, > It was suggested I contact you for possible help with this issue. Well, > as you can see for the emails below, that is what I was told at R-help. > Any insight to my lsa problems (also listed below) would be of great > help. from what I see, the problem probably indeed lies within the textfiles: for performance reasons, it was not possible to include any
2008 Mar 25
Error "... x must be atomic" when using lsa (latent semantic analysis) package
Hello, I'm trying to use the "lsa" (latent semantic analysis) package, and running into a problem that seems to be related to the number of documents being processed. Here's the code I'm running (after loading the lsa and rstem packages), and the error message: > SnippetsPath <- "c:\\OED\\AuditExplain\\" # path where to find text snippets >
2011 May 21
DocumentTermMatrix error
Hi all, I have tried to create a DocumentTermMatrix with a tm package, but i get this error : Error in tolower(txt) : invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' I tried doing this as it is showed in : (An Introduction to Text Mining), with this R code :
2011 May 20
DocumentTermMatrix - text minig
Hi All, I have a Data.frame that looks like that one below. I would like to do some text mining on it to possibly find some patterns between Opis, ACklasifikacija and Vodja. I looked over a tm package which loks promissing, more specifically DocumentTermMatrix or TermDocumentMatrix. But I can not figure out how to change my data from data.frame to Corpus or VCorpus. Globina
2010 Feb 16
tm package
Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation)
2013 Oct 08
how to check the accuracy for maxent ?
I was going through this example of maxent use: # LOAD LIBRARY library(maxent) # READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent")) corpus <- Corpus(VectorSource(data$Title[1:150])) matrix <- DocumentTermMatrix(corpus) # TRAIN/PREDICT
2009 Aug 07
xtable, sweave and resizebox
does anyone know to rezize a table produzed by xtable? The size of my table is too big and I would like to resize it like using resizebox but it gives an erros when I try using it using it its fine \SweaveOpts{echo=false} <<results=tex>>= xtable(stats0,caption='Número de transacções dos artigos frequentes e infrequentes',label='tab:INEStats') @ but the size is too
2011 Sep 13
SVD Memory Issue
I am trying to perform Singular Value Decomposition (SVD) on a Term Document Matrix I created using the 'tm' package. Eventually I want to do a Latent Semantic Analysis (LSA). There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory