search for: wordstem

Displaying 6 results from an estimated 6 matches for "wordstem".

Did you mean: wordstar
2009 Jul 07
1
wordStem problems in R 2.9, Fedora 11; Linux Kernel 2.6.29.5-191.fc11.i586
Dear All, I just updated from Fedora 9 to Fedora 11, kernel version 2.6.29.5-191.fc11.i586. I'm running R 2.9. I successfully installed package Rstem from source (it always ran fine for me in F9). However: > wordStem(c("This","is","a","test")) Error in wordStem(c("This", "is", "a", "test")) : VECTOR_ELT() can only be applied to a 'list', not a 'character' Any idea what causes this / how I can fix this? RR
2008 Jul 28
1
RStem with portuguese language
Greetings, I have R 2.7.1 in MacOs and I believe UTF encoding is already installed. At least: > Sys.getenv() shows several variables, including: LANG "pt_PT.UTF-8" I installed the Rstem and tm packages and when I try the following code: > wordStem(c("aberra??o","aberra??es"), language="portuguese") [1] "aberra?\xc3" "aberra??" Warning message: In wordStem(c("aberra??o", "aberra??es"), language = "portuguese") : Currently, only 'english' is tested. You...
2011 Jun 09
2
Coercing Output from mget() into Proper Data Frame
...data frame of stems and their frequencies stem_freq_list <- function(freqFile) { stem_dict <- new.env(parent=emptyenv(), hash=TRUE) freq_dist <- read.csv(freqFile,header=TRUE) words <- as.character(freq_dist[,1]) freqs <- as.numeric(freq_dist[,2]) stems <- wordStem(words, language="english") uniq_stems <- c() # make a hash table of stems and their frequencies for (i in 1:length(words)) { word <- words[i]; stem <- stems[i]; freq <- freqs[i] if (exists(stem, envir=stem_dict)) { cnt <- get(s...
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corp...
2010 Aug 02
1
Multibyte characters in (row) names
I have an array with names which contain multibyte characters. ?When I try to write the array to a file using write.table and row.names = T I receive an error message when the first such name is encountered, saying that I have not specified the option to generate NA instead. ?I really would be satisfied if the row name in the file were exactly what is displayed when I print the array on the
2009 Jul 17
3
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones