thr3ads.net - search: "wordstem"

Displaying 6 results from an estimated 6 matches for "wordstem".

Did you mean: wordstar

wordStem problems in R 2.9, Fedora 11; Linux Kernel 2.6.29.5-191.fc11.i586

2009 Jul 07

wordStem problems in R 2.9, Fedora 11; Linux Kernel 2.6.29.5-191.fc11.i586

Dear All, I just updated from Fedora 9 to Fedora 11, kernel version 2.6.29.5-191.fc11.i586. I'm running R 2.9. I successfully installed package Rstem from source (it always ran fine for me in F9). However: > wordStem(c("This","is","a","test")) Error in wordStem(c("This", "is", "a", "test")) : VECTOR_ELT() can only be applied to a 'list', not a 'character' Any idea what causes this / how I can fix this? RR

RStem with portuguese language

2008 Jul 28

RStem with portuguese language

Greetings, I have R 2.7.1 in MacOs and I believe UTF encoding is already installed. At least: > Sys.getenv() shows several variables, including: LANG "pt_PT.UTF-8" I installed the Rstem and tm packages and when I try the following code: > wordStem(c("aberra??o","aberra??es"), language="portuguese") [1] "aberra?\xc3" "aberra??" Warning message: In wordStem(c("aberra??o", "aberra??es"), language = "portuguese") : Currently, only 'english' is tested. You...

Coercing Output from mget() into Proper Data Frame

2011 Jun 09

Coercing Output from mget() into Proper Data Frame

...data frame of stems and their frequencies stem_freq_list <- function(freqFile) { stem_dict <- new.env(parent=emptyenv(), hash=TRUE) freq_dist <- read.csv(freqFile,header=TRUE) words <- as.character(freq_dist[,1]) freqs <- as.numeric(freq_dist[,2]) stems <- wordStem(words, language="english") uniq_stems <- c() # make a hash table of stems and their frequencies for (i in 1:length(words)) { word <- words[i]; stem <- stems[i]; freq <- freqs[i] if (exists(stem, envir=stem_dict)) { cnt <- get(s...

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corp...

Multibyte characters in (row) names

2010 Aug 02

Multibyte characters in (row) names

I have an array with names which contain multibyte characters. ?When I try to write the array to a file using write.table and row.names = T I receive an error message when the first such name is encountered, saying that I have not specified the option to generate NA instead. ?I really would be satisfied if the row name in the file were exactly what is displayed when I print the array on the

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

search for: wordstem