thr3ads.net - search: "corpus"

Displaying 20 results from an estimated 297 matches for "corpus".

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

No es un problema de tm ni de SnowfallC ni de mcapply (por el path utilizas linux, en windows mcapply según el manual no va bien) No defines bien los objetos que pasas. Pasas doc.corpus en lugar de corpus ( o asignas a corpus en lugar de a doc.corpus) . Depura los programas cuando salga un error de objeto, como te pone en el Error que pasas . Temporalmente lo tienes arreglado en http://rpubs.com/ricardo/Temp Pero otra vez observa paso a paso los objetos y nos dirás (bueno ya...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

...l consejo. Aparentemente no lo estoy aplicando bien, pues el objeto que obtengo no contiene lo que quiero. Me explico, al ejecutar txt <- vector('list', length = length(names)) #names el el vector donde ya tenía almacenada la lista de txt's for(i in seq_along(txt)){ txt[[i]] <- Corpus(VectorSource(names[i])) } obtengo el objeto txt: > class(txt) [1] "list" si extraigo solamente el primer objeto de esa lista: > txt[[1]] <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> si quiero ver el contenido del primer copus > inspect(txt[[1]]) <...

merging corpora and metadata

2011 Nov 17

merging corpora and metadata

Greetings! I loose all my metadata after concatenating corpora. This is an example of what happens: > meta(corpus.1) MetaID cid fid selfirst selend fname 1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb 2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb 3 0 1 14 2109 2577 WCPD-2003-01-13-Pg39.scrb .... .... 17 0 1 114 17863...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...riginal----- > De: r-help-es-bounces en r-project.org [mailto:r-help-es-bounces en r- > project.org] En nombre de Ruben Tobalina Ramirez > Enviado el: martes, 17 de junio de 2014 20:25 > Para: Ricardo Alberich > CC: r-help-es > Asunto: Re: [R-es] No es un problema de tm tienes doc.corpus vacío > > Buenas tardes Ricardo, > > gracias por la respuesta rÃ¡pida. Pues copie tu cÃ³digo y me sigue > dando el mismo error. Fui a mi codigo y cambie los 'doc.corpus' por > 'corpus' (fue un error al limpiar el codigo) y sigue el error. > No sÃ©, he probado bu...

Minería de texto

2012 Oct 25

Minería de texto

Cordial Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and s...

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

Jorge, estimados colaboradores de R-help Estuve tratando de utilizar un script para uno de los pasos en mi análisis, que es transformar cada uno de los corpus en mi espacio de trabajo en un objeto TermDocumentMatrix Tengo un vector llamado bNames que lista todos los corpus que quiero pasar a TDM, y construí los siguientes comandos: tdm.n1 <- vector('list', length = length(bNames)) for(i in seq_along(tdm.n1)){ tdm.n1.[[i]] <- TermDocume...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

Hola a todos! Estoy en un proyecto de text mining y por razones de los recursos con que cuento tuve que separar los archivos de texto de input del proyecto en muchos archivos pequeños. Luego de transformar cada uno de estos archivos en un corpus separado, puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero estoy atorado en el paso de transformar cada uno de los archivos en corpus mediante un loop. Es decir que en lugar de hacer esto infinitas veces: #...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...-es-bounces@r- > > project.org] En nombre de Isidro Hidalgo > > Enviado el: miércoles, 18 de junio de 2014 12:46 > > Para: 'Ruben Tobalina Ramirez'; 'Ricardo Alberich' > > CC: 'r-help-es' > > Asunto: RE: [R-es] No es un problema de tm tienes doc.corpus vacío > > > > Creo que lo que quieres hacer necesita esta línea de código justo > > después de cargar el paquete tm: > > > > inmortal = unlist(strsplit(inmortal, " ", fixed = T)) > > > > De esta forma, trabajas con palabras, y NO con las frases en...

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <- readLines("D:/Publico/Documents/texto1.txt",encoding="UTF-8") txt = iconv(txt, to="ASCII//TRANSLIT") # construye un corpus corpus <- Corpus(VectorSource(txt)) # lleva a minúsculas corpus <- tm_map(corpus, tolower) # quita espacios en blanco corpus <- tm_map(corpus, stripWhitespace) # remueve la puntuación corpus <- tm_map(corpus, removePunctu...

strings plots

2010 Feb 01

strings plots

...not in a beatiful way, I have to improve it especially about labels and coordinates) with number inputs like : 110,248,245,151,175,165,163,52,213,315,164,276,273,273,175,220,284,216,213,278,245,157,278,248 My problem appear when I want to create such plots with inputs composed by strings like: CORPUS,CORPUS,CORPUS,CORPUS,CORPUS,CORPUS,CORPUS,CORPUS,OVARY,OVARY,OVARY,OVARY,PERITONEUM,PERITONEUM,PERITONEUM,PERITONEUM,PERITONEUM,PERITONEUM,PERITONEUM,PERITONEUM,PERITONEUM,UTERUS,UTERUS,UTERUS for creating plots of the distributions of the different 'words'. (e.g. bar named corpus that tel...

Can't pass file name as parameter to Corpus function

2009 Nov 03

Can't pass file name as parameter to Corpus function

I'm working on a small project to extract high-frequency terms from a document and then display those terms in web page. To this end, I've to pass the file name as parameter to the Corpus function to build a corpus of only one document. I can build the corpus using the code below interactively in R. But calling the function with a file name as the parameter I got the error message saying "Error in eval(expr, envir, enclos) : object 'strFileName' not found" test&lt...

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

...got another mine like memory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus)...

S3 objects in S4 slots

2009 Sep 15

S3 objects in S4 slots

Hello, I am the maintainer of the stringkernels package and have come across a problem with using S3 objects in my S4 classes. Specifically, I have an S4 class with a slot that takes a text corpus as a list of character vectors. tm (version 0.5) saves corpora as lists with a class attribute of c("VCorpus", "Corpus", "list"). I don't actually need the class-specific attributes, I only care about the list itself. Here's a simplified example of my problem:...

Extracting information from text data

2011 Jan 24

Extracting information from text data

....0 on Windows XP. I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) In readLines(y, encoding = x$Encoding) : incomplete final line found on 'M:\textmine/slr.txt' > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" B. Using package(s)...

logistic mixed effects models with lmer

2007 Dec 28

logistic mixed effects models with lmer

...18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m17: after_part + first_rep + is_open + is_disc + poly(wfreq, m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean + m17: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) + m18: (1 | speaker) + (1 | corpus) + (1 | ref) m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m18: after_part + first_rep + is_open + is_disc + poly(wfreq, m17: 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) + m18: pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange, m17: 2)...

Get a list of all terms in an indexed corpus

2010 Oct 08

Get a list of all terms in an indexed corpus

Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuitio...

TM Package - Corpus function - Memory Allocation Problems

2010 Aug 17

TM Package - Corpus function - Memory Allocation Problems

I'm using R 2.11.1 on Win XP (32-bit) with 3 GB of RAM. My data has (only) 16.0 MB. I want to create a VCorpus object using the Corpus function in the tm package but I'm running into Memory allocation issues: "Error: cannot allocate vector of size 372 Kb". My data is stored in a csv file which I've imported with "read.csv" and then used the following to create the Corpus (but i...

Samba and active Directory

2010 May 14

Samba and active Directory

...l = 65 local master = No domain master = No wins support = Yes idmap uid = 10000-20000 idmap gid = 10000-20000 winbind separator = + winbind use default domain = Yes [test] comment = test path = /test valid users = ABC+corpus, ABC+ahu read only = No [/code] The user ABC+corpus also exists locally and I am able to logon with his Directory password on the share, but not with the user ABC+ahu If I just do useradd ahu I am able to logon with this user! What am I doing wrong? I also want that users from the director...

read.table() fails with https in R 3.6 but not in R 3.5

2019 May 04

read.table() fails with https in R 3.6 but not in R 3.5

In versions of R prior to 3.6.0 the following invocation succeeds, returning the data frame shown: > read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", header=TRUE) Dekade Anzahl 1 1900 11467254 2 1910 13023370 3 1920 13434601 4 1930 13296355 5 1940 12121250 6 1950 13191131 7 1960 10587420 8 1970 10944129 9 1980 11279439 10 1990 12052652 But in version 3....

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

search for: corpus