thr3ads.net - similar to: "text mining"

Displaying 20 results from an estimated 400 matches similar to: "text mining"

2009 Oct 02

text mining

The following code is derived from a paper titled "Text Mining Infrastructure in R" (http://www.jstatsoft.org/v25/i05/paper). The example below seems to load some default documents for analysis, some sort of latin document. I cannot for the life of me figure out to load my own document let alone an entire corpus. I have searched the above documenet as well as related documentation.

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =

Extracting information from text data

2011 Jan 24

Extracting information from text data

Hi R-Users, Thanks in advance. I am using R-2.12.0 on Windows XP. I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: > my.corpus <- Corpus(DirSource(my.path),

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

Help using "tm" text mining package - preprocessing

2011 Feb 10

Help using "tm" text mining package - preprocessing

Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

Hi, Gurus Thanks to your good helps, I have managed starting the use of a text mining package so called "tm" in R under the OS of Win XP. However, during running the tm package, I got another mine like memory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R

Help needed for Loading "tm" package

2009 Jan 10

Help needed for Loading "tm" package

Howdy Gurus again Thanks to Tony.Breyal, I was able to writing the following script for analyzing a text document. But I got an error with "tm' package. I don't why I got the error from the R script below. I think I followed proccess of R tm manual. I use R v2.8.1. and tm_0.3-3.zip under Win XP. Thanks in advance, Kum Hwang > # setting directory > my.path

reading in MS Word files

2009 Aug 17

reading in MS Word files

I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark

package tm: reading XML files

2012 May 29

package tm: reading XML files

Dear fellow R users, I'm using the package tm for text mining, and have a problem with reading in a corpus from XML files. When I copy the example from "Introduction to the tm package" of the small reuters subset "crude", everything goes well, and I get a corpus with the required meta data. When I read in the entire reuters21578 corpus in XML format however (or a

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

I am looking to do some text analysis using R and have run into some issues with some of the packages. Im not sure if its my goofy Vista OS or what but using R 2.8.1 i s relatively successful loading the text but the rJava package was messed up somehow: library(tm) > library(rJava) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the

Reading PDF files

2009 Dec 22

Reading PDF files

Hi: I need to do text mining on PDF files. I understand there is a readPDF command in tm that can be used. Have read the 2008 posts on converting PDF files to text by Tony Breyal and others. Wondering if the procedure has been standardized in any tutorial or otherwise? Being new to R, I was able to follow only part of the discussion. Any way to get a set of step by step instructions

Help with tm assocation analysis and Rgraphviz installation.

2009 Mar 30

Help with tm assocation analysis and Rgraphviz installation.

Help with tm assocation analysis and Rgraphviz installation. THANK YOU IN ADVANCE Question 1: I saved two txt file in C:\textfile And each txt file contents only one text column, and both have 100 records. I know term “research” occurs 49 times, so I want to find out which other words are correlated to this word, and I got tons of association ‘1’ . I tried other terms, and no

importing data

2007 May 05

importing data

Hello, I need to import a data set. I have never imported data files with R. I have always worked on simulated data. I have looked at R Data Import/Export manual. It is a bit peculiar because my data base is already an R object called "japan". I guess it is not yet a data set, and I don't know how to manipulate variables from it. When I type "japan", here is an extract

reading and frequency analysis of Spanish text

2009 Aug 05

reading and frequency analysis of Spanish text

For an historical paper I'm working on, I have some Spanish plaintext, presently in the form of a Word .doc file, http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc and also some ciphered text from the same original source. The ultimate goal is to use some frequency analysis of letters and word lengths in the plaintext to help decode the

question about the Text Mining package tm

2009 Apr 17

question about the Text Mining package tm

Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON

text mining problem using TM package

2011 May 18

text mining problem using TM package

Hi, I’m using R (TM package) for text mining and I’m having problems filtering articles out of my data set by local meta data. Here is the code: *data <- ("C:/… /19970331")* * * * * *rs <- ReutersSource(data , encoding = "UTF-8")* *RC <- VCorpus(DirSource(data), readerControl = list(reader = readRCV1asPlain,* * language = "en_US",* * load =

PAM authentication to Active Directory

2006 Nov 15

PAM authentication to Active Directory

Hello list, I want to authenticate (only authenticate) through active directory with PAM. I googled around and everything I found, wheter it is forum posts or howtos, it always talks about winbind and joining the linux machine to the windows domain. I do not which to do this, I only want to get PAM to authenticate with the AD and then everything else is local. Should I use pam_winbind ?

Can't pass file name as parameter to Corpus function

2009 Nov 03

Can't pass file name as parameter to Corpus function

I'm working on a small project to extract high-frequency terms from a document and then display those terms in web page. To this end, I've to pass the file name as parameter to the Corpus function to build a corpus of only one document. I can build the corpus using the code below interactively in R. But calling the function with a file name as the parameter I got the error message saying

Bug in sd() and var() in handling vectors of NA (R version 2.7.1)?

2008 Jul 29

Bug in sd() and var() in handling vectors of NA (R version 2.7.1)?

In the previous versions of R (2.6.1), when a vector of NA was given to the functions 'sd' or 'var' with parameter na.rm = TRUE, it used to return NA. Now (2.7.1) it returns an ERROR : Example in 2.6.1: > sd(c(NA, NA, NA, NA), na.rm = TRUE) [1] NA Example in 2.7.1: > sd(c(NA, NA, NA, NA), na.rm = TRUE) Error in var(x, na.rm = na.rm) : paires d'éléments

similar to: text mining