thr3ads.net - similar to: "tm package - how to transform a TermDocMatrix to a data.frame"

Displaying 20 results from an estimated 1000 matches similar to: "tm package - how to transform a TermDocMatrix to a data.frame"

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

Help installing Rstem package

2007 Oct 21

Help installing Rstem package

An embedded and charset-unspecified text was scrubbed... Name: n?o dispon?vel Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071021/3a2e8c5b/attachment.pl

Help needed for Loading "tm" package

2009 Jan 10

Help needed for Loading "tm" package

Howdy Gurus again Thanks to Tony.Breyal, I was able to writing the following script for analyzing a text document. But I got an error with "tm' package. I don't why I got the error from the R script below. I think I followed proccess of R tm manual. I use R v2.8.1. and tm_0.3-3.zip under Win XP. Thanks in advance, Kum Hwang > # setting directory > my.path

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

Help with tm assocation analysis and Rgraphviz installation.

2009 Mar 30

Help with tm assocation analysis and Rgraphviz installation.

Help with tm assocation analysis and Rgraphviz installation. THANK YOU IN ADVANCE Question 1: I saved two txt file in C:\textfile And each txt file contents only one text column, and both have 100 records. I know term “research” occurs 49 times, so I want to find out which other words are correlated to this word, and I got tons of association ‘1’ . I tried other terms, and no

remove Punctuation characters

2006 May 09

remove Punctuation characters

Hi, I want to remove all punctuation characters in a string. I was trying it use a regular expressions but it doesn't work. Here is a sample os what i want: str <- 'ABD - remove de punct, and dot characters.' str <- gsub('[:punct:]','',str) str "'ABD remove de punct and dot characters" is there any function that do this kind of thing? Thanks to

K MEANS clustering

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

Extracting information from text data

2011 Jan 24

Extracting information from text data

Hi R-Users, Thanks in advance. I am using R-2.12.0 on Windows XP. I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: > my.corpus <- Corpus(DirSource(my.path),

sorting matrix output alphabetically

2008 Oct 18

sorting matrix output alphabetically

Hello, I have been using the TM package to create a TermDocMatrix, which I have saved as a matrix so that I can view word frequencies. Below is a section of the code that I have used and an excerpt of the output: What I wanted to be able to do is to view the output alphabetically - rather than the results being sorted by frequency as below, that an alphabetical list would be generated. This

Abundance data ordination in R

2007 Apr 01

Abundance data ordination in R

Um texto embutido e sem conjunto de caracteres especificado associado... Nome: n?o dispon?vel Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070401/33921c2a/attachment.pl

GSoC 2016 - Introduction

2016 May 01

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

p-values pvclust maximum distance measure

2010 Jul 20

p-values pvclust maximum distance measure

Hi, I am new to clustering and was wondering why pvclust using "maximum" as distance measure nearly always results in p-values above 95%. I wrote an example programme which demonstrates this effect. I uploaded a PDF showing the results Here is the code which produces the PDF file: ------------------------------------------------------------------------------------- s <-

k-means with euclidian distance but no coordinates

2001 Dec 13

k-means with euclidian distance but no coordinates

Hi, I'm trying to build a thesaurus that will sensible values for rare words. I suspect the best algorithm to use is k-means although I'm not sure about that -- I would have preferred a k dimensional space with a binary cluster in each dimension so a word can belong to 0..k clusters, but I digress... I can measure the strength of correlation between words fairly easily by counting

question about the Text Mining package tm

2009 Apr 17

question about the Text Mining package tm

Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON

buglet in dist() ?

2007 Sep 02

buglet in dist() ?

the first line of dist() says if (!is.na(pmatch(method, "euclidian"))) shouldn't that be "euclidean" ? --------------------- R version 2.5.1 (2007-06-27) i486-pc-linux-gnu locale:

Document clustering for R

2005 Sep 12

Document clustering for R

I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for

Windows 7 Issues

2010 Mar 16

Windows 7 Issues

i have been trying to join my windows 7 machines in a samba domain, but it aways fails. I can join a windows 7 machine in a Samba domain. Then i have an error: _netr_ServerAuthenticate3: netlogon_creds_server_check failed. Rejecting auth request from client USER machine account USER$ But the machine joined. Then, when i will log in with an user, i cant do it, and the same error is showed in the

how to extract options for a function call

2011 Apr 18

how to extract options for a function call

Hi, I'm having some difficulties formulating this question. But what I want, is to extract the options associated with a parameter for a function. e.g. method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN") in the optim function. So I would like to have a vector with c("Nelder-Mead", "BFGS", "CG",

similar to: tm package - how to transform a TermDocMatrix to a data.frame