thr3ads.net - similar to: "package tm: reading XML files"

Displaying 20 results from an estimated 300 matches similar to: "package tm: reading XML files"

2010 Feb 16

tm package

Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation)

tm 0.1 uploaded to CRAN

2007 Jan 11

tm 0.1 uploaded to CRAN

Dear useRs, a first version of tm has just been released on CRAN. tm provides a sophisticated framework for text mining applications within R. It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the

tm 0.1 uploaded to CRAN

2007 Jan 11

tm 0.1 uploaded to CRAN

readHTML within tm package

2009 Dec 11

readHTML within tm package

I'm hoping to work with the tm package with some html documents. In the documentation and in the the tutorial material it says that there is a readHTML routine that can be used to read HTML documents into a corpus. However, when I try to use that routine I get an error. When I run getReaders (below) readHTML isn't listed. > getReaders() [1] "readDOC"

How to read HTML or TEXT file with tm package

2010 Feb 04

How to read HTML or TEXT file with tm package

??????????????????????????????????????????... ????: ???? URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in

Ferret 0.10.6 released (and some benchmarks)

2006 Nov 04

Ferret 0.10.6 released (and some benchmarks)

Hey folks, ** Description ** Firstly for those who don''t know, Ferret is a full-text search library which makes adding search to your application a breeze. It''s much faster than MySQL full-text search as well most other search libraries out there. It allows you to do Boolean (+ruby + rails -jewelry) and phrase queries ("the quick brown fox") as well as some more

Dependencies on recommended packages

2012 Jun 01

Dependencies on recommended packages

Dear all, I've recently had some issues getting my package to successfully "check". This was on R-Forge, so it's not obvious for me to provide SessionInfo or the likes (if necessary, Stefan can chime in?). After some research (mainly by Stefan Theussler, driving force behind R-Forge), this turned out to be the root cause: On R-Forge, the version of R installed was the

text mining problem using TM package

2011 May 18

text mining problem using TM package

Hi, I’m using R (TM package) for text mining and I’m having problems filtering articles out of my data set by local meta data. Here is the code: *data <- ("C:/… /19970331")* * * * * *rs <- ReutersSource(data , encoding = "UTF-8")* *RC <- VCorpus(DirSource(data), readerControl = list(reader = readRCV1asPlain,* * language = "en_US",* * load =

R 2.15.0 alpha: R CMD check --as-cran / tools:::..check_package_CRAN_incoming() crash

2012 Mar 06

R 2.15.0 alpha: R CMD check --as-cran / tools:::..check_package_CRAN_incoming() crash

For what it's worth, with R --no-init-file CMD check --as-cran ${pkg}_${version}.tar.gz on R version 2.15.0 alpha (2012-03-03 r58572) on Windows I just managed to generate a crash: Checking package affxparser... * using log directory 'X:/affxparser,BioC-devel/R2.15.0/affxparser.Rcheck' * using R version 2.15.0 alpha (2012-03-03 r58572) * using platform: x86_64-pc-mingw32 (64-bit)

KMeans - Evaluation Results

2016 Aug 19

KMeans - Evaluation Results

On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =

tm: Why does adding local metadata take so long?

2009 Oct 13

tm: Why does adding local metadata take so long?

I'm running tm 0.5 on R 2.9.2 on a MacBook Pro 17" unibody early 2009 2.93 GHz 4GB RAM. I have a directory with 1697 plain text files on the Mac, that I want to analyze with the tm package. I have read the documents into a corpus, Corpus_3compounds, as follows: # Assign directory to a character vector dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT" # Put the

Help needed for Loading "tm" package

2009 Jan 10

Help needed for Loading "tm" package

Howdy Gurus again Thanks to Tony.Breyal, I was able to writing the following script for analyzing a text document. But I got an error with "tm' package. I don't why I got the error from the R script below. I think I followed proccess of R tm manual. I use R v2.8.1. and tm_0.3-3.zip under Win XP. Thanks in advance, Kum Hwang > # setting directory > my.path

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

I am looking to do some text analysis using R and have run into some issues with some of the packages. Im not sure if its my goofy Vista OS or what but using R 2.8.1 i s relatively successful loading the text but the rJava package was messed up somehow: library(tm) > library(rJava) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the

Extracting certain text using tm package

2011 Jun 27

Extracting certain text using tm package

I have used "tm" package to import a set of text documents using the following command: text <- Corpus(DirSource("."),readerControl = list(language ="ansi")) I would like to extract only a certain portion of the text in each document using certain keywords. For example, I would like to include all the text between key words <Start Text> and <End

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

question about the Text Mining package tm

2009 Apr 17

question about the Text Mining package tm

Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON

Invalid input error in tm package

2010 Jan 22

Invalid input error in tm package

Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final

Help using "tm" text mining package - preprocessing

2011 Feb 10

Help using "tm" text mining package - preprocessing

Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command

similar to: package tm: reading XML files