similar to: package tm: reading XML files

Displaying 20 results from an estimated 300 matches similar to: "package tm: reading XML files"

2010 Feb 16
0
tm package
Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation)
2007 Jan 11
0
tm 0.1 uploaded to CRAN
Dear useRs, a first version of tm has just been released on CRAN. tm provides a sophisticated framework for text mining applications within R. It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the
2007 Jan 11
0
tm 0.1 uploaded to CRAN
Dear useRs, a first version of tm has just been released on CRAN. tm provides a sophisticated framework for text mining applications within R. It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the
2009 Dec 11
0
readHTML within tm package
I'm hoping to work with the tm package with some html documents. In the documentation and in the the tutorial material it says that there is a readHTML routine that can be used to read HTML documents into a corpus. However, when I try to use that routine I get an error. When I run getReaders (below) readHTML isn't listed. > getReaders() [1] "readDOC"
2010 Feb 04
1
How to read HTML or TEXT file with tm package
??????????????????????????????????????????... ????: ???? URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in
2006 Nov 04
0
Ferret 0.10.6 released (and some benchmarks)
Hey folks, ** Description ** Firstly for those who don''t know, Ferret is a full-text search library which makes adding search to your application a breeze. It''s much faster than MySQL full-text search as well most other search libraries out there. It allows you to do Boolean (+ruby + rails -jewelry) and phrase queries ("the quick brown fox") as well as some more
2012 Jun 01
1
Dependencies on recommended packages
Dear all, I've recently had some issues getting my package to successfully "check". This was on R-Forge, so it's not obvious for me to provide SessionInfo or the likes (if necessary, Stefan can chime in?). After some research (mainly by Stefan Theussler, driving force behind R-Forge), this turned out to be the root cause: On R-Forge, the version of R installed was the
2011 May 18
0
text mining problem using TM package
Hi, I’m using R (TM package) for text mining and I’m having problems filtering articles out of my data set by local meta data. Here is the code: *data <- ("C:/… /19970331")* * * * * *rs <- ReutersSource(data , encoding = "UTF-8")* *RC <- VCorpus(DirSource(data), readerControl = list(reader = readRCV1asPlain,* * language = "en_US",* * load =
2012 Mar 06
0
R 2.15.0 alpha: R CMD check --as-cran / tools:::..check_package_CRAN_incoming() crash
For what it's worth, with R --no-init-file CMD check --as-cran ${pkg}_${version}.tar.gz on R version 2.15.0 alpha (2012-03-03 r58572) on Windows I just managed to generate a crash: Checking package affxparser... * using log directory 'X:/affxparser,BioC-devel/R2.15.0/affxparser.Rcheck' * using R version 2.15.0 alpha (2012-03-03 r58572) * using platform: x86_64-pc-mingw32 (64-bit)
2016 Aug 19
2
KMeans - Evaluation Results
On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =
2009 Oct 13
0
tm: Why does adding local metadata take so long?
I'm running tm 0.5 on R 2.9.2 on a MacBook Pro 17" unibody early 2009 2.93 GHz 4GB RAM. I have a directory with 1697 plain text files on the Mac, that I want to analyze with the tm package. I have read the documents into a corpus, Corpus_3compounds, as follows: # Assign directory to a character vector dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT" # Put the
2009 Jan 10
1
Help needed for Loading "tm" package
Howdy Gurus again Thanks to Tony.Breyal, I was able to writing the following script for analyzing a text document. But I got an error with "tm' package. I don't why I got the error from the R script below. I think I followed proccess of R tm manual. I use R v2.8.1. and tm_0.3-3.zip under Win XP. Thanks in advance, Kum Hwang > # setting directory > my.path
2009 Oct 15
1
Problems with rJava and tm packages
I am looking to do some text analysis using R and have run into some issues with some of the packages. Im not sure if its my goofy Vista OS or what but using R 2.8.1 i s relatively successful loading the text but the rJava package was messed up somehow: library(tm) > library(rJava) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the
2011 Jun 27
0
Extracting certain text using tm package
I have used "tm" package to import a set of text documents using the following command: text <- Corpus(DirSource("."),readerControl = list(language ="ansi")) I would like to extract only a certain portion of the text in each document using certain keywords. For example, I would like to include all the text between key words <Start Text> and <End
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.
2009 Apr 17
0
question about the Text Mining package tm
Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON
2010 Jan 22
1
Invalid input error in tm package
Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final
2011 Feb 10
2
Help using "tm" text mining package - preprocessing
Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command