Dear useRs, a first version of tm has just been released on CRAN. tm provides a sophisticated framework for text mining applications within R. It offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. An advanced metadata management is implemented for collections of text documents to alleviate the usage of large and with metadata enriched document sets. With the package ships native support for handling *) the Reuters 21578 dataset, *) the Reuters Corpus Volume 1 dataset, *) Gmane RSS feeds, *) e-mails, and *) several classic file formats (e.g. plain text or CSV text). tm provides easy access to preprocessing and manipulation mechanisms, like *) whitespace removal, *) stemming, or *) conversion between file formats (e.g., Reuters21578 to plain text). Further a generic filter architecture is available in order to *) filter documents for certain criteria, *) or perform fulltext search. The package supports the export from document collections to term-document matrices as frequently used in the text mining literature. This allows the straight-forward integration of existing methods for classification, clustering, visualizations, etc. The package is designed in a modular way to enable easy integration of new file formats, parsers, transformations and filter operations. Best regards, Ingo Feinerer _______________________________________________ R-packages mailing list R-packages at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-packages