Hello, I had been using R for text mining already. I wanted to use R for large scale text processing and for experiments with topic modeling. I started reading tutorials and working on some of those. I will now put down my understanding of each of the tools: 1) R text mining toolbox: Meant for local (client side) text processing and it uses the XML library 2) Hive: Hadoop interative, provides the framework to call map/reduce and also provides the DFS interface for storing files on the DFS. 3) RHIPE: R Hadoop integrated environment 4) Elastic MapReduce with R: a MapReduce framework for those who do not have their own clusters 5) Distributed Text Mining with R: An attempt to make seamless move form local to server side processing, from R-tm to R-distributed-tm I have the following questions and confusions about the above packages 1) Hive and RHIPE and the distributed text mining toolbox need you to have your own clusters. Right? 2) If I have just one computer how would DFS work in case of HIVE 3) Are we facing with the problem of duplication of effort with the above packages? I am hoping to get insights on the above questions in the next few days. Your timely response will be helpful Thanks and Regards, Shivani -- Research Scholar, School of Electrical and Computer Engineering Purdue University West Lafayette IN web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao> -- Research Scholar, School of Electrical and Computer Engineering Purdue University West Lafayette IN web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao> -- Research Scholar, School of Electrical and Computer Engineering Purdue University West Lafayette IN web.ics.purdue.edu/~sgrao [[alternative HTML version deleted]]