Hello,
I had been using R for text mining already. I wanted to use R for large
scale text processing and for experiments with topic modeling. I started
reading tutorials and working on some of those. I will now put down my
understanding of each of the tools:
1) R text mining toolbox: Meant for local (client side) text processing and
it uses the XML library
2) Hive: Hadoop interative, provides the framework to call map/reduce and
also provides the DFS interface for storing files on the DFS.
3) RHIPE: R Hadoop integrated environment
4) Elastic MapReduce with R: a MapReduce framework for those who do not have
their own clusters
5) Distributed Text Mining with R: An attempt to make seamless move form
local to server side processing, from R-tm to R-distributed-tm
I have the following questions and confusions about the above packages
1) Hive and RHIPE and the distributed text mining toolbox need you to have
your own clusters. Right?
2) If I have just one computer how would DFS work in case of HIVE
3) Are we facing with the problem of duplication of effort with the above
packages?
I am hoping to get insights on the above questions in the next few days.
Your timely response will be helpful
Thanks and Regards,
Shivani
--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>
--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>
--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao
[[alternative HTML version deleted]]