Hi,
The document helps a lot thanks. I need to know how to work with Hadoop and
R in a parallel clsuter environment.
HIVE is a new system on top of Hadoop that uses a SQL derivative to query
it. http://hadoop.apache.org/hive/
Regards,
Ajay
On Fri, Jul 31, 2009 at 7:23 PM, Avram Aelony <aavram@mac.com> wrote:
>
>
> I am not sure if I understood your question, but you may want to look at
> http://cran.r-project.org/web/packages/HadoopStreaming/HadoopStreaming.pdf
> Regards,
>
> Avram
>
>
>
> On Friday, July 31, 2009, at 02:39PM, "Ajay ohri"
<ohri2007@gmail.com>
> wrote:
> >Hive <http://hadoop.apache.org/hive/> is a data warehouse
infrastructure
> >built on top of Hadoop that provides tools to enable easy data
> >summarization, adhoc querying and analysis of large datasets data
stored
> in
> >Hadoop files. It provides a mechanism to put structure on this data and
it
> >also provides a simple query language called QL which is based on SQL
and
> >which enables users familiar with SQL to query this data. At the same
> time,
> >this language also allows traditional map/reduce programmers to be able
to
> >plug in their custom mappers and reducers to do more sophisticated
> analysis
> >which may not be supported by the built in capabilities of the
language.
> >
> >Is there any package currently out or in development that is looking
into
> >using R like matrix capabilties with HIVE like big data abilties on a
> >remote/ parallel HPC.
> >
> >Regards,
> >
> >Ajay
> >
> > [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
[[alternative HTML version deleted]]