thr3ads.net - R help - [R] Using R with Hadoop/Hive for Big Data [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Ajay ohri

2009-Jul-31 21:39 UTC

[R] Using R with Hadoop/Hive for Big Data

Hive <http://hadoop.apache.org/hive/> is a data warehouse infrastructure
built on top of Hadoop that provides tools to enable easy data
summarization, adhoc querying and analysis of large datasets data stored in
Hadoop files. It provides a mechanism to put structure on this data and it
also provides a simple query language called QL which is based on SQL and
which enables users familiar with SQL to query this data. At the same time,
this language also allows traditional map/reduce programmers to be able to
plug in their custom mappers and reducers to do more sophisticated analysis
which may not be supported by the built in capabilities of the language.

Is there any package currently out or in development that is looking into
using R like matrix capabilties with HIVE like big data abilties on a
remote/ parallel HPC.

Regards,

Ajay

	[[alternative HTML version deleted]]

Ajay ohri

2009-Aug-01 11:27 UTC

head link

[R] Using R with Hadoop/Hive for Big Data

Hi,

The document helps a lot thanks. I need to know how to work with Hadoop and
R in a parallel clsuter environment.

HIVE is a new system on top of Hadoop that uses a SQL derivative to query
it. http://hadoop.apache.org/hive/



Regards,

Ajay


On Fri, Jul 31, 2009 at 7:23 PM, Avram Aelony <aavram@mac.com> wrote:
>
>
> I am not sure if I understood your question, but you may want to look at
> http://cran.r-project.org/web/packages/HadoopStreaming/HadoopStreaming.pdf
> Regards,
>
> Avram
>
>
>
> On Friday, July 31, 2009, at 02:39PM, "Ajay ohri"
<ohri2007@gmail.com>
> wrote:
> >Hive <http://hadoop.apache.org/hive/> is a data warehouse
infrastructure
> >built on top of Hadoop that provides tools to enable easy data
> >summarization, adhoc querying and analysis of large datasets data
stored
> in
> >Hadoop files. It provides a mechanism to put structure on this data and
it
> >also provides a simple query language called QL which is based on SQL
and
> >which enables users familiar with SQL to query this data. At the same
> time,
> >this language also allows traditional map/reduce programmers to be able
to
> >plug in their custom mappers and reducers to do more sophisticated
> analysis
> >which may not be supported by the built in capabilities of the
language.
> >
> >Is there any package currently out or in development that is looking
into
> >using R like matrix capabilties with HIVE like big data abilties on a
> >remote/ parallel HPC.
> >
> >Regards,
> >
> >Ajay
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jul 2009 - Using R with Hadoop/Hive for Big Data

[R] Using R with Hadoop/Hive for Big Data

[R] Using R with Hadoop/Hive for Big Data

Possibly Parallel Threads