search for: hadoopstream

Displaying 2 results from an estimated 2 matches for "hadoopstream".

2009 Jul 31
1
Using R with Hadoop/Hive for Big Data
Hive <http://hadoop.apache.org/hive/> is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called QL which is based on SQL and which enables users familiar with
2010 Feb 24
1
Sparse KMeans/KDE/Nearest Neighbors?
hi, I have a dataset (the netflix dataset) which is basically ~18k columns and well variable number of rows but let's assume 25 thousand for now. The dataset is very sparse. I was wondering how to do kmeans/nearest neighbors or kernel density estimation on it. I tired using the spMatrix function in "Matrix" package. I think I'm able to create the matrix but as soon as I pass