thr3ads.net - search: "hadoopstream"

Displaying 2 results from an estimated 2 matches for "hadoopstream".

2009 Jul 31

Using R with Hadoop/Hive for Big Data

Hive <http://hadoop.apache.org/hive/> is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called QL which is based on SQL and which enables users familiar with

Sparse KMeans/KDE/Nearest Neighbors?

2010 Feb 24

Sparse KMeans/KDE/Nearest Neighbors?

hi, I have a dataset (the netflix dataset) which is basically ~18k columns and well variable number of rows but let's assume 25 thousand for now. The dataset is very sparse. I was wondering how to do kmeans/nearest neighbors or kernel density estimation on it. I tired using the spMatrix function in "Matrix" package. I think I'm able to create the matrix but as soon as I pass

search for: hadoopstream