Displaying 2 results from an estimated 2 matches for "hadoopstream".
2009 Jul 31
1
Using R with Hadoop/Hive for Big Data
Hive <http://hadoop.apache.org/hive/> is a data warehouse infrastructure
built on top of Hadoop that provides tools to enable easy data
summarization, adhoc querying and analysis of large datasets data stored in
Hadoop files. It provides a mechanism to put structure on this data and it
also provides a simple query language called QL which is based on SQL and
which enables users familiar with
2010 Feb 24
1
Sparse KMeans/KDE/Nearest Neighbors?
hi,
I have a dataset (the netflix dataset) which is basically ~18k columns and
well variable number of rows but let's assume 25 thousand for now. The
dataset is very sparse. I was wondering how to do kmeans/nearest neighbors
or kernel density estimation on it.
I tired using the spMatrix function in "Matrix" package. I think I'm able to
create the matrix but as soon as I pass