Hi R'er, I have a dataset which has a matrix of 7502 x 1426 (rows x columns). The data is in a CSV format which has a size around 68Mb. This dataset is less than 10% of our dataset. I have been adopting the Anomaly detection method as described by http://www.mattpeeples.net/kmeans.html . It has been running more than 24hrs and still haven't completed the calculation. I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). It took around 12hrs to run. I have a few questions and need your expertise guidance. 1) Is there any better Open source tools to use to do in one tool (eg, R Studio): prepare data, build models, validate models, test models and present data. I am looking a tool which will allow me to do the same as per the above link (Matt Peeples' blog). 2) Is there an Open source tools to perform the above which will allow me to run on top of Hadoop eco-system? 3) Can we use R Studio for windows as a client to run on top of Hadoop eco-system? If yes, please point me to the site where they have a use cases or samples. Thanks and Regards, Truong Phan [[alternative HTML version deleted]]