thr3ads.net - similar to: "cluster benchmark datasets"

2016 Mar 10

2

Introduction and Doubts

Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term

Re: Hot swap CPU -- "build" is not a good CPU benchmark

2005 Jul 01

1

Re: Hot swap CPU -- "build" is not a good CPU benchmark

From: Peter Arremann <loony at loonybin.org> > It is a valid benchmark though :-) compile speed is a actually a > good measure for any integer app that is small enough to run in > large cache... Image processing, oil companies for their simulations, > cad... they all act very similar to compile benchmark - if a compile > is twice as fast, a software image rendering is usually

kmeans clustering

2006 Jun 29

1

kmeans clustering

Hello R list members, I'm a bio informatics student from the Leiden university (netherlands). We were asked to make a program with different clustering methods. The problem we are experiencing is the following. we have a matrix with data like the following research1 research2 research3 enz sample1 0.5 0.2 0.4 sample2 0.4

Cluster prediction from factor/numeric datasets

2007 Jul 23

1

Cluster prediction from factor/numeric datasets

Hi all, I have a dataset with numeric and factor columns of data which I developed a Gower Dissimilarity Matrix for (Daisy) and used Agglomerative Nesting (Agnes) to develop 20 clusters. I would like to use the 20 clusters to determine cluster membership for a new dataset (using predict) but cannot find a way to do this (no way to "predict" in the cluster package). I know I can use

Get distribution of positive/negative examples for each cluster

2010 Jul 21

1

Get distribution of positive/negative examples for each cluster

Dear R experts, I have a labeled data set. Each data is assigned a binary label 0 or 1. Assume that I use some clustering algorithm to group the data by clusters (using some features of the data). Now I want to know how many data are labeled as 0/1 in each cluster. For example, assume that I have 9 labeled data grouped into three clusters. The ids of the clusters are 1, 2, and 3. The dataset is

K-means recluster data with given cluster centers

2010 Jan 11

1

K-means recluster data with given cluster centers

K-means recluster data with given cluster centers Dear R user, I have several large data sets. Over time additional new data sets will be created. I want to cluster all the data in a similar/ identical way with the k-means algorithm. With the first data set I will find my cluster centers and save the cluster centers to a file [1]. This first data set is huge, it is guarantied that cluster

References verifying accuracy of R for basic statisticalcalculations and tests

2006 Jul 14

2

References verifying accuracy of R for basic statisticalcalculations and tests

Hi, > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Corey Powell > > Do you know of any references that verify the accuracy of R > for basic statistical calculations and tests. The results of > these studies should indicate that R results are the same as > the results of other statistical packages to a certain number > of decimal places on some benchmark

Getting individual co-ordinate points in k medoids cluster

2024 Sep 17

1

Getting individual co-ordinate points in k medoids cluster

Hello I am using k medoids in R to generate sets of clusters for datasets through time. I can plot the individual clusters OK but what I cannot find is a way of pulling out the co-ordinates of the individual points in the cluster diagrams - none of the kmed$... info sets seems to be this. Beneath is an example of a k medoid prog using the built in US arrests dataset - this is not the data I am

cluster results using fanny

2005 May 13

2

cluster results using fanny

Hi, I am using fanny and I have estrange results. I am wondering if someone out there can help me understand why this happens. First of all in most of my tries, it gives me a result in which each object has equal membership in all clusters. I have read that that means "the clustering is entirely fuzzy". Looking at the graphics it is really difficult to understand how objects with so

Benchmarks for LLVM-generated Binaries

2016 Sep 01

3

Benchmarks for LLVM-generated Binaries

Hi, I've lately been wondering where benchmarks for LLVM-generated binaries are hosted, and whether they're tracked over time. I'm asking because I'm thinking of where to put some benchmarks I've written using the open source Google benchmarking library [0] to test certain costs of XRay-instrumented binaries, the XRay runtime, and other related measurements (effect of

R-beta: mlbench-0.1 --- machine learning benchmark problems

1997 Jun 09

1

R-beta: mlbench-0.1 --- machine learning benchmark problems

I've made a package from some benchmark datasets for use with R and uploaded it to CRAN. Here's the Index entry: mlbench-0.1.tar.gz: A collection of artificial and real-world machine learning benchmark problems, including, e.g., the boston housing data from the UCI repository. Written/packaged by Fritz Leisch <Friedrich.Leisch at ci.tuwien.ac.at> Original data sets from

R-beta: mlbench-0.1 --- machine learning benchmark problems

1997 Jun 09

1

R-beta: mlbench-0.1 --- machine learning benchmark problems

I've made a package from some benchmark datasets for use with R and uploaded it to CRAN. Here's the Index entry: mlbench-0.1.tar.gz: A collection of artificial and real-world machine learning benchmark problems, including, e.g., the boston housing data from the UCI repository. Written/packaged by Fritz Leisch <Friedrich.Leisch at ci.tuwien.ac.at> Original data sets from

R-beta: mlbench-0.1 --- machine learning benchmark problems

1997 Jun 09

1

R-beta: mlbench-0.1 --- machine learning benchmark problems

I've made a package from some benchmark datasets for use with R and uploaded it to CRAN. Here's the Index entry: mlbench-0.1.tar.gz: A collection of artificial and real-world machine learning benchmark problems, including, e.g., the boston housing data from the UCI repository. Written/packaged by Fritz Leisch <Friedrich.Leisch at ci.tuwien.ac.at> Original data sets from

Remove error data and clustering analysis

2009 Mar 27

2

Remove error data and clustering analysis

Hi, all, I?d like to do the clustering analysis in my dataset. The example data are as follows: Dataset 1: 500, 490, 486, 490, 491, 493, 480, 461, 504, 476, 434, 500, 470, 495, 3116, 3142, 12836, 3062, 3091, 3141, 3177, 3150, 3114, 3149; Dataset 2: 506, 473, 495, 494, 434, 459, 445, 475, 476, 128367, 470, 513, 466, 476,482, 1201, 469, 502; I had so many datasets like that. Basically, every

cluster/snow question

2008 Sep 08

1

cluster/snow question

Dear R Users, I am attempting to use the snow package for clustering. Is there a way to identfy, in the environment of each node, a rank for that node and also, the total size of the cluster ? By way of analogy, I am looking for the functions in snow equivalent to mpi.comm.rank() and mpi.comm.size() from RMPI, in case that makes things clearer. Thanks in advance, Tolga Generally, this

[LLVMdev] Representing -ffast-math at the IR level

2012 Apr 16

0

[LLVMdev] Representing -ffast-math at the IR level

Duncan, I have some issues with representing this as a single "fast" mode flag, which mostly boil down to the fact that this is a very C-centric view of the world. And, since C compilers are not generally known for their awesomeness on issues of numerics, I'm not sure that's a good idea. Having something called a "fast" or "relaxed" mode implies that it is

Data Extraction - benchmark()

2012 Nov 22

1

Data Extraction - benchmark()

Hi Berend, I see you are one of the contributors to the rbecnhmark package. I am sorry that I am bothering you again. I have tried to run your code (slightly tweaked) involving the benchmark function, and I am getting the following error message. What am I doing wrong? Error in benchmark(d1 <- s1(df), d2 <- s2(df), d3 <- s3(df), d4 <- s4(df), : could not find function

cluster size

2009 Dec 11

1

cluster size

hi r-help, i am doing kmeans clustering in stats. i tried for five clusters clustering using: kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife", "rellife","hordlife","doutlife","symtlife","washlife",

determining optimal # of clusters for a given dataset (e.g. between 2 and K)

2006 Apr 19

1

determining optimal # of clusters for a given dataset (e.g. between 2 and K)

Hi: I'm clustering a microarray dataset with a large # of samples. I would like your opinion on the best way to automatically determine the optimal # of clusters. Currently I am using the "cluster" package, clustering with "clara", examining the average silhouette width at various numbers of clusters. I'd like opinions on whether any newer packages offer

Clustering algorithms don't find obvious clusters

2010 Jun 11

2

Clustering algorithms don't find obvious clusters

I have a directed graph which is represented as a matrix on the form 0 4 0 1 6 0 0 0 0 1 0 5 0 0 4 0 Each row correspond to an author (A, B, C, D) and the values says how many times this author have cited the other authors. Hence the first row says that author A have cited author B four times and author D one time. Thus the matrix represents two groups of authors: (A,B) and (C,D) who cites

similar to: cluster benchmark datasets