Displaying 20 results from an estimated 70000 matches similar to: "cluster benchmark datasets"
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has
been used in other frameworks like lucene and many other places.
okapi bm25(implemented in xapian) is theoretically better/improved measure
than tf-idf and
i am looking into various other weighting scheme which are there in xapian
or can be implemented like TF-ICF(term frequecy inverse corpus
frequency),TF-RF(term
2005 Jul 01
1
Re: Hot swap CPU -- "build" is not a good CPU benchmark
From: Peter Arremann <loony at loonybin.org>
> It is a valid benchmark though :-) compile speed is a actually a
> good measure for any integer app that is small enough to run in
> large cache... Image processing, oil companies for their simulations,
> cad... they all act very similar to compile benchmark - if a compile
> is twice as fast, a software image rendering is usually
2006 Jun 29
1
kmeans clustering
Hello R list members,
I'm a bio informatics student from the Leiden university
(netherlands). We were asked to make a program with different
clustering methods. The problem we are experiencing is the following.
we have a matrix with data like the following
research1 research2 research3 enz
sample1 0.5 0.2 0.4
sample2 0.4
2007 Jul 23
1
Cluster prediction from factor/numeric datasets
Hi all,
I have a dataset with numeric and factor columns of data which I developed a
Gower Dissimilarity Matrix for (Daisy) and used Agglomerative Nesting
(Agnes) to develop 20 clusters.
I would like to use the 20 clusters to determine cluster membership for a
new dataset (using predict) but cannot find a way to do this (no way to
"predict" in the cluster package).
I know I can use
2010 Jul 21
1
Get distribution of positive/negative examples for each cluster
Dear R experts,
I have a labeled data set. Each data is assigned a binary label 0 or 1.
Assume that I use some clustering algorithm to group the data by clusters
(using some features of the data). Now I want to know how many data are
labeled as 0/1 in each cluster.
For example, assume that I have 9 labeled data grouped into three clusters.
The ids of the clusters are 1, 2, and 3. The dataset is
2010 Jan 11
1
K-means recluster data with given cluster centers
K-means recluster data with given cluster centers
Dear R user,
I have several large data sets. Over time additional new data sets will be created.
I want to cluster all the data in a similar/ identical way with the k-means algorithm.
With the first data set I will find my cluster centers and save the cluster centers to a file [1].
This first data set is huge, it is guarantied that cluster
2006 Jul 14
2
References verifying accuracy of R for basic statisticalcalculations and tests
Hi,
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Corey Powell
>
> Do you know of any references that verify the accuracy of R
> for basic statistical calculations and tests. The results of
> these studies should indicate that R results are the same as
> the results of other statistical packages to a certain number
> of decimal places on some benchmark
2024 Sep 17
1
Getting individual co-ordinate points in k medoids cluster
Hello I am using k medoids in R to generate sets of clusters for datasets
through time. I can plot the individual clusters OK but what I cannot find
is a way of pulling out the co-ordinates of the individual points in the
cluster diagrams - none of the kmed$... info sets seems to be this.
Beneath is an example of a k medoid prog using the built in US arrests
dataset - this is not the data I am
2005 May 13
2
cluster results using fanny
Hi,
I am using fanny and I have estrange results. I am wondering if
someone out there can help me understand why this happens.
First of all in most of my tries, it gives me a result in which each
object has equal membership in all clusters. I have read that that
means "the clustering is entirely fuzzy". Looking at the graphics it
is really difficult to understand how objects with so
2016 Sep 01
3
Benchmarks for LLVM-generated Binaries
Hi,
I've lately been wondering where benchmarks for LLVM-generated binaries are hosted, and whether they're tracked over time. I'm asking because I'm thinking of where to put some benchmarks I've written using the open source Google benchmarking library [0] to test certain costs of XRay-instrumented binaries, the XRay runtime, and other related measurements (effect of
1997 Jun 09
1
R-beta: mlbench-0.1 --- machine learning benchmark problems
I've made a package from some benchmark datasets for use with R and
uploaded it to CRAN.
Here's the Index entry:
mlbench-0.1.tar.gz:
A collection of artificial and real-world machine learning
benchmark problems, including, e.g., the boston housing
data from the UCI repository.
Written/packaged by Fritz Leisch <Friedrich.Leisch at ci.tuwien.ac.at>
Original data sets from
1997 Jun 09
1
R-beta: mlbench-0.1 --- machine learning benchmark problems
I've made a package from some benchmark datasets for use with R and
uploaded it to CRAN.
Here's the Index entry:
mlbench-0.1.tar.gz:
A collection of artificial and real-world machine learning
benchmark problems, including, e.g., the boston housing
data from the UCI repository.
Written/packaged by Fritz Leisch <Friedrich.Leisch at ci.tuwien.ac.at>
Original data sets from
1997 Jun 09
1
R-beta: mlbench-0.1 --- machine learning benchmark problems
I've made a package from some benchmark datasets for use with R and
uploaded it to CRAN.
Here's the Index entry:
mlbench-0.1.tar.gz:
A collection of artificial and real-world machine learning
benchmark problems, including, e.g., the boston housing
data from the UCI repository.
Written/packaged by Fritz Leisch <Friedrich.Leisch at ci.tuwien.ac.at>
Original data sets from
2009 Mar 27
2
Remove error data and clustering analysis
Hi, all,
I?d like to do the clustering analysis in my dataset. The example data
are as follows:
Dataset 1:
500, 490, 486, 490, 491, 493, 480, 461, 504, 476, 434, 500, 470, 495,
3116, 3142, 12836, 3062, 3091, 3141, 3177, 3150, 3114, 3149;
Dataset 2:
506, 473, 495, 494, 434, 459, 445, 475, 476, 128367, 470, 513, 466,
476,482, 1201, 469, 502;
I had so many datasets like that. Basically, every
2008 Sep 08
1
cluster/snow question
Dear R Users,
I am attempting to use the snow package for clustering. Is there a way to
identfy, in the environment of each node, a rank for that node and also,
the total size of the cluster ?
By way of analogy, I am looking for the functions in snow equivalent to
mpi.comm.rank() and mpi.comm.size() from RMPI, in case that makes things
clearer.
Thanks in advance,
Tolga
Generally, this
2012 Apr 16
0
[LLVMdev] Representing -ffast-math at the IR level
Duncan,
I have some issues with representing this as a single "fast" mode flag, which mostly boil down to the fact that this is a very C-centric view of the world. And, since C compilers are not generally known for their awesomeness on issues of numerics, I'm not sure that's a good idea.
Having something called a "fast" or "relaxed" mode implies that it is
2012 Nov 22
1
Data Extraction - benchmark()
Hi Berend,
I see you are one of the contributors to the rbecnhmark package.
I am sorry that I am bothering you again. I have tried to run your code (slightly tweaked) involving the benchmark function, and I am getting the following error message. What am I doing wrong?
Error in benchmark(d1 <- s1(df), d2 <- s2(df), d3 <- s3(df), d4 <- s4(df), :
could not find function
2009 Dec 11
1
cluster size
hi r-help,
i am doing kmeans clustering in stats. i tried for five clusters clustering using:
kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife",
"rellife","hordlife","doutlife","symtlife","washlife",
2006 Apr 19
1
determining optimal # of clusters for a given dataset (e.g. between 2 and K)
Hi:
I'm clustering a microarray dataset with a large # of samples. I would like your opinion on the best way to automatically determine the optimal # of clusters. Currently I am using the "cluster" package, clustering with "clara", examining the average silhouette width at various numbers of clusters. I'd like opinions on whether any newer packages offer
2010 Jun 11
2
Clustering algorithms don't find obvious clusters
I have a directed graph which is represented as a matrix on the form
0 4 0 1
6 0 0 0
0 1 0 5
0 0 4 0
Each row correspond to an author (A, B, C, D) and the values says how many
times this author have cited the other authors. Hence the first row says
that author A have cited author B four times and author D one time. Thus the
matrix represents two groups of authors: (A,B) and (C,D) who cites