Displaying 20 results from an estimated 4000 matches similar to: "Clustering with R - efficient processing of large sparse data sets (text data)"
2011 May 11
4
How to document man/*.Rd pages with images?
Hi,
I?m trying to figure out how to put images into my package?s help
documentation. I?ve gotten to the point where I can put the images in the
/inst/doc/ directory. I have also gotten to the point where I have package
checks without any warnings. I couldn?t find the terms ?picture,? ?image,?
or ?graphic? in a text search within the Writing R Extensions: 2 Writing R
documentation files
2011 Aug 19
1
Build a package - check error
Dear R-users
I am slowly migrating my mex files (MATLAB - Fortran and C) to R. To get my
own functions available on R section I have decided to learn how to build a
R package. I choose a simple example with a few Fortran and R functions
(wrapper).
The fortran sources are located at src and the R functions at R (as
recommended). The building process went ok but R CMD check did not. The
error
2001 Nov 21
2
distances from points to line
Dear all,
I have discovered that there are many things that I used to do in my GIS
which are easily done directly in R, for example calculating interpoint
distances using geoR and pick out points inside a polygon using splancs.
I now wonder, is there a function to create a line object like a
watercourse and then calculate the distances between many points in space
and this line?
I couldn't
2011 Apr 06
5
Need a more efficient way to implement this type of logic in R
I have cobbled together the following logic. It works but is very
slow. I'm sure that there must be a better r-specific way to implement
this kind of thing, but have been unable to find/understand one. Any
help would be appreciated.
hh.sub <- households[c("HOUSEID","HHFAMINC")]
for (indx in 1:length(hh.sub$HOUSEID)) {
if ((hh.sub$HHFAMINC[indx] == '01')
2008 Feb 21
0
extending code to handle more variables
useR's,
Consider the variables defined below:
yvals <- c(25,30,35)
x1 <- c(1,2,3)
x2 <- c(3,4,5)
x3 <- c(6,7,8)
x <- as.data.frame(cbind(x1,x2,x3))
delta <- c(2.5, 1.5, 0.5)
h <- delta/2
vars <- 3
xk1 <- seq(min(x1)-0.5, max(x1)+0.5, 0.5)
xk2 <- seq(min(x2)-0.5, max(x2)+0.5, 0.5)
xk3 <- seq(min(x3)-0.5, max(x3)+0.5, 0.5)
xks <- list(xk1,xk2,xk3)
xk <-
2011 May 16
1
pam() clustering for large data sets
Hello everyone,
I need to do k-medoids clustering for data which consists of 50,000
observations. I have computed distances between the observations
separately and tried to use those with pam().
I got the "cannot allocate vector of length" error and I realize this
job is too memory intensive. I am at a bit of a loss on what to do at
this point.
I can't use clara(), because I
2009 Dec 16
0
R graphics
Graphics about...
Bayesian
ChemPhys
Cluster
Distributions
Econometrics
Environmetrics
ExperimentalDesign
Finance
Genetics
gR
Graphics
HighPerformanceComputing
MachineLearning
MedicalImaging
Multivariate
NaturalLanguageProcessing
Optimization
Pharmacokinetics
Psychometrics
Robust
SocialSciences
Spatial
Survival
TimeSeries
Other
URL: http://bm2.genes.nig.ac.jp/RGM2/index.php?clear=all
--
Share
2011 Jun 30
1
"non-efficient" sparse file copying with rsync
Hi all!
There is a need to copy sparse files (precisely VMware disk images) with rsync. For that purpose I'm using the -S (--sparse) option and they are copied just fine (the check sums of the original and file at destination are the same). However, as it is said in the manual:
[quote] -S, --sparse Try to handle sparse files efficiently so they take up less space on the destination.
2006 Jun 05
1
Survey - twophase
Dear WizaRds,
I am struggling with the use of twophase in package survey. My goal
is to compute a simple example in two phase sampling:
phase 1: I sample n1=1000 circuit boards and find 80 non functional
phase 2: Given the n1=1000 sample I sample n2=100 and find 15 non
functional. Let's say, phase 2 shows this result together with phase 1:
...................phase1........
2007 Nov 16
1
Efficient way to compute power of a sparse matrix
Dear all,
I would like to compute power of a square non symmetric matrix. This is
a part of a simulation study. Matrices are quite large (e.g., 900 by
900), and contains many 0 (more than 99 %). I have try the function
mtx.exp of the Biodem package:
library(Biodem)
m <- matrix(0, 900, 900)
i <- sample(1:900, 3000, replace = T)
j <- sample(1:900, 3000, replace = T)
for(x in 1:3000)
2006 Jun 10
3
sparse matrix, rnorm, malloc
Hi,
I'm Sorry for any cross-posting. I've reviewed the archives and could
not find an exact answer to my question below.
I'm trying to generate very large sparse matrices (< 1% non-zero
entries per row). I have a sparse matrix function below which works
well until the row/col count exceeds 10,000. This is being run on a
machine with 32G memory:
sparse_matrix <-
2008 Nov 25
1
Efficient passing through big data.frame and modifying select
> -----Original Message-----
> From: William Dunlap
> Sent: Tuesday, November 25, 2008 9:16 AM
> To: 'johannes_graumann at web.de'
> Subject: Re: [R] Efficient passing through big data.frame and
> modifying select fields
>
> > Johannes Graumann johannes_graumann at web.de
> > Tue Nov 25 15:16:01 CET 2008
> >
> > Hi all,
> >
> >
2010 Dec 14
1
Installing R-packages in Windows
Hi there,
I have the following problem and I hope somebody might help me.
First of all: I am using WinXP SP3 (english and/or german) with R in
Version 2.10.0.
Now I am trying to install some packages but unfortunately I am getting
a weird error. No matter which package I am trying to install - I nearly
get the same error.
It looks like this:
2004 Jun 29
1
PAM clustering: using my own dissimilarity matrix
Hello,
I would like to use my own dissimilarity matrix in a PAM clustering with
method "pam" (cluster package) instead of a dissimilarity matrix created
by daisy.
I read data from a file containing the dissimilarity values using
"read.csv". This creates a matrix (alternatively: an array or vector)
which is not accepted by "pam": A call
2009 Jan 26
2
Help with clustering
I am going to try out a tentative clustering of some feature vectors.
The range of values spanned by the three items making up the features vector is quite different:
Item-1 goes roughly from 70 to 525 (integer numbers only)
Item-2 is in-between 0 and 1 (all real numbers between 0 and 1)
Item-3 goes from 1 to 10 (integer numbers only)
In order to spread out Item-2 even further I might try to
2001 Jan 09
2
PAM clustering (using triangular matrix)
Hi,
I'm trying to use a similarity matrix (triangular) as input for pam() or
fanny() clustering algorithms.
The problem is that this algorithms can only accept a dissimilarity
matrix, normally generated by daisy().
However, daisy only accept 'data matrix or dataframe. Dissimilarities
will be computed between the rows of x'.
Is there any way to say to that your data are already a
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org>
wrote:
> On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote:
>
> K-Means or something related certainly seems like a viable approach,
> so what you'll need to do is to come up with a proposal of how you'd
> implement this in Xapian (either with reference to the previous work,
>
2004 Dec 09
1
more clustering questions
Sorry to bother you kind folks again with my questions. I am trying to
learn as much as I can about all this, and I will admit that I don't
have the proper background, but I hope that someone can at least point
me in the correct direction.
I have created a test matrix for what I want to do:
s1 s2 s3 s4 s5
s1 10 5 0 8 7
s2 5 10 0 0 5
s3 0 0 10 0 0
s4 8 0 0 10 0
s5 7
2010 Oct 19
2
Clustering with ordinal data
Hello
I've been asked to help evaluate a vegetation data set, specifically to
examine it for community similarity. The initial problem I see is that the
data is ordinal. At best this only captures a relative ranking of
abundance and ordinal ranks are assigned after data collection. I've
been trying to find a procedure in R that can handle ordinal based
classification and so far have
2004 Feb 13
3
Calculate Closest 5 Cases?
I've only begun investigating R as a substitute for SPSS.
I have a need to identify for each CASE the closest (or most similar) 5
other CASES (not including itself as it is automatically the closest). I
have a fairly large matrix (50000 cases by 50 vars). In SPSS, I can use Correlate > Distances to generate a matrix of similarity, but only on a small sample. The entire matrix can not