similar to: Clustering with R - efficient processing of large sparse data sets (text data)

Displaying 20 results from an estimated 4000 matches similar to: "Clustering with R - efficient processing of large sparse data sets (text data)"

2011 May 11
4
How to document man/*.Rd pages with images?
Hi, I?m trying to figure out how to put images into my package?s help documentation. I?ve gotten to the point where I can put the images in the /inst/doc/ directory. I have also gotten to the point where I have package checks without any warnings. I couldn?t find the terms ?picture,? ?image,? or ?graphic? in a text search within the Writing R Extensions: 2 Writing R documentation files
2011 Aug 19
1
Build a package - check error
Dear R-users I am slowly migrating my mex files (MATLAB - Fortran and C) to R. To get my own functions available on R section I have decided to learn how to build a R package. I choose a simple example with a few Fortran and R functions (wrapper). The fortran sources are located at src and the R functions at R (as recommended). The building process went ok but R CMD check did not. The error
2001 Nov 21
2
distances from points to line
Dear all, I have discovered that there are many things that I used to do in my GIS which are easily done directly in R, for example calculating interpoint distances using geoR and pick out points inside a polygon using splancs. I now wonder, is there a function to create a line object like a watercourse and then calculate the distances between many points in space and this line? I couldn't
2011 Apr 06
5
Need a more efficient way to implement this type of logic in R
I have cobbled together the following logic. It works but is very slow. I'm sure that there must be a better r-specific way to implement this kind of thing, but have been unable to find/understand one. Any help would be appreciated. hh.sub <- households[c("HOUSEID","HHFAMINC")] for (indx in 1:length(hh.sub$HOUSEID)) { if ((hh.sub$HHFAMINC[indx] == '01')
2008 Feb 21
0
extending code to handle more variables
useR's, Consider the variables defined below: yvals <- c(25,30,35) x1 <- c(1,2,3) x2 <- c(3,4,5) x3 <- c(6,7,8) x <- as.data.frame(cbind(x1,x2,x3)) delta <- c(2.5, 1.5, 0.5) h <- delta/2 vars <- 3 xk1 <- seq(min(x1)-0.5, max(x1)+0.5, 0.5) xk2 <- seq(min(x2)-0.5, max(x2)+0.5, 0.5) xk3 <- seq(min(x3)-0.5, max(x3)+0.5, 0.5) xks <- list(xk1,xk2,xk3) xk <-
2011 May 16
1
pam() clustering for large data sets
Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I
2009 Dec 16
0
R graphics
Graphics about... Bayesian ChemPhys Cluster Distributions Econometrics Environmetrics ExperimentalDesign Finance Genetics gR Graphics HighPerformanceComputing MachineLearning MedicalImaging Multivariate NaturalLanguageProcessing Optimization Pharmacokinetics Psychometrics Robust SocialSciences Spatial Survival TimeSeries Other URL: http://bm2.genes.nig.ac.jp/RGM2/index.php?clear=all -- Share
2011 Jun 30
1
"non-efficient" sparse file copying with rsync
Hi all! There is a need to copy sparse files (precisely VMware disk images) with rsync. For that purpose I'm using the -S (--sparse) option and they are copied just fine (the check sums of the original and file at destination are the same). However, as it is said in the manual: [quote] -S, --sparse Try to handle sparse files efficiently so they take up less space on the destination.
2006 Jun 05
1
Survey - twophase
Dear WizaRds, I am struggling with the use of twophase in package survey. My goal is to compute a simple example in two phase sampling: phase 1: I sample n1=1000 circuit boards and find 80 non functional phase 2: Given the n1=1000 sample I sample n2=100 and find 15 non functional. Let's say, phase 2 shows this result together with phase 1: ...................phase1........
2007 Nov 16
1
Efficient way to compute power of a sparse matrix
Dear all, I would like to compute power of a square non symmetric matrix. This is a part of a simulation study. Matrices are quite large (e.g., 900 by 900), and contains many 0 (more than 99 %). I have try the function mtx.exp of the Biodem package: library(Biodem) m <- matrix(0, 900, 900) i <- sample(1:900, 3000, replace = T) j <- sample(1:900, 3000, replace = T) for(x in 1:3000)
2006 Jun 10
3
sparse matrix, rnorm, malloc
Hi, I'm Sorry for any cross-posting. I've reviewed the archives and could not find an exact answer to my question below. I'm trying to generate very large sparse matrices (< 1% non-zero entries per row). I have a sparse matrix function below which works well until the row/col count exceeds 10,000. This is being run on a machine with 32G memory: sparse_matrix <-
2008 Nov 25
1
Efficient passing through big data.frame and modifying select
> -----Original Message----- > From: William Dunlap > Sent: Tuesday, November 25, 2008 9:16 AM > To: 'johannes_graumann at web.de' > Subject: Re: [R] Efficient passing through big data.frame and > modifying select fields > > > Johannes Graumann johannes_graumann at web.de > > Tue Nov 25 15:16:01 CET 2008 > > > > Hi all, > > > >
2010 Dec 14
1
Installing R-packages in Windows
Hi there, I have the following problem and I hope somebody might help me. First of all: I am using WinXP SP3 (english and/or german) with R in Version 2.10.0. Now I am trying to install some packages but unfortunately I am getting a weird error. No matter which package I am trying to install - I nearly get the same error. It looks like this:
2004 Jun 29
1
PAM clustering: using my own dissimilarity matrix
Hello, I would like to use my own dissimilarity matrix in a PAM clustering with method "pam" (cluster package) instead of a dissimilarity matrix created by daisy. I read data from a file containing the dissimilarity values using "read.csv". This creates a matrix (alternatively: an array or vector) which is not accepted by "pam": A call
2009 Jan 26
2
Help with clustering
I am going to try out a tentative clustering of some feature vectors. The range of values spanned by the three items making up the features vector is quite different: Item-1 goes roughly from 70 to 525 (integer numbers only) Item-2 is in-between 0 and 1 (all real numbers between 0 and 1) Item-3 goes from 1 to 10 (integer numbers only) In order to spread out Item-2 even further I might try to
2001 Jan 09
2
PAM clustering (using triangular matrix)
Hi, I'm trying to use a similarity matrix (triangular) as input for pam() or fanny() clustering algorithms. The problem is that this algorithms can only accept a dissimilarity matrix, normally generated by daisy(). However, daisy only accept 'data matrix or dataframe. Dissimilarities will be computed between the rows of x'. Is there any way to say to that your data are already a
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >
2004 Dec 09
1
more clustering questions
Sorry to bother you kind folks again with my questions. I am trying to learn as much as I can about all this, and I will admit that I don't have the proper background, but I hope that someone can at least point me in the correct direction. I have created a test matrix for what I want to do: s1 s2 s3 s4 s5 s1 10 5 0 8 7 s2 5 10 0 0 5 s3 0 0 10 0 0 s4 8 0 0 10 0 s5 7
2010 Oct 19
2
Clustering with ordinal data
Hello I've been asked to help evaluate a vegetation data set, specifically to examine it for community similarity. The initial problem I see is that the data is ordinal. At best this only captures a relative ranking of abundance and ordinal ranks are assigned after data collection. I've been trying to find a procedure in R that can handle ordinal based classification and so far have
2004 Feb 13
3
Calculate Closest 5 Cases?
I've only begun investigating R as a substitute for SPSS. I have a need to identify for each CASE the closest (or most similar) 5 other CASES (not including itself as it is automatically the closest). I have a fairly large matrix (50000 cases by 50 vars). In SPSS, I can use Correlate > Distances to generate a matrix of similarity, but only on a small sample. The entire matrix can not