thr3ads.net - similar to: "Clustering with R - efficient processing of large sparse data sets (text data)"

Displaying 20 results from an estimated 4000 matches similar to: "Clustering with R - efficient processing of large sparse data sets (text data)"

How to document man/*.Rd pages with images?

2011 May 11

How to document man/*.Rd pages with images?

Hi, I?m trying to figure out how to put images into my package?s help documentation. I?ve gotten to the point where I can put the images in the /inst/doc/ directory. I have also gotten to the point where I have package checks without any warnings. I couldn?t find the terms ?picture,? ?image,? or ?graphic? in a text search within the Writing R Extensions: 2 Writing R documentation files

Build a package - check error

2011 Aug 19

Build a package - check error

Dear R-users I am slowly migrating my mex files (MATLAB - Fortran and C) to R. To get my own functions available on R section I have decided to learn how to build a R package. I choose a simple example with a few Fortran and R functions (wrapper). The fortran sources are located at src and the R functions at R (as recommended). The building process went ok but R CMD check did not. The error

distances from points to line

2001 Nov 21

distances from points to line

Dear all, I have discovered that there are many things that I used to do in my GIS which are easily done directly in R, for example calculating interpoint distances using geoR and pick out points inside a polygon using splancs. I now wonder, is there a function to create a line object like a watercourse and then calculate the distances between many points in space and this line? I couldn't

Need a more efficient way to implement this type of logic in R

2011 Apr 06

Need a more efficient way to implement this type of logic in R

I have cobbled together the following logic. It works but is very slow. I'm sure that there must be a better r-specific way to implement this kind of thing, but have been unable to find/understand one. Any help would be appreciated. hh.sub <- households[c("HOUSEID","HHFAMINC")] for (indx in 1:length(hh.sub$HOUSEID)) { if ((hh.sub$HHFAMINC[indx] == '01')

extending code to handle more variables

2008 Feb 21

extending code to handle more variables

useR's, Consider the variables defined below: yvals <- c(25,30,35) x1 <- c(1,2,3) x2 <- c(3,4,5) x3 <- c(6,7,8) x <- as.data.frame(cbind(x1,x2,x3)) delta <- c(2.5, 1.5, 0.5) h <- delta/2 vars <- 3 xk1 <- seq(min(x1)-0.5, max(x1)+0.5, 0.5) xk2 <- seq(min(x2)-0.5, max(x2)+0.5, 0.5) xk3 <- seq(min(x3)-0.5, max(x3)+0.5, 0.5) xks <- list(xk1,xk2,xk3) xk <-

pam() clustering for large data sets

2011 May 16

pam() clustering for large data sets

Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I

R graphics

2009 Dec 16

R graphics

Graphics about... Bayesian ChemPhys Cluster Distributions Econometrics Environmetrics ExperimentalDesign Finance Genetics gR Graphics HighPerformanceComputing MachineLearning MedicalImaging Multivariate NaturalLanguageProcessing Optimization Pharmacokinetics Psychometrics Robust SocialSciences Spatial Survival TimeSeries Other URL: http://bm2.genes.nig.ac.jp/RGM2/index.php?clear=all -- Share

"non-efficient" sparse file copying with rsync

2011 Jun 30

"non-efficient" sparse file copying with rsync

Hi all! There is a need to copy sparse files (precisely VMware disk images) with rsync. For that purpose I'm using the -S (--sparse) option and they are copied just fine (the check sums of the original and file at destination are the same). However, as it is said in the manual: [quote] -S, --sparse Try to handle sparse files efficiently so they take up less space on the destination.

Survey - twophase

2006 Jun 05

Survey - twophase

Dear WizaRds, I am struggling with the use of twophase in package survey. My goal is to compute a simple example in two phase sampling: phase 1: I sample n1=1000 circuit boards and find 80 non functional phase 2: Given the n1=1000 sample I sample n2=100 and find 15 non functional. Let's say, phase 2 shows this result together with phase 1: ...................phase1........

Efficient way to compute power of a sparse matrix

2007 Nov 16

Efficient way to compute power of a sparse matrix

Dear all, I would like to compute power of a square non symmetric matrix. This is a part of a simulation study. Matrices are quite large (e.g., 900 by 900), and contains many 0 (more than 99 %). I have try the function mtx.exp of the Biodem package: library(Biodem) m <- matrix(0, 900, 900) i <- sample(1:900, 3000, replace = T) j <- sample(1:900, 3000, replace = T) for(x in 1:3000)

sparse matrix, rnorm, malloc

2006 Jun 10

sparse matrix, rnorm, malloc

Hi, I'm Sorry for any cross-posting. I've reviewed the archives and could not find an exact answer to my question below. I'm trying to generate very large sparse matrices (< 1% non-zero entries per row). I have a sparse matrix function below which works well until the row/col count exceeds 10,000. This is being run on a machine with 32G memory: sparse_matrix <-

Efficient passing through big data.frame and modifying select

2008 Nov 25

Efficient passing through big data.frame and modifying select

> -----Original Message----- > From: William Dunlap > Sent: Tuesday, November 25, 2008 9:16 AM > To: 'johannes_graumann at web.de' > Subject: Re: [R] Efficient passing through big data.frame and > modifying select fields > > > Johannes Graumann johannes_graumann at web.de > > Tue Nov 25 15:16:01 CET 2008 > > > > Hi all, > > > >

Installing R-packages in Windows

2010 Dec 14

Installing R-packages in Windows

Hi there, I have the following problem and I hope somebody might help me. First of all: I am using WinXP SP3 (english and/or german) with R in Version 2.10.0. Now I am trying to install some packages but unfortunately I am getting a weird error. No matter which package I am trying to install - I nearly get the same error. It looks like this:

PAM clustering: using my own dissimilarity matrix

2004 Jun 29

PAM clustering: using my own dissimilarity matrix

Hello, I would like to use my own dissimilarity matrix in a PAM clustering with method "pam" (cluster package) instead of a dissimilarity matrix created by daisy. I read data from a file containing the dissimilarity values using "read.csv". This creates a matrix (alternatively: an array or vector) which is not accepted by "pam": A call

Help with clustering

2009 Jan 26

Help with clustering

I am going to try out a tentative clustering of some feature vectors. The range of values spanned by the three items making up the features vector is quite different: Item-1 goes roughly from 70 to 525 (integer numbers only) Item-2 is in-between 0 and 1 (all real numbers between 0 and 1) Item-3 goes from 1 to 10 (integer numbers only) In order to spread out Item-2 even further I might try to

PAM clustering (using triangular matrix)

2001 Jan 09

PAM clustering (using triangular matrix)

Hi, I'm trying to use a similarity matrix (triangular) as input for pam() or fanny() clustering algorithms. The problem is that this algorithms can only accept a dissimilarity matrix, normally generated by daisy(). However, daisy only accept 'data matrix or dataframe. Dissimilarities will be computed between the rows of x'. Is there any way to say to that your data are already a

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

similar to: Clustering with R - efficient processing of large sparse data sets (text data)