similar to: Cluster analysis, factor variables, large data set

Displaying 20 results from an estimated 3000 matches similar to: "Cluster analysis, factor variables, large data set"

2008 Jul 29
2
About clustering techniques
Hello R users It's some time I am playing with a dataset to do some cluster analysis. The data set consists of 14 columns being geographical coordinates and monthly temperatures in annual files latitutde - longitude - temperature 1 -..... - temperature 12 I have some missing values in some cases, maybe there are 8 monthly valid values at some points with four non valid. I don't want to
2008 Feb 19
3
simple usage of "for"
Hi list I have a data frame I would like to loop over. To begin with I would like crosstabulations using the first variabel in the data frame, which is called "meriter". > table(meriter[[1]], meriter[[3]]) ja nej Annan 0
2005 Apr 20
1
make check failure -- R 2.1.0 Windows XP SP2
I compiled R 2.1.0 under Windows XP SP2 as a preliminary to rebuilding a custom package for use with R 2.1.0. The compile completed successfully, and I was able to run demo(graphics) successfully. But make check and make check-recommended fail. > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 1.0 year 2005
2006 Apr 10
2
passing known medoids to clara() in the cluster package
Greetings, I have had good success using the clara() function to perform a simple cluster analysis on a large dataset (1 million+ records with 9 variables). Since the clara function is a wrapper to pam(), which will accept known medoid data - I am wondering if this too is possible with clara() ... The documentation does not suggest that this is possible. Essentially I am trying to
2010 Oct 18
1
make error for R 2.13.0 (and 2.12.0)
Regarding Tengfei Yin's post about an error trying to install "cluster" in 2.13.0, I have gotten an error with this package when trying to install the released version of 2.12.0. Here is the output on an Ubuntu Linux system: begin installing recommended package cluster * installing *source* package 'cluster' ... ** libs make[3]: Entering directory
2012 Feb 25
5
which is the fastest way to make data.frame out of a three-dimensional array?
foo <- rnorm(30*34*12) dim(foo) <- c(30, 34, 12) I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame. I know how this can be done in a very slow way using for loops, like this: x <- rep(seq(from = 1, to = 30), 34) y <- as.vector(sapply(1:34, function(x) {rep(x, 30)})) month <- as.vector(sapply(1:12,
2003 Sep 30
2
cluster & mgcv update
Hello, After reinstalling the whole OS and R as well, I tried to update.packages() and get the follwing error message: concerning the mgcv update: atlas2-base is installed and blas as well (on debian). I haven't found lf77blas, I assume it's a library or something similar associated with blas. any suggestion how to solve that, thanks Martin * Installing *source* package
2013 Apr 09
1
sorting the VAR model output according to variable names??
I was wondering if one can have the coefficients of VAR model sorted according to variable names rather than lags. If you notice below, the output is sorted according to lags. >VAR(cbind(fossil,labour),p=2,type="const") VAR Estimation Results: ======================= Estimated coefficients for equation fossil: =========================================== Call: fossil = fossil.l1
2010 Apr 29
2
getting random integers
I want 100 integers. Each integer, x, can be in the range 1 =< x => 10. Does the following code give 1 and 10 the same chances to be selected as 2:8? round(runif(100, min = 1, max = 10)) -- Hans Ekbrand -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL:
2011 Nov 03
2
query about counting rows of a dataframe
Dear R users, I have got the following data frame, called my_df: gender day_birth month_birth year_birth labour 1 F 22 10 2001 1 2 M 29 10 2001 2 3 M 1 11 2001 1 4 F 3 11
2009 Jan 19
1
reference category for factor in regression
Hi all, I am struggling with a strange issue in R that I have not encountered before and I am not sure how to resolve this. The model looks like this, with all irrelevant variables left out: LABOUR - a dummy variable NONLABOUR = 1 - LABOUR AGE - a categorical variable / factor VOTE - a dummy variable glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,
2010 Sep 30
7
how to make list() return a list of *named* elements
If I combine elements into a list b <- c(22.4, 12.2, 10.9, 8.5, 9.2) my.c <- sample.int(round(2*mean(b)), 5) my.list <- list(b, my.c) the names of the elements seems to get lost in the process: > str(my.list) List of 2 $ : num [1:5] 22.4 12.2 10.9 8.5 9.2 $ : int [1:5] 11 8 6 9 20 If I explicitly name the elements at list-creation, I get what I want: my.list <- list(b=b,
2009 Apr 29
3
how to word-wrap text in labels in plots?
c <- structure(c(2L, 2L, 1L, 3L, 4L, 2L, 3L, 2L, 3L, 2L, 5L), .Label = c("foo", + "bar", "a really really long variable label mostly here to show the need of word-wrapping text in labels", + "a not so important value", "baz"), class = "factor") plot(c) Is there a way to get the long variable labels to automatically wrap so that all
2011 May 16
1
pam() clustering for large data sets
Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I
2006 May 23
1
standardization of values before call to pam() or clara()
Greetings, Experimenting with the cluster package, and am starting to scratch my head in regards to the *best* way to standardize my data. Both functions can pre-standardize columns in a dataframe. according to the manual: Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. This
2005 Dec 24
8
Prototype OOP example
Hi, Here is what I want to do: Labour = Class.create(); Labour.prototype = { initialize:function(name){ this.name = name; } } What I want to do is create a class called "Worker" which will inherit from "Labour", and the signature of "initialize" is "function(name, position)". May I ask what should do? Thank you all very much for the
2010 Aug 26
1
daisy(): space allocation issue
Hi, I'm trying to apply the function daisy() to a data.frame 10000x10 but I have not enough space (error message: cannot allocate vector of length 1476173280). I didn't imagine I was not able to work with a matrix of just 10000 observations... I have setted in Rgui --max-mem-size=2G (I'm not able to set more space..) How can I solve this issue? Separating observations depending on
2006 Apr 05
1
"partitioning cluster function"
Hi All, For the function "bclust"(e1071), the argument "base.method" is explained as "must be the name of a partitioning cluster function returning a list with the same components as the return value of 'kmeans'. In my understanding, there are three partitioning cluster functions in R, which are "clara, pam, fanny". Then I check each of them to
2007 Nov 01
2
ploting a comparison of two scores, including the labels in the plot
Hello r-help! I have data with two kind of ratings on status of 100 occupations. The first kind of rating is on the percieved "objective" status that these occupations have in society at large, and the second kind or rating is on the status that the respondents think that these occuption *should* have. The ratings were originally integer values in the rage 1-9, but in the current data,
2010 Jul 19
2
Grouping and stacking bar plot for categorical variables
Hi all, I have a series of cateogiral variables that look just like this: welfare=sample(c("less", "same", "more"), 1000, replace=TRUE) education=sample(c("less", "same", "more"), 1000, replace=TRUE) defence=sample(c("less", "same", "more"), 1000, replace=TRUE) egp=sample(c("salariat",