thr3ads.net - similar to: "Cluster analysis, factor variables, large data set"

Displaying 20 results from an estimated 3000 matches similar to: "Cluster analysis, factor variables, large data set"

2008 Jul 29

About clustering techniques

Hello R users It's some time I am playing with a dataset to do some cluster analysis. The data set consists of 14 columns being geographical coordinates and monthly temperatures in annual files latitutde - longitude - temperature 1 -..... - temperature 12 I have some missing values in some cases, maybe there are 8 monthly valid values at some points with four non valid. I don't want to

simple usage of "for"

2008 Feb 19

simple usage of "for"

Hi list I have a data frame I would like to loop over. To begin with I would like crosstabulations using the first variabel in the data frame, which is called "meriter". > table(meriter[[1]], meriter[[3]]) ja nej Annan 0

make check failure -- R 2.1.0 Windows XP SP2

2005 Apr 20

make check failure -- R 2.1.0 Windows XP SP2

I compiled R 2.1.0 under Windows XP SP2 as a preliminary to rebuilding a custom package for use with R 2.1.0. The compile completed successfully, and I was able to run demo(graphics) successfully. But make check and make check-recommended fail. > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 1.0 year 2005

passing known medoids to clara() in the cluster package

2006 Apr 10

passing known medoids to clara() in the cluster package

Greetings, I have had good success using the clara() function to perform a simple cluster analysis on a large dataset (1 million+ records with 9 variables). Since the clara function is a wrapper to pam(), which will accept known medoid data - I am wondering if this too is possible with clara() ... The documentation does not suggest that this is possible. Essentially I am trying to

make error for R 2.13.0 (and 2.12.0)

2010 Oct 18

make error for R 2.13.0 (and 2.12.0)

Regarding Tengfei Yin's post about an error trying to install "cluster" in 2.13.0, I have gotten an error with this package when trying to install the released version of 2.12.0. Here is the output on an Ubuntu Linux system: begin installing recommended package cluster * installing *source* package 'cluster' ... ** libs make[3]: Entering directory

which is the fastest way to make data.frame out of a three-dimensional array?

2012 Feb 25

which is the fastest way to make data.frame out of a three-dimensional array?

foo <- rnorm(30*34*12) dim(foo) <- c(30, 34, 12) I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame. I know how this can be done in a very slow way using for loops, like this: x <- rep(seq(from = 1, to = 30), 34) y <- as.vector(sapply(1:34, function(x) {rep(x, 30)})) month <- as.vector(sapply(1:12,

cluster & mgcv update

2003 Sep 30

cluster & mgcv update

Hello, After reinstalling the whole OS and R as well, I tried to update.packages() and get the follwing error message: concerning the mgcv update: atlas2-base is installed and blas as well (on debian). I haven't found lf77blas, I assume it's a library or something similar associated with blas. any suggestion how to solve that, thanks Martin * Installing *source* package

sorting the VAR model output according to variable names??

2013 Apr 09

sorting the VAR model output according to variable names??

I was wondering if one can have the coefficients of VAR model sorted according to variable names rather than lags. If you notice below, the output is sorted according to lags. >VAR(cbind(fossil,labour),p=2,type="const") VAR Estimation Results: ======================= Estimated coefficients for equation fossil: =========================================== Call: fossil = fossil.l1

getting random integers

2010 Apr 29

getting random integers

I want 100 integers. Each integer, x, can be in the range 1 =< x => 10. Does the following code give 1 and 10 the same chances to be selected as 2:8? round(runif(100, min = 1, max = 10)) -- Hans Ekbrand -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL:

query about counting rows of a dataframe

2011 Nov 03

query about counting rows of a dataframe

Dear R users, I have got the following data frame, called my_df: gender day_birth month_birth year_birth labour 1 F 22 10 2001 1 2 M 29 10 2001 2 3 M 1 11 2001 1 4 F 3 11

reference category for factor in regression

2009 Jan 19

reference category for factor in regression

Hi all, I am struggling with a strange issue in R that I have not encountered before and I am not sure how to resolve this. The model looks like this, with all irrelevant variables left out: LABOUR - a dummy variable NONLABOUR = 1 - LABOUR AGE - a categorical variable / factor VOTE - a dummy variable glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,

how to make list() return a list of *named* elements

2010 Sep 30

how to make list() return a list of *named* elements

If I combine elements into a list b <- c(22.4, 12.2, 10.9, 8.5, 9.2) my.c <- sample.int(round(2*mean(b)), 5) my.list <- list(b, my.c) the names of the elements seems to get lost in the process: > str(my.list) List of 2 $ : num [1:5] 22.4 12.2 10.9 8.5 9.2 $ : int [1:5] 11 8 6 9 20 If I explicitly name the elements at list-creation, I get what I want: my.list <- list(b=b,

how to word-wrap text in labels in plots?

2009 Apr 29

how to word-wrap text in labels in plots?

c <- structure(c(2L, 2L, 1L, 3L, 4L, 2L, 3L, 2L, 3L, 2L, 5L), .Label = c("foo", + "bar", "a really really long variable label mostly here to show the need of word-wrapping text in labels", + "a not so important value", "baz"), class = "factor") plot(c) Is there a way to get the long variable labels to automatically wrap so that all

pam() clustering for large data sets

2011 May 16

pam() clustering for large data sets

Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I

standardization of values before call to pam() or clara()

2006 May 23

standardization of values before call to pam() or clara()

Greetings, Experimenting with the cluster package, and am starting to scratch my head in regards to the *best* way to standardize my data. Both functions can pre-standardize columns in a dataframe. according to the manual: Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. This

Prototype OOP example

2005 Dec 24

Prototype OOP example

Hi, Here is what I want to do: Labour = Class.create(); Labour.prototype = { initialize:function(name){ this.name = name; } } What I want to do is create a class called "Worker" which will inherit from "Labour", and the signature of "initialize" is "function(name, position)". May I ask what should do? Thank you all very much for the

daisy(): space allocation issue

2010 Aug 26

daisy(): space allocation issue

Hi, I'm trying to apply the function daisy() to a data.frame 10000x10 but I have not enough space (error message: cannot allocate vector of length 1476173280). I didn't imagine I was not able to work with a matrix of just 10000 observations... I have setted in Rgui --max-mem-size=2G (I'm not able to set more space..) How can I solve this issue? Separating observations depending on

"partitioning cluster function"

2006 Apr 05

"partitioning cluster function"

Hi All, For the function "bclust"(e1071), the argument "base.method" is explained as "must be the name of a partitioning cluster function returning a list with the same components as the return value of 'kmeans'. In my understanding, there are three partitioning cluster functions in R, which are "clara, pam, fanny". Then I check each of them to

ploting a comparison of two scores, including the labels in the plot

2007 Nov 01

ploting a comparison of two scores, including the labels in the plot

Hello r-help! I have data with two kind of ratings on status of 100 occupations. The first kind of rating is on the percieved "objective" status that these occupations have in society at large, and the second kind or rating is on the status that the respondents think that these occuption *should* have. The ratings were originally integer values in the rage 1-9, but in the current data,

Grouping and stacking bar plot for categorical variables

2010 Jul 19

Grouping and stacking bar plot for categorical variables

Hi all, I have a series of cateogiral variables that look just like this: welfare=sample(c("less", "same", "more"), 1000, replace=TRUE) education=sample(c("less", "same", "more"), 1000, replace=TRUE) defence=sample(c("less", "same", "more"), 1000, replace=TRUE) egp=sample(c("salariat",

similar to: Cluster analysis, factor variables, large data set