thr3ads.net - similar to: "cluster"

Displaying 20 results from an estimated 4000 matches similar to: "cluster"

2006 Oct 17

cluster in R

hi, is there some good summary on clustering methods in R? It seems there are many packages involving it. And I have two questions on clustering here: 1. Is there a way of evaluate the effecitives (or seperation) of clustering (rather than by visualization)? 2. Is there a search method (like genetic search) which can help find the best subset of attributes which gives best seperation? Thanks,

computationally singular

2005 Aug 08

computationally singular

Hi, I have a dataset which has around 138 variables and 30,000 cases. I am trying to calculate a mahalanobis distance matrix for them and my procedure is like this: Suppose my data is stored in mymatrix > S<-cov(mymatrix) # this is fine > D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S)) Error in solve.default(cov, ...) : system is computationally

network package in R

2011 May 27

network package in R

Hi there, I need a network builder and it can change the node size and color; I am not sure if network package in R can do this or not. The other functions I wanted have been found in that package. BTW, if there is another package in R relating to this, please suggest too. Thanks, Weiwei -- Weiwei Shi, Ph.D Research Scientist "Did you always know?" "No, I did not. But I

need help

2005 Aug 12

need help

Hi, there: I think i need to re-phrase my question since last time I did not get any reply but i think the question is not that hard, probably i did not make the question clear: I want to find cases like 35, 90, 330, 330, 335 from the rest which look like 3, 3, 3, 3.2, 3.3 4, 4.4, 4.5, 4.6, 4.7 .... basically there is one (or more) big 'gap' in the case i seek. thanks, weiwei --

read.table

2005 Jul 13

read.table

Hi, I have a question on read.table. I have a dataset with 273,000 lines and 195 columns. I used the read.table to load the data into R: trn<-read.table('train1.dat', header=F, sep='|', na.strings='.') I found it takes forever. then I run 1/10 of the data (test) using read.table again. And this time it finished quickly. So, there might be something wrong in my data

randomForest

2005 Jul 07

randomForest

> From: Weiwei Shi > > it works. > thanks, > > but: (just curious) > why i tried previously and i got > > > is.vector(sample.size) > [1] TRUE Because a list is also a vector: > a <- c(list(1), list(2)) > a [[1]] [1] 1 [[2]] [1] 2 > is.vector(a) [1] TRUE > is.numeric(a) [1] FALSE Actually, the way I initialize a list of known length is by

a problem in random forest

2005 Oct 11

a problem in random forest

Hi, there: I spent some time on this but I think I really cannot figure it out, maybe I missed something here: my data looks like this: > dim(trn3) [1] 7361 209 > dim(val3) [1] 7427 209 > mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[, 1:208], ytest=val3[,209], importance=T) my test data has 7427 observations but after prediction, > dim(mg.rf2$votes)

generalized linear model and missing handling

2005 Oct 04

generalized linear model and missing handling

Hi, I have a dataset and want to build a generalized linear model on it. Unfortunately, complete.cases(df) returns null, which means I have to find a way to "fill" those missings. One way is following my previous post to use median to replace(or use most freq. of level to replace for catergorical case), but I am wondering if there are other ways, when glm or something like it is

question on write.table

2005 Dec 15

question on write.table

Hi, I have a question on write.table: I have a data.frame called t7 as below: > dim(t7) [1] 14015184 6 > t7[1:5,] uci uce par line graphical.forms stems 1 0 0 0 0 active activ 2 0 0 0 0 policy polici 3 0 0 0 0 wc PC 4 0 0 0 0 eff elf 5 0 0 0 0 icn ICC I want to write the

time series clustering

2006 Jun 03

time series clustering

Dear Listers: I happened to have a problem requiring time-series clustering since the clusters will change with time (too old data need to be removed from data while new data comes in). I am wondering if there is some paper or reference on this topic and there is some kind of implementation in R? Thanks, Weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I

how to reverse a list

2007 Apr 11

how to reverse a list

Hi, there: I am wondering if there is a quick way to "reverse" a list like this: t0 <- list(a=1, b=1, c=2, d=1) reverst t0 to t1 > t1 $`1` [1] "a" "b" "d" $`2` [1] "c" thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

intersect more than two sets

2007 Apr 24

intersect more than two sets

Hi, I searched the archives and did not find a good solution to that. assume I have 10 sets and I want to have the common character elements of them. how could i do that? -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

a string to enviroment or function

2007 Jun 25

a string to enviroment or function

Hi, I am wondering how to make a function Fun to make the following work: t0 <- (paste("hgu133a", "ENTREZID", sep="")) xx <- as.list(Fun(t0)) # make it work like xx<-as.list(hgu133aENTREZID) thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

"more" and "tab" functionalities in R under linux

2005 Jul 08

"more" and "tab" functionalities in R under linux

Hi, forgive me if it is due to my "laziness" :) I am wondering if there are functionalities in R, which can do like "more" and "tab" in linux: more(one.data.frame) so I can browse through it. Sometimes I can use one.data.frame[1:100,], but still not as good as "more" in linux. tab: can I use tab to auto complete an defined object name in R so I don't

pca in dimension reduction

2005 Oct 05

pca in dimension reduction

Hi, there: I am wondering if anyone here can provide an example using pca doing dimension reduction for a dataset. The dataset can be n*q (n>=q or n<=q). As to dimension reduction, are there other implementations for like ICA, Isomap, Locally Linear Embedding... Thanks, weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III

an error in my using of nnet

2005 Oct 11

an error in my using of nnet

Hi, there: I am trying nnet as followed: > mg.nnet<-nnet(x=trn3[,r.v[1:100]], y=trn3[,209], size=5, decay = 5e-4, maxit = 200) # weights: 511 initial value 13822.108453 iter 10 value 7408.169201 iter 20 value 7362.201934 iter 30 value 7361.669408 iter 40 value 7361.294379 iter 50 value 7361.045190 final value 7361.038121 converged Error in y - tmp : non-numeric argument to binary operator

how to split data.frame by row?

2007 Oct 29

how to split data.frame by row?

hi, if I have 20 x 3 data.frame, how to split it into 10 x 6 (moving the lower part of 10x3 to column) or 5 x 12 thanks -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

2008 Aug 24

similarity between two gene lists with varied length

Dear listers, a little off-topic: I am looking for and compare algorithms which can calculate "distance" or "similarity" between two gene lists with different lengths. Any paper, any implementation in R and any suggestion is welcome! Thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..."

a statistics question

2006 Apr 07

a statistics question

Hi there, I have a statistics question on a classification problem: Suppose I have 1000 binary variables and one binary dependent variable. I want to find a way similar to PCA, in which I can find a couple of combinations of those variables to discriminate best according to the dependent variable. It is not only for dimension reduction, but more important, for finding best way to construct

dlda{supclust} 's output

2007 May 01

dlda{supclust} 's output

Hi, I am using dlda algorithm from supclust package and I am wondering if the output can be a continuous probability instead of discrete class label (zero or one) since it puts some restriction on convariance matrix, compared with lda, while the latter can. thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..."

similar to: cluster