thr3ads.net - similar to: "memory problem in handling large dataset"

Displaying 20 results from an estimated 10000 matches similar to: "memory problem in handling large dataset"

2005 Jul 25

cluster

Dear listers: Here I have a question on clustering methods available in R. I am trying to down-sampling the majority class in a classification problem on an imbalanced dataset. Since I don't want to lose information in the original dataset, I don't want to use naive down-sampling: I think using clustering on the majority class' side to select "representative" samples might

randomForest

2005 Jul 07

randomForest

> From: Weiwei Shi > > it works. > thanks, > > but: (just curious) > why i tried previously and i got > > > is.vector(sample.size) > [1] TRUE Because a list is also a vector: > a <- c(list(1), list(2)) > a [[1]] [1] 1 [[2]] [1] 2 > is.vector(a) [1] TRUE > is.numeric(a) [1] FALSE Actually, the way I initialize a list of known length is by

Random Forest

2007 Apr 23

Random Forest

Hi, I am trying to print out my confusion matrix after having created my random forest. I have put in this command: fit<-randomForest(MMS_ENABLED_HANDSET~.,data=dat,ntree=500,mtry=14, na.action=na.omit,confusion=TRUE) but I can't get it to give me the confusion matrix, anyone know how this works? Thansk! Ruben [[alternative HTML version deleted]]

source a specific function

2007 Jun 18

source a specific function

Dear Listers: For example, if I have a .R source file which has more than one function, and I want to just load only one of the functions, how could I do that? (removing the rest after sourcing is not what I intend b/c in my workspace, I might have some of the rest and I don't want to change my workspace: i.e., I only change my workspace by adding one function from a R source file). Thanks,

2008 Aug 24

similarity between two gene lists with varied length

Dear listers, a little off-topic: I am looking for and compare algorithms which can calculate "distance" or "similarity" between two gene lists with different lengths. Any paper, any implementation in R and any suggestion is welcome! Thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..."

gbm

2005 Jan 12

gbm

Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. thanks, Ed

a problem in random forest

2005 Oct 11

a problem in random forest

Hi, there: I spent some time on this but I think I really cannot figure it out, maybe I missed something here: my data looks like this: > dim(trn3) [1] 7361 209 > dim(val3) [1] 7427 209 > mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[, 1:208], ytest=val3[,209], importance=T) my test data has 7427 observations but after prediction, > dim(mg.rf2$votes)

pretty report

2007 Jun 12

pretty report

Dear Listers: I have a couple of data frames to report and each corresponds to different condtions, e.g. conditions=c(10, 15, 20, 25). In this examples, four data frames need to be exported in a "pretty" report. I knew Perl has some module for exporting data to Excel and after googling, I found R does not. So I am wondering if there is a package in R for generating good reports. I

Cronbach's alpha

2007 Jan 24

Cronbach's alpha

Dear Listers: I used cronbach{psy} to evaluate the internal consistency and some set of variables gave me alpha=-1.1003, while other, alpha=-0.2; alpha=0.89; and so on. I am interested in knowing how to interpret 1. negative value 2. negative value less than -1. I also want to re-mention my previous question about how to evaluate the consistency of a set of variables and about the total

a statistics question

2006 Apr 07

a statistics question

Hi there, I have a statistics question on a classification problem: Suppose I have 1000 binary variables and one binary dependent variable. I want to find a way similar to PCA, in which I can find a couple of combinations of those variables to discriminate best according to the dependent variable. It is not only for dimension reduction, but more important, for finding best way to construct

help with RandomForest classwt option

2007 Jan 28

help with RandomForest classwt option

Hello there, I am working on an extremely unbalanced two class classification problems. I wanna use "classwt" with "down sampling" together. By checking the rfNews() in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you

multi-class classification using rpart

2005 Jan 25

multi-class classification using rpart

Hi, I am trying to make a multi-class classification tree by using rpart. I used MASS package'd data: fgl to test and it works well. However, when I used my small-sampled data as below, the program seems to take forever. I am not sure if it is due to slowness or there is something wrong with my codes or data manipulation. Please be advised ! The data is described as the output from str()

margins defined in randomForest and supclust

2009 Jul 22

margins defined in randomForest and supclust

Hi there, How to solve the conflicts as to the same object between two packages, for example, like margins in both randomForest and supclust? When both libraries are installed, supclust will complain "margins" defined in randomForest. I can only solve it by re-starting R, which is very inconvenient, any clever way? Thanks, Weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc.

memory problems when combining randomForests [Broadcast]

2006 Jul 27

memory problems when combining randomForests [Broadcast]

You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki > > Dear all, > > I am trying to train a randomForest using all my control data > (12,000 cases, ~ 20 explanatory variables, 2 classes). > Because

RandomForest question

2005 Jul 21

RandomForest question

Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although there are only 32 explanatory variables the best classification performance is reached when

selectively load some objects from old workspace

2006 Oct 18

selectively load some objects from old workspace

Dear Listers: I have a question on loading objects from workspace: suppose I have two workspaces for two approaches. My old workspace has some objects I need for the new workspace but I don't want to load the whole old workspace and remove most of the old objects and get what I want. Is there an easier way to do like this: load "some needed obj" from old workspace, which has been

how to calculate mean into a list

2007 Aug 28

how to calculate mean into a list

Dear Listers: I have this task and suppose a0 is a list of 10 data.frames, I want to calculate like this > (a0[[1]]+a0[[2]]+..+a[[10]])/10 Thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

problems in loading MASS

2007 Apr 12

problems in loading MASS

Hi, there: After I upgraded my R to 2.4.1, it is my first time of trying to use MASS and found the following error message: > install.packages("MASS") --- Please select a CRAN mirror for use in this session --- trying URL 'http://cran.cnr.Berkeley.edu/bin/macosx/universal/contrib/2.4/VR_7.2-33.tgz' Content type 'application/x-gzip' length 995260 bytes opened URL

time series clustering

2006 Jun 03

time series clustering

Dear Listers: I happened to have a problem requiring time-series clustering since the clusters will change with time (too old data need to be removed from data while new data comes in). I am wondering if there is some paper or reference on this topic and there is some kind of implementation in R? Thanks, Weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I

some thoughts on outlier detection, need help!

2005 Aug 04

some thoughts on outlier detection, need help!

Dear listers: I have an idea to do the outlier detection and I need to use R to implement it first. Here I hope I can get some input from all the guru's here. I select distance-based approach--- step 1: calculate the distance of any two rows for a dataframe. considering the scaling among different variables, I choose mahalanobis, using variance as scaler. step 2: Let k be the number of

similar to: memory problem in handling large dataset