similar to: randomForest memory footprint

Displaying 20 results from an estimated 700 matches similar to: "randomForest memory footprint"

2011 Feb 25
1
speed up process
Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it
2006 Feb 20
3
Boxplot Help for Neophyte
R helpers I am getting to grips with R but came across a small problem today that I could not fix by myself. I have 3 text files, each with a single column of data. I read them in using: myData1<-scan("C:/Program Files/R/myData1.txt") myData2<-scan("C:/Program Files/R/myData2.txt") myData3<-scan("C:/Program Files/R/myData3.txt") I wanted to produce a
2009 Jan 06
5
Using apply for two datasets
I can run one-sample t-test on an array, for example a matrix myData1, with the following apply(myData1, 2, t.test) Is there a similar fashion using apply() or something else to run 2-sample t-test with datasets from two groups, myData1 and myData2, without looping? TIA, Gang
2012 Jul 03
1
insert missing dates
Hello I have dataframes. mydata1 <-data.frame(value=c(15,20,25,30,45,50),dates=c("2005-05-25 07:00:00 ","2005-05-25 19:00:00","2005-06-25 07:00:00","2005-06-25 19:00:00 ","2005-07-25 07:00:00","2005-8-25 19:00:00")) or mydata2 <-data.frame(value=c(15,20,25,30,45,50),dates=c("2005-05-25 00:00:00 ","2005-05-25
2012 May 15
4
reading data into R
Hi I am really new using R, so this is really a beginner stuff! I created a very small data set on excel and then converted it to .csv file. I am able to open the data on R using the command "read.table ("mydata1.csv", sep=",", header=T)" and it just works fine. But when I want to work on the data (e.g. calculate the mean of variable "X") R says
2011 Jul 11
1
GLS - Plotting Graphs with 95% conf interval
Hi, I am trying to plot the original data with the line of the model using the predict function. I want to add SE to the graph, but not sure how to get them out as the predict function for gls does not appear to allow for SE=TRUE argument. Here is my code so far: f1<-formula(MaxNASC40_50~hu3+flcmax+TidalFlag) vf1Exp<-varExp(form=~hu3) B1D<-gls(f1,correlation=corGaus(form=Lat~Lon,
2012 Jun 06
3
problem about set operation and computation after split
hi, I met some problems in R, plz help me. 1. How to do a intersect operation among several groups in one list, without a loop statement? (I think It may be a list) create data: myData <- data.frame(product = c(1,2,3,1,2,3,1,2,2), year=c(2009,2009,2009,2010,2010,2010,2011,2011,2011),value=c(1104,608,606,1504,508,1312,900,1100,800)) mySplit<- split(myData,myData$year)
2013 Jan 10
1
Semi Parametric Bootstrap
Greetings to you all, I am performing a semi parametric bootstrap in R on a Gamma Distributed data and a Binomial distributed data. The main challenge am facing is the fact that the residual variance depends on the mean (if I am correct). I strongly feel that the script below may be wrong due to mean-variance relationship #####R code####### fit1s
2008 Apr 15
2
How can I import user-defined missings from Spss?
Hi, It works for me to import spss datasets via library(foreign) with read.spss or via library Hmisc by (spss.get). But no matter which way I do import the data, user-defined missings from Spss are always lost. (it makes no difference if there are a single value, a range, or any combination of them. They are always ignored). Is there any way in R to find out if any value was user-defined missing
2010 Jan 22
1
confidence intervals for mean (GLM)
Dear useRs, How could I obtain the confidence intervals for the means of my treatments, when my data was fitted to a GLM? I need the CI's for the Poisson and Negative Binomial distributions. Here's what I have: mydata1 <- data.frame('treatments'=gl(4,20), 'value'=rpois(80, 1)) model1 <- glm(value ~ treatments, data=mydata1, family=poisson) means1 <-
2018 Mar 15
3
stats 'dist' euclidean distance calculation
Hello, I am working with a matrix of multilocus genotypes for ~180 individual snail samples, with substantial missing data. I am trying to calculate the pairwise genetic distance between individuals using the stats package 'dist' function, using euclidean distance. I took a subset of this dataset (3 samples x 3 loci) to test how euclidean distance is calculated: 3x3 subset used
2011 Feb 28
0
Fwd: Re: speed up process
Dear Jim, Here is again exactly what I did and with the output of Rprof (with this reduced dataset and with a simpler function, it is here much faster than in real life). Thanks you again for your help! ## CODE ## mydata1<- structure(list(species = structure(1:8, .Label = c("alsen","gogor", "loalb", "mafas", "pacyn", "patro",
2008 Jun 24
1
Error Handling
Hi All, The for-loop below stopped when error("Cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance") occurred. I assigned a row of NA values to the data frame "m1" manually and reset "j" in the for-loop every time error returned. I’m wondering if there is a function that can detect error or failure, so the
2010 Nov 11
2
Kolmogorov Smirnov Test
I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always < 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm
2013 May 02
0
Data in packages: save or write.table?
Hi all, I am trying to understand Writing R Extension... Section 1.1.5, data: I include two datasets in a package, one using 'save', the other using 'write.table': --- 8< ---- myData1 <- data.frame(x=1:10) write.table(myData1,file="myData1.txt") myData2 <- data.frame(x=2:10) save(myData2,file="myData2.Rdata") --- 8< ---- Then R CMD check aks me to
2003 Jul 17
2
i need help in cluster analyse
Hello, My name is Rodrigo, I am using R program and I have a trouble. I am trying to do a dendrogram with genetics information. Let me explain... The Similarity Matrix was already did, and with this matrix I want to construct a dendrogram. So, the distance is done. I need to transform this matrix (that I have) in a dendrogram, I woud be very grateful if someone could help me. PS: I am sending
2012 Mar 16
1
multivariate regression and lm()
Hello, I would like to perform a multivariate regression analysis to model the relationship between m responses Y1, ... Ym and a single set of predictor variables X1, ..., Xr. Each response is assumed to follow its own regression model, and the error terms in each model can be correlated. Based on my readings of the R help archives and R documentation, the function lm() should be able to
2011 Feb 13
2
creating NAs for some values only
Hello, I have some data file, say, mydata 1,2,3,4,5,6,7 3,3,4,4,w,w,1 w,3,6,5,7,8,9 4,4,w,5,3,3,0 i want to replace some percentages of "mydata" file in to NAs for those values that are NOT w's. I know how to apply the percentage thing here but don't know how to select those values that are not "w"s. So far, i was able to do it but the result replaces the w's
2012 Apr 05
0
Multi part problem...array manipulation and sampling
Ok, I have a new, multipart problem that I need help figuring out. Part 1. I have a three dimensional array (species, sites, repeat counts within sites). Sampling effort per site varies so the array should be ragged. Maximum number of visits at any site = 22 Number of species = 161 Number of sites = 56 I generated the array first by;
2008 Aug 11
1
help on model selection - step()
dears R-users, I'm interested in model selection problem, and i have faced some problems that i would like to ask for help. well, this is a very small example with 4 variable (just one var. is the response - z) with 100 individuals i would like to do a stepwise search, for the "best" model, and a use BIC criteria. I know when I have a lot of variables, let's say 120, I know,