I want to eliminate certain observations in a large dataframe (21000x100). I have written code which does this using a binary vector (0=delete obs, 1=keep), but it uses for loops, and so it's slow and in the extreme it causes R to hang for indefinite time periods. I'm looking for one of two things: 1. A document which discusses how to avoid for loops and situations in which it's impossible to avoid for loops. or 2. A function which can do the above better than mine. My code is pasted below. Thanks so much, Janet # asst is a binary vector of length= nrow(DATAFRAME). # 1= observations you want to keep. 0= observation to get rid of. remove.xtra.f <-function(asst, DATAFRAME) { n<-sum(asst, na.rm=T) newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME)) j<-1 for(i in 1:length(data)) { if (asst[i]==1) { newdata[j,]<-DATAFRAME[i,] j<-j+1 } } newdata.f<-as.data.frame(newdata) names(newdata.f)<-names(DATAFRAME) return(newdata.f) } -- Janet Rosenbaum jerosenb at fas.harvard.edu PhD Candidate in Health Policy, Harvard GSAS Harvard Injury Control Research Center, Harvard School of Public Health
On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:> > I want to eliminate certain observations in a large dataframe (21000x100). > I have written code which does this using a binary vector (0=delete obs, > 1=keep), but it uses for loops, and so it's slow and in the extreme it > causes R to hang for indefinite time periods. > > I'm looking for one of two things: > 1. A document which discusses how to avoid for loops and situations in > which it's impossible to avoid for loops. > > or > > 2. A function which can do the above better than mine.?subset newdata <- subset(DATAFRAME, asst==1) which will work whether DATAFRAME is a matrix or data.frame (two different classes).> > My code is pasted below. > > Thanks so much, > > Janet > > # asst is a binary vector of length= nrow(DATAFRAME). > # 1= observations you want to keep. 0= observation to get rid of. > > remove.xtra.f <-function(asst, DATAFRAME) { > n<-sum(asst, na.rm=T) > newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME)) > j<-1 > for(i in 1:length(data)) { > if (asst[i]==1) { > newdata[j,]<-DATAFRAME[i,] > j<-j+1 > } > } > newdata.f<-as.data.frame(newdata) > names(newdata.f)<-names(DATAFRAME) > return(newdata.f) > } > -- > Janet Rosenbaum jerosenb at fas.harvard.edu > PhD Candidate in Health Policy, Harvard GSAS > Harvard Injury Control Research Center, Harvard School of Public Health > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: Roger.Bivand at nhh.no
On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:> > I want to eliminate certain observations in a large dataframe (21000x100). > I have written code which does this using a binary vector (0=delete obs, > 1=keep), but it uses for loops, and so it's slow and in the extreme it > causes R to hang for indefinite time periods. > > I'm looking for one of two things: > 1. A document which discusses how to avoid for loops and situations in > which it's impossible to avoid for loops.`S Programming': see the FAQ. But at the level of the example below, chapter 2 of MASS4 (FAQ again for details).> or > > 2. A function which can do the above better than mine. > > My code is pasted below. > > Thanks so much, > > Janet > > # asst is a binary vector of length= nrow(DATAFRAME). > # 1= observations you want to keep. 0= observation to get rid of.How about DATAFRAME[asst == 1, ] ? I am not sure if asst has NAs in, but if it has you will get an error from if (asst[i]==1) and if not, you don't need na.rm=T.> DF <- as.data.frame(matrix(rnorm(21000*100),, 100)) > asst <- rbinom(21000, 1, 0.7) > DF2 <- DF[asst==1,]where the subsetting took less than a second for me. Note that your code converts DATAFRAME to a matrix. If that is reasonable (e.g. it is all numeric), then matrix indexing will be faster.> remove.xtra.f <-function(asst, DATAFRAME) { > n<-sum(asst, na.rm=T) > newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME)) > j<-1 > for(i in 1:length(data)) { > if (asst[i]==1) { > newdata[j,]<-DATAFRAME[i,] > j<-j+1 > } > } > newdata.f<-as.data.frame(newdata) > names(newdata.f)<-names(DATAFRAME) > return(newdata.f) > } > -- > Janet Rosenbaum jerosenb at fas.harvard.edu > PhD Candidate in Health Policy, Harvard GSAS > Harvard Injury Control Research Center, Harvard School of Public Health > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Have you tried reading the manual "An Introduction to R", with special attention to "Array Indexing" (indexing for data frames is pretty similar to indexing for matrices). Unless I'm misunderstanding, what you want to do is very simple. It is possible to use numeric vectors with 0 and 1 to indicate whether you want to keep the row, but it's a little easier with logical vectors. Here's an example: > x <- data.frame(a=1:5,b=letters[1:5]) > keep.num <- ifelse(x$a %% 2 == 1, 1, 0) > keep.num [1] 1 0 1 0 1 > keep.logical <- (x$a %% 2) == 1 > keep.logical [1] TRUE FALSE TRUE FALSE TRUE > x[keep.num==1,,drop=F] a b 1 1 a 3 3 c 5 5 e > x[keep.logical,,drop=F] a b 1 1 a 3 3 c 5 5 e > At Friday 10:34 AM 11/5/2004, Janet Elise Rosenbaum wrote:>I want to eliminate certain observations in a large dataframe (21000x100). >I have written code which does this using a binary vector (0=delete obs, >1=keep), but it uses for loops, and so it's slow and in the extreme it >causes R to hang for indefinite time periods. > >I'm looking for one of two things: >1. A document which discusses how to avoid for loops and situations in >which it's impossible to avoid for loops. > >or > >2. A function which can do the above better than mine. > >My code is pasted below. > >Thanks so much, > >Janet > ># asst is a binary vector of length= nrow(DATAFRAME). ># 1= observations you want to keep. 0= observation to get rid of. > >remove.xtra.f <-function(asst, DATAFRAME) { > n<-sum(asst, na.rm=T) > newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME)) > j<-1 > for(i in 1:length(data)) { > if (asst[i]==1) { > newdata[j,]<-DATAFRAME[i,] > j<-j+1 > } > } > newdata.f<-as.data.frame(newdata) > names(newdata.f)<-names(DATAFRAME) > return(newdata.f) >} >-- >Janet Rosenbaum jerosenb at fas.harvard.edu >PhD Candidate in Health Policy, Harvard GSAS >Harvard Injury Control Research Center, Harvard School of Public Health > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html