Hi, I am trying to remove a series of records from a large dataframe. The script I have written works fine but takes a long time to run. Can anyone suggest a quicker way to do this? Here is an example of the code I've written. The end result of this bit of code would be a dataframe with any records relating to ID 1 or ID 4 removed: #dataframe id <- c(1,1,1,1,2,2,2,2,2, 3,3,3, 4,4) year <- c(1,1,1,2, 2,2,3,2,2, 2,3,4, 8,8) age <- c("Adult",NA,NA,NA, "Adult",NA,NA,NA, "Adult", NA,"Adult",NA, NA,"Adult") dat <- data.frame(id, year, age) dat.id<-unique(dat$id) #ID numbers for removal bad<- data.frame(c(1,4)) names(bad)<-"id" remove.value<-bad$id good.id<- dat.id[!dat.id%in%remove.value] #Combine all good ID numbers if(exists("dat.2")){ rm(dat.2)} for(i in good.id){ lala<-dat[which(dat$id==i),] if(!exists("dat.2")) { dat.2 <- lala } else { dat.2 <- rbind(dat.2, lala) } } Many thanks in advance for any suggestions -- View this message in context: http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990.html Sent from the R help mailing list archive at Nabble.com.
Hello, If I understand it well, idx <- !dat$id %in% bad$id dat[idx, ] Also, to create bad you are complicating, this would do: bad <- data.frame(id = c(1,4)) Hope this helps, Rui Barradas Em 22-10-2012 12:04, penguins escreveu:> Hi, I am trying to remove a series of records from a large dataframe. The > script I have written works fine but takes a long time to run. Can anyone > suggest a quicker way to do this? > > Here is an example of the code I've written. The end result of this bit of > code would be a dataframe with any records relating to ID 1 or ID 4 removed: > > #dataframe > id <- c(1,1,1,1,2,2,2,2,2, 3,3,3, 4,4) > year <- c(1,1,1,2, 2,2,3,2,2, 2,3,4, 8,8) > age <- c("Adult",NA,NA,NA, "Adult",NA,NA,NA, "Adult", > NA,"Adult",NA, NA,"Adult") > dat <- data.frame(id, year, age) > dat.id<-unique(dat$id) > > #ID numbers for removal > bad<- data.frame(c(1,4)) > names(bad)<-"id" > remove.value<-bad$id > > > good.id<- dat.id[!dat.id%in%remove.value] > > #Combine all good ID numbers > if(exists("dat.2")){ rm(dat.2)} > > for(i in good.id){ > lala<-dat[which(dat$id==i),] > > if(!exists("dat.2")) { > dat.2 <- lala } else { > dat.2 <- rbind(dat.2, lala) > } > } > > Many thanks in advance for any suggestions > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, You can also use ?merge() dat$ind_dat<-TRUE bad$ind_bad<-TRUE res<-merge(dat,bad,all=TRUE) ?res1<-res[is.na(res$ind_bad),][,1:3] ?res1 #?? id year?? age #5?? 2??? 2 Adult #6?? 2??? 2? <NA> #7?? 2??? 3? <NA> #8?? 2??? 2? <NA> #9?? 2??? 2 Adult #10? 3??? 2? <NA> #11? 3??? 3 Adult #12? 3??? 4? <NA> A.K. ----- Original Message ----- From: penguins <catrsw at bas.ac.uk> To: r-help at r-project.org Cc: Sent: Monday, October 22, 2012 7:04 AM Subject: [R] Remove records from a large dataframe Hi, I am trying to remove a series of records from a large dataframe. The script I have written works fine but takes a long time to run. Can anyone suggest a quicker way to do this? Here is an example of the code I've written. The end result of this bit of code would be a dataframe with any records relating to ID 1 or ID 4 removed: #dataframe id <-? ? c(1,1,1,1,2,2,2,2,2, 3,3,3, 4,4) year <- c(1,1,1,2, 2,2,3,2,2, 2,3,4, 8,8) age <-? c("Adult",NA,NA,NA, "Adult",NA,NA,NA, "Adult", ? ? ? ? NA,"Adult",NA, NA,"Adult") dat <- data.frame(id, year, age) dat.id<-unique(dat$id) #ID numbers for removal bad<- data.frame(c(1,4)) names(bad)<-"id" remove.value<-bad$id good.id<- dat.id[!dat.id%in%remove.value] #Combine all good ID numbers if(exists("dat.2")){ rm(dat.2)} for(i in good.id){ ? ? lala<-dat[which(dat$id==i),] ? ? if(!exists("dat.2")) { ? ? ? dat.2 <- lala } else { ? ? ? dat.2 <- rbind(dat.2, lala) ? ? ? } } Many thanks in advance for any suggestions -- View this message in context: http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks Rui, your solution works great and is so fast! -- View this message in context: http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990p4647029.html Sent from the R help mailing list archive at Nabble.com.