Hi, I am trying to remove a series of records from a large dataframe. The
script I have written works fine but takes a long time to run. Can anyone
suggest a quicker way to do this?
Here is an example of the code I've written. The end result of this bit of
code would be a dataframe with any records relating to ID 1 or ID 4 removed:
#dataframe
id <- c(1,1,1,1,2,2,2,2,2, 3,3,3, 4,4)
year <- c(1,1,1,2, 2,2,3,2,2, 2,3,4, 8,8)
age <- c("Adult",NA,NA,NA, "Adult",NA,NA,NA,
"Adult",
NA,"Adult",NA, NA,"Adult")
dat <- data.frame(id, year, age)
dat.id<-unique(dat$id)
#ID numbers for removal
bad<- data.frame(c(1,4))
names(bad)<-"id"
remove.value<-bad$id
good.id<- dat.id[!dat.id%in%remove.value]
#Combine all good ID numbers
if(exists("dat.2")){ rm(dat.2)}
for(i in good.id){
lala<-dat[which(dat$id==i),]
if(!exists("dat.2")) {
dat.2 <- lala } else {
dat.2 <- rbind(dat.2, lala)
}
}
Many thanks in advance for any suggestions
--
View this message in context:
http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990.html
Sent from the R help mailing list archive at Nabble.com.
Hello, If I understand it well, idx <- !dat$id %in% bad$id dat[idx, ] Also, to create bad you are complicating, this would do: bad <- data.frame(id = c(1,4)) Hope this helps, Rui Barradas Em 22-10-2012 12:04, penguins escreveu:> Hi, I am trying to remove a series of records from a large dataframe. The > script I have written works fine but takes a long time to run. Can anyone > suggest a quicker way to do this? > > Here is an example of the code I've written. The end result of this bit of > code would be a dataframe with any records relating to ID 1 or ID 4 removed: > > #dataframe > id <- c(1,1,1,1,2,2,2,2,2, 3,3,3, 4,4) > year <- c(1,1,1,2, 2,2,3,2,2, 2,3,4, 8,8) > age <- c("Adult",NA,NA,NA, "Adult",NA,NA,NA, "Adult", > NA,"Adult",NA, NA,"Adult") > dat <- data.frame(id, year, age) > dat.id<-unique(dat$id) > > #ID numbers for removal > bad<- data.frame(c(1,4)) > names(bad)<-"id" > remove.value<-bad$id > > > good.id<- dat.id[!dat.id%in%remove.value] > > #Combine all good ID numbers > if(exists("dat.2")){ rm(dat.2)} > > for(i in good.id){ > lala<-dat[which(dat$id==i),] > > if(!exists("dat.2")) { > dat.2 <- lala } else { > dat.2 <- rbind(dat.2, lala) > } > } > > Many thanks in advance for any suggestions > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi,
You can also use ?merge()
dat$ind_dat<-TRUE
bad$ind_bad<-TRUE
res<-merge(dat,bad,all=TRUE)
?res1<-res[is.na(res$ind_bad),][,1:3]
?res1
#?? id year?? age
#5?? 2??? 2 Adult
#6?? 2??? 2? <NA>
#7?? 2??? 3? <NA>
#8?? 2??? 2? <NA>
#9?? 2??? 2 Adult
#10? 3??? 2? <NA>
#11? 3??? 3 Adult
#12? 3??? 4? <NA>
A.K.
----- Original Message -----
From: penguins <catrsw at bas.ac.uk>
To: r-help at r-project.org
Cc:
Sent: Monday, October 22, 2012 7:04 AM
Subject: [R] Remove records from a large dataframe
Hi, I am trying to remove a series of records from a large dataframe. The
script I have written works fine but takes a long time to run. Can anyone
suggest a quicker way to do this?
Here is an example of the code I've written. The end result of this bit of
code would be a dataframe with any records relating to ID 1 or ID 4 removed:
#dataframe
id <-? ? c(1,1,1,1,2,2,2,2,2, 3,3,3, 4,4)
year <- c(1,1,1,2, 2,2,3,2,2, 2,3,4, 8,8)
age <-? c("Adult",NA,NA,NA, "Adult",NA,NA,NA,
"Adult",
? ? ? ? NA,"Adult",NA, NA,"Adult")
dat <- data.frame(id, year, age)
dat.id<-unique(dat$id)
#ID numbers for removal
bad<- data.frame(c(1,4))
names(bad)<-"id"
remove.value<-bad$id
good.id<- dat.id[!dat.id%in%remove.value]
#Combine all good ID numbers
if(exists("dat.2")){ rm(dat.2)}
for(i in good.id){
? ? lala<-dat[which(dat$id==i),]
? ? if(!exists("dat.2")) {
? ? ? dat.2 <- lala } else {
? ? ? dat.2 <- rbind(dat.2, lala)
? ? ? }
}
Many thanks in advance for any suggestions
--
View this message in context:
http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thanks Rui, your solution works great and is so fast! -- View this message in context: http://r.789695.n4.nabble.com/Remove-records-from-a-large-dataframe-tp4646990p4647029.html Sent from the R help mailing list archive at Nabble.com.