Hi, I have a dataset called "data". There is one row called "ac_name". Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times ==> exclude it; House B appears 5 times ==> include it etc.) In the end, I want to have the old "data" dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. I think for one of the professionals amongst you this is pretty easy to solve. ;-) Thanks dudes! Cheerio, Felix -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454.html Sent from the R help mailing list archive at Nabble.com.
Thanks for the first reply. Unfortunately, my list of different ac_names ist pretty long (about 1,000 different names). Is there a way, to sort them, count the quantity of each name and exclude these rows, who exceed a particular limit? -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454p4646465.html Sent from the R help mailing list archive at Nabble.com.
One way is: ac_name_count <- ave(integer(nrow(data)), data[["ac_name"]], FUN=length) data[ac_name_count <= 5, ,drop=FALSE] # rows whose ac_name entry is rare data[ac_name_count > 5, ,drop=FALSE] # rows whose ac_name entry is common Use ac_name_seqno <- ave(integer(nrow(data)), data[["ac_name"]], FUN=seq_along) to assign a within-group sequence number so you can pick out the first or last n items in a group for the big groups. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of fxen3k > Sent: Wednesday, October 17, 2012 5:45 AM > To: r-help at r-project.org > Subject: [R] How to count rows with a condition > > Hi, > > I have a dataset called "data". There is one row called "ac_name". Some > names in this column appear very often, some less. > What I want is to filter this dataset with the following condition: > > Exclude the names, which appear more than five times. (example: House A > appears 8 times ==> exclude it; House B appears 5 times ==> include it etc.) > > In the end, I want to have the old "data" dataset excluding the rows with > the above mentioned condition and another list with all the names which have > been excluded. > > > I think for one of the professionals amongst you this is pretty easy to > solve. ;-) > > Thanks dudes! > > Cheerio, > Felix > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with- > a-condition-tp4646454.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Oct 17, 2012, at 5:44 AM, fxen3k wrote:> Hi, > > I have a dataset called "data". There is one row called "ac_name". > Some > names in this column appear very often, some less. > What I want is to filter this dataset with the following condition: > > Exclude the names, which appear more than five times. (example: > House A > appears 8 times ==> exclude it; House B appears 5 times ==> include > it etc.) > > In the end, I want to have the old "data" dataset excluding the rows > with > the above mentioned condition and another list with all the names > which have > been excluded. >data[ ave(data$ac_name, data$ac_name, length) <= 5, ] # all with 5 or fewer entries -- David Winsemius, MD Alameda, CA, USA
HI David, I tried ur function: set.seed(1) dat1<-data.frame(ac_name=rep(c("HouseA","HouseB","HouseC","HouseD","HouseE"),times=c(8,5,4,6,3)),val=rnorm(26,15)) dat2<-within(dat1,{ac_name<-as.character(ac_name)}) dat2<-dat2[order(dat2[,1]),] ?dat2[ave(dat2$ac_name,dat2$ac_name,length)<=5,] #Error in unique.default(x) : unique() applies only to vectors #With "FUN" added head(dat2[ave(dat2$ac_name,dat2$ac_name,FUN=length)<=5,]) #?? ac_name????? val #9?? HouseB 15.57578 #10? HouseB 14.69461 #11? HouseB 16.51178 #12? HouseB 15.38984 #13? HouseB 14.37876 #14? HouseC 12.78530 A.K. ----- Original Message ----- From: David Winsemius <dwinsemius at comcast.net> To: fxen3k <f.sehardt at gmail.com> Cc: r-help at r-project.org Sent: Wednesday, October 17, 2012 4:25 PM Subject: Re: [R] How to count rows with a condition On Oct 17, 2012, at 5:44 AM, fxen3k wrote:> Hi, > > I have a dataset called "data". There is one row called "ac_name". Some > names in this column appear very often, some less. > What I want is to filter this dataset with the following condition: > > Exclude the names, which appear more than five times. (example: House A > appears 8 times ==> exclude it; House B appears 5 times ==> include it etc.) > > In the end, I want to have the old "data" dataset excluding the rows with > the above mentioned condition and another list with all the names which have > been excluded. >data[ ave(data$ac_name, data$ac_name, length) <= 5, ]? # all with 5 or fewer entries -- David Winsemius, MD Alameda, CA, USA ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.