Hi, I have two columns with data (both identifiers - it's an affiliation list) and I would like to delete the rows in which the observations in the second column have a frequency < 5 in the entire second column. Example: 1 a 1 b 1 c 2 a 2 b 2 d Let's say, I would like to delete the rows in which the observation in the second column has a frequency < 2 in the entire second column. This would result in: 1 a 1 b 2 a 2 b How can I do this? Thanks in advance! Mathijs -- View this message in context: http://r.789695.n4.nabble.com/Delete-observations-with-a-frequency-x-tp3081226p3081226.html Sent from the R help mailing list archive at Nabble.com.
Suppose this is your data frame:> df = data.frame(x=c(1,1,1,2,2,2),y=c('a','b','c','a','b','d')) > dfx y 1 1 a 2 1 b 3 1 c 4 2 a 5 2 b 6 2 d> df[!table(df$y)[df$y] < 2,]x y 1 1 a 2 1 b 4 2 a 5 2 b Note that this will only work properly if y is a factor or character variable. If y was numeric, you would need df[!table(df$y)[as.character(df$y)] - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 9 Dec 2010, mathijsdevaan wrote:> > Hi, > > I have two columns with data (both identifiers - it's an affiliation list) > and I would like to delete the rows in which the observations in the second > column have a frequency < 5 in the entire second column. Example: > > 1 a > 1 b > 1 c > 2 a > 2 b > 2 d > > Let's say, I would like to delete the rows in which the observation in the > second column has a frequency < 2 in the entire second column. This would > result in: > > 1 a > 1 b > 2 a > 2 b > > How can I do this? Thanks in advance! > > Mathijs > -- > View this message in context: http://r.789695.n4.nabble.com/Delete-observations-with-a-frequency-x-tp3081226p3081226.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Seeliger.Curt at epamail.epa.gov
2010-Dec-10 00:41 UTC
[R] Delete observations with a frequency < x
mathijsdevaan wrote on 12/09/2010 04:21:54 PM:> I have two columns with data (both identifiers - it's an affiliationlist)> and I would like to delete the rows in which the observations in thesecond> column have a frequency < 5 in the entire second column. Example: > > 1 a > 1 b > 1 c > 2 a > 2 b > 2 d > > Let's say, I would like to delete the rows in which the observation inthe> second column has a frequency < 2 in the entire second column. Thiswould> result in: > > 1 a > 1 b > 2 a > 2 b > > How can I do this? Thanks in advance! >It's not clear whether you want to delete rows where the value second column occurs less than 5 times or appears less than 2 times. I'll assume the latter. foo <- data.frame(k=rep(1:2, each=3), x=letters[c(1,2,3,1,2,4)]) bar <- subset(foo, x %in% names(table(foo$x))[table(foo$x)>=2]) No doubt others can write this more succinctly. -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.curt@epa.gov 541/754-4638 [[alternative HTML version deleted]]
Hi Phil, That worked perfectly! Thanks Mathijs -- View this message in context: http://r.789695.n4.nabble.com/Delete-observations-with-a-frequency-x-tp3081226p3081264.html Sent from the R help mailing list archive at Nabble.com.