Dear all, I have a data set in which the same unit appears 2 or 3 or 4 times. I need to aggregate this data to maintain only one unit by row. But I need to do that based on a comparison between the values of such units. I can't find a function to help me on that. I appreciate any help. Below I provide an example of what I want: This is my data: Units Var1 Var2 Var3 1 B 2 2 1 C 1 3 2 D 3 3 2 C 2 1 2 A 1 2 3 C 2 1 3 A 1 3 4 B 3 3 4 C 2 2 4 A 1 1 I want to produce a new data in which I keep, for each unit, the row with max number for Var2. For instance, Units Var1 Var2 Var3 1 B 2 2 2 D 3 3 3 C 2 1 4 B 3 3 Thanks, *Felipe Nunes* CAPES/Fulbright Fellow PhD Student Political Science - UCLA Web: felipenunes.bol.ucla.edu [[alternative HTML version deleted]]
Hi, Here's one option: dat <- read.table(text = "Units Var1 Var2 Var3 1 B 2 2 1 C 1 3 2 D 3 3 2 C 2 1 2 A 1 2 3 C 2 1 3 A 1 3 4 B 3 3 4 C 2 2 4 A 1 1", header = TRUE) dat2 <- dat[as.logical(ave(dat$Var2, dat$Units, FUN = function(x) x =max(x))), ] dat2 Cheers, Josh On Mon, Jan 16, 2012 at 2:13 PM, Felipe Nunes <felipnunes at gmail.com> wrote:> Dear all, > > I have a data set in which the same unit appears 2 or 3 or 4 times. I need > to aggregate this data to maintain only one unit by row. But I need to do > that based on a comparison between the values of such units. I can't find a > function to help me on that. I appreciate any help. Below I provide an > example of what I want: > > This is my data: > > Units ?Var1 Var2 Var3 > ?1 ? ? ? ? ?B ? ? ? 2 ? ? ? ?2 > ?1 ? ? ? ? ?C ? ? ? 1 ? ? ? 3 > ?2 ? ? ? ? ?D ? ? ?3 ? ? ? ?3 > ?2 ? ? ? ? ?C ? ? ?2 ? ? ? ?1 > ?2 ? ? ? ? ?A ? ? ?1 ? ? ? ?2 > ?3 ? ? ? ? ?C ? ? ?2 ? ? ? ?1 > ?3 ? ? ? ? ?A ? ? ?1 ? ? ? ?3 > ?4 ? ? ? ? ?B ? ? ?3 ? ? ? ?3 > ?4 ? ? ? ? ?C ? ? ?2 ? ? ? ?2 > ?4 ? ? ? ? ?A ? ? ?1 ? ? ? ?1 > > I want to produce a new data in which I keep, for each unit, the row with > max number for Var2. For instance, > > Units ?Var1 Var2 Var3 > ?1 ? ? ? ? ?B ? ? ? 2 ? ? ? 2 > ?2 ? ? ? ? ?D ? ? ?3 ? ? ? ?3 > ?3 ? ? ? ? ?C ? ? ?2 ? ? ? ?1 > ?4 ? ? ? ? ?B ? ? ?3 ? ? ? ?3 > > Thanks, > > > *Felipe Nunes* > CAPES/Fulbright Fellow > PhD Student Political Science - UCLA > Web: felipenunes.bol.ucla.edu > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Hi Felipe, Try x <- structure(list(Units = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), Var1 = structure(c(2L, 3L, 4L, 3L, 1L, 3L, 1L, 2L, 3L, 1L ), .Label = c("A", "B", "C", "D"), class = "factor"), Var2 = c(2L, 1L, 3L, 2L, 1L, 2L, 1L, 3L, 2L, 1L), Var3 = c(2L, 3L, 3L, 1L, 2L, 1L, 3L, 3L, 2L, 1L)), .Names = c("Units", "Var1", "Var2", "Var3"), class = "data.frame", row.names = c(NA, -10L)) do.call(rbind, lapply(with(x, split(x, Units)), function(d) d[which.max(d[, 3]),])) # Units Var1 Var2 Var3 # 1 1 B 2 2 # 2 2 D 3 3 # 3 3 C 2 1 # 4 4 B 3 3 HTH, Jorge.- On Mon, Jan 16, 2012 at 5:13 PM, Felipe Nunes <> wrote:> Dear all, > > I have a data set in which the same unit appears 2 or 3 or 4 times. I need > to aggregate this data to maintain only one unit by row. But I need to do > that based on a comparison between the values of such units. I can't find a > function to help me on that. I appreciate any help. Below I provide an > example of what I want: > > This is my data: > > Units Var1 Var2 Var3 > 1 B 2 2 > 1 C 1 3 > 2 D 3 3 > 2 C 2 1 > 2 A 1 2 > 3 C 2 1 > 3 A 1 3 > 4 B 3 3 > 4 C 2 2 > 4 A 1 1 > > I want to produce a new data in which I keep, for each unit, the row with > max number for Var2. For instance, > > Units Var1 Var2 Var3 > 1 B 2 2 > 2 D 3 3 > 3 C 2 1 > 4 B 3 3 > > Thanks, > > > *Felipe Nunes* > CAPES/Fulbright Fellow > PhD Student Political Science - UCLA > Web: felipenunes.bol.ucla.edu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Jan 16, 2012, at 4:13 PM, Felipe Nunes wrote:> Dear all, > > I have a data set in which the same unit appears 2 or 3 or 4 times. I need > to aggregate this data to maintain only one unit by row. But I need to do > that based on a comparison between the values of such units. I can't find a > function to help me on that. I appreciate any help. Below I provide an > example of what I want: > > This is my data: > > Units Var1 Var2 Var3 > 1 B 2 2 > 1 C 1 3 > 2 D 3 3 > 2 C 2 1 > 2 A 1 2 > 3 C 2 1 > 3 A 1 3 > 4 B 3 3 > 4 C 2 2 > 4 A 1 1 > > I want to produce a new data in which I keep, for each unit, the row with > max number for Var2. For instance, > > Units Var1 Var2 Var3 > 1 B 2 2 > 2 D 3 3 > 3 C 2 1 > 4 B 3 3 > > Thanks,There are likely to be multiple approaches, but the first that comes to mind is combining ?merge with ?aggregate.> DFUnits Var1 Var2 Var3 1 1 B 2 2 2 1 C 1 3 3 2 D 3 3 4 2 C 2 1 5 2 A 1 2 6 3 C 2 1 7 3 A 1 3 8 4 B 3 3 9 4 C 2 2 10 4 A 1 1> merge(aggregate(Var2 ~ Units, max, data = DF), DF)Units Var2 Var1 Var3 1 1 2 B 2 2 2 3 D 3 3 3 2 C 1 4 4 3 B 3 The use of aggregate() gets you the max value of Var2 for each group within Units:> aggregate(Var2 ~ Units, max, data = DF)Units Var2 1 1 2 2 2 3 3 3 2 4 4 3 merge() then allows you to add back the columns from 'DF', matching on 'Units' and 'Var2', which are the common columns between the two data frames. Note that the ordering of the columns is changed a bit here. That is easily adjusted with:> merge(aggregate(Var2 ~ Units, max, data = DF), DF)[, c(1, 3, 2, 4)]Units Var1 Var2 Var3 1 1 B 2 2 2 2 D 3 3 3 3 C 2 1 4 4 B 3 3 Note that you would want to use the 'na.rm = TRUE' argument to max() in the call to aggregate() if in fact any of your Var2 arguments are NA. HTH, Marc Schwartz