Hello R experts, The following problem outstrips my current programming knowledge. I have a dataframe with two fields that looks like the following: ID Contract 01 1 01 1 02 2 02 3 02 1 03 2 03 2 03 2 03 1 03 1 03 1 etc... I would like to end up with a dataframe with one row per ID where the value in the contract field would be the highest value recorded for a single ID. As you can see above, the number of IDs varies irregularly. Given the above, the new file would look like the following: ID Contract 01 1 02 3 03 2 Thanks in advance for your suggestions. Gregory L. Blevins The Market Solustions Group, Partner [[alternative HTML version deleted]]
On Wed, 2004-03-03 at 21:19, Greg Blevins wrote:> Hello R experts, > > The following problem outstrips my current programming knowledge. > > I have a dataframe with two fields that looks like the following: > > ID Contract > 01 1 > 01 1 > 02 2 > 02 3 > 02 1 > 03 2 > 03 2 > 03 2 > 03 1 > 03 1 > 03 1 > etc... > > I would like to end up with a dataframe with one row per ID where the > value in the contract field would be the highest value recorded for a > single ID. As you can see above, the number of IDs varies irregularly. > Given the above, the new file would look like the following: > > ID Contract > 01 1 > 02 3 > 03 2 > > Thanks in advance for your suggestions.# Create the data frame df <- data.frame(ID = I(c(rep("01", 2), rep("02", 3), rep("03", 6))), Contract = c(1, 1, 2, 3, 1, 2, 2, 2, 1, 1, 1, ))> dfID Contract 1 01 1 2 01 1 3 02 2 4 02 3 5 02 1 6 03 2 7 03 2 8 03 2 9 03 1 10 03 1 11 03 1 # Now use aggregate() to condense df by ID, using the max # value of Contract> aggregate(df$Contract, list(ID = df$ID), max)ID x 1 01 1 2 02 3 3 03 2 See ?aggregate for more information. By default, aggregate() names the function derived column as 'x'. You can of course rename it as you need. HTH, Marc Schwartz
Say your data frame is `dat'. Try: tapply(dat$Contract, dat$ID, max) (The output is not a data frame, but that shouldn't be a problem...) HTH, Andy> From: Greg Blevins > > Hello R experts, > > The following problem outstrips my current programming knowledge. > > I have a dataframe with two fields that looks like the following: > > ID Contract > > 01 1 > > 01 1 > > 02 2 > > 02 3 > > 02 1 > > 03 2 > > 03 2 > > 03 2 > > 03 1 > > 03 1 > > 03 1 > > etc... > > I would like to end up with a dataframe with one row per ID > where the value in the contract field would be the highest > value recorded for a single ID. As you can see above, the > number of IDs varies irregularly. Given the above, the new > file would look like the following: > > ID Contract > > 01 1 > > 02 3 > > 03 2 > > Thanks in advance for your suggestions. > > Gregory L. Blevins The Market Solustions Group, Partner > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
How about something like ... (if your data frame is called the.data) summarized <- as.data.frame(levels(the.data$ID)) names(summarized) <- "ID" summarized$Contract <- as.numeric(tapply(the.data$Contract, the.data$ID, max)) Andrew On Wednesday 03 March 2004 19:19, Greg Blevins wrote:> Hello R experts, > > The following problem outstrips my current programming knowledge. > > I have a dataframe with two fields that looks like the following: > > ID Contract > > 01 1 > > 01 1 > > 02 2 > > 02 3 > > 02 1 > > 03 2 > > 03 2 > > 03 2 > > 03 1 > > 03 1 > > 03 1 > > etc... > > I would like to end up with a dataframe with one row per ID where the value > in the contract field would be the highest value recorded for a single ID. > As you can see above, the number of IDs varies irregularly. Given the > above, the new file would look like the following: > > ID Contract > > 01 1 > > 02 3 > > 03 2 > > Thanks in advance for your suggestions. > > Gregory L. Blevins The Market Solustions Group, Partner > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Andrew Robinson Ph: 208 885 7115 Department of Forest Resources Fa: 208 885 6226 University of Idaho E : andrewr at uidaho.edu PO Box 441133 W : http://www.uidaho.edu/~andrewr Moscow ID 83843 Or: http://www.biometrics.uidaho.edu No statement above necessarily represents my employer's opinion.
You can ensure the name gets set appropriately like this:> aggregate(list(Contract=df$Contract), list(ID=df$ID), max)ID Contract 1 01 1 2 02 3 3 03 2 --- Date: Wed, 03 Mar 2004 21:46:27 -0600 From: Marc Schwartz <MSchwartz at medanalytics.com> To: Greg Blevins <gblevins at mn.rr.com> Cc: R-Help <r-help at stat.math.ethz.ch> Subject: Re: [R] A file manipulation question [...]> aggregate(df$Contract, list(ID = df$ID), max)ID x 1 01 1 2 02 3 3 03 2 See ?aggregate for more information. By default, aggregate() names the function derived column as 'x'. You can of course rename it as you need. HTH, Marc Schwartz