thr3ads.net - R help - [R] A file manipulation question [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Greg Blevins

2004-Mar-04 03:19 UTC

[R] A file manipulation question

Hello R experts,

The following problem outstrips my current programming knowledge. 

I have a dataframe with two fields that looks like the following:

ID     Contract

01     1

01     1

02     2

02     3

02     1

03     2

03     2

03     2

03     1

03     1

03     1

etc...

I would like to end up with a dataframe with one row per ID where the value in
the contract field would be the highest value recorded for a single ID. As you
can see above, the number of IDs varies irregularly.  Given the above, the new
file would look like the following:

ID     Contract

01     1

02     3

03     2

Thanks in advance for your suggestions.

Gregory L. Blevins The Market Solustions Group, Partner



	[[alternative HTML version deleted]]

Marc Schwartz

2004-Mar-04 03:46 UTC

head link

[R] A file manipulation question

On Wed, 2004-03-03 at 21:19, Greg Blevins wrote:> Hello R experts,
> 
> The following problem outstrips my current programming knowledge. 
> 
> I have a dataframe with two fields that looks like the following:
> 
> ID     Contract
> 01     1
> 01     1
> 02     2
> 02     3
> 02     1
> 03     2
> 03     2
> 03     2
> 03     1
> 03     1
> 03     1
> etc...
> 
> I would like to end up with a dataframe with one row per ID where the
> value in the contract field would be the highest value recorded for a
> single ID. As you can see above, the number of IDs varies irregularly.
> Given the above, the new file would look like the following:
> 
> ID     Contract
> 01     1
> 02     3
> 03     2
> 
> Thanks in advance for your suggestions.
# Create the data frame
df <- data.frame(ID = I(c(rep("01", 2), rep("02", 3),
rep("03", 6))),
                 Contract = c(1, 1, 2, 3, 1, 2, 2, 2, 1, 1, 1, ))
> df   ID Contract
1  01        1
2  01        1
3  02        2
4  02        3
5  02        1
6  03        2
7  03        2
8  03        2
9  03        1
10 03        1
11 03        1

# Now use aggregate() to condense df by ID, using the max
# value of Contract> aggregate(df$Contract, list(ID = df$ID), max)  ID x
1 01 1
2 02 3
3 03 2


See ?aggregate for more information.  By default, aggregate() names the
function derived column as 'x'. You can of course rename it as you need.

HTH,

Marc Schwartz

Liaw, Andy

2004-Mar-04 03:46 UTC

head link

[R] A file manipulation question

Say your data frame is `dat'.  Try:

    tapply(dat$Contract, dat$ID, max)

(The output is not a data frame, but that shouldn't be a problem...)

HTH,
Andy

> From: Greg Blevins
> 
> Hello R experts,
> 
> The following problem outstrips my current programming knowledge. 
> 
> I have a dataframe with two fields that looks like the following:
> 
> ID     Contract
> 
> 01     1
> 
> 01     1
> 
> 02     2
> 
> 02     3
> 
> 02     1
> 
> 03     2
> 
> 03     2
> 
> 03     2
> 
> 03     1
> 
> 03     1
> 
> 03     1
> 
> etc...
> 
> I would like to end up with a dataframe with one row per ID 
> where the value in the contract field would be the highest 
> value recorded for a single ID. As you can see above, the 
> number of IDs varies irregularly.  Given the above, the new 
> file would look like the following:
> 
> ID     Contract
> 
> 01     1
> 
> 02     3
> 
> 03     2
> 
> Thanks in advance for your suggestions.
> 
> Gregory L. Blevins The Market Solustions Group, Partner
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Andrew Robinson

2004-Mar-04 03:53 UTC

head link

[R] A file manipulation question

How about something like ... (if your data frame is called the.data)



summarized <- as.data.frame(levels(the.data$ID))
names(summarized) <- "ID"
summarized$Contract <- as.numeric(tapply(the.data$Contract, the.data$ID,
max))




Andrew

On Wednesday 03 March 2004 19:19, Greg Blevins wrote:> Hello R experts,
>
> The following problem outstrips my current programming knowledge.
>
> I have a dataframe with two fields that looks like the following:
>
> ID     Contract
>
> 01     1
>
> 01     1
>
> 02     2
>
> 02     3
>
> 02     1
>
> 03     2
>
> 03     2
>
> 03     2
>
> 03     1
>
> 03     1
>
> 03     1
>
> etc...
>
> I would like to end up with a dataframe with one row per ID where the value
> in the contract field would be the highest value recorded for a single ID.
> As you can see above, the number of IDs varies irregularly.  Given the
> above, the new file would look like the following:
>
> ID     Contract
>
> 01     1
>
> 02     3
>
> 03     2
>
> Thanks in advance for your suggestions.
>
> Gregory L. Blevins The Market Solustions Group, Partner
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
-- 
Andrew Robinson                      Ph: 208 885 7115
Department of Forest Resources       Fa: 208 885 6226
University of Idaho                  E : andrewr at uidaho.edu
PO Box 441133                        W : http://www.uidaho.edu/~andrewr
Moscow ID 83843                      Or: http://www.biometrics.uidaho.edu
No statement above necessarily represents my employer's opinion.

Gabor Grothendieck

2004-Mar-04 05:12 UTC

head link

[R] A file manipulation question

You can ensure the name gets set appropriately like this:
> aggregate(list(Contract=df$Contract), list(ID=df$ID), max)  ID Contract
1 01        1
2 02        3
3 03        2


---
Date:   Wed, 03 Mar 2004 21:46:27 -0600 
From:   Marc Schwartz <MSchwartz at medanalytics.com>
To:   Greg Blevins <gblevins at mn.rr.com> 
Cc:   R-Help <r-help at stat.math.ethz.ch> 
Subject:   Re: [R] A file manipulation question 

[...]
> aggregate(df$Contract, list(ID = df$ID), max)ID x
1 01 1
2 02 3
3 03 2


See ?aggregate for more information. By default, aggregate() names the
function derived column as 'x'. You can of course rename it as you need.

HTH,

Marc Schwartz

Reasonably Related Threads

Search for more seemingly similar threads

R help - Mar 2004 - A file manipulation question

[R] A file manipulation question

[R] A file manipulation question

[R] A file manipulation question

[R] A file manipulation question

[R] A file manipulation question

Reasonably Related Threads