thr3ads.net - R help - [R] Determining which.max() within groups [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Morway, Eric

2017-Jun-07 01:30 UTC

[R] Determining which.max() within groups

Using the dataset below, I got close to what I'm after, but not quite all
the way there.  Any suggestions appreciated:

Daily <- read.table(textConnection("     Date  wyr        Q
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784"),header=TRUE)

aggregate(Q ~ wyr, data = Daily, which.max)

# gives:
#    wyr Q
# 1 1990 4
# 2 1991 2
# 3 1992 1
# 4 1993 3
# 5 1994 2

I can 'see' that it is returning the which.max() relative to each
grouping.  Is there a way to instead return the absolute position (row) of
the max value within each group.  i.e.:

# Would instead like to have
#     wyr  Q
# 1  1990  4
# 2  1991  6
# 3  1992  9
# 4  1993  15
# 5  1994  18

The icing on the cake would be to get the Julien Day corresponding to the
date on which each year's maximum occurs?

	[[alternative HTML version deleted]]

Bert Gunter

2017-Jun-07 02:15 UTC

head link

[R] Determining which.max() within groups

cumsum() seems to be what you need.

This can probably be done more elegantly, but ...

out <- aggregate(Q ~ wyr, data = Daily, which.max)
tbl <- table(Daily$wyr)
out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
out

## yields

   wyr  Q
1 1990  4
2 1991  6
3 1992  9
4 1993 15
5 1994 18

I leave the matter of Julian dates to you or others.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov>
wrote:> Using the dataset below, I got close to what I'm after, but not quite
all
> the way there.  Any suggestions appreciated:
>
> Daily <- read.table(textConnection("     Date  wyr        Q
> 1911-04-01 1990 4.530695
> 1911-04-02 1990 4.700596
> 1911-04-03 1990 4.898814
> 1911-04-04 1990 5.097032
> 1911-04-05 1991 5.295250
> 1911-04-06 1991 6.569508
> 1911-04-07 1991 5.861587
> 1911-04-08 1991 5.153666
> 1911-04-09 1992 4.445745
> 1911-04-10 1992 3.737824
> 1911-04-11 1992 3.001586
> 1911-04-12 1992 3.001586
> 1911-04-13 1993 2.350298
> 1911-04-14 1993 2.661784
> 1911-04-16 1993 3.001586
> 1911-04-17 1993 2.661784
> 1911-04-19 1994 2.661784
> 1911-04-28 1994 3.369705
> 1911-04-29 1994 3.001586
> 1911-05-20 1994 2.661784"),header=TRUE)
>
> aggregate(Q ~ wyr, data = Daily, which.max)
>
> # gives:
> #    wyr Q
> # 1 1990 4
> # 2 1991 2
> # 3 1992 1
> # 4 1993 3
> # 5 1994 2
>
> I can 'see' that it is returning the which.max() relative to each
> grouping.  Is there a way to instead return the absolute position (row) of
> the max value within each group.  i.e.:
>
> # Would instead like to have
> #     wyr  Q
> # 1  1990  4
> # 2  1991  6
> # 3  1992  9
> # 4  1993  15
> # 5  1994  18
>
> The icing on the cake would be to get the Julien Day corresponding to the
> date on which each year's maximum occurs?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David L Carlson

2017-Jun-07 13:49 UTC

head link

[R] Determining which.max() within groups

If you want the Julian date, you could use Bert's index on the original data
frame:

Daily[out$Q, ]
         Date  wyr        Q
4  1911-04-04 1990 5.097032
6  1911-04-06 1991 6.569508
9  1911-04-09 1992 4.445745
15 1911-04-16 1993 3.001586
18 1911-04-28 1994 3.369705

Another way to get that index would be to use by():

idx <- as.vector(by(Daily, Daily$wyr, function(x)
rownames(x)[which.max(x$Q)]))
Daily[idx, ]

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter
Sent: Tuesday, June 6, 2017 9:16 PM
To: Morway, Eric <emorway at usgs.gov>
Cc: R mailing list <r-help at r-project.org>
Subject: Re: [R] Determining which.max() within groups

cumsum() seems to be what you need.

This can probably be done more elegantly, but ...

out <- aggregate(Q ~ wyr, data = Daily, which.max)
tbl <- table(Daily$wyr)
out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
out

## yields

   wyr  Q
1 1990  4
2 1991  6
3 1992  9
4 1993 15
5 1994 18

I leave the matter of Julian dates to you or others.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov>
wrote:> Using the dataset below, I got close to what I'm after, but not quite
all
> the way there.  Any suggestions appreciated:
>
> Daily <- read.table(textConnection("     Date  wyr        Q
> 1911-04-01 1990 4.530695
> 1911-04-02 1990 4.700596
> 1911-04-03 1990 4.898814
> 1911-04-04 1990 5.097032
> 1911-04-05 1991 5.295250
> 1911-04-06 1991 6.569508
> 1911-04-07 1991 5.861587
> 1911-04-08 1991 5.153666
> 1911-04-09 1992 4.445745
> 1911-04-10 1992 3.737824
> 1911-04-11 1992 3.001586
> 1911-04-12 1992 3.001586
> 1911-04-13 1993 2.350298
> 1911-04-14 1993 2.661784
> 1911-04-16 1993 3.001586
> 1911-04-17 1993 2.661784
> 1911-04-19 1994 2.661784
> 1911-04-28 1994 3.369705
> 1911-04-29 1994 3.001586
> 1911-05-20 1994 2.661784"),header=TRUE)
>
> aggregate(Q ~ wyr, data = Daily, which.max)
>
> # gives:
> #    wyr Q
> # 1 1990 4
> # 2 1991 2
> # 3 1992 1
> # 4 1993 3
> # 5 1994 2
>
> I can 'see' that it is returning the which.max() relative to each
> grouping.  Is there a way to instead return the absolute position (row) of
> the max value within each group.  i.e.:
>
> # Would instead like to have
> #     wyr  Q
> # 1  1990  4
> # 2  1991  6
> # 3  1992  9
> # 4  1993  15
> # 5  1994  18
>
> The icing on the cake would be to get the Julien Day corresponding to the
> date on which each year's maximum occurs?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2017-Jun-07 15:57 UTC

head link

[R] Determining which.max() within groups

Aggregate can do both which.max and group length calculations, but the 
result ends up as a matrix inside the data frame, which I find cumbersome 
to work with.

Daily <- read.table( text "     Date  wyr        Q
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784
", header = TRUE, stringsAsFactors=FALSE)

# this algorithm only works if wyr groups are contiguous
out <- out[ order(out$wyr), ]
# generate a data frame with key column wyr and matrix Q as the second 
column
out <- aggregate( Q ~ wyr
                 , data = Daily
                 , FUN = function(x) {
                      c( WM = which.max(x)
                       , n=length( x )
                       )
                   }
                 )
# put matrix into separate columns Q.WM
out[ , paste( "Q", colnames( out$Q ), sep="." ) ] <-
out$Q
# drop the matrix
out$Q <- NULL
# form absolute indexes Q.N
out <- within( out, {
         Q.maxidx <- cumsum( c( 0, Q.n[ -length(Q.n) ] ) ) + Q.WM
        })
result <- Daily[ with( out, Q.maxidx ), ]

# or save ourselves some effort
library(dplyr)
result2 <- (   Daily
            %>% group_by( wyr )
            %>% slice( which.max( Q ) )
            %>% as.data.frame
            )

On Tue, 6 Jun 2017, Bert Gunter wrote:
> cumsum() seems to be what you need.
>
> This can probably be done more elegantly, but ...
>
> out <- aggregate(Q ~ wyr, data = Daily, which.max)
> tbl <- table(Daily$wyr)
> out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
> out
>
> ## yields
>
>   wyr  Q
> 1 1990  4
> 2 1991  6
> 3 1992  9
> 4 1993 15
> 5 1994 18
>
> I leave the matter of Julian dates to you or others.
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov>
wrote:
>> Using the dataset below, I got close to what I'm after, but not
quite all
>> the way there.  Any suggestions appreciated:
>>
>> Daily <- read.table(textConnection("     Date  wyr        Q
>> 1911-04-01 1990 4.530695
>> 1911-04-02 1990 4.700596
>> 1911-04-03 1990 4.898814
>> 1911-04-04 1990 5.097032
>> 1911-04-05 1991 5.295250
>> 1911-04-06 1991 6.569508
>> 1911-04-07 1991 5.861587
>> 1911-04-08 1991 5.153666
>> 1911-04-09 1992 4.445745
>> 1911-04-10 1992 3.737824
>> 1911-04-11 1992 3.001586
>> 1911-04-12 1992 3.001586
>> 1911-04-13 1993 2.350298
>> 1911-04-14 1993 2.661784
>> 1911-04-16 1993 3.001586
>> 1911-04-17 1993 2.661784
>> 1911-04-19 1994 2.661784
>> 1911-04-28 1994 3.369705
>> 1911-04-29 1994 3.001586
>> 1911-05-20 1994 2.661784"),header=TRUE)
>>
>> aggregate(Q ~ wyr, data = Daily, which.max)
>>
>> # gives:
>> #    wyr Q
>> # 1 1990 4
>> # 2 1991 2
>> # 3 1992 1
>> # 4 1993 3
>> # 5 1994 2
>>
>> I can 'see' that it is returning the which.max() relative to
each
>> grouping.  Is there a way to instead return the absolute position (row)
of
>> the max value within each group.  i.e.:
>>
>> # Would instead like to have
>> #     wyr  Q
>> # 1  1990  4
>> # 2  1991  6
>> # 3  1992  9
>> # 4  1993  15
>> # 5  1994  18
>>
>> The icing on the cake would be to get the Julien Day corresponding to
the
>> date on which each year's maximum occurs?
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

Charles C. Berry

2017-Jun-07 19:06 UTC

head link

[R] Determining which.max() within groups

On Tue, 6 Jun 2017, Morway, Eric wrote:
> Using the dataset below, I got close to what I'm after, but not quite
all
> the way there.  Any suggestions appreciated:
>
> Daily <- read.table(textConnection("     Date  wyr        Q
> 1911-04-01 1990 4.530695
> 1911-04-02 1990 4.700596
> 1911-04-03 1990 4.898814
> 1911-04-04 1990 5.097032
> 1911-04-05 1991 5.295250
> 1911-04-06 1991 6.569508
> 1911-04-07 1991 5.861587
> 1911-04-08 1991 5.153666
> 1911-04-09 1992 4.445745
> 1911-04-10 1992 3.737824
> 1911-04-11 1992 3.001586
> 1911-04-12 1992 3.001586
> 1911-04-13 1993 2.350298
> 1911-04-14 1993 2.661784
> 1911-04-16 1993 3.001586
> 1911-04-17 1993 2.661784
> 1911-04-19 1994 2.661784
> 1911-04-28 1994 3.369705
> 1911-04-29 1994 3.001586
> 1911-05-20 1994 2.661784"),header=TRUE)
>
> aggregate(Q ~ wyr, data = Daily, which.max)
>
> # gives:
> #    wyr Q
> # 1 1990 4
> # 2 1991 2
> # 3 1992 1
> # 4 1993 3
> # 5 1994 2
>
> I can 'see' that it is returning the which.max() relative to each
> grouping.  Is there a way to instead return the absolute position (row) of
> the max value within each group.  i.e.:
>
> # Would instead like to have
> #     wyr  Q
> # 1  1990  4
> # 2  1991  6
> # 3  1992  9
> # 4  1993  15
> # 5  1994  18
>
> The icing on the cake would be to get the Julien Day corresponding to the
> date on which each year's maximum occurs?
>

Like this:
> which.max.by.wyr <- with(Daily, which( ave( Q, wyr, FUN=max) == Q))
> cbind( Daily[ which.max.by.wyr, ], index=which.max.by.wyr )          Date  wyr        Q index
4  1911-04-04 1990 5.097032     4
6  1911-04-06 1991 6.569508     6
9  1911-04-09 1992 4.445745     9
15 1911-04-16 1993 3.001586    15
18 1911-04-28 1994 3.369705    18

If there are ties in Q and you do not want more than one max value listed, 
you can add a litle fuzz to randomly pick one. i.e.
> fuzz <- runif(nrow(Daily), 0, 1e-10)
> which.max.by.wyr <- with(Daily, which(ave(Q+fuzz,wyr,FUN=max)==Q+fuzz))

If you want the first tied value, then sort fuzz before determining 
which.max.by.wyr.

HTH,

Chuck

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jun 2017 - Determining which.max() within groups

[R] Determining which.max() within groups

[R] Determining which.max() within groups

[R] Determining which.max() within groups

[R] Determining which.max() within groups

[R] Determining which.max() within groups

Seemingly Similar Threads