Using the dataset below, I got close to what I'm after, but not quite all
the way there. Any suggestions appreciated:
Daily <- read.table(textConnection(" Date wyr Q
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784"),header=TRUE)
aggregate(Q ~ wyr, data = Daily, which.max)
# gives:
# wyr Q
# 1 1990 4
# 2 1991 2
# 3 1992 1
# 4 1993 3
# 5 1994 2
I can 'see' that it is returning the which.max() relative to each
grouping. Is there a way to instead return the absolute position (row) of
the max value within each group. i.e.:
# Would instead like to have
# wyr Q
# 1 1990 4
# 2 1991 6
# 3 1992 9
# 4 1993 15
# 5 1994 18
The icing on the cake would be to get the Julien Day corresponding to the
date on which each year's maximum occurs?
[[alternative HTML version deleted]]
cumsum() seems to be what you need. This can probably be done more elegantly, but ... out <- aggregate(Q ~ wyr, data = Daily, which.max) tbl <- table(Daily$wyr) out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)])) out ## yields wyr Q 1 1990 4 2 1991 6 3 1992 9 4 1993 15 5 1994 18 I leave the matter of Julian dates to you or others. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov> wrote:> Using the dataset below, I got close to what I'm after, but not quite all > the way there. Any suggestions appreciated: > > Daily <- read.table(textConnection(" Date wyr Q > 1911-04-01 1990 4.530695 > 1911-04-02 1990 4.700596 > 1911-04-03 1990 4.898814 > 1911-04-04 1990 5.097032 > 1911-04-05 1991 5.295250 > 1911-04-06 1991 6.569508 > 1911-04-07 1991 5.861587 > 1911-04-08 1991 5.153666 > 1911-04-09 1992 4.445745 > 1911-04-10 1992 3.737824 > 1911-04-11 1992 3.001586 > 1911-04-12 1992 3.001586 > 1911-04-13 1993 2.350298 > 1911-04-14 1993 2.661784 > 1911-04-16 1993 3.001586 > 1911-04-17 1993 2.661784 > 1911-04-19 1994 2.661784 > 1911-04-28 1994 3.369705 > 1911-04-29 1994 3.001586 > 1911-05-20 1994 2.661784"),header=TRUE) > > aggregate(Q ~ wyr, data = Daily, which.max) > > # gives: > # wyr Q > # 1 1990 4 > # 2 1991 2 > # 3 1992 1 > # 4 1993 3 > # 5 1994 2 > > I can 'see' that it is returning the which.max() relative to each > grouping. Is there a way to instead return the absolute position (row) of > the max value within each group. i.e.: > > # Would instead like to have > # wyr Q > # 1 1990 4 > # 2 1991 6 > # 3 1992 9 > # 4 1993 15 > # 5 1994 18 > > The icing on the cake would be to get the Julien Day corresponding to the > date on which each year's maximum occurs? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
If you want the Julian date, you could use Bert's index on the original data
frame:
Daily[out$Q, ]
Date wyr Q
4 1911-04-04 1990 5.097032
6 1911-04-06 1991 6.569508
9 1911-04-09 1992 4.445745
15 1911-04-16 1993 3.001586
18 1911-04-28 1994 3.369705
Another way to get that index would be to use by():
idx <- as.vector(by(Daily, Daily$wyr, function(x)
rownames(x)[which.max(x$Q)]))
Daily[idx, ]
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter
Sent: Tuesday, June 6, 2017 9:16 PM
To: Morway, Eric <emorway at usgs.gov>
Cc: R mailing list <r-help at r-project.org>
Subject: Re: [R] Determining which.max() within groups
cumsum() seems to be what you need.
This can probably be done more elegantly, but ...
out <- aggregate(Q ~ wyr, data = Daily, which.max)
tbl <- table(Daily$wyr)
out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
out
## yields
wyr Q
1 1990 4
2 1991 6
3 1992 9
4 1993 15
5 1994 18
I leave the matter of Julian dates to you or others.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov>
wrote:> Using the dataset below, I got close to what I'm after, but not quite
all
> the way there. Any suggestions appreciated:
>
> Daily <- read.table(textConnection(" Date wyr Q
> 1911-04-01 1990 4.530695
> 1911-04-02 1990 4.700596
> 1911-04-03 1990 4.898814
> 1911-04-04 1990 5.097032
> 1911-04-05 1991 5.295250
> 1911-04-06 1991 6.569508
> 1911-04-07 1991 5.861587
> 1911-04-08 1991 5.153666
> 1911-04-09 1992 4.445745
> 1911-04-10 1992 3.737824
> 1911-04-11 1992 3.001586
> 1911-04-12 1992 3.001586
> 1911-04-13 1993 2.350298
> 1911-04-14 1993 2.661784
> 1911-04-16 1993 3.001586
> 1911-04-17 1993 2.661784
> 1911-04-19 1994 2.661784
> 1911-04-28 1994 3.369705
> 1911-04-29 1994 3.001586
> 1911-05-20 1994 2.661784"),header=TRUE)
>
> aggregate(Q ~ wyr, data = Daily, which.max)
>
> # gives:
> # wyr Q
> # 1 1990 4
> # 2 1991 2
> # 3 1992 1
> # 4 1993 3
> # 5 1994 2
>
> I can 'see' that it is returning the which.max() relative to each
> grouping. Is there a way to instead return the absolute position (row) of
> the max value within each group. i.e.:
>
> # Would instead like to have
> # wyr Q
> # 1 1990 4
> # 2 1991 6
> # 3 1992 9
> # 4 1993 15
> # 5 1994 18
>
> The icing on the cake would be to get the Julien Day corresponding to the
> date on which each year's maximum occurs?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Aggregate can do both which.max and group length calculations, but the
result ends up as a matrix inside the data frame, which I find cumbersome
to work with.
Daily <- read.table( text " Date wyr Q
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784
", header = TRUE, stringsAsFactors=FALSE)
# this algorithm only works if wyr groups are contiguous
out <- out[ order(out$wyr), ]
# generate a data frame with key column wyr and matrix Q as the second
column
out <- aggregate( Q ~ wyr
, data = Daily
, FUN = function(x) {
c( WM = which.max(x)
, n=length( x )
)
}
)
# put matrix into separate columns Q.WM
out[ , paste( "Q", colnames( out$Q ), sep="." ) ] <-
out$Q
# drop the matrix
out$Q <- NULL
# form absolute indexes Q.N
out <- within( out, {
Q.maxidx <- cumsum( c( 0, Q.n[ -length(Q.n) ] ) ) + Q.WM
})
result <- Daily[ with( out, Q.maxidx ), ]
# or save ourselves some effort
library(dplyr)
result2 <- ( Daily
%>% group_by( wyr )
%>% slice( which.max( Q ) )
%>% as.data.frame
)
On Tue, 6 Jun 2017, Bert Gunter wrote:
> cumsum() seems to be what you need.
>
> This can probably be done more elegantly, but ...
>
> out <- aggregate(Q ~ wyr, data = Daily, which.max)
> tbl <- table(Daily$wyr)
> out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
> out
>
> ## yields
>
> wyr Q
> 1 1990 4
> 2 1991 6
> 3 1992 9
> 4 1993 15
> 5 1994 18
>
> I leave the matter of Julian dates to you or others.
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov>
wrote:
>> Using the dataset below, I got close to what I'm after, but not
quite all
>> the way there. Any suggestions appreciated:
>>
>> Daily <- read.table(textConnection(" Date wyr Q
>> 1911-04-01 1990 4.530695
>> 1911-04-02 1990 4.700596
>> 1911-04-03 1990 4.898814
>> 1911-04-04 1990 5.097032
>> 1911-04-05 1991 5.295250
>> 1911-04-06 1991 6.569508
>> 1911-04-07 1991 5.861587
>> 1911-04-08 1991 5.153666
>> 1911-04-09 1992 4.445745
>> 1911-04-10 1992 3.737824
>> 1911-04-11 1992 3.001586
>> 1911-04-12 1992 3.001586
>> 1911-04-13 1993 2.350298
>> 1911-04-14 1993 2.661784
>> 1911-04-16 1993 3.001586
>> 1911-04-17 1993 2.661784
>> 1911-04-19 1994 2.661784
>> 1911-04-28 1994 3.369705
>> 1911-04-29 1994 3.001586
>> 1911-05-20 1994 2.661784"),header=TRUE)
>>
>> aggregate(Q ~ wyr, data = Daily, which.max)
>>
>> # gives:
>> # wyr Q
>> # 1 1990 4
>> # 2 1991 2
>> # 3 1992 1
>> # 4 1993 3
>> # 5 1994 2
>>
>> I can 'see' that it is returning the which.max() relative to
each
>> grouping. Is there a way to instead return the absolute position (row)
of
>> the max value within each group. i.e.:
>>
>> # Would instead like to have
>> # wyr Q
>> # 1 1990 4
>> # 2 1991 6
>> # 3 1992 9
>> # 4 1993 15
>> # 5 1994 18
>>
>> The icing on the cake would be to get the Julien Day corresponding to
the
>> date on which each year's maximum occurs?
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
On Tue, 6 Jun 2017, Morway, Eric wrote:> Using the dataset below, I got close to what I'm after, but not quite all > the way there. Any suggestions appreciated: > > Daily <- read.table(textConnection(" Date wyr Q > 1911-04-01 1990 4.530695 > 1911-04-02 1990 4.700596 > 1911-04-03 1990 4.898814 > 1911-04-04 1990 5.097032 > 1911-04-05 1991 5.295250 > 1911-04-06 1991 6.569508 > 1911-04-07 1991 5.861587 > 1911-04-08 1991 5.153666 > 1911-04-09 1992 4.445745 > 1911-04-10 1992 3.737824 > 1911-04-11 1992 3.001586 > 1911-04-12 1992 3.001586 > 1911-04-13 1993 2.350298 > 1911-04-14 1993 2.661784 > 1911-04-16 1993 3.001586 > 1911-04-17 1993 2.661784 > 1911-04-19 1994 2.661784 > 1911-04-28 1994 3.369705 > 1911-04-29 1994 3.001586 > 1911-05-20 1994 2.661784"),header=TRUE) > > aggregate(Q ~ wyr, data = Daily, which.max) > > # gives: > # wyr Q > # 1 1990 4 > # 2 1991 2 > # 3 1992 1 > # 4 1993 3 > # 5 1994 2 > > I can 'see' that it is returning the which.max() relative to each > grouping. Is there a way to instead return the absolute position (row) of > the max value within each group. i.e.: > > # Would instead like to have > # wyr Q > # 1 1990 4 > # 2 1991 6 > # 3 1992 9 > # 4 1993 15 > # 5 1994 18 > > The icing on the cake would be to get the Julien Day corresponding to the > date on which each year's maximum occurs? >Like this:> which.max.by.wyr <- with(Daily, which( ave( Q, wyr, FUN=max) == Q)) > cbind( Daily[ which.max.by.wyr, ], index=which.max.by.wyr )Date wyr Q index 4 1911-04-04 1990 5.097032 4 6 1911-04-06 1991 6.569508 6 9 1911-04-09 1992 4.445745 9 15 1911-04-16 1993 3.001586 15 18 1911-04-28 1994 3.369705 18 If there are ties in Q and you do not want more than one max value listed, you can add a litle fuzz to randomly pick one. i.e.> fuzz <- runif(nrow(Daily), 0, 1e-10) > which.max.by.wyr <- with(Daily, which(ave(Q+fuzz,wyr,FUN=max)==Q+fuzz))If you want the first tied value, then sort fuzz before determining which.max.by.wyr. HTH, Chuck