Using the dataset below, I got close to what I'm after, but not quite all the way there. Any suggestions appreciated: Daily <- read.table(textConnection(" Date wyr Q 1911-04-01 1990 4.530695 1911-04-02 1990 4.700596 1911-04-03 1990 4.898814 1911-04-04 1990 5.097032 1911-04-05 1991 5.295250 1911-04-06 1991 6.569508 1911-04-07 1991 5.861587 1911-04-08 1991 5.153666 1911-04-09 1992 4.445745 1911-04-10 1992 3.737824 1911-04-11 1992 3.001586 1911-04-12 1992 3.001586 1911-04-13 1993 2.350298 1911-04-14 1993 2.661784 1911-04-16 1993 3.001586 1911-04-17 1993 2.661784 1911-04-19 1994 2.661784 1911-04-28 1994 3.369705 1911-04-29 1994 3.001586 1911-05-20 1994 2.661784"),header=TRUE) aggregate(Q ~ wyr, data = Daily, which.max) # gives: # wyr Q # 1 1990 4 # 2 1991 2 # 3 1992 1 # 4 1993 3 # 5 1994 2 I can 'see' that it is returning the which.max() relative to each grouping. Is there a way to instead return the absolute position (row) of the max value within each group. i.e.: # Would instead like to have # wyr Q # 1 1990 4 # 2 1991 6 # 3 1992 9 # 4 1993 15 # 5 1994 18 The icing on the cake would be to get the Julien Day corresponding to the date on which each year's maximum occurs? [[alternative HTML version deleted]]
cumsum() seems to be what you need. This can probably be done more elegantly, but ... out <- aggregate(Q ~ wyr, data = Daily, which.max) tbl <- table(Daily$wyr) out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)])) out ## yields wyr Q 1 1990 4 2 1991 6 3 1992 9 4 1993 15 5 1994 18 I leave the matter of Julian dates to you or others. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov> wrote:> Using the dataset below, I got close to what I'm after, but not quite all > the way there. Any suggestions appreciated: > > Daily <- read.table(textConnection(" Date wyr Q > 1911-04-01 1990 4.530695 > 1911-04-02 1990 4.700596 > 1911-04-03 1990 4.898814 > 1911-04-04 1990 5.097032 > 1911-04-05 1991 5.295250 > 1911-04-06 1991 6.569508 > 1911-04-07 1991 5.861587 > 1911-04-08 1991 5.153666 > 1911-04-09 1992 4.445745 > 1911-04-10 1992 3.737824 > 1911-04-11 1992 3.001586 > 1911-04-12 1992 3.001586 > 1911-04-13 1993 2.350298 > 1911-04-14 1993 2.661784 > 1911-04-16 1993 3.001586 > 1911-04-17 1993 2.661784 > 1911-04-19 1994 2.661784 > 1911-04-28 1994 3.369705 > 1911-04-29 1994 3.001586 > 1911-05-20 1994 2.661784"),header=TRUE) > > aggregate(Q ~ wyr, data = Daily, which.max) > > # gives: > # wyr Q > # 1 1990 4 > # 2 1991 2 > # 3 1992 1 > # 4 1993 3 > # 5 1994 2 > > I can 'see' that it is returning the which.max() relative to each > grouping. Is there a way to instead return the absolute position (row) of > the max value within each group. i.e.: > > # Would instead like to have > # wyr Q > # 1 1990 4 > # 2 1991 6 > # 3 1992 9 > # 4 1993 15 > # 5 1994 18 > > The icing on the cake would be to get the Julien Day corresponding to the > date on which each year's maximum occurs? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
If you want the Julian date, you could use Bert's index on the original data frame: Daily[out$Q, ] Date wyr Q 4 1911-04-04 1990 5.097032 6 1911-04-06 1991 6.569508 9 1911-04-09 1992 4.445745 15 1911-04-16 1993 3.001586 18 1911-04-28 1994 3.369705 Another way to get that index would be to use by(): idx <- as.vector(by(Daily, Daily$wyr, function(x) rownames(x)[which.max(x$Q)])) Daily[idx, ] ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter Sent: Tuesday, June 6, 2017 9:16 PM To: Morway, Eric <emorway at usgs.gov> Cc: R mailing list <r-help at r-project.org> Subject: Re: [R] Determining which.max() within groups cumsum() seems to be what you need. This can probably be done more elegantly, but ... out <- aggregate(Q ~ wyr, data = Daily, which.max) tbl <- table(Daily$wyr) out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)])) out ## yields wyr Q 1 1990 4 2 1991 6 3 1992 9 4 1993 15 5 1994 18 I leave the matter of Julian dates to you or others. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov> wrote:> Using the dataset below, I got close to what I'm after, but not quite all > the way there. Any suggestions appreciated: > > Daily <- read.table(textConnection(" Date wyr Q > 1911-04-01 1990 4.530695 > 1911-04-02 1990 4.700596 > 1911-04-03 1990 4.898814 > 1911-04-04 1990 5.097032 > 1911-04-05 1991 5.295250 > 1911-04-06 1991 6.569508 > 1911-04-07 1991 5.861587 > 1911-04-08 1991 5.153666 > 1911-04-09 1992 4.445745 > 1911-04-10 1992 3.737824 > 1911-04-11 1992 3.001586 > 1911-04-12 1992 3.001586 > 1911-04-13 1993 2.350298 > 1911-04-14 1993 2.661784 > 1911-04-16 1993 3.001586 > 1911-04-17 1993 2.661784 > 1911-04-19 1994 2.661784 > 1911-04-28 1994 3.369705 > 1911-04-29 1994 3.001586 > 1911-05-20 1994 2.661784"),header=TRUE) > > aggregate(Q ~ wyr, data = Daily, which.max) > > # gives: > # wyr Q > # 1 1990 4 > # 2 1991 2 > # 3 1992 1 > # 4 1993 3 > # 5 1994 2 > > I can 'see' that it is returning the which.max() relative to each > grouping. Is there a way to instead return the absolute position (row) of > the max value within each group. i.e.: > > # Would instead like to have > # wyr Q > # 1 1990 4 > # 2 1991 6 > # 3 1992 9 > # 4 1993 15 > # 5 1994 18 > > The icing on the cake would be to get the Julien Day corresponding to the > date on which each year's maximum occurs? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Aggregate can do both which.max and group length calculations, but the result ends up as a matrix inside the data frame, which I find cumbersome to work with. Daily <- read.table( text " Date wyr Q 1911-04-01 1990 4.530695 1911-04-02 1990 4.700596 1911-04-03 1990 4.898814 1911-04-04 1990 5.097032 1911-04-05 1991 5.295250 1911-04-06 1991 6.569508 1911-04-07 1991 5.861587 1911-04-08 1991 5.153666 1911-04-09 1992 4.445745 1911-04-10 1992 3.737824 1911-04-11 1992 3.001586 1911-04-12 1992 3.001586 1911-04-13 1993 2.350298 1911-04-14 1993 2.661784 1911-04-16 1993 3.001586 1911-04-17 1993 2.661784 1911-04-19 1994 2.661784 1911-04-28 1994 3.369705 1911-04-29 1994 3.001586 1911-05-20 1994 2.661784 ", header = TRUE, stringsAsFactors=FALSE) # this algorithm only works if wyr groups are contiguous out <- out[ order(out$wyr), ] # generate a data frame with key column wyr and matrix Q as the second column out <- aggregate( Q ~ wyr , data = Daily , FUN = function(x) { c( WM = which.max(x) , n=length( x ) ) } ) # put matrix into separate columns Q.WM out[ , paste( "Q", colnames( out$Q ), sep="." ) ] <- out$Q # drop the matrix out$Q <- NULL # form absolute indexes Q.N out <- within( out, { Q.maxidx <- cumsum( c( 0, Q.n[ -length(Q.n) ] ) ) + Q.WM }) result <- Daily[ with( out, Q.maxidx ), ] # or save ourselves some effort library(dplyr) result2 <- ( Daily %>% group_by( wyr ) %>% slice( which.max( Q ) ) %>% as.data.frame ) On Tue, 6 Jun 2017, Bert Gunter wrote:> cumsum() seems to be what you need. > > This can probably be done more elegantly, but ... > > out <- aggregate(Q ~ wyr, data = Daily, which.max) > tbl <- table(Daily$wyr) > out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)])) > out > > ## yields > > wyr Q > 1 1990 4 > 2 1991 6 > 3 1992 9 > 4 1993 15 > 5 1994 18 > > I leave the matter of Julian dates to you or others. > > Cheers, > Bert > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov> wrote: >> Using the dataset below, I got close to what I'm after, but not quite all >> the way there. Any suggestions appreciated: >> >> Daily <- read.table(textConnection(" Date wyr Q >> 1911-04-01 1990 4.530695 >> 1911-04-02 1990 4.700596 >> 1911-04-03 1990 4.898814 >> 1911-04-04 1990 5.097032 >> 1911-04-05 1991 5.295250 >> 1911-04-06 1991 6.569508 >> 1911-04-07 1991 5.861587 >> 1911-04-08 1991 5.153666 >> 1911-04-09 1992 4.445745 >> 1911-04-10 1992 3.737824 >> 1911-04-11 1992 3.001586 >> 1911-04-12 1992 3.001586 >> 1911-04-13 1993 2.350298 >> 1911-04-14 1993 2.661784 >> 1911-04-16 1993 3.001586 >> 1911-04-17 1993 2.661784 >> 1911-04-19 1994 2.661784 >> 1911-04-28 1994 3.369705 >> 1911-04-29 1994 3.001586 >> 1911-05-20 1994 2.661784"),header=TRUE) >> >> aggregate(Q ~ wyr, data = Daily, which.max) >> >> # gives: >> # wyr Q >> # 1 1990 4 >> # 2 1991 2 >> # 3 1992 1 >> # 4 1993 3 >> # 5 1994 2 >> >> I can 'see' that it is returning the which.max() relative to each >> grouping. Is there a way to instead return the absolute position (row) of >> the max value within each group. i.e.: >> >> # Would instead like to have >> # wyr Q >> # 1 1990 4 >> # 2 1991 6 >> # 3 1992 9 >> # 4 1993 15 >> # 5 1994 18 >> >> The icing on the cake would be to get the Julien Day corresponding to the >> date on which each year's maximum occurs? >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
On Tue, 6 Jun 2017, Morway, Eric wrote:> Using the dataset below, I got close to what I'm after, but not quite all > the way there. Any suggestions appreciated: > > Daily <- read.table(textConnection(" Date wyr Q > 1911-04-01 1990 4.530695 > 1911-04-02 1990 4.700596 > 1911-04-03 1990 4.898814 > 1911-04-04 1990 5.097032 > 1911-04-05 1991 5.295250 > 1911-04-06 1991 6.569508 > 1911-04-07 1991 5.861587 > 1911-04-08 1991 5.153666 > 1911-04-09 1992 4.445745 > 1911-04-10 1992 3.737824 > 1911-04-11 1992 3.001586 > 1911-04-12 1992 3.001586 > 1911-04-13 1993 2.350298 > 1911-04-14 1993 2.661784 > 1911-04-16 1993 3.001586 > 1911-04-17 1993 2.661784 > 1911-04-19 1994 2.661784 > 1911-04-28 1994 3.369705 > 1911-04-29 1994 3.001586 > 1911-05-20 1994 2.661784"),header=TRUE) > > aggregate(Q ~ wyr, data = Daily, which.max) > > # gives: > # wyr Q > # 1 1990 4 > # 2 1991 2 > # 3 1992 1 > # 4 1993 3 > # 5 1994 2 > > I can 'see' that it is returning the which.max() relative to each > grouping. Is there a way to instead return the absolute position (row) of > the max value within each group. i.e.: > > # Would instead like to have > # wyr Q > # 1 1990 4 > # 2 1991 6 > # 3 1992 9 > # 4 1993 15 > # 5 1994 18 > > The icing on the cake would be to get the Julien Day corresponding to the > date on which each year's maximum occurs? >Like this:> which.max.by.wyr <- with(Daily, which( ave( Q, wyr, FUN=max) == Q)) > cbind( Daily[ which.max.by.wyr, ], index=which.max.by.wyr )Date wyr Q index 4 1911-04-04 1990 5.097032 4 6 1911-04-06 1991 6.569508 6 9 1911-04-09 1992 4.445745 9 15 1911-04-16 1993 3.001586 15 18 1911-04-28 1994 3.369705 18 If there are ties in Q and you do not want more than one max value listed, you can add a litle fuzz to randomly pick one. i.e.> fuzz <- runif(nrow(Daily), 0, 1e-10) > which.max.by.wyr <- with(Daily, which(ave(Q+fuzz,wyr,FUN=max)==Q+fuzz))If you want the first tied value, then sort fuzz before determining which.max.by.wyr. HTH, Chuck