thr3ads.net - R help - [R] Correctly applying aggregate.ts() [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Rich Shepard

2018-Sep-07 21:19 UTC

[R] Correctly applying aggregate.ts()

I've read ?aggregate and several blog posts on using aggregate() yet I
still haven't applied it correctly to my dataframe. The sample data are:

structure(list(sampdate = c("2005-01-01", "2005-01-02",
"2005-01-03",
"2005-01-04", "2005-01-05", "2005-01-06",
"2005-01-07", "2005-01-08",
"2005-01-09", "2005-01-10", "2005-01-11",
"2005-01-12", "2005-01-13",
"2005-01-14", "2005-01-15", "2005-01-16",
"2005-01-17", "2005-01-18",
"2005-01-19", "2005-01-20", "2005-01-21",
"2005-01-22", "2005-01-23",
"2005-01-24", "2005-01-25", "2005-01-26",
"2005-01-27", "2005-01-28",
"2005-01-29", "2005-01-30", "2005-01-31",
"2005-02-01", "2005-02-02",
"2005-02-03", "2005-02-04", "2005-02-05",
"2005-02-06", "2005-02-07",
"2005-02-08", "2005-02-09", "2005-02-10",
"2005-02-11", "2005-02-12",
"2005-02-13", "2005-02-14", "2005-02-15",
"2005-02-16", "2005-02-17",
"2005-02-18", "2005-02-19", "2005-02-20",
"2005-02-21", "2005-02-22",
"2005-02-23", "2005-02-24", "2005-02-25",
"2005-02-26", "2005-02-27",
"2005-02-28", "2005-03-01", "2005-03-02",
"2005-03-03"), prcp = c(0.59,
0.08, 0.1, 0, 0, 0.02, 0.05, 0.1, 0, 0.02, 0, 0.05, 0.2, 0, 0, 
0.5, 0.41, 0.84, 0.01, 0.1, 0.01, 0, 0, 0, 0, 0.21, 0.24, 0.13, 
1.12, 0.01, 0.09, 0, 0, 0, 0.35, 0.18, 0.65, 0.16, 0, 0, 0, 0, 
0.55, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.05, 
0.01, 0)), row.names = c(NA, 62L), class = "data.frame")

   What I need to learn how to do is to calculate monthly sum, median, and
maximum rainfall amounts from the full data set which has daily rainfall
amounts. My most current effort to calculate monthly sums uses this syntax:

monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'], by =
list(month = \
substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)

(entered on a single line) which produces this result:

head(monthly.rain)
[1] NA

   The sample data has 62 of the 113K rows in the dataframe. A larger set can
be provided if needed.

   An explanation of what I've missed is needed.

Regards,

Rich

Bert Gunter

2018-Sep-07 22:25 UTC

head link

[R] Correctly applying aggregate.ts()

Well, let's see:
"monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'],
by = list(month = \
substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)"

1. x is a data frame, so why are you using the time series method?
Perhaps you need to study S3 method usage in R.

2. You have improperly subscripted the data frame: it should be dp[,
c('sampdate','prcp')] . Perhaps you need to read about how
subscripting in R. However, in this case, no subscripting is needed
(see 3.)

3. As you should be using the data frame method, and the month is
obtained as a substring of sampdate, you should use dp[,'prcp'] as
your data frame so that sum() is not applied to the sampdate column.

4. I assume the "\" indicates <Return> ?

Anyway, once you have corrected all that, here's the call:
> monthly.rain <- aggregate(dp[, 'prcp'],+                           list(substr(dp$sampdate,1,7)),
+                           FUN = sum, na.rm = TRUE)> ## yielding
> monthly.rain  Group.1    x
1 2005-01 4.88
2 2005-02 2.27
3 2005-03 0.06

It's perhaps also worth noting that the formula method (for data
frames) is somewhat more convenient, especially with several grouping
factors in the list:
> monthly.rain <- aggregate(prcp ~ substr(sampdate,1,7), data = dp, FUN =
sum, na.rm = TRUE)
> ##yielding
> monthly.rain  substr(sampdate, 1, 7) prcp
1                2005-01 4.88
2                2005-02 2.27
3                2005-03 0.06

Cheers,

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Sep 7, 2018 at 2:19 PM Rich Shepard <rshepard at appl-ecosys.com>
wrote:>
>    I've read ?aggregate and several blog posts on using aggregate() yet
I
> still haven't applied it correctly to my dataframe. The sample data
are:
>
> structure(list(sampdate = c("2005-01-01", "2005-01-02",
"2005-01-03",
> "2005-01-04", "2005-01-05", "2005-01-06",
"2005-01-07", "2005-01-08",
> "2005-01-09", "2005-01-10", "2005-01-11",
"2005-01-12", "2005-01-13",
> "2005-01-14", "2005-01-15", "2005-01-16",
"2005-01-17", "2005-01-18",
> "2005-01-19", "2005-01-20", "2005-01-21",
"2005-01-22", "2005-01-23",
> "2005-01-24", "2005-01-25", "2005-01-26",
"2005-01-27", "2005-01-28",
> "2005-01-29", "2005-01-30", "2005-01-31",
"2005-02-01", "2005-02-02",
> "2005-02-03", "2005-02-04", "2005-02-05",
"2005-02-06", "2005-02-07",
> "2005-02-08", "2005-02-09", "2005-02-10",
"2005-02-11", "2005-02-12",
> "2005-02-13", "2005-02-14", "2005-02-15",
"2005-02-16", "2005-02-17",
> "2005-02-18", "2005-02-19", "2005-02-20",
"2005-02-21", "2005-02-22",
> "2005-02-23", "2005-02-24", "2005-02-25",
"2005-02-26", "2005-02-27",
> "2005-02-28", "2005-03-01", "2005-03-02",
"2005-03-03"), prcp = c(0.59,
> 0.08, 0.1, 0, 0, 0.02, 0.05, 0.1, 0, 0.02, 0, 0.05, 0.2, 0, 0,
> 0.5, 0.41, 0.84, 0.01, 0.1, 0.01, 0, 0, 0, 0, 0.21, 0.24, 0.13,
> 1.12, 0.01, 0.09, 0, 0, 0, 0.35, 0.18, 0.65, 0.16, 0, 0, 0, 0,
> 0.55, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.05,
> 0.01, 0)), row.names = c(NA, 62L), class = "data.frame")
>
>    What I need to learn how to do is to calculate monthly sum, median, and
> maximum rainfall amounts from the full data set which has daily rainfall
> amounts. My most current effort to calculate monthly sums uses this syntax:
>
> monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'],
by = list(month = \
> substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)
>
> (entered on a single line) which produces this result:
>
> head(monthly.rain)
> [1] NA
>
>    The sample data has 62 of the 113K rows in the dataframe. A larger set
can
> be provided if needed.
>
>    An explanation of what I've missed is needed.
>
> Regards,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2018-Sep-07 22:34 UTC

head link

[R] Correctly applying aggregate.ts()

Clarification: When using the formula interface, no subscripting is needed.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 7, 2018 at 3:25 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:>
> Well, let's see:
> "monthly.rain <- aggregate.ts(x =
dp['sampdate','prcp'], by = list(month = \
> substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)"
>
> 1. x is a data frame, so why are you using the time series method?
> Perhaps you need to study S3 method usage in R.
>
> 2. You have improperly subscripted the data frame: it should be dp[,
> c('sampdate','prcp')] . Perhaps you need to read about how
> subscripting in R. However, in this case, no subscripting is needed
> (see 3.)
>
> 3. As you should be using the data frame method, and the month is
> obtained as a substring of sampdate, you should use dp[,'prcp'] as
> your data frame so that sum() is not applied to the sampdate column.
>
> 4. I assume the "\" indicates <Return> ?
>
> Anyway, once you have corrected all that, here's the call:
>
> > monthly.rain <- aggregate(dp[, 'prcp'],
> +                           list(substr(dp$sampdate,1,7)),
> +                           FUN = sum, na.rm = TRUE)
> > ## yielding
> > monthly.rain
>   Group.1    x
> 1 2005-01 4.88
> 2 2005-02 2.27
> 3 2005-03 0.06
>
> It's perhaps also worth noting that the formula method (for data
> frames) is somewhat more convenient, especially with several grouping
> factors in the list:
>
> > monthly.rain <- aggregate(prcp ~ substr(sampdate,1,7), data = dp,
FUN = sum, na.rm = TRUE)
> > ##yielding
> > monthly.rain
>   substr(sampdate, 1, 7) prcp
> 1                2005-01 4.88
> 2                2005-02 2.27
> 3                2005-03 0.06
>
> Cheers,
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
> On Fri, Sep 7, 2018 at 2:19 PM Rich Shepard <rshepard at
appl-ecosys.com> wrote:
> >
> >    I've read ?aggregate and several blog posts on using
aggregate() yet I
> > still haven't applied it correctly to my dataframe. The sample
data are:
> >
> > structure(list(sampdate = c("2005-01-01",
"2005-01-02", "2005-01-03",
> > "2005-01-04", "2005-01-05",
"2005-01-06", "2005-01-07", "2005-01-08",
> > "2005-01-09", "2005-01-10",
"2005-01-11", "2005-01-12", "2005-01-13",
> > "2005-01-14", "2005-01-15",
"2005-01-16", "2005-01-17", "2005-01-18",
> > "2005-01-19", "2005-01-20",
"2005-01-21", "2005-01-22", "2005-01-23",
> > "2005-01-24", "2005-01-25",
"2005-01-26", "2005-01-27", "2005-01-28",
> > "2005-01-29", "2005-01-30",
"2005-01-31", "2005-02-01", "2005-02-02",
> > "2005-02-03", "2005-02-04",
"2005-02-05", "2005-02-06", "2005-02-07",
> > "2005-02-08", "2005-02-09",
"2005-02-10", "2005-02-11", "2005-02-12",
> > "2005-02-13", "2005-02-14",
"2005-02-15", "2005-02-16", "2005-02-17",
> > "2005-02-18", "2005-02-19",
"2005-02-20", "2005-02-21", "2005-02-22",
> > "2005-02-23", "2005-02-24",
"2005-02-25", "2005-02-26", "2005-02-27",
> > "2005-02-28", "2005-03-01",
"2005-03-02", "2005-03-03"), prcp = c(0.59,
> > 0.08, 0.1, 0, 0, 0.02, 0.05, 0.1, 0, 0.02, 0, 0.05, 0.2, 0, 0,
> > 0.5, 0.41, 0.84, 0.01, 0.1, 0.01, 0, 0, 0, 0, 0.21, 0.24, 0.13,
> > 1.12, 0.01, 0.09, 0, 0, 0, 0.35, 0.18, 0.65, 0.16, 0, 0, 0, 0,
> > 0.55, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.05,
> > 0.01, 0)), row.names = c(NA, 62L), class = "data.frame")
> >
> >    What I need to learn how to do is to calculate monthly sum, median,
and
> > maximum rainfall amounts from the full data set which has daily
rainfall
> > amounts. My most current effort to calculate monthly sums uses this
syntax:
> >
> > monthly.rain <- aggregate.ts(x =
dp['sampdate','prcp'], by = list(month = \
> > substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)
> >
> > (entered on a single line) which produces this result:
> >
> > head(monthly.rain)
> > [1] NA
> >
> >    The sample data has 62 of the 113K rows in the dataframe. A larger
set can
> > be provided if needed.
> >
> >    An explanation of what I've missed is needed.
> >
> > Regards,
> >
> > Rich
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

Rich Shepard

2018-Sep-07 22:39 UTC

head link

[R] Correctly applying aggregate.ts() [RESOLVED]

On Fri, 7 Sep 2018, Bert Gunter wrote:
> Well, let's see:
> "monthly.rain <- aggregate.ts(x =
dp['sampdate','prcp'], by = list(month = \
> substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)"
>
> 1. x is a data frame, so why are you using the time series method?
> Perhaps you need to study S3 method usage in R.
Bert,

   I saw the four varieties of aggregate and thought the time series
appropriate for the data frame of sequential dates. As I wrote, I had
difficulties understanding which flavor to use.
> 2. You have improperly subscripted the data frame: it should be dp[,
> c('sampdate','prcp')] . Perhaps you need to read about how
> subscripting in R. However, in this case, no subscripting is needed
> (see 3.)
   Ah so. All the examples I saw used single column data frames.
> 3. As you should be using the data frame method, and the month is
> obtained as a substring of sampdate, you should use dp[,'prcp'] as
> your data frame so that sum() is not applied to the sampdate column.
>
> 4. I assume the "\" indicates <Return> ?
   Yes. Alpine broke the line so I added a newline to the first part.
> Anyway, once you have corrected all that, here's the call:
>
>> monthly.rain <- aggregate(dp[, 'prcp'],
> +                           list(substr(dp$sampdate,1,7)),
> +                           FUN = sum, na.rm = TRUE)
   Thanks for making the syntax so clear.
> It's perhaps also worth noting that the formula method (for data
> frames) is somewhat more convenient, especially with several grouping
> factors in the list:
>
>> monthly.rain <- aggregate(prcp ~ substr(sampdate,1,7), data = dp,
FUN = sum, na.rm = TRUE)
>> ##yielding
>> monthly.rain
>  substr(sampdate, 1, 7) prcp
> 1                2005-01 4.88
> 2                2005-02 2.27
> 3                2005-03 0.06
   I looked at the formula method without appreciating how to apply it.

   Now I can work with the multiple of daily data sets I have and properly
condense them for presentation to readers of the report. And I'm much better
armed to understand how to apply aggregate() to various data sets.

Very much appreciated,

Rich
a

Rui Barradas

2018-Sep-08 10:51 UTC

head link

[R] Correctly applying aggregate.ts()

Hello,

Like Bert said, your data is a data.frame so there is no need to call 
aggregate.ts. Besides, R will call the right method so unless you want 
to change the standard behaviour, it would be enough to call aggregate 
and let the methods dispatch code to its job.

As for the problem, first an example of the formula interface, which I 
almost always prefer.


aggregate(prcp ~ substr(sampdate, 1, 7), data = dp, FUN = sum, na.rm = TRUE)
#  substr(sampdate, 1, 7) prcp
#1                2005-01 4.88
#2                2005-02 2.27
#3                2005-03 0.06


Now, you would have to change the name of the Month column, but it 
worked as expected, there was no NA issues.
And there is no need to subset the data.frame, R will find the columns 
where they are, by their names, as long as you pass the argument data = 
dp to aggregate.


If you want several statistics at the same time, it's a bit trickier, 
but with practice it becomes intuitive. (So to speak.)

Define a custom summary function. I haven't changed the default na.rm 
setting but it would make the rest of the code simpler to set na.rm = 
TRUE right now.

customSmry <- function(x, na.rm = FALSE){
   c(Sum = sum(x, na.rm = na.rm),
     Median = median(x, na.rm = na.rm),
     Max = max(x, na.rm = na.rm)
   )
}


#Now call aggregate:

agg <- aggregate(prcp ~ substr(sampdate, 1, 7), dp, FUN = customSmry, 
na.rm = TRUE)


But be VERY carefull, the result is not a df with 4 columns, it's a df 
with only two columns, the second being a matrix as you can see in the 
output of str.


str(agg)
#'data.frame':	3 obs. of  2 variables:
# $ substr(sampdate, 1, 7): chr  "2005-01" "2005-02"
"2005-03"
# $ prcp                  : num [1:3, 1:3] 4.88 2.27 0.06 0.05 0 0.01 
1.12 0.65 0.05
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr  "Sum" "Median" "Max"


So the final steps will be to cbind those two "columns" into a df.
"columns" is between quotes because I am not cbinding the first
column,
I'm cbinding the sub-df agg[1]. Like this the method of cbind that is 
called is cbind.data.frame and the result is a df.
Also, since df's are lists, the second column is an actual column but 
not a vector, an object of class matrix. This column is a list member, 
like all df columns and I will subset the df 'agg' as a list, agg[[2]].

As a bonus, the colnames of the matrix are immediately right, no prcp 
prefix. The first column's name comes from the function substr, and is 
not part of this story, just rename it when it's all done.


agg <- cbind(agg[1], agg[[2]])
str(agg)
#'data.frame':	3 obs. of  4 variables:
# $ substr(sampdate, 1, 7): chr  "2005-01" "2005-02"
"2005-03"
# $ Sum                   : num  4.88 2.27 0.06
# $ Median                : num  0.05 0 0.01
# $ Max                   : num  1.12 0.65 0.05

names(agg)[1] <- "Month"
agg
#    Month  Sum Median  Max
#1 2005-01 4.88   0.05 1.12
#2 2005-02 2.27   0.00 0.65
#3 2005-03 0.06   0.01 0.05


Finally, try to get some practice with the formula interface, you will 
see that it pays in code simplicity and readability.


Hope this helps,

Rui Barradas

?s 22:19 de 07/09/2018, Rich Shepard escreveu:>  ? I've read ?aggregate and several blog posts on using aggregate() yet
I
> still haven't applied it correctly to my dataframe. The sample data
are:
> 
> structure(list(sampdate = c("2005-01-01", "2005-01-02",
"2005-01-03",
> "2005-01-04", "2005-01-05", "2005-01-06",
"2005-01-07", "2005-01-08",
> "2005-01-09", "2005-01-10", "2005-01-11",
"2005-01-12", "2005-01-13",
> "2005-01-14", "2005-01-15", "2005-01-16",
"2005-01-17", "2005-01-18",
> "2005-01-19", "2005-01-20", "2005-01-21",
"2005-01-22", "2005-01-23",
> "2005-01-24", "2005-01-25", "2005-01-26",
"2005-01-27", "2005-01-28",
> "2005-01-29", "2005-01-30", "2005-01-31",
"2005-02-01", "2005-02-02",
> "2005-02-03", "2005-02-04", "2005-02-05",
"2005-02-06", "2005-02-07",
> "2005-02-08", "2005-02-09", "2005-02-10",
"2005-02-11", "2005-02-12",
> "2005-02-13", "2005-02-14", "2005-02-15",
"2005-02-16", "2005-02-17",
> "2005-02-18", "2005-02-19", "2005-02-20",
"2005-02-21", "2005-02-22",
> "2005-02-23", "2005-02-24", "2005-02-25",
"2005-02-26", "2005-02-27",
> "2005-02-28", "2005-03-01", "2005-03-02",
"2005-03-03"), prcp = c(0.59,
> 0.08, 0.1, 0, 0, 0.02, 0.05, 0.1, 0, 0.02, 0, 0.05, 0.2, 0, 0, 0.5, 
> 0.41, 0.84, 0.01, 0.1, 0.01, 0, 0, 0, 0, 0.21, 0.24, 0.13, 1.12, 0.01, 
> 0.09, 0, 0, 0, 0.35, 0.18, 0.65, 0.16, 0, 0, 0, 0, 0.55, 0.21, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.05, 0.01, 0)), row.names = 
> c(NA, 62L), class = "data.frame")
> 
>  ? What I need to learn how to do is to calculate monthly sum, median, and
> maximum rainfall amounts from the full data set which has daily rainfall
> amounts. My most current effort to calculate monthly sums uses this syntax:
> 
> monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'],
by = list(month = \
> substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)
> 
> (entered on a single line) which produces this result:
> 
> head(monthly.rain)
> [1] NA
> 
>  ? The sample data has 62 of the 113K rows in the dataframe. A larger 
> set can
> be provided if needed.
> 
>  ? An explanation of what I've missed is needed.
> 
> Regards,
> 
> Rich
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rich Shepard

2018-Sep-08 14:43 UTC

head link

[R] Correctly applying aggregate.ts()

On Sat, 8 Sep 2018, Rui Barradas wrote:
> Like Bert said, your data is a data.frame so there is no need to call 
> aggregate.ts. Besides, R will call the right method so unless you want to 
> change the standard behaviour, it would be enough to call aggregate and let
> the methods dispatch code to its job.
Rui,

   I have no excuse (and certainly no valid reason) for not calling aggregate
itself.

   <Exceptionlly well written tutorial deleted.>
> Hope this helps,
   Very much so. It has also strengthed my abilities to learn how to use
other functions new to me.

Best regards,

Rich

R help - Sep 2018 - Correctly applying aggregate.ts()

[R] Correctly applying aggregate.ts()

[R] Correctly applying aggregate.ts()

[R] Correctly applying aggregate.ts()

[R] Correctly applying aggregate.ts() [RESOLVED]

[R] Correctly applying aggregate.ts()

[R] Correctly applying aggregate.ts()