Kevin Zembower
2023-Sep-12 20:50 UTC
[R] Help with plotting and date-times for climate data
Hello, I'm trying to calculate the mean temperature max from a file of climate date, and plot it over a range of days in the year. I've downloaded the data, and cleaned it up the way I think it should be. However, when I plot it, the geom_smooth line doesn't show up. I think that's because my x axis is characters or factors. Here's what I have so far: =======================================library(tidyverse) data <- read_csv("Ely_MN_Weather.csv") start_day = yday(as_date("2023-09-22")) end_day = yday(as_date("2023-10-15")) d <- as_tibble(data) %>% select(DATE,TMAX,TMIN) %>% mutate(DATE = as_date(DATE), yday = yday(DATE), md = sprintf("%02d-%02d", month(DATE), mday(DATE)) ) %>% filter(yday >= start_day & yday <= end_day) %>% mutate(md = as.factor(md)) d_sum <- d %>% group_by(md) %>% summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) ## Here's the filtered data: dput(d_sum)> structure(list(md = structure(1:25, levels = c("09-21", "09-22","09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", "10-14", "10-15"), class = "factor"), tmax_mean = c(65, 62.2222222222222, 61.3, 63.8888888888889, 64.3, 60.1111111111111, 62.3, 60.5, 61.9, 61.2, 63.6666666666667, 59.5, 59.5555555555556, 61.5555555555556, 59.4444444444444, 58.7777777777778, 55.8888888888889, 58.125, 58, 55.6666666666667, 57, 55.4444444444444, 49.7777777777778, 48.75, 43.6666666666667)), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, -25L))>ggplot(data = d_sum, aes(x = md)) + geom_point(aes(y = tmax_mean, color = "blue")) + geom_smooth(aes(y = tmax_mean, color = "blue")) ====================================My questions are: 1. Why isn't my geom_smooth plotting? How can I fix it? 2. I don't think I'm handling the month and day combination correctly. Is there a way to encode month and day (but not year) as a date? 3. (Minor point) Why does my graph of tmax_mean come out red when I specify "blue"? Thanks for any advice or guidance you can offer. I really appreciate the expertise of this group. -Kevin
?s 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu:> Hello, > > I'm trying to calculate the mean temperature max from a file of climate > date, and plot it over a range of days in the year. I've downloaded the > data, and cleaned it up the way I think it should be. However, when I > plot it, the geom_smooth line doesn't show up. I think that's because > my x axis is characters or factors. Here's what I have so far: > =======================================> library(tidyverse) > > data <- read_csv("Ely_MN_Weather.csv") > > start_day = yday(as_date("2023-09-22")) > end_day = yday(as_date("2023-10-15")) > > d <- as_tibble(data) %>% > select(DATE,TMAX,TMIN) %>% > mutate(DATE = as_date(DATE), > yday = yday(DATE), > md = sprintf("%02d-%02d", month(DATE), mday(DATE)) > ) %>% > filter(yday >= start_day & yday <= end_day) %>% > mutate(md = as.factor(md)) > > d_sum <- d %>% > group_by(md) %>% > summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) > > ## Here's the filtered data: > dput(d_sum) > >> structure(list(md = structure(1:25, levels = c("09-21", "09-22", > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", > "10-14", "10-15"), class = "factor"), tmax_mean = c(65, > 62.2222222222222, > 61.3, 63.8888888888889, 64.3, 60.1111111111111, 62.3, 60.5, 61.9, > 61.2, 63.6666666666667, 59.5, 59.5555555555556, 61.5555555555556, > 59.4444444444444, 58.7777777777778, 55.8888888888889, 58.125, > 58, 55.6666666666667, 57, 55.4444444444444, 49.7777777777778, > 48.75, 43.6666666666667)), class = c("tbl_df", "tbl", "data.frame" > ), row.names = c(NA, -25L)) >> > ggplot(data = d_sum, aes(x = md)) + > geom_point(aes(y = tmax_mean, color = "blue")) + > geom_smooth(aes(y = tmax_mean, color = "blue")) > ====================================> My questions are: > 1. Why isn't my geom_smooth plotting? How can I fix it? > 2. I don't think I'm handling the month and day combination correctly. > Is there a way to encode month and day (but not year) as a date? > 3. (Minor point) Why does my graph of tmax_mean come out red when I > specify "blue"? > > Thanks for any advice or guidance you can offer. I really appreciate > the expertise of this group. > > -Kevin > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, The problem is that the dates are factors, not real dates. And geom_smooth is not interpolating along a discrete axis (the x axis). Paste a fake year with md, coerce to date and plot. I have simplified the aes() calls and added a date scale in order to make the x axis more readable. Without the formula and method arguments, geom_smooth will print a message, they are now made explicit. suppressPackageStartupMessages({ library(dplyr) library(ggplot2) }) d_sum %>% mutate(md = paste("2023", md, sep = "-"), md = as.Date(md)) %>% ggplot(aes(x = md, y = tmax_mean)) + geom_point(color = "blue") + geom_smooth( formula = y ~ x, method = loess, color = "blue" ) + scale_x_date(date_breaks = "7 days", date_labels = "%m-%d") Hope this helps, Rui Barradas
Richard O'Keefe
2023-Sep-13 13:58 UTC
[R] Help with plotting and date-times for climate data
Off-topic, but what is a "mean temperature max" and what good would it do you to know you if you did? I've been looking at a lot of weather station data and for no question I've ever had (except "would the newspapers get excited about this") was "max" (or min) the answer. Considering the way that temperature can change by several degrees in a few minutes, or a few metres -- I meant horizontally when I wrote that, but as you know your head and feet don't experience the same temperature, again by more than one degree -- I am at something of a loss to ascribe much practical significance to TMAX. Are you sure this is the analysis you want to do? Is this the most informative data you can get? On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help < r-help at r-project.org> wrote:> Hello, > > I'm trying to calculate the mean temperature max from a file of climate > date, and plot it over a range of days in the year. I've downloaded the > data, and cleaned it up the way I think it should be. However, when I > plot it, the geom_smooth line doesn't show up. I think that's because > my x axis is characters or factors. Here's what I have so far: > =======================================> library(tidyverse) > > data <- read_csv("Ely_MN_Weather.csv") > > start_day = yday(as_date("2023-09-22")) > end_day = yday(as_date("2023-10-15")) > > d <- as_tibble(data) %>% > select(DATE,TMAX,TMIN) %>% > mutate(DATE = as_date(DATE), > yday = yday(DATE), > md = sprintf("%02d-%02d", month(DATE), mday(DATE)) > ) %>% > filter(yday >= start_day & yday <= end_day) %>% > mutate(md = as.factor(md)) > > d_sum <- d %>% > group_by(md) %>% > summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) > > ## Here's the filtered data: > dput(d_sum) > > > structure(list(md = structure(1:25, levels = c("09-21", "09-22", > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", > "10-14", "10-15"), class = "factor"), tmax_mean = c(65, > 62.2222222222222, > 61.3, 63.8888888888889, 64.3, 60.1111111111111, 62.3, 60.5, 61.9, > 61.2, 63.6666666666667, 59.5, 59.5555555555556, 61.5555555555556, > 59.4444444444444, 58.7777777777778, 55.8888888888889, 58.125, > 58, 55.6666666666667, 57, 55.4444444444444, 49.7777777777778, > 48.75, 43.6666666666667)), class = c("tbl_df", "tbl", "data.frame" > ), row.names = c(NA, -25L)) > > > ggplot(data = d_sum, aes(x = md)) + > geom_point(aes(y = tmax_mean, color = "blue")) + > geom_smooth(aes(y = tmax_mean, color = "blue")) > ====================================> My questions are: > 1. Why isn't my geom_smooth plotting? How can I fix it? > 2. I don't think I'm handling the month and day combination correctly. > Is there a way to encode month and day (but not year) as a date? > 3. (Minor point) Why does my graph of tmax_mean come out red when I > specify "blue"? > > Thanks for any advice or guidance you can offer. I really appreciate > the expertise of this group. > > -Kevin > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Martin Møller Skarbiniks Pedersen
2023-Sep-15 20:09 UTC
[R] Help with plotting and date-times for climate data
Change geom_point(aes(y = tmax_mean, color = "blue")) to geom_point(aes(y = tmax_mean), color = "blue") if you want blue points. aes(color = ) does not set the color of the points. aes(color = ) takes a column (best if it is a factor) and uses that for different colors. /Martin On Tue, Sep 12, 2023, 22:50 Kevin Zembower via R-help <r-help at r-project.org> wrote:> Hello, > > I'm trying to calculate the mean temperature max from a file of climate > date, and plot it over a range of days in the year. I've downloaded the > data, and cleaned it up the way I think it should be. However, when I > plot it, the geom_smooth line doesn't show up. I think that's because > my x axis is characters or factors. Here's what I have so far: > =======================================> library(tidyverse) > > data <- read_csv("Ely_MN_Weather.csv") > > start_day = yday(as_date("2023-09-22")) > end_day = yday(as_date("2023-10-15")) > > d <- as_tibble(data) %>% > select(DATE,TMAX,TMIN) %>% > mutate(DATE = as_date(DATE), > yday = yday(DATE), > md = sprintf("%02d-%02d", month(DATE), mday(DATE)) > ) %>% > filter(yday >= start_day & yday <= end_day) %>% > mutate(md = as.factor(md)) > > d_sum <- d %>% > group_by(md) %>% > summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) > > ## Here's the filtered data: > dput(d_sum) > > > structure(list(md = structure(1:25, levels = c("09-21", "09-22", > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", > "10-14", "10-15"), class = "factor"), tmax_mean = c(65, > 62.2222222222222, > 61.3, 63.8888888888889, 64.3, 60.1111111111111, 62.3, 60.5, 61.9, > 61.2, 63.6666666666667, 59.5, 59.5555555555556, 61.5555555555556, > 59.4444444444444, 58.7777777777778, 55.8888888888889, 58.125, > 58, 55.6666666666667, 57, 55.4444444444444, 49.7777777777778, > 48.75, 43.6666666666667)), class = c("tbl_df", "tbl", "data.frame" > ), row.names = c(NA, -25L)) > > > ggplot(data = d_sum, aes(x = md)) + > geom_point(aes(y = tmax_mean, color = "blue")) + > geom_smooth(aes(y = tmax_mean, color = "blue")) > ====================================> My questions are: > 1. Why isn't my geom_smooth plotting? How can I fix it? > 2. I don't think I'm handling the month and day combination correctly. > Is there a way to encode month and day (but not year) as a date? > 3. (Minor point) Why does my graph of tmax_mean come out red when I > specify "blue"? > > Thanks for any advice or guidance you can offer. I really appreciate > the expertise of this group. > > -Kevin > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]