Hi, I am fairly new to 'R' and would like advice on the following. I want to calculate a weekly average number of reports (e.g. of flu, norovirus) based on the same weeks for the last five years. I will then use this to plot a chart with 52 points for the average based on the last five years; another line will then plot the current year, enabling a comparison of current weekly counts against a five year average for the same week. I would like some advice on how this can be done in 'R' . My data is disaggregated data - with dates in the format in 01/01/2018. Thanks Shakeel Suleman ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE ************************************************************************** [[alternative HTML version deleted]]
Hi Shakeel, One approach would be to look at the dplyr package and its functions group_by() and summarise(). These should be useful in preparing the data. (Alternatively if you know SQL you might look at dbplyr.) On the plotting side you can use plot(...) for the first line and then lines(...) for the second line. Or you can go with the ggplot2 package for the charts but that might require a bit more time to get up to speed. Good luck, Eric On Wed, May 9, 2018 at 9:37 AM, Shakeel Suleman <Shakeel.Suleman at phe.gov.uk> wrote:> Hi, > > I am fairly new to 'R' and would like advice on the following. I want to > calculate a weekly average number of reports (e.g. of flu, norovirus) based > on the same weeks for the last five years. I will then use this to plot a > chart with 52 points for the average based on the last five years; another > line will then plot the current year, enabling a comparison of current > weekly counts against a five year average for the same week. I would like > some advice on how this can be done in 'R' . My data is disaggregated data > - with dates in the format in 01/01/2018. > > Thanks > > Shakeel Suleman > > > > ************************************************************************** > The information contained in the EMail and any attachments is confidential > and intended solely and for the attention and use of the named > addressee(s). It may not be disclosed to any other person without the > express authority of Public Health England, or the intended recipient, or > both. If you are not the intended recipient, you must not disclose, copy, > distribute or retain this message or any part of it. This footnote also > confirms that this EMail has been swept for computer viruses by > Symantec.Cloud, but please re-sweep any attachments before opening or > saving. http://www.gov.uk/PHE > ************************************************************************** > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Shakeel, Assuming that you are starting with a bunch of dates: # make a vector of character strings that can be converted to dates rep_dates<-paste(sample(1:30,500,TRUE),sample(1:12,500,TRUE), sample(2013:2017,500,TRUE),sep="/") # if this isn't your format, change it date_format<-"%d/%m/%Y" # create a data frame with a column of dates rep_df<-data.frame(rep_dates=as.Date(rep_dates,format=date_format)) # add the week of the year rep_df$rep_week<-format(rep_df$rep_dates,"%V") # add the year rep_df$rep_year<-format(rep_df$rep_dates,"%Y") # get a table of the weekly counts by year rep_tab<-table(rep_df$rep_week,rep_df$rep_year) # get the row means (5 year averages) rep5<-apply(rep_tab,1,mean) # plot the 5 year weekly averages plot(rep5,type="b",ylim=c(0,4),xlab="Week",ylab="Reports per week") # add the 2017 weekly counts points(rep_tab[,5],type="b",col="red") legend(1,4,c("5 yr average","2017"),pch=1,lty=1,col=c("black","red")) Jim On Wed, May 9, 2018 at 4:37 PM, Shakeel Suleman <Shakeel.Suleman at phe.gov.uk> wrote:> Hi, > > I am fairly new to 'R' and would like advice on the following. I want to calculate a weekly average number of reports (e.g. of flu, norovirus) based on the same weeks for the last five years. I will then use this to plot a chart with 52 points for the average based on the last five years; another line will then plot the current year, enabling a comparison of current weekly counts against a five year average for the same week. I would like some advice on how this can be done in 'R' . My data is disaggregated data - with dates in the format in 01/01/2018. > > Thanks > > Shakeel Suleman > > > > ************************************************************************** > The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE > ************************************************************************** > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I would just add, see ?strptime for information about those date format specifications ( "%V" for example), and an introduction to R's handling of date and date-time values. And a few quick examples, to see that %V works as advertised:> format( Sys.Date() , '%V')[1] "19"> format( as.Date('2018-1-1') , '%V')[1] "01"> format( as.Date('2018-1-8') , '%V')[1] "02" -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 ?On 5/9/18, 1:49 AM, "R-help on behalf of Jim Lemon" <r-help-bounces at r-project.org on behalf of drjimlemon at gmail.com> wrote: Hi Shakeel, Assuming that you are starting with a bunch of dates: # make a vector of character strings that can be converted to dates rep_dates<-paste(sample(1:30,500,TRUE),sample(1:12,500,TRUE), sample(2013:2017,500,TRUE),sep="/") # if this isn't your format, change it date_format<-"%d/%m/%Y" # create a data frame with a column of dates rep_df<-data.frame(rep_dates=as.Date(rep_dates,format=date_format)) # add the week of the year rep_df$rep_week<-format(rep_df$rep_dates,"%V") # add the year rep_df$rep_year<-format(rep_df$rep_dates,"%Y") # get a table of the weekly counts by year rep_tab<-table(rep_df$rep_week,rep_df$rep_year) # get the row means (5 year averages) rep5<-apply(rep_tab,1,mean) # plot the 5 year weekly averages plot(rep5,type="b",ylim=c(0,4),xlab="Week",ylab="Reports per week") # add the 2017 weekly counts points(rep_tab[,5],type="b",col="red") legend(1,4,c("5 yr average","2017"),pch=1,lty=1,col=c("black","red")) Jim On Wed, May 9, 2018 at 4:37 PM, Shakeel Suleman <Shakeel.Suleman at phe.gov.uk> wrote: > Hi, > > I am fairly new to 'R' and would like advice on the following. I want to calculate a weekly average number of reports (e.g. of flu, norovirus) based on the same weeks for the last five years. I will then use this to plot a chart with 52 points for the average based on the last five years; another line will then plot the current year, enabling a comparison of current weekly counts against a five year average for the same week. I would like some advice on how this can be done in 'R' . My data is disaggregated data - with dates in the format in 01/01/2018. > > Thanks > > Shakeel Suleman > > > > ************************************************************************** > The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE > ************************************************************************** > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.