Tariq Khasiri
2022-Oct-02 23:04 UTC
[R] Creating a year-month indicator and groupby with category
Hello, I have the following data. I want to show in a line plot how each different company is earning over the timeline of my data sample. I'm not sure how I can create the *year-month indicator* to plot it nicely in my horizontal axis out of my dataset. After creating the *year-month* indicator ( which will be in my x axis) I want to create a dataframe where I can groupby companies over the year-month indicator by putting *share *in the y axis as variables. ### data is like the following dput(dat) structure(list(year = c(2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019), month = c(12, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5), company = c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH" ), share = c(20, 16.5, 15, 15.5, 15.5, 16, 17, 16.5, 61, 55, 53, 53, 54, 53, 58, 54, 50, 47, 55, 50, 52, 51, 51.5, 52, 53, 54, 55, 53, 54, 50, 42, 48, 41, 40, 39, 36.5, 35), com_name = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), row.names = c(NA, -37L), spec = structure(list(cols = list(year = structure(list(), class c("collector_double", "collector")), month = structure(list(), class = c("collector_double", "collector")), company = structure(list(), class = c("collector_character", "collector")), share = structure(list(), class = c("collector_double", "collector")), com_name = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x7fd732028680>, class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame")) [[alternative HTML version deleted]]
Rui Barradas
2022-Oct-03 05:47 UTC
[R] Creating a year-month indicator and groupby with category
Hello, First of all, I'll repost the data at end because the OP posted with a pointer ref: problems = <pointer: 0x7fd732028680> and this must be removed for the dput output to run. Suggestion: coerce to class "data.frame" and post the output of dput(as.data.frame(dat)) Now the plot. Here are two plots of share by date, grouped by company. One with base R graphics and the other one with package ggplot2. Create a date/time column to be used by both plots. dat$date <- with(dat, ISOdate(year, month, 1)) 1) Base R plot. ylim <- range(dat$share) + c(0, 2) # make room for the legend on top comp <- unique(dat$company) # draw each line in a loop on companies # open a blank plot witth all the data, # setting the ylim as explained above plot(share ~ date, dat, type = "n", ylim = ylim) for(i in seq_along(comp)) { lines(share ~ date, subset(dat, company == comp[i]), col = i) } legend("top", legend = comp, col = seq_along(comp), lty = "solid", horiz = TRUE) 2) ggplot2 plot. library(ggplot2) ggplot(dat, aes(date, share, color = company)) + geom_line() + scale_x_datetime(date_labels = "%Y-%m") + scale_color_manual(values = c(ABC = "black", FGH = "red")) + theme_bw() 3) The data, reposted with the new pipe operator introduced in R 4.1.0 to make it look modern and slightly edited. dat |> as.data.frame() |> dput() dat <- structure(list(year = c(2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019), month = c(12, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5), company = c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH" ), share = c(20, 16.5, 15, 15.5, 15.5, 16, 17, 16.5, 61, 55, 53, 53, 54, 53, 58, 54, 50, 47, 55, 50, 52, 51, 51.5, 52, 53, 54, 55, 53, 54, 50, 42, 48, 41, 40, 39, 36.5, 35), com_name = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), date = structure(c(1543665600, 1546344000, 1549022400, 1551441600, 1554120000, 1556712000, 1559390400, 1561982400, 1483272000, 1485950400, 1488369600, 1491048000, 1493640000, 1496318400, 1498910400, 1501588800, 1504267200, 1506859200, 1509537600, 1512129600, 1514808000, 1517486400, 1519905600, 1522584000, 1525176000, 1527854400, 1530446400, 1533124800, 1535803200, 1538395200, 1541073600, 1543665600, 1546344000, 1549022400, 1551441600, 1554120000, 1556712000 ), class = c("POSIXct", "POSIXt"), tzone = "GMT")), row.names = c(NA, -37L), class = "data.frame") Hope this helps, Rui Barradas ?s 00:04 de 03/10/2022, Tariq Khasiri escreveu:> Hello, > > I have the following data. I want to show in a line plot how each different > company is earning over the timeline of my data sample. > > I'm not sure how I can create the *year-month indicator* to plot it nicely > in my horizontal axis out of my dataset. > > After creating the *year-month* indicator ( which will be in my x axis) I > want to create a dataframe where I can groupby companies over the > year-month indicator by putting *share *in the y axis as variables. > > ### data is like the following > > dput(dat) > structure(list(year = c(2018, 2019, 2019, 2019, 2019, 2019, 2019, > 2019, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, > 2017, 2017, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, > 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019), month = c(12, > 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, > 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5), company = c("ABC", > "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "FGH", "FGH", > "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", > "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", > "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH" > ), share = c(20, 16.5, 15, 15.5, 15.5, 16, 17, 16.5, 61, 55, > 53, 53, 54, 53, 58, 54, 50, 47, 55, 50, 52, 51, 51.5, 52, 53, > 54, 55, 53, 54, 50, 42, 48, 41, 40, 39, 36.5, 35), com_name = c(1, > 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), row.names = c(NA, > -37L), spec = structure(list(cols = list(year = structure(list(), class > c("collector_double", > "collector")), month = structure(list(), class = c("collector_double", > "collector")), company = structure(list(), class = c("collector_character", > "collector")), share = structure(list(), class = c("collector_double", > "collector")), com_name = structure(list(), class = c("collector_double", > "collector"))), default = structure(list(), class = c("collector_guess", > "collector")), delim = ","), class = "col_spec"), problems = <pointer: > 0x7fd732028680>, class = c("spec_tbl_df", > "tbl_df", "tbl", "data.frame")) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2022-Oct-03 07:45 UTC
[R] Creating a year-month indicator and groupby with category
Hi Tariq, There were a couple of glitches in your data structure. Here's an example of a simple plot: dat<-structure(list(year = c(2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019), month = c(12, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5), company = c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH" ), share = c(20, 16.5, 15, 15.5, 15.5, 16, 17, 16.5, 61, 55, 53, 53, 54, 53, 58, 54, 50, 47, 55, 50, 52, 51, 51.5, 52, 53, 54, 55, 53, 54, 50, 42, 48, 41, 40, 39, 36.5, 35), com_name = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), row.names = c(NA, -37L), spec = structure(list(cols = list(year = structure(list(), class c("collector_double", "collector")), month = structure(list(), class = c("collector_double", "collector")), company = structure(list(), class = c("collector_character", "collector")), share = structure(list(), class = c("collector_double", "collector")), com_name = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame")) # convert year and month fields to dates about the middle of each month dat$date<-as.Date(paste(dat$year,dat$month,15,sep="-"),"%Y-%m-%d") # plot the values for one company plot(dat$date[dat$company=="ABC"],dat$share[dat$company=="ABC"], main="Plot of dat",xlab="Year",ylab="Share", xlim=range(dat$date),ylim=range(dat$share), type="l",col="red") # add a line for the other one lines(dat$date[dat$company=="FGH"],dat$share[dat$company=="FGH"],col="green") # get the x plot limits as they are date values xspan<-par("usr")[1:2] # place a legend about in the middle of the plot legend(xspan[1]+diff(xspan)*0.3,35,c("ABC","FGH"),lty=1,col=c("red","green")) There are many more elegant ways to plot something like this. Jim On Mon, Oct 3, 2022 at 10:05 AM Tariq Khasiri <tariqkhasiri at gmail.com> wrote:> > Hello, > > I have the following data. I want to show in a line plot how each different > company is earning over the timeline of my data sample. > > I'm not sure how I can create the *year-month indicator* to plot it nicely > in my horizontal axis out of my dataset. > > After creating the *year-month* indicator ( which will be in my x axis) I > want to create a dataframe where I can groupby companies over the > year-month indicator by putting *share *in the y axis as variables. > > ### data is like the following > > dput(dat) > structure(list(year = c(2018, 2019, 2019, 2019, 2019, 2019, 2019, > 2019, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, > 2017, 2017, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, > 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019), month = c(12, > 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, > 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5), company = c("ABC", > "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "FGH", "FGH", > "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", > "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", > "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH", "FGH" > ), share = c(20, 16.5, 15, 15.5, 15.5, 16, 17, 16.5, 61, 55, > 53, 53, 54, 53, 58, 54, 50, 47, 55, 50, 52, 51, 51.5, 52, 53, > 54, 55, 53, 54, 50, 42, 48, 41, 40, 39, 36.5, 35), com_name = c(1, > 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), row.names = c(NA, > -37L), spec = structure(list(cols = list(year = structure(list(), class > c("collector_double", > "collector")), month = structure(list(), class = c("collector_double", > "collector")), company = structure(list(), class = c("collector_character", > "collector")), share = structure(list(), class = c("collector_double", > "collector")), com_name = structure(list(), class = c("collector_double", > "collector"))), default = structure(list(), class = c("collector_guess", > "collector")), delim = ","), class = "col_spec"), problems = <pointer: > 0x7fd732028680>, class = c("spec_tbl_df", > "tbl_df", "tbl", "data.frame")) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.