Dear All, I am making my baby steps with the tidyverse purr package and I am stuck with some probably trivial tasks. Consider the following data set zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081, 23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942, 34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306, 14207780.264), relation = c("EU28-Algeria", "EU28-Algeria", "EU28-Algeria", "EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria", "World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467, 0.0271163231905857, -0.0573261107603093, -0.000504474880914325, 0.614846575418334, 0.0272549232650638, -0.0156418673197543, 0.0326138831530727, 0.428272657063707, 0.0275142592018328, 0.0623237165799383, 0.0875811837579971 )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" )), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171, 7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887, 16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207 ), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"), g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081, -0.120399959882366, 0.124744629514854, -0.0721097823643728, -0.0202454077789513, -0.174521376957825, 0.146712116047648, -0.0146912579338002, 0.0163501051368976, -0.206837670383671 )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" ))) I am capable of doing very simple stuff with maps for instance taking the iteratively the mean of a certain column map(zz, function(x) mean(x$tot_i)) or filtering the values of the years map(zz, function(x) filter(x, year==2000)) however, I bang my head against the wall as soon as I want to add a bit of complexity. For instance 1) I want to iteratively group the data in zz by relation and summarise them by taking the average of tot_i and 2) Given a list of years ll<-list(c(2000, 2001), c(2001, 2003)) I would like to filter the two elements of the zz list according to the years listed in ll. I would then have plenty of other operations to carry out on the data, but already understanding 1 and 2 would take me a long way from where I am stuck now. Any suggestion is welcome. Cheers Lorenzo
Does this answer the first question?> rel <- map(zz, function(x){+ group_by(x, relation) %>% summarise(tot = mean(tot_i)) + })> rel[[1]] # A tibble: 3 x 2 relation tot <chr> <dbl> 1 EU28-Algeria 22186767. 2 Extra EU28-Algeria 12884156. 3 World-Algeria 35070922. [[2]] # A tibble: 3 x 2 relation tot <chr> <dbl> 1 EU28-Egypt 7692530. 2 Extra EU28-Egypt 11494855. 3 World-Egypt 19187385.>Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > I am making my baby steps with the tidyverse purr package and I am > stuck with some probably trivial tasks. > Consider the following data set > > > zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, > 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081, > 23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942, > 34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306, > 14207780.264), relation = c("EU28-Algeria", "EU28-Algeria", > "EU28-Algeria", > "EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria", > "World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria", > "Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467, > 0.0271163231905857, -0.0573261107603093, -0.000504474880914325, > 0.614846575418334, 0.0272549232650638, -0.0156418673197543, > 0.0326138831530727, > 0.428272657063707, 0.0275142592018328, 0.0623237165799383, > 0.0875811837579971 > )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" > )), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, > 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171, > 7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887, > 16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207 > ), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt", > "World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra > EU28-Egypt", > "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"), > g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081, > -0.120399959882366, 0.124744629514854, -0.0721097823643728, > -0.0202454077789513, -0.174521376957825, 0.146712116047648, > -0.0146912579338002, 0.0163501051368976, -0.206837670383671 > )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" > ))) > > I am capable of doing very simple stuff with maps for instance taking the > iteratively the mean of a certain column > > map(zz, function(x) mean(x$tot_i)) > > or filtering the values of the years > > map(zz, function(x) filter(x, year==2000)) > > however, I bang my head against the wall as soon as I want to add a bit of > complexity. For instance > > 1) I want to iteratively group the data in zz by relation and summarise > them by taking the average of tot_i and > > 2) Given a list of years > > ll<-list(c(2000, 2001), c(2001, 2003)) > > I would like to filter the two elements of the zz list according to the > years listed in ll. > > I would then have plenty of other operations to carry out on the data, but > already understanding 1 and 2 would take me a long way from where I am > stuck now. > > Any suggestion is welcome. > Cheers > > Lorenzo > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Try this for the second question:> years <- map2(zz,+ list(c(2000, 2001), c(2001, 2003)), + ~ filter(.x, year %in% .y) + )> years[[1]] # A tibble: 6 x 4 year tot_i relation g_rate <dbl> <dbl> <chr> <dbl> 1 2000 22393349. EU28-Algeria 0.736 2 2001 23000574. EU28-Algeria 0.0271 3 2000 34361300. World-Algeria 0.615 4 2001 35297815. World-Algeria 0.0273 5 2000 11967951. Extra EU28-Algeria 0.428 6 2001 12297241. Extra EU28-Algeria 0.0275 [[2]] # A tibble: 6 x 4 year tot_i relation g_rate <dbl> <dbl> <chr> <dbl> 1 2001 7869288. EU28-Egypt -0.148 2 2003 6395999. EU28-Egypt -0.120 3 2001 19851236. World-Egypt -0.0721 4 2003 16055014. World-Egypt -0.175 5 2001 11981948. Extra EU28-Egypt -0.0147 6 2003 9659015. Extra EU28-Egypt -0.207>Jim Holtman *Data Munger Guru* *What is the problem that you are trying to solve?Tell me what you want to do, not how you want to do it.* On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > I am making my baby steps with the tidyverse purr package and I am > stuck with some probably trivial tasks. > Consider the following data set > > > zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, > 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081, > 23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942, > 34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306, > 14207780.264), relation = c("EU28-Algeria", "EU28-Algeria", > "EU28-Algeria", > "EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria", > "World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria", > "Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467, > 0.0271163231905857, -0.0573261107603093, -0.000504474880914325, > 0.614846575418334, 0.0272549232650638, -0.0156418673197543, > 0.0326138831530727, > 0.428272657063707, 0.0275142592018328, 0.0623237165799383, > 0.0875811837579971 > )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" > )), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, > 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171, > 7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887, > 16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207 > ), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt", > "World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra > EU28-Egypt", > "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"), > g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081, > -0.120399959882366, 0.124744629514854, -0.0721097823643728, > -0.0202454077789513, -0.174521376957825, 0.146712116047648, > -0.0146912579338002, 0.0163501051368976, -0.206837670383671 > )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" > ))) > > I am capable of doing very simple stuff with maps for instance taking the > iteratively the mean of a certain column > > map(zz, function(x) mean(x$tot_i)) > > or filtering the values of the years > > map(zz, function(x) filter(x, year==2000)) > > however, I bang my head against the wall as soon as I want to add a bit of > complexity. For instance > > 1) I want to iteratively group the data in zz by relation and summarise > them by taking the average of tot_i and > > 2) Given a list of years > > ll<-list(c(2000, 2001), c(2001, 2003)) > > I would like to filter the two elements of the zz list according to the > years listed in ll. > > I would then have plenty of other operations to carry out on the data, but > already understanding 1 and 2 would take me a long way from where I am > stuck now. > > Any suggestion is welcome. > Cheers > > Lorenzo > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Jim, Thanks a lot for your stellar replies! They address my questions perfectly. Cheers Lorenzo On Fri, Jan 25, 2019 at 07:46:50AM -0800, jim holtman wrote:>Try this for the second question: > >> years <- map2(zz, >+ list(c(2000, 2001), c(2001, 2003)), >+ ~ filter(.x, year %in% .y) >+ ) >> years >[[1]] ># A tibble: 6 x 4 > year tot_i relation g_rate > <dbl> <dbl> <chr> <dbl> >1 2000 22393349. EU28-Algeria 0.736 >2 2001 23000574. EU28-Algeria 0.0271 >3 2000 34361300. World-Algeria 0.615 >4 2001 35297815. World-Algeria 0.0273 >5 2000 11967951. Extra EU28-Algeria 0.428 >6 2001 12297241. Extra EU28-Algeria 0.0275 > >[[2]] ># A tibble: 6 x 4 > year tot_i relation g_rate > <dbl> <dbl> <chr> <dbl> >1 2001 7869288. EU28-Egypt -0.148 >2 2003 6395999. EU28-Egypt -0.120 >3 2001 19851236. World-Egypt -0.0721 >4 2003 16055014. World-Egypt -0.175 >5 2001 11981948. Extra EU28-Egypt -0.0147 >6 2003 9659015. Extra EU28-Egypt -0.207 > >> > >Jim Holtman >*Data Munger Guru* > > >*What is the problem that you are trying to solve?Tell me what you want to >do, not how you want to do it.* > > >On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella <lorenzo.isella at gmail.com> >wrote: > >> Dear All, >> I am making my baby steps with the tidyverse purr package and I am >> stuck with some probably trivial tasks. >> Consider the following data set >> >> >> zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, >> 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081, >> 23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942, >> 34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306, >> 14207780.264), relation = c("EU28-Algeria", "EU28-Algeria", >> "EU28-Algeria", >> "EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria", >> "World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria", >> "Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467, >> 0.0271163231905857, -0.0573261107603093, -0.000504474880914325, >> 0.614846575418334, 0.0272549232650638, -0.0156418673197543, >> 0.0326138831530727, >> 0.428272657063707, 0.0275142592018328, 0.0623237165799383, >> 0.0875811837579971 >> )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" >> )), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, >> 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171, >> 7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887, >> 16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207 >> ), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt", >> "World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra >> EU28-Egypt", >> "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"), >> g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081, >> -0.120399959882366, 0.124744629514854, -0.0721097823643728, >> -0.0202454077789513, -0.174521376957825, 0.146712116047648, >> -0.0146912579338002, 0.0163501051368976, -0.206837670383671 >> )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame" >> ))) >> >> I am capable of doing very simple stuff with maps for instance taking the >> iteratively the mean of a certain column >> >> map(zz, function(x) mean(x$tot_i)) >> >> or filtering the values of the years >> >> map(zz, function(x) filter(x, year==2000)) >> >> however, I bang my head against the wall as soon as I want to add a bit of >> complexity. For instance >> >> 1) I want to iteratively group the data in zz by relation and summarise >> them by taking the average of tot_i and >> >> 2) Given a list of years >> >> ll<-list(c(2000, 2001), c(2001, 2003)) >> >> I would like to filter the two elements of the zz list according to the >> years listed in ll. >> >> I would then have plenty of other operations to carry out on the data, but >> already understanding 1 and 2 would take me a long way from where I am >> stuck now. >> >> Any suggestion is welcome. >> Cheers >> >> Lorenzo >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>