hi all I have this dataframe (created as a reproducible example) mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names = c(NA, -7L), class = "data.frame") mydf and I need to get to this final result mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names = c(NA, -4L), class = "data.frame") mydf_final my question: how to compute a weighted mean i.e. "weighted_avg_speed" from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) grouped by "date_time" and "type"? to be noted the complication of the case "motorcycle" (not present in both directions) any help for that? thank you max [[alternative HTML version deleted]]
Hello an update about my question: I worked out the following solution (with the package "dplyr") library(dplyr) mydf%>% mutate(speed_vehicles=n_vehicles*mydf$speed) %>% group_by(date_time,type) %>% summarise( sum_n_times_speed=sum(speed_vehicles), n_vehicles=sum(n_vehicles), vel=sum(speed_vehicles)/sum(n_vehicles) ) In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean() any hints for a different approach much appreciated thanks Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> A: "r-help" <r-help at r-project.org> Inviato: Gioved?, 9 novembre 2017 12:20:52 Oggetto: weighted average grouped by variables hi all I have this dataframe (created as a reproducible example) mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names = c(NA, -7L), class = "data.frame") mydf and I need to get to this final result mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names = c(NA, -4L), class = "data.frame") mydf_final my question: how to compute a weighted mean i.e. "weighted_avg_speed" from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) grouped by "date_time" and "type"? to be noted the complication of the case "motorcycle" (not present in both directions) any help for that? thank you max -- ------------------------------------------------------------ Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bressan at arpa.veneto.it ------------------------------------------------------------ [[alternative HTML version deleted]]
Hello, Using base R only, the following seems to do what you want. with(mydf, ave(speed, date_time, type, FUN = weighted.mean, w = n_vehicles)) Hope this helps, Rui Barradas Em 09-11-2017 13:16, Massimo Bressan escreveu:> Hello > > an update about my question: I worked out the following solution (with the package "dplyr") > > library(dplyr) > > mydf%>% > mutate(speed_vehicles=n_vehicles*mydf$speed) %>% > group_by(date_time,type) %>% > summarise( > sum_n_times_speed=sum(speed_vehicles), > n_vehicles=sum(n_vehicles), > vel=sum(speed_vehicles)/sum(n_vehicles) > ) > > > In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean() > > any hints for a different approach much appreciated > > thanks > > > > Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> > A: "r-help" <r-help at r-project.org> > Inviato: Gioved?, 9 novembre 2017 12:20:52 > Oggetto: weighted average grouped by variables > > hi all > > I have this dataframe (created as a reproducible example) > > mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), > direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), > type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), > avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5), > n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), > .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), > row.names = c(NA, -7L), > class = "data.frame") > > mydf > > and I need to get to this final result > > mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), > type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), > weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), > n_vehicles = c(1153L,69L,45L,23L)), > .Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), > row.names = c(NA, -4L), > class = "data.frame") > > mydf_final > > > my question: > how to compute a weighted mean i.e. "weighted_avg_speed" > from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) > grouped by "date_time" and "type"? > > to be noted the complication of the case "motorcycle" (not present in both directions) > > any help for that? > > thank you > > max > > >
Hi Thanks for working example. you could use split/ lapply approach, however it is probably not much better than dplyr method. sapply(split(mydf, mydf$type), function(speed, n_vehicles) sum(mydf$speed*mydf$n_vehicles)/sum(mydf$n_vehicles)) gives you averages aggregate(mydf$n_vehicles, list(mydf$type), sum)$x gives you sums Cheers Petr> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo > Bressan > Sent: Thursday, November 9, 2017 2:17 PM > To: r-help <r-help at r-project.org> > Subject: Re: [R] weighted average grouped by variables > > Hello > > an update about my question: I worked out the following solution (with the > package "dplyr") > > library(dplyr) > > mydf%>% > mutate(speed_vehicles=n_vehicles*mydf$speed) %>% > group_by(date_time,type) %>% > summarise( > sum_n_times_speed=sum(speed_vehicles), > n_vehicles=sum(n_vehicles), > vel=sum(speed_vehicles)/sum(n_vehicles) > ) > > > In fact I was hoping to manage everything in a "one-go": i.e. without the need > to create the "intermediate" variable called "speed_vehicles" and with the use > of the function weighted.mean() > > any hints for a different approach much appreciated > > thanks > > > > Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> > A: "r-help" <r-help at r-project.org> > Inviato: Gioved?, 9 novembre 2017 12:20:52 > Oggetto: weighted average grouped by variables > > hi all > > I have this dataframe (created as a reproducible example) > > mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, > 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class > c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, > 2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L, > 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class > "factor"), avg_speed = c(41.1029082774049, 40.3333333333333, > 40.3157894736842, 36.0869565217391, 33.4065155807365, > 37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), > .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names > = c(NA, -7L), class = "data.frame") > > mydf > > and I need to get to this final result > > mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, > 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type > structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", > "motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521, > 37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names > c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names > c(NA, -4L), class = "data.frame") > > mydf_final > > > my question: > how to compute a weighted mean i.e. "weighted_avg_speed" > from "speed" (the values whose weighted mean is to be computed) and > "n_vehicles" (the weights) grouped by "date_time" and "type"? > > to be noted the complication of the case "motorcycle" (not present in both > directions) > > any help for that? > > thank you > > max > > > > -- > > ------------------------------------------------------------ > Massimo Bressan > > ARPAV > Agenzia Regionale per la Prevenzione e > Protezione Ambientale del Veneto > > Dipartimento Provinciale di Treviso > Via Santa Barbara, 5/a > 31100 Treviso, Italy > > tel: +39 0422 558545 > fax: +39 0422 558516 > e-mail: massimo.bressan at arpa.veneto.it > ------------------------------------------------------------ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
Dear Massimo, It seems straightforward to use weighted.mean() in a dplyr context library(dplyr) mydf %>% group_by(date_time, type) %>% summarise(vel = weighted.mean(speed, n_vehicles)) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// [image: Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.] <https://overheid.vlaanderen.be/mobiliteitsplan-herman-teirlinckgebouw> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> 2017-11-09 14:16 GMT+01:00 Massimo Bressan <massimo.bressan at arpa.veneto.it>:> Hello > > an update about my question: I worked out the following solution (with the > package "dplyr") > > library(dplyr) > > mydf%>% > mutate(speed_vehicles=n_vehicles*mydf$speed) %>% > group_by(date_time,type) %>% > summarise( > sum_n_times_speed=sum(speed_vehicles), > n_vehicles=sum(n_vehicles), > vel=sum(speed_vehicles)/sum(n_vehicles) > ) > > > In fact I was hoping to manage everything in a "one-go": i.e. without the > need to create the "intermediate" variable called "speed_vehicles" and with > the use of the function weighted.mean() > > any hints for a different approach much appreciated > > thanks > > > > Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> > A: "r-help" <r-help at r-project.org> > Inviato: Gioved?, 9 novembre 2017 12:20:52 > Oggetto: weighted average grouped by variables > > hi all > > I have this dataframe (created as a reproducible example) > > mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, > 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class > c("POSIXct", "POSIXt"), tzone = ""), > direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), > class = "factor"), > type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", > "light_duty", "heavy_duty", "motorcycle"), class = "factor"), > avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, > 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5), > n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), > .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), > row.names = c(NA, -7L), > class = "data.frame") > > mydf > > and I need to get to this final result > > mydf_final<-structure(list(date_time = structure(c(1508238000, > 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone > = ""), > type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", > "heavy_duty", "motorcycle"), class = "factor"), > weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), > n_vehicles = c(1153L,69L,45L,23L)), > .Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), > row.names = c(NA, -4L), > class = "data.frame") > > mydf_final > > > my question: > how to compute a weighted mean i.e. "weighted_avg_speed" > from "speed" (the values whose weighted mean is to be computed) and > "n_vehicles" (the weights) > grouped by "date_time" and "type"? > > to be noted the complication of the case "motorcycle" (not present in both > directions) > > any help for that? > > thank you > > max > > > > -- > > ------------------------------------------------------------ > Massimo Bressan > > ARPAV > Agenzia Regionale per la Prevenzione e > Protezione Ambientale del Veneto > > Dipartimento Provinciale di Treviso > Via Santa Barbara, 5/a > 31100 Treviso, Italy > > tel: +39 0422 558545 > fax: +39 0422 558516 > e-mail: massimo.bressan at arpa.veneto.it > ------------------------------------------------------------ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]