thr3ads.net - R help - [R] weighted average grouped by variables [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Massimo Bressan

2017-Nov-09 11:20 UTC

[R] weighted average grouped by variables

hi all 

I have this dataframe (created as a reproducible example) 

mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class =
"factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), 
.Names = c("date_time", "direction", "type",
"speed", "n_vehicles"),
row.names = c(NA, -7L), 
class = "data.frame") 

mydf 

and I need to get to this final result 

mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000), class = c("POSIXct", "POSIXt"),
tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class =
"factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), 
n_vehicles = c(1153L,69L,45L,23L)), 
.Names = c("date_time", "type",
"weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L), 
class = "data.frame") 

mydf_final 


my question: 
how to compute a weighted mean i.e. "weighted_avg_speed" 
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights)
grouped by "date_time" and "type"? 

to be noted the complication of the case "motorcycle" (not present in
both directions)

any help for that? 

thank you 

max 



	[[alternative HTML version deleted]]

Massimo Bressan

2017-Nov-09 13:16 UTC

head link

[R] weighted average grouped by variables

Hello 

an update about my question: I worked out the following solution (with the
package "dplyr")

library(dplyr) 

mydf%>% 
mutate(speed_vehicles=n_vehicles*mydf$speed) %>% 
group_by(date_time,type) %>% 
summarise( 
sum_n_times_speed=sum(speed_vehicles), 
n_vehicles=sum(n_vehicles), 
vel=sum(speed_vehicles)/sum(n_vehicles) 
) 


In fact I was hoping to manage everything in a "one-go": i.e. without
the need to create the "intermediate" variable called
"speed_vehicles" and with the use of the function weighted.mean()

any hints for a different approach much appreciated 

thanks 



Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> 
A: "r-help" <r-help at r-project.org> 
Inviato: Gioved?, 9 novembre 2017 12:20:52 
Oggetto: weighted average grouped by variables 

hi all 

I have this dataframe (created as a reproducible example) 

mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class =
"factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), 
.Names = c("date_time", "direction", "type",
"speed", "n_vehicles"),
row.names = c(NA, -7L), 
class = "data.frame") 

mydf 

and I need to get to this final result 

mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000), class = c("POSIXct", "POSIXt"),
tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class =
"factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), 
n_vehicles = c(1153L,69L,45L,23L)), 
.Names = c("date_time", "type",
"weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L), 
class = "data.frame") 

mydf_final 


my question: 
how to compute a weighted mean i.e. "weighted_avg_speed" 
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights)
grouped by "date_time" and "type"? 

to be noted the complication of the case "motorcycle" (not present in
both directions)

any help for that? 

thank you 

max 



-- 

------------------------------------------------------------ 
Massimo Bressan 

ARPAV 
Agenzia Regionale per la Prevenzione e 
Protezione Ambientale del Veneto 

Dipartimento Provinciale di Treviso 
Via Santa Barbara, 5/a 
31100 Treviso, Italy 

tel: +39 0422 558545 
fax: +39 0422 558516 
e-mail: massimo.bressan at arpa.veneto.it 
------------------------------------------------------------ 


	[[alternative HTML version deleted]]

Rui Barradas

2017-Nov-09 13:27 UTC

head link

[R] weighted average grouped by variables

Hello,

Using base R only, the following seems to do what you want.

with(mydf, ave(speed, date_time, type, FUN = weighted.mean, w = n_vehicles))


Hope this helps,

Rui Barradas

Em 09-11-2017 13:16, Massimo Bressan escreveu:> Hello
>
> an update about my question: I worked out the following solution (with the
package "dplyr")
>
> library(dplyr)
>
> mydf%>%
> mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
> group_by(date_time,type) %>%
> summarise(
> sum_n_times_speed=sum(speed_vehicles),
> n_vehicles=sum(n_vehicles),
> vel=sum(speed_vehicles)/sum(n_vehicles)
> )
>
>
> In fact I was hoping to manage everything in a "one-go": i.e.
without the need to create the "intermediate" variable called
"speed_vehicles" and with the use of the function weighted.mean()
>
> any hints for a different approach much appreciated
>
> thanks
>
>
>
> Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
> A: "r-help" <r-help at r-project.org>
> Inviato: Gioved?, 9 novembre 2017 12:20:52
> Oggetto: weighted average grouped by variables
>
> hi all
>
> I have this dataframe (created as a reproducible example)
>
> mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""),
> direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label =
c("A", "B"), class = "factor"),
> type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class =
"factor"),
> avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
> n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
> .Names = c("date_time", "direction", "type",
"speed", "n_vehicles"),
> row.names = c(NA, -7L),
> class = "data.frame")
>
> mydf
>
> and I need to get to this final result
>
> mydf_final<-structure(list(date_time = structure(c(1508238000,
1508238000, 1508238000, 1508238000), class = c("POSIXct",
"POSIXt"), tzone = ""),
> type = structure(c(1L, 2L, 3L, 4L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class =
"factor"),
> weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
> n_vehicles = c(1153L,69L,45L,23L)),
> .Names = c("date_time", "type",
"weighted_avg_speed", "n_vehicles"),
> row.names = c(NA, -4L),
> class = "data.frame")
>
> mydf_final
>
>
> my question:
> how to compute a weighted mean i.e. "weighted_avg_speed"
> from "speed" (the values whose weighted mean is to be computed)
and "n_vehicles" (the weights)
> grouped by "date_time" and "type"?
>
> to be noted the complication of the case "motorcycle" (not
present in both directions)
>
> any help for that?
>
> thank you
>
> max
>
>
>

PIKAL Petr

2017-Nov-09 13:58 UTC

head link

[R] weighted average grouped by variables

Hi

Thanks for working example.

you could use split/ lapply approach, however it is probably not much better
than dplyr method.

sapply(split(mydf, mydf$type), function(speed, n_vehicles)
sum(mydf$speed*mydf$n_vehicles)/sum(mydf$n_vehicles))
gives you averages

aggregate(mydf$n_vehicles, list(mydf$type), sum)$x
gives you sums

Cheers
Petr
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo
> Bressan
> Sent: Thursday, November 9, 2017 2:17 PM
> To: r-help <r-help at r-project.org>
> Subject: Re: [R] weighted average grouped by variables
>
> Hello
>
> an update about my question: I worked out the following solution (with the
> package "dplyr")
>
> library(dplyr)
>
> mydf%>%
> mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
> group_by(date_time,type) %>%
> summarise(
> sum_n_times_speed=sum(speed_vehicles),
> n_vehicles=sum(n_vehicles),
> vel=sum(speed_vehicles)/sum(n_vehicles)
> )
>
>
> In fact I was hoping to manage everything in a "one-go": i.e.
without the need
> to create the "intermediate" variable called
"speed_vehicles" and with the use
> of the function weighted.mean()
>
> any hints for a different approach much appreciated
>
> thanks
>
>
>
> Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
> A: "r-help" <r-help at r-project.org>
> Inviato: Gioved?, 9 novembre 2017 12:20:52
> Oggetto: weighted average grouped by variables
>
> hi all
>
> I have this dataframe (created as a reproducible example)
>
> mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
> 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class >
c("POSIXct", "POSIXt"), tzone = ""), direction =
structure(c(1L, 1L, 1L, 1L, 2L, 2L,
> 2L), .Label = c("A", "B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L,
> 2L, 3L), .Label = c("car", "light_duty",
"heavy_duty", "motorcycle"), class > "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333,
> 40.3157894736842, 36.0869565217391, 33.4065155807365,
> 37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L,
26L)),
> .Names = c("date_time", "direction", "type",
"speed", "n_vehicles"), row.names
> = c(NA, -7L), class = "data.frame")
>
> mydf
>
> and I need to get to this final result
>
> mydf_final<-structure(list(date_time = structure(c(1508238000,
1508238000,
> 1508238000, 1508238000), class = c("POSIXct",
"POSIXt"), tzone = ""), type > structure(c(1L, 2L, 3L,
4L), .Label = c("car", "light_duty", "heavy_duty",
> "motorcycle"), class = "factor"), weighted_avg_speed =
c(36.39029, 38.56521,
> 37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names >
c("date_time", "type", "weighted_avg_speed",
"n_vehicles"), row.names > c(NA, -4L), class =
"data.frame")
>
> mydf_final
>
>
> my question:
> how to compute a weighted mean i.e. "weighted_avg_speed"
> from "speed" (the values whose weighted mean is to be computed)
and
> "n_vehicles" (the weights) grouped by "date_time" and
"type"?
>
> to be noted the complication of the case "motorcycle" (not
present in both
> directions)
>
> any help for that?
>
> thank you
>
> max
>
>
>
> --
>
> ------------------------------------------------------------
> Massimo Bressan
>
> ARPAV
> Agenzia Regionale per la Prevenzione e
> Protezione Ambientale del Veneto
>
> Dipartimento Provinciale di Treviso
> Via Santa Barbara, 5/a
> 31100 Treviso, Italy
>
> tel: +39 0422 558545
> fax: +39 0422 558516
> e-mail: massimo.bressan at arpa.veneto.it
> ------------------------------------------------------------
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny
pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho
odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho
syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i
zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a
to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce
s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m
dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost
??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn?
pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn?
osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi
?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender.
Delete the contents of this e-mail with all attachments and its copies from your
system.
If you are not the intended recipient of this e-mail, you are not authorized to
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately
accept such offer; The sender of this e-mail (offer) excludes any acceptance of
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into
any contracts on behalf of the company except for cases in which he/she is
expressly authorized to do so in writing, and such authorization or power of
attorney is submitted to the recipient or the person represented by the
recipient, or the existence of such authorization is known to the recipient of
the person represented by the recipient.

Thierry Onkelinx

2017-Nov-09 14:17 UTC

head link

[R] weighted average grouped by variables

Dear Massimo,

It seems straightforward to use weighted.mean() in a dplyr context

library(dplyr)
mydf %>%
  group_by(date_time, type) %>%
  summarise(vel = weighted.mean(speed, n_vehicles))

Best regards,



ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Kliniekstraat 25, B-1070 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

[image: Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging
in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf
dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.]
<https://overheid.vlaanderen.be/mobiliteitsplan-herman-teirlinckgebouw>
Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000
Brussel.

///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>

2017-11-09 14:16 GMT+01:00 Massimo Bressan <massimo.bressan at
arpa.veneto.it>:
> Hello
>
> an update about my question: I worked out the following solution (with the
> package "dplyr")
>
> library(dplyr)
>
> mydf%>%
> mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
> group_by(date_time,type) %>%
> summarise(
> sum_n_times_speed=sum(speed_vehicles),
> n_vehicles=sum(n_vehicles),
> vel=sum(speed_vehicles)/sum(n_vehicles)
> )
>
>
> In fact I was hoping to manage everything in a "one-go": i.e.
without the
> need to create the "intermediate" variable called
"speed_vehicles" and with
> the use of the function weighted.mean()
>
> any hints for a different approach much appreciated
>
> thanks
>
>
>
> Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
> A: "r-help" <r-help at r-project.org>
> Inviato: Gioved?, 9 novembre 2017 12:20:52
> Oggetto: weighted average grouped by variables
>
> hi all
>
> I have this dataframe (created as a reproducible example)
>
> mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
> 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class >
c("POSIXct", "POSIXt"), tzone = ""),
> direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label =
c("A", "B"),
> class = "factor"),
> type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
> "light_duty", "heavy_duty", "motorcycle"),
class = "factor"),
> avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
> 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
> n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
> .Names = c("date_time", "direction", "type",
"speed", "n_vehicles"),
> row.names = c(NA, -7L),
> class = "data.frame")
>
> mydf
>
> and I need to get to this final result
>
> mydf_final<-structure(list(date_time = structure(c(1508238000,
> 1508238000, 1508238000, 1508238000), class = c("POSIXct",
"POSIXt"), tzone
> = ""),
> type = structure(c(1L, 2L, 3L, 4L), .Label = c("car",
"light_duty",
> "heavy_duty", "motorcycle"), class =
"factor"),
> weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
> n_vehicles = c(1153L,69L,45L,23L)),
> .Names = c("date_time", "type",
"weighted_avg_speed", "n_vehicles"),
> row.names = c(NA, -4L),
> class = "data.frame")
>
> mydf_final
>
>
> my question:
> how to compute a weighted mean i.e. "weighted_avg_speed"
> from "speed" (the values whose weighted mean is to be computed)
and
> "n_vehicles" (the weights)
> grouped by "date_time" and "type"?
>
> to be noted the complication of the case "motorcycle" (not
present in both
> directions)
>
> any help for that?
>
> thank you
>
> max
>
>
>
> --
>
> ------------------------------------------------------------
> Massimo Bressan
>
> ARPAV
> Agenzia Regionale per la Prevenzione e
> Protezione Ambientale del Veneto
>
> Dipartimento Provinciale di Treviso
> Via Santa Barbara, 5/a
> 31100 Treviso, Italy
>
> tel: +39 0422 558545
> fax: +39 0422 558516
> e-mail: massimo.bressan at arpa.veneto.it
> ------------------------------------------------------------
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more seemingly similar threads

R help - Nov 2017 - weighted average grouped by variables

[R] weighted average grouped by variables

[R] weighted average grouped by variables

[R] weighted average grouped by variables

[R] weighted average grouped by variables

[R] weighted average grouped by variables

Maybe Matching Threads