Hello, I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support. Here is my data set: Number of parasites per host: parasites = c(0,1,2,3,4,5,6,7,8,9,10) Number of hosts associated with each number of parasites given above: hosts = c(18,20,28,19,16,10,3,1,0,0,0) To represent the Lorenz curve: I manually calculated the cumulative percentage of parasites and hosts: cumul_parasites <- cumsum(parasites)/max(cumsum(parasites)) cumul_hosts <- cumsum(hosts)/max(cumsum(hosts)) plot(cumul_hosts, cumul_parasites, type= "l")>From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ?Thank you very much for your help. Have a nice day Marine [[alternative HTML version deleted]]
> On 30 Mar 2016, at 02:53, Marine Regis <marine.regis at hotmail.fr> wrote: > > Hello, > > I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support. > > Here is my data set: > > Number of parasites per host: > parasites = c(0,1,2,3,4,5,6,7,8,9,10) > > Number of hosts associated with each number of parasites given above: > hosts = c(18,20,28,19,16,10,3,1,0,0,0) > > To represent the Lorenz curve: > I manually calculated the cumulative percentage of parasites and hosts: > > cumul_parasites <- cumsum(parasites)/max(cumsum(parasites)) > cumul_hosts <- cumsum(hosts)/max(cumsum(hosts)) > plot(cumul_hosts, cumul_parasites, type= "l?)Your values in hosts are frequencies. So you need to calculate cumul_hosts = cumsum(hosts)/sum(hosts) cumul_parasites = cumsum(hosts*parasites)/sum(parasites) The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors cumul_hosts = c(0,cumul_hosts) cumul_parasites = c(0,cumul_parasites) plot(cumul_hosts,cum9l_parasites,type=?l?) The Gini coefficient can be calculated as library(reldist) gini(parasites,hosts) If you want to check, you can ?recreate? the original data (number of parasited for each host) with num_parasites = rep(parasites,hosts) and gini(num_parasites) will also give you the Gini coefficient you want.>>> From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ? > > Thank you very much for your help. > Have a nice day > Marine > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 670 bytes Desc: Message signed with OpenPGP using GPGMail URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160330/84c0177d/attachment.bin>
On Wed, 30 Mar 2016, Erich Neuwirth wrote:> >> On 30 Mar 2016, at 02:53, Marine Regis <marine.regis at hotmail.fr> wrote: >> >> Hello, >> >> I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support. >> >> Here is my data set: >> >> Number of parasites per host: >> parasites = c(0,1,2,3,4,5,6,7,8,9,10) >> >> Number of hosts associated with each number of parasites given above: >> hosts = c(18,20,28,19,16,10,3,1,0,0,0) >> >> To represent the Lorenz curve: >> I manually calculated the cumulative percentage of parasites and hosts: >> >> cumul_parasites <- cumsum(parasites)/max(cumsum(parasites)) >> cumul_hosts <- cumsum(hosts)/max(cumsum(hosts)) >> plot(cumul_hosts, cumul_parasites, type= "l?) > > > Your values in hosts are frequencies. So you need to calculate > > cumul_hosts = cumsum(hosts)/sum(hosts) > cumul_parasites = cumsum(hosts*parasites)/sum(parasites)That's what I thought as well but Marine explicitly said that the 'host' are _not_ weights. Hence I was confused what this would actually mean. Using the "ineq" package you can also do plot(Lc(parasites, hosts))> The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors > > cumul_hosts = c(0,cumul_hosts) > cumul_parasites = c(0,cumul_parasites) > > plot(cumul_hosts,cum9l_parasites,type=?l?) > > > The Gini coefficient can be calculated as > library(reldist) > gini(parasites,hosts) > > > If you want to check, you can ?recreate? the original data (number of parasited for each host) with > > num_parasites = rep(parasites,hosts) > > and > gini(num_parasites) > > will also give you the Gini coefficient you want. > > > >> > >>> From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ? >> >> Thank you very much for your help. >> Have a nice day >> Marine >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >