Hello I've got a problem I don't know how to solve. I have got a dataset that contains age intervals (age groups) of people and the number of persons in each age group each year (y1994-y1996). The number of persons varies each year. I only have access to the age intervals, not the age of each person, which would make things easier. I want to know the median age interval (not the median number) for each year. Let's say that in y1994 23 corresponds to the median age interval "45-54", I want to "45-54" as a result. How is that done? This is the sample dataset: agegrp <- c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74","75-84","84-") y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) y1996 <- c(1,3,1,4,22,37,41,24,24,26,8) I look forward to your response Best regards, Erik Svensson [[alternative HTML version deleted]]
Dear Erik, There may be more elegant solutions, but try this: a. Create a data frame with your data, for example data <- data.frame(agegrp, y1994, y1995, y1996) b. Then use the which function: as.character(aa$agegrp[which(aa$y1994==23)]) Hope it helps, Jos? Prof. Jos? Iparraguirre Chief Economist Age UK -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Erik B Svensson Sent: 12 January 2015 10:33 To: r-help at r-project.org Subject: [R] Calculate the median age interval Hello I've got a problem I don't know how to solve. I have got a dataset that contains age intervals (age groups) of people and the number of persons in each age group each year (y1994-y1996). The number of persons varies each year. I only have access to the age intervals, not the age of each person, which would make things easier. I want to know the median age interval (not the median number) for each year. Let's say that in y1994 23 corresponds to the median age interval "45-54", I want to "45-54" as a result. How is that done? This is the sample dataset: agegrp <- c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74","75-84","84-") y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) y1996 <- c(1,3,1,4,22,37,41,24,24,26,8) I look forward to your response Best regards, Erik Svensson [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. www.ageuk.org.uk Age UK Group Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798) Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited. Age UK Enterprises Limited is authorised and regulated by the Financial Conduct Authority. Charitable Services are offered through Age UK (the Charity) and commercial products and services are offered by the Charity?s subsidiary companies. The Age UK Group comprises of Age UK, and its subsidiary companies and charities, dedicated to improving the lives of people in later life. Our network includes the three national charities Age Cymru, Age NI and Age Scotland and more than 160 local Age UK charities. This email and any files transmitted with it are confide...{{dropped:12}}
On 12-Jan-2015 10:32:41 Erik B Svensson wrote:> Hello > I've got a problem I don't know how to solve. I have got a dataset that > contains age intervals (age groups) of people and the number of persons in > each age group each year (y1994-y1996). The number of persons varies each > year. I only have access to the age intervals, not the age of each person, > which would make things easier. > > I want to know the median age interval (not the median number) for each > year. Let's say that in y1994 23 corresponds to the median age interval > "45-54", I want to "45-54" as a result. How is that done? > > This is the sample dataset: >agegrp <- c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74", "75-84","84-") y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) y1996 <- c(1,3,1,4,22,37,41,24,24,26,8)> I look forward to your response > > Best regards, > Erik SvenssonIn principle, this is straightforward. But in practice you may need to be careful about how to deal with borderline cases -- and about what you mean by "median age interval". The underlying idea is based on: cumsum(y1994)/sum(y1994) # [1] 0.00000000 0.02183406 0.05240175 0.09170306 0.20087336 # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.00000000 Thus age intervals 1-7 ("<1" - "45-64") contain less that 50% (0.49344978...), though "45-64" almost gets there. However, age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence the median age is within "49-64". Implementing the above as a procedure: agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)] # [1] "55-64" Note that the "obvious solution": agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))] # [1] "45-54" gives an incorrect answer, since with these data it returns a group whose maximum age is below the median. This is because the "<=" is satisfied by "<" also. Hoping this helps! Ted. ------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> Date: 12-Jan-2015 Time: 11:12:39 This message was sent by XFMail
Sorry, a typo in my reply below. See at "#######". On 12-Jan-2015 11:12:43 Ted Harding wrote:> On 12-Jan-2015 10:32:41 Erik B Svensson wrote: >> Hello >> I've got a problem I don't know how to solve. I have got a dataset that >> contains age intervals (age groups) of people and the number of persons in >> each age group each year (y1994-y1996). The number of persons varies each >> year. I only have access to the age intervals, not the age of each person, >> which would make things easier. >> >> I want to know the median age interval (not the median number) for each >> year. Let's say that in y1994 23 corresponds to the median age interval >> "45-54", I want to "45-54" as a result. How is that done? >> >> This is the sample dataset: >> > agegrp <- > c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74", > "75-84","84-") > y1994 <- c(0,5,7,9,25,44,23,32,40,36,8) > y1995 <- c(2,4,1,7,20,39,32,18,21,23,5) > y1996 <- c(1,3,1,4,22,37,41,24,24,26,8) > >> I look forward to your response >> >> Best regards, >> Erik Svensson > > In principle, this is straightforward. But in ##############practice you may > need to be careful about how to deal with borderline cases -- and > about what you mean by "median age interval". > The underlying idea is based on: > > cumsum(y1994)/sum(y1994) > # [1] 0.00000000 0.02183406 0.05240175 0.09170306 0.20087336 > # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.00000000 > > Thus age intervals 1-7 ("<1" - "45-64") contain less that 50% > (0.49344978...), though "45-64" almost gets there. However, > age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence > the median age is within "49-64".####### Should be: age groups 1-8 ("<1" - 55-64") contain more than 50%. Hence the median age is within "55-64".> Implementing the above as a procedure: > > agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)] > # [1] "55-64" > > Note that the "obvious solution": > > agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))] > # [1] "45-54" > > gives an incorrect answer, since with these data it returns a group > whose maximum age is below the median. This is because the "<=" is > satisfied by "<" also. > > Hoping this helps! > Ted. > > ------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> > Date: 12-Jan-2015 Time: 11:12:39 > This message was sent by XFMail > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> Date: 12-Jan-2015 Time: 11:21:11 This message was sent by XFMail