Newbie here. Many apologies in advance for using the incorrect lingo. I'm new to statistics and VERY new to R. I'm attempting to "group" or "bin" data together in order to analyze them as a combined group rather than as discrete set. I'll provide a simple example of the data for illustrative purposes. Patient ID | Charges | Age | Race 1 | 100 | 0 | Black 2 | 500 | 3 | White 3 | 200 | 5 | Hispanic 4 | 90 | 7 | Asian 5 | 400 | 10 | Hispanic 6 | 500 | 16 | Black I'm trying to create three age categories--"0 to 4", "5 to 11" and "12 to 17"--and analyze their "Charges" by their "Race." How do I go abouts to doing this? Thanks for any assistance! Sam -- View this message in context: http://www.nabble.com/Binning-or-grouping-data-tp23864555p23864555.html Sent from the R help mailing list archive at Nabble.com.
> I'm attempting to "group" or "bin" data together in order to analyze them as > a combined group rather than as discrete set. I'll provide a simple example > of the data for illustrative purposes. > > Patient ID | Charges | Age | Race > 1 | 100 | 0 | Black > 2 | 500 | 3 | White > 3 | 200 | 5 | Hispanic > 4 | 90 | 7 | Asian > 5 | 400 | 10 | Hispanic > 6 | 500 | 16 | Black > > I'm trying to create three age categories--"0 to 4", "5 to 11" and "12 to > 17"--and analyze their "Charges" by their "Race." How do I go abouts to > doing this?cut() or split() are probably the functions you are looking for. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
You want cut and tapply, perhaps along these lines: ## Your data frame: a <- data.frame(patient=1:6, charges=c(100,500,200,90,400,500), age=c(0,3,5,7,10,16), race=c("black","white","hispanic","asian","hispanic","black")) ## Add an age category: a <- cbind(a, age_category=cut(a$age, breaks=c(-Inf,4,11,17))) ## Calculate average charges per age category and race with(a, tapply(charges, list(age_category, race), mean)) # asian black hispanic white # (-Inf,4] NA 100 NA 500 # (4,11] 90 NA 300 NA # (11,17] NA 500 NA NA Hope this helps. Allan. alamoboy wrote:> Newbie here. Many apologies in advance for using the incorrect lingo. I'm > new to statistics and VERY new to R. > > I'm attempting to "group" or "bin" data together in order to analyze them as > a combined group rather than as discrete set. I'll provide a simple example > of the data for illustrative purposes. > > Patient ID | Charges | Age | Race > 1 | 100 | 0 | Black > 2 | 500 | 3 | White > 3 | 200 | 5 | Hispanic > 4 | 90 | 7 | Asian > 5 | 400 | 10 | Hispanic > 6 | 500 | 16 | Black > > I'm trying to create three age categories--"0 to 4", "5 to 11" and "12 to > 17"--and analyze their "Charges" by their "Race." How do I go abouts to > doing this? > > Thanks for any assistance! > > > Sam > >
alamoboy wrote:> > Newbie here. Many apologies in advance for using the incorrect lingo. > I'm new to statistics and VERY new to R. > > I'm attempting to "group" or "bin" data together in order to analyze them > as a combined group rather than as discrete set. I'll provide a simple > example of the data for illustrative purposes. > > Patient ID | Charges | Age | Race > 1 | 100 | 0 | Black > 2 | 500 | 3 | White > 3 | 200 | 5 | Hispanic > 4 | 90 | 7 | Asian > 5 | 400 | 10 | Hispanic > 6 | 500 | 16 | Black > > I'm trying to create three age categories--"0 to 4", "5 to 11" and "12 to > 17"--and analyze their "Charges" by their "Race." How do I go abouts to > doing this? > > Thanks for any assistance! > > > Sam > > >Sam, In addition to functions mentioned by other respondents, you may wish to investigate findInterval(), which returns indices of bins. The resulting indices are very useful for subscripting as well as grouping.> id[1] 1 2 3 4 5 6> age[1] 0 3 5 7 10 16> group <- findInterval(age,breaks)> group[1] 1 1 3 3 3 5> data.frame(id,age,group)id age group 1 1 0 1 2 2 3 1 3 3 5 3 4 4 7 3 5 5 10 3 6 6 16 5 Glen -- View this message in context: http://www.nabble.com/Binning-or-grouping-data-tp23864555p23872151.html Sent from the R help mailing list archive at Nabble.com.
alamoboy wrote:> > Newbie here. Many apologies in advance for using the incorrect lingo. > I'm new to statistics and VERY new to R. > > I'm attempting to "group" or "bin" data together in order to analyze them > as a combined group rather than as discrete set. I'll provide a simple > example of the data for illustrative purposes. > > Patient ID | Charges | Age | Race > 1 | 100 | 0 | Black > 2 | 500 | 3 | White > 3 | 200 | 5 | Hispanic > 4 | 90 | 7 | Asian > 5 | 400 | 10 | Hispanic > 6 | 500 | 16 | Black > > I'm trying to create three age categories--"0 to 4", "5 to 11" and "12 to > 17"--and analyze their "Charges" by their "Race." How do I go abouts to > doing this? > > Thanks for any assistance! > > > Sam > > >Oops! My use of bins other than you described was not part of some obscure strategy. Should have been as shown below:> id[1] 1 2 3 4 5 6> age[1] 0 3 5 7 10 16> breaks[1] 0 5 12 17> group <- findInterval(age,breaks)> data.frame(id,age,group)id age group 1 1 0 1 2 2 3 1 3 3 5 2 4 4 7 2 5 5 10 2 6 6 16 3 -- View this message in context: http://www.nabble.com/Binning-or-grouping-data-tp23864555p23872705.html Sent from the R help mailing list archive at Nabble.com.