citadel
2012-Jan-26 08:23 UTC
[R] How do I use the cut function to assign specific cut points?
I am new to R, and I am trying to cut a continuous variable BMI into different categories and can't figure out how to use it. I would like to cut it into four groups: <20, 20-25, 25-30 and >= 30. I am having difficulty figuring the code for <20 and >=30? Please help. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4329788.html Sent from the R help mailing list archive at Nabble.com.
Jim Lemon
2012-Jan-26 09:25 UTC
[R] How do I use the cut function to assign specific cut points?
On 01/26/2012 07:23 PM, citadel wrote:> I am new to R, and I am trying to cut a continuous variable BMI into > different categories and can't figure out how to use it. I would like to cut > it into four groups:<20, 20-25, 25-30 and>= 30. I am having difficulty > figuring the code for<20 and>=30? Please help. Thank you. >Hi citadel, Assuming that you only have positive numbers and they are all less than 100: BMIcut<-cut(BMI,breaks=c(0,20,25,30,100), include.lowest=TRUE,right=FALSE) This will give you >=0-20, >=20-<25, >=20-<30, >=30-<100 Jim
Frank Harrell
2012-Jan-26 14:02 UTC
[R] How do I use the cut function to assign specific cut points?
It is not valid to categorize BMI. This will result in major loss of information and residual confounding. Plus there is huge heterogeneity in the BMI >= 30 group. Details are at http://biostat.mc.vanderbilt.edu/CatContinuous and see these articles: @Article{fil07cat, author = {Filardo, Giovanni and Hamilton, Cody and Hamman, Baron and Ng, Hon K. T. and Grayburn, Paul}, title = {Categorizing {BMI} may lead to biased results in studies investigating in-hospital mortality after isolated {CABG}}, journal = J Clin Epi, year = 2007, volume = 60, pages = {1132-1139}, annote = {BMI;CABG;surgical adverse events;hospital mortality;epidemiology;smoothing methods;categorization;categorizing continuous variables;investigators should waive categorization entirely and use smoothed functions for continuous variables;examples of non-monotonic relationships} } @Article{roy06dic, author = {Royston, Patrick and Altman, Douglas G. and Sauerbrei, Willi}, title = {Dichotomizing continuous predictors in multiple regression: a bad idea}, journal = Stat in Med, year = 2006, volume = 25, pages = {127-141}, annote = {continuous covariates;dichotomization;categorization;regression;efficiency;clinical research;residual confounding;destruction of statistical inference when cutpoints are chosen using the response variable;varying effect estimates when change cutpoints;difficult to interpret effects when dichotomize;nice plot showing effect of categorization;PBC data} } If you work with colleagues who tell you "this is the way it's done" don't go down without a fight. In general, good statistical practice dictates that categorization is only done for producing certain tables (for which case you might use the cut2 function in the Hmisc package). Even that will change as we incorporate more micrographics (think of loess plots with BMI on the x-axis) within table cells as is now done in the Hmisc summary.formula function for purely categorical variables. Frank citadel wrote> > I am new to R, and I am trying to cut a continuous variable BMI into > different categories and can't figure out how to use it. I would like to > cut it into four groups: <20, 20-25, 25-30 and >= 30. I am having > difficulty figuring the code for <20 and >=30? Please help. Thank you. >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4330380.html Sent from the R help mailing list archive at Nabble.com.