Manli Yan
2009-Aug-20 07:40 UTC
[R] how to categorize continuous variable when useing regression
assume dependent variable y( continuous),independent variable x ( continuous),I try to categorize x with some interval,such that,those intervals would has most significant different effect on y. any one knows which method I should apply,I really need some hints,thanks so much~ [[alternative HTML version deleted]]
Frank E Harrell Jr
2009-Aug-20 12:50 UTC
[R] how to categorize continuous variable when useing regression
Manli Yan wrote:> assume dependent variable y( continuous),independent variable x ( > continuous),I try to categorize x with some interval,such that,those > intervals would has most significant different effect on y. > any one knows which method I should apply,I really need some hints,thanks > so much~This is a dangerous practice. Unless you have mastered the bootstrap so that you can accurately estimate the damage to the statistical significance caused by such categorization, it is best to avoid this altogether. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Manli Yan
2009-Sep-17 01:41 UTC
[R] how to categorize continuous variable when useing regression
assume dependent variable y( continuous),independent variable x ( continuous),I try to categorize x with some interval,such that,those intervals would has most significant different effect on y. any one knows which method I should apply,I know it will cause the loss of information,but can I really do that?or by using what mehod ,I will keep the loss minimal,all I want just some key words,thanks in advance~ [[alternative HTML version deleted]]
Frank E Harrell Jr
2009-Sep-17 02:26 UTC
[R] how to categorize continuous variable when useing regression
Manli Yan wrote:> assume dependent variable y( continuous),independent variable x ( > continuous),I try to categorize x with some interval,such that,those > intervals would has most significant different effect on y. > any one knows which method I should apply,I know it will cause the loss > of information,but can I really do that?or by using what mehod ,I will keep > the loss minimal,all I want just some key words,thanks in advance~This is bad statistical practice and should be avoided. Use modern methods such as regression splines, penalized splines, loess, etc. Howard Wainer provided an algorithm that, for any set of x-y pairs in which there is no correlation, one can find a set of 5 intervals such that the mean y is increasing in x and another set of intervals in which the mean y is decreasing in x. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University