Saumya Gupta
2013-Sep-14 20:12 UTC
[R] Regression model for predicting ranks of the dependent variable
I have a dataset which has several predictor variables and a dependent variable, "score" (which is numeric). The score for each row is calculated using a formula which uses some of the predictor variables. But, the "score" figures are not explicitly given in the dataset. The scores are only arranged in ascending order, and the ranks of the numbers are given (like 1, 2, 3, 4, etc.; rank 1 means that the particular row had the highest score, 2 means it had the second highest score and so on). So, if the data has 100 rows, the output has ranks from 1 to 100. I don't think it would be proper to treat the output column as a numeric one, since it is an ordinal variable, and the distance (difference in scores) between ranks 1 and 2 may not be the same as that between ranks 2 and 3. However, most R regression models for ordinal regression are made for output such as (high, medium, low), where each level of the output does not necessarily correspond to a unique row. In my case, each output (rank) corresponds to a unique row. So please suggest me what models I could use for this problem. Will treating the output as numeric instead of ordinal be a reasonable approximation? Or will the usual models for ordinal regression work on this dataset as well? [[alternative HTML version deleted]]
Frank Harrell
2013-Sep-15 08:52 UTC
[R] Regression model for predicting ranks of the dependent variable
require(rms) ?orm # ordinal regression model For a case study see Handouts in http://biostat.mc.vanderbilt.edu/CourseBios330 Since you have lost the original values, one part of the case study will not apply: the use of Mean(). Frank ------------- I have a dataset which has several predictor variables and a dependent variable, "score" (which is numeric). The score for each row is calculated using a formula which uses some of the predictor variables. But, the "score" figures are not explicitly given in the dataset. The scores are only arranged in ascending order, and the ranks of the numbers are given (like 1, 2, 3, 4, etc.; rank 1 means that the particular row had the highest score, 2 means it had the second highest score and so on). So, if the data has 100 rows, the output has ranks from 1 to 100. I don't think it would be proper to treat the output column as a numeric one, since it is an ordinal variable, and the distance (difference in scores) between ranks 1 and 2 may not be the same as that between ranks 2 and 3. However, most R regression models for ordinal regression are made for output such as (high, medium, low), where each level of the output does not necessarily correspond to a unique row. In my case, each output (rank) corresponds to a unique row. So please suggest me what models I could use for this problem. Will treating the output as numeric instead of ordinal be a reasonable approximation? Or will the usual models for ordinal regression work on this dataset as well?
Greg Snow
2013-Sep-16 16:20 UTC
[R] Regression model for predicting ranks of the dependent variable
What question (or questions) are you trying to answer? Any advice we may give will depend on what you are trying to accomplish. On Sat, Sep 14, 2013 at 2:12 PM, Saumya Gupta <saumya.gupta@outlook.com>wrote:> I have a dataset which has several predictor variables and a dependent > variable, "score" (which is numeric). The score for each row is calculated > using a formula which uses some of the predictor variables. But, the > "score" figures are not explicitly given in the dataset. The scores are > only arranged in ascending order, and the ranks of the numbers are given > (like 1, 2, 3, 4, etc.; rank 1 means that the particular row had the > highest score, 2 means it had the second highest score and so on). So, if > the data has 100 rows, the output has ranks from 1 to 100. > I don't think it would be proper to treat the output column as a numeric > one, since it is an ordinal variable, and the distance (difference in > scores) between ranks 1 and 2 may not be the same as that between ranks 2 > and 3. However, most R regression models for ordinal regression are made > for output such as (high, medium, low), where each level of the output does > not necessarily correspond to a unique row. In my case, each output (rank) > corresponds to a unique row. > So please suggest me what models I could use for this problem. Will > treating the output as numeric instead of ordinal be a reasonable > approximation? Or will the usual models for ordinal regression work on this > dataset as well? > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gregory (Greg) L. Snow Ph.D. 538280@gmail.com [[alternative HTML version deleted]]