thr3ads.net - R devel - [Rd] Factors [Oct 2025]

If this information is useful, please help other people find it:
Share via:

Therneau, Terry M., Ph.D.

2025-Oct-25 15:15 UTC

[Rd] Factors

Peter wrote:
A related issue that has bugged me "forever" is that we treat ordered
factors as if their levels are equidistant even when that is patently untrue.
The use of polynomial contrasts for ordered factors reflects this - it would
really be more sensible to use e.g. successive differences or (ick!) Helmert
contrasts for the ordered case and reserve poly() for factors with actual
numerical levels. To me, this effectively makes ordered factors conceptually
useless.

I actually fall into the opposite camp.  We teach students that they *must* code
factors as  having non-equal increments but that continuous variables  are okay
s is  (the old interval vs. ordinal dichotomy).   In my medical work, a lot of
the categorical variables actually are close to evenly spaced, a disease grade
for instance, since that is what the original authors of the scale were trying
to do.   It is the continuous variables that violate equal spacing most
violently.    A cardiac ejection fraction drop from 70 to 60 is "meh",
a drop from 30 to 20 is "I hope your affairs are in order".    We
don't check this nearly often enough.

Also, I use as.integer(factor) quite when creating a an analysis data set from
input data.   It's just another tool for creating new variables.

Terry T.

	[[alternative HTML version deleted]]

Peter Dalgaard

2025-Oct-27 15:09 UTC

head link

[Rd] Factors

I don?t think these are actually opposite camps, just two related but different
issues. Sure, functions can be nonlinear, but we shouldn?t create arbitrary
nonlinearities due to the selection of x values.

What I was thinking of was measurements at baseline and 3,6,9,12,18,24 mo into
treatment (as in ISwR::alkfos). Fitting a linear trend with the factor codes
will de facto assume a kink at 12 months. The jump from time 0 to 3mo may well
be deviate from a linear trend (AFAIR, it does), but that is another issue.

- Peter
> On 25 Oct 2025, at 17.15, Therneau, Terry M., Ph.D. <therneau at
mayo.edu> wrote:
> 
> Peter wrote:
> A related issue that has bugged me "forever" is that we treat
ordered factors as if their levels are equidistant even when that is patently
untrue. The use of polynomial contrasts for ordered factors reflects this - it
would really be more sensible to use e.g. successive differences or (ick!)
Helmert contrasts for the ordered case and reserve poly() for factors with
actual numerical levels. To me, this effectively makes ordered factors
conceptually useless.
>  I actually fall into the opposite camp.  We teach students that they
*must* code factors as  having non-equal increments but that continuous
variables  are okay s is  (the old interval vs. ordinal dichotomy).   In my
medical work, a lot of the categorical variables actually are close to evenly
spaced, a disease grade for instance, since that is what the original authors of
the scale were trying to do.   It is the continuous variables that violate equal
spacing most violently.    A cardiac ejection fraction drop from 70 to 60 is
"meh", a drop from 30 to 20 is "I hope your affairs are in
order".    We don't check this nearly often enough.
>  Also, I use as.integer(factor) quite when creating a an analysis data set
from input data.   It's just another tool for creating new variables.
>  Terry T.

R devel - Oct 2025 - Factors

[Rd] Factors

[Rd] Factors