> can anyone tell me why an encoding of 1/2 for a dummy variable for
> two groups (e.g. gender) seems to be preferred over 0/1?
> It's been bugging me for a while, 0/1 seems more natural, but I have
> been told (without explanation) that 1/2 is better. Why?
The best encoding depends upon which language you would like to manipulate
the variable in. In R, genders are most naturally represented as factors.
That means that in an external data source (like a spreadsheet of data),
you should ideally have the gender recorded as human-understandable text
("male" and "female", or "M" and "F").
Once the data is read into R, by
default R will convert the string to factors (keeping the human readable
labels). This way you avoid having to remember that 1 means male (or
whatever).
If you were manipulating the data in a different language that didn't have
factors, then it might be more appropriate to use an integer. Which
integers you use doesn't matter, you need to have a look-up table to know
what each number refers to, whatever you choose.
Regards,
Richie.
Mathematical Sciences Unit
HSL
------------------------------------------------------------------------
ATTENTION:
This message contains privileged and confidential inform...{{dropped:20}}