Conradsb
2012-Sep-05 19:14 UTC
[R] Recoding categorical gender variable into numeric factors
I currently have a data set in which gender is inputed as "Male" and "Female" , and I'm trying to convert this into "1" and "0". I found a website which reccomended using two commands: data$scode[data$sex=="M"] <- "1" data$scode[data$sex=="F"] <- "2" to convert to numbers, and: data$scode <- factor(data$scode) to convert this variable to a factor. My issue is that, after I use the first command, *only* the female values get converted to a number. I am left with a column filled with 2's and blank spaces. Instead of typing both lines of the first command, I copy and pasted the first line and changed the letter representing gender. I also made sure that both letters were exactly as they appear in the dataset. My questions are: is there any visible issue with my syntax, and are there any other methods to accomplish this? I'm also very new to R, so complex syntax is beyond me. Conrad Baldner -- View this message in context: http://r.789695.n4.nabble.com/Recoding-categorical-gender-variable-into-numeric-factors-tp4642316.html Sent from the R help mailing list archive at Nabble.com.
Ista Zahn
2012-Sep-05 20:14 UTC
[R] Recoding categorical gender variable into numeric factors
Hi Conrad, On Wed, Sep 5, 2012 at 3:14 PM, Conradsb <csbaldne at vt.edu> wrote:> I currently have a data set in which gender is inputed as "Male" and "Female" > , and I'm trying to convert this into "1" and "0".This is usually not necessary, and makes things more confusing. "Male" and "Female" is clear and self-explanatory: "0" and "1" are not.> > I found a website which reccomended using two commands: > > data$scode[data$sex=="M"] <- "1" > data$scode[data$sex=="F"] <- "2"Nope, "1" is the character 1, not the number 1 in R. Also, you said the values were "Male" and "Female", not "F" and "M". To convert "Male" to 1 and "Female" to 2 you can use data$scode[data$sex=="Male"] <- 1> data$scode[data$sex=="Female"] <- 2Notice "Male" and "Female", instead of "M" and "F", and 1 and 2 instead of "1" and "2"> > to convert to numbers, and: > > data$scode <- factor(data$scode) > > to convert this variable to a factor.No need to convert it to a factor first. Just use data$sex <- factor(data$sex)> > > > My issue is that, after I use the first command, *only* the female values > get converted to a number. I am left with a column filled with 2's and blank > spaces.Strange, especially if sex is actually "Male" and "Female", in which case scode should be all NA. If you want to follow up on this, please post the result of dput(dat["sex"]) Instead of typing both lines of the first command, I copy and pasted> the first line and changed the letter representing gender. I also made sure > that both letters were exactly as they appear in the dataset. > > My questions are: is there any visible issue with my syntax, and are there > any other methods to accomplish this?In this case you don't actually need to convert to numeric. Just use data$scode <- factor(scode) If you really need to convert characters to numbers, it is often convenient to use factors as intermediate steps, like this: dat <- data.frame(sex=sample(c("Male", "Female"), 10, replace=TRUE)) dat$sex.n <- as.numeric( as.character( factor( dat$sex, levels = c("Female", "Male"), labels = c("0", "1")))) Best, Ista> > I'm also very new to R, so complex syntax is beyond me. > > Conrad Baldner > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Recoding-categorical-gender-variable-into-numeric-factors-tp4642316.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David L Carlson
2012-Sep-05 20:20 UTC
[R] Recoding categorical gender variable into numeric factors
I can't replicate your problem. I created a data set with "Male" and "Female" since that is what you indicate, but your commands use "M" and "F" which is different. When I use "Male" and "Female" the recoding is just as expected, but you don't even need to do this. You probably already have a factor since R routinely turns character fields into factors:> data <- data.frame(sex=c(rep("Male", 5), rep("Female", 5))) > datasex 1 Male 2 Male 3 Male 4 Male 5 Male 6 Female 7 Female 8 Female 9 Female 10 Female> str(data)'data.frame': 10 obs. of 1 variable: $ sex: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 1 1 1 1 1 So data$sex is a Factor with two levels Female=1 and Male=2. If the result of str(data) looks like this, you have a character array (chr):> str(data)'data.frame': 10 obs. of 1 variable: $ sex: chr "Male" "Male" "Male" "Male" ... If you want to convert a character array to a factor just use the command: data$sex <- factor(data$sex) By default, R orders the character strings alphabetically before converting to factors so "Female" becomes 1 and "Male" becomes 2. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Conradsb > Sent: Wednesday, September 05, 2012 2:14 PM > To: r-help at r-project.org > Subject: [R] Recoding categorical gender variable into numeric factors > > I currently have a data set in which gender is inputed as "Male" and > "Female" > , and I'm trying to convert this into "1" and "0". > > I found a website which reccomended using two commands: > > data$scode[data$sex=="M"] <- "1" > data$scode[data$sex=="F"] <- "2" > > to convert to numbers, and: > > data$scode <- factor(data$scode) > > to convert this variable to a factor. > > > > My issue is that, after I use the first command, *only* the female > values > get converted to a number. I am left with a column filled with 2's and > blank > spaces. Instead of typing both lines of the first command, I copy and > pasted > the first line and changed the letter representing gender. I also made > sure > that both letters were exactly as they appear in the dataset. > > My questions are: is there any visible issue with my syntax, and are > there > any other methods to accomplish this? > > I'm also very new to R, so complex syntax is beyond me. > > Conrad Baldner > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Recoding- > categorical-gender-variable-into-numeric-factors-tp4642316.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.