Hi dear experts, I have a general question in R, about the categorical variable such as Gender(Male or Female) If I have this column in my data and wanted to do regression model or feed the data to seqmeta packages (singlesnp, skat meta) , would you please let me know should I code them first ( male=0 and female=1) or R programming do it for me? Because when I didn't code them, R still can do the analysis without any error but I'm not sure it's correct or not? Thanks [[alternative HTML version deleted]]
It's correct. You need to spend some time with an R tutorial -- there are many on the web -- and learn about how R handles factors in modeling. ?factor Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Fri, Sep 11, 2015 at 7:45 AM, Lida Zeighami <lid.zigh at gmail.com> wrote:> Hi dear experts, > I have a general question in R, about the categorical variable such as > Gender(Male or Female) > If I have this column in my data and wanted to do regression model or feed > the data to seqmeta packages (singlesnp, skat meta) , would you please let > me know should I code them first ( male=0 and female=1) or R programming do > it for me? > Because when I didn't code them, R still can do the analysis without any > error but I'm not sure it's correct or not? > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You need to read the Introduction to R, paying particular attention to the factors data type, which is designed for this problem. You should also be aware that on this list failure to include a small example of your problem in R, using plain text email (a setting in your email program), often leads to getting no response at all. Conversely, if you do provide an example, the response will often include modifications to your example code that you can study.[1] Also, you really ought to read the Posting Guide given at the bottom of every R-help posting. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On September 11, 2015 7:45:42 AM PDT, Lida Zeighami <lid.zigh at gmail.com> wrote:>Hi dear experts, >I have a general question in R, about the categorical variable such as >Gender(Male or Female) >If I have this column in my data and wanted to do regression model or >feed >the data to seqmeta packages (singlesnp, skat meta) , would you please >let >me know should I code them first ( male=0 and female=1) or R >programming do >it for me? >Because when I didn't code them, R still can do the analysis without >any >error but I'm not sure it's correct or not? >Thanks > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi Lida, Given that this is such a common question and the R FAQ doesn't really answer it, perhaps a brief explanation will help. In R the factor class is a sort of combination of the literal representation of the data and a sequence of numbers beginning at 1 that are alphabetically ordered by default. For example, suppose you read in what you think are a set of numbers like this: x<-read.table(text="1 2 3 + 4 5 6 + 7 . 9") x V1 V2 V3 1 1 2 3 2 4 5 6 3 7 . 9 Now look at the classes of the columns: sapply(x,class) V1 V2 V3 "integer" "factor" "integer" Somehow that second column has become a factor. This is because "." cannot be represented as a number and I didn't tell R that it should be regarded as a missing value (na.strings="."). R has taken the literal values in that column levels(x$V2) [1] "." "2" "5" and attached numbers to those values their alphabetic order. as.numeric(x$V2) [1] 2 3 1 You can get the original numbers back like this: as.numeric(as.character(x$V2)) [1] 2 5 NA Warning message: NAs introduced by coercion and R helpfully tells you that it couldn't coerce "." to a number. In your example, the factor is created for you mf<-factor(c("male","female"))> mf[1] male female Levels: female male but as you can see, the default order of the factor may not be what you think as.numeric(mf) [1] 2 1 For a more complete account of factors, see "An Introduction to R" section 4 "Ordered and unordered factors". Jim On Sat, Sep 12, 2015 at 12:45 AM, Lida Zeighami <lid.zigh at gmail.com> wrote:> Hi dear experts, > I have a general question in R, about the categorical variable such as > Gender(Male or Female) > If I have this column in my data and wanted to do regression model or feed > the data to seqmeta packages (singlesnp, skat meta) , would you please let > me know should I code them first ( male=0 and female=1) or R programming do > it for me? > Because when I didn't code them, R still can do the analysis without any > error but I'm not sure it's correct or not? > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]