I'm trying to figure out about factors, however the on-line documentation is rather sparse. I guess, factors are intended for grouping arrays members into categories, which R names "Levels". And so we have: * state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act") * statef <- factor(state) * statef [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa nt wa [20] vic qld nsw nsw wa sa act nsw vic vic act Levels: act nsw nt qld sa tas vic wa With this, just visually, I know what the cateogries or Levels are. Nonetheless, two questions arise here: How can I have, computationally as opposed to visually, access to the names of these categories, and how do I get the indexes of the original array elements that belong to a particular category, say, "act"? This is, for instance, to select from another "parallel" array, the corresponding elements, say * incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, 59, 46, 58, 43) So to select, the corresponding elements to "act": 46 43 Do you have any comments on this? Thanks, --Sergio.
Hi Julio, You can use a factor to index another object just as you'd use any other index:> incomes[statef == "act"][1] 46 43 It looks like you're using the R intro guide, but there's a lot of other material available. Try this one for starters: http://www.stat.berkeley.edu/classes/s133/factors.html Is there something specific you're trying to accomplish? Sarah On Fri, Mar 30, 2012 at 12:50 PM, Julio Sergio <juliosergio at gmail.com> wrote:> > I'm trying to figure out about factors, however the on-line documentation is > rather sparse. I guess, factors are intended for grouping arrays members into > categories, which R names "Levels". And so we have: > > ?* state <- c("tas", "sa", ?"qld", "nsw", "nsw", "nt", ?"wa", ?"wa", > ? ? ? ? ? ? ? ? ?"qld", "vic", "nsw", "vic", "qld", "qld", "sa", ?"tas", > ? ? ? ? ? ? ? ? ?"sa", ?"nt", ?"wa", ?"vic", "qld", "nsw", "nsw", "wa", > ? ? ? ? ? ? ? ? ?"sa", ?"act", "nsw", "vic", "vic", "act") > ?* statef <- factor(state) > ?* statef > ?[1] tas sa ?qld nsw nsw nt ?wa ?wa ?qld vic nsw vic qld qld sa ?tas sa ?nt ?wa > ?[20] vic qld nsw nsw wa ?sa ?act nsw vic vic act > ?Levels: act nsw nt qld sa tas vic wa > > With this, just visually, I know what the cateogries or Levels are. Nonetheless, > two questions arise here: How can I have, computationally as opposed to > visually, access to the names of these categories, and how do I get the indexes > of the original array elements that belong to a particular category, say, "act"? > This is, for instance, to select from another "parallel" array, the > corresponding elements, say > > > ?* incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, > ? ? ? ? ? ? ? ? ? ?61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, > ? ? ? ? ? ? ? ? ? ?59, 46, 58, 43) > > So to select, the corresponding elements to "act": > > ?46 43 > > > Do you have any comments on this? > > Thanks, > > --Sergio. >-- Sarah Goslee http://www.functionaldiversity.org
On Mar 30, 2012, at 12:50 PM, Julio Sergio wrote:> > I'm trying to figure out about factors, however the on-line > documentation is > rather sparse. I guess, factors are intended for grouping arrays > members into > categories, which R names "Levels". And so we have: > > * state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", > "qld", "vic", "nsw", "vic", "qld", "qld", "sa", > "tas", > "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", > "wa", > "sa", "act", "nsw", "vic", "vic", "act") > * statef <- factor(state) > * statef > [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas > sa nt wa > [20] vic qld nsw nsw wa sa act nsw vic vic act > Levels: act nsw nt qld sa tas vic wa > > With this, just visually, I know what the cateogries or Levels are. > Nonetheless, > two questions arise here: How can I have, computationally as opposed > to > visually, access to the names of these categories, and how do I get > the indexes > of the original array elements that belong to a particular category, > say, "act"? > This is, for instance, to select from another "parallel" array, the > corresponding elements, say > > > * incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, > 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, > 59, 46, 58, 43) > > So to select, the corresponding elements to "act": > > 46 43I think you need to understand indexing more than you need to understand factors. incomes [ which(statef == "act") ]> > > Do you have any comments on this?If you want to understand how to programmatically access levels, then you only need to follow the "See also" links on the ?factor page. -- David Winsemius, MD West Hartford, CT
David Winsemius <dwinsemius <at> comcast.net> writes:> > > I think you need to understand indexing more than you need to > understand factors. > > incomes [ which(statef == "act") ] > > If you want to understand how to programmatically access levels, then > you only need to follow the "See also" links on the ?factor page. >Thanks David! Your information has been very useful. --Sergio.
I'd like to make the distinction between the purpose of factors, i.e., what they are intended for, and how that purpose is accomplished. Their purpose is for use in statistical models. The simplest example is analysis of variance, where predictors are commonly referred to as factors. Factors in R are intended to be used as factors in statistical models. Similarly, in the anova literature, the different values of the predictor are often referred to as levels. So R creates factors by grouping the array categories into levels, as you described. Underlying the levels are numeric codes that the modeling functions use. Try as.numeric(statef) and compare with as.numeric(state) Because of this, I personally don't make anything into a factor unless I intend to use it in a model. Or, occasionally, because of a useful "side effect." For example: (the following needs to be viewed using a monospaced font)> set.seed(21)> mns <- sample(month.abb,100,replace=TRUE) > table(mns)mns Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep 3 12 18 8 8 14 2 9 4 6 8 8 ## same:> mnsf1 <- factor(mns) > table(mnsf1)mnsf1 Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep 3 12 18 8 8 14 2 9 4 6 8 8 ## now the months are in the "correct" order> mnsf2 <- factor(mns, levels=month.abb) > table(mnsf2)mnsf2 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 8 8 9 3 4 2 14 12 8 8 6 18 Compare > sort(mnsf1) > sort(mnsf2)and compare how the underlying numeric codes are assigned to the categories. So, I know this wasn't about your main question, but I hope you find it useful anyway. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 3/30/12 9:50 AM, "Julio Sergio" <juliosergio at gmail.com> wrote:> >I'm trying to figure out about factors, however the on-line documentation >is >rather sparse. I guess, factors are intended for grouping arrays members >into >categories, which R names "Levels". And so we have: > > * state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", > "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", > "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", > "sa", "act", "nsw", "vic", "vic", "act") > * statef <- factor(state) > * statef > [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa tas sa >nt wa > [20] vic qld nsw nsw wa sa act nsw vic vic act > Levels: act nsw nt qld sa tas vic wa > >With this, just visually, I know what the cateogries or Levels are. >Nonetheless, >two questions arise here: How can I have, computationally as opposed to >visually, access to the names of these categories, and how do I get the >indexes >of the original array elements that belong to a particular category, say, >"act"? >This is, for instance, to select from another "parallel" array, the >corresponding elements, say > > > * incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, > 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, > 59, 46, 58, 43) > >So to select, the corresponding elements to "act": > > 46 43 > > >Do you have any comments on this? > >Thanks, > >--Sergio. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.