Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Thanks Ulrik> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74)) > dim(m)[1] 3 3> levels(m$gender)[1] "F" "M"> s <- subset(m, m$gender == "M") > dim(s)[1] 2 3> levels(s$gender)[1] "F" "M"> cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor) > dim(s)[1] 2 3> levels(s$gender)[1] "M"
Hi Ulrik On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:> Dear List, > > When I subset a data.frame, the levels are not re-adjusted (see > example). Why is this? Am I missing out on some basic stuff here?Only that this issue has come up many times before, and that this list is archived and searchable. Try RSiteSearch("subset drop levels", restrict = c("Rhelp10", "Rhelp08", "Rhelp02")) -Ista> > Thanks > Ulrik > > >> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74)) >> dim(m) > [1] 3 3 > >> levels(m$gender) > [1] "F" "M" > >> s <- subset(m, m$gender == "M") >> dim(s) > [1] 2 3 > >> levels(s$gender) > [1] "F" "M" > >> cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor) >> dim(s) > [1] 2 3 > >> levels(s$gender) > [1] "M" > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
The advantage of computers is that they do exactly what they are told. The disadvantage of computers is that they do exactly what they are told. R is a set of instructions to the computer, those instructions are a combinations from the original programmers and from you. Who should make important decisions about the structure of your data? A group of (admittedly brilliant) programmers who have never seen your data nor know what questions you are trying to answer, or you (who hopefully knows more about your data and questions)? I don't claim to be more intelligent/knowledgable than the programmers of R, but I am grateful that they have/had sufficient humility to allow for the possibility that I may actually know something about my data and questions that they don't (or maybe they are just to lazy to do my job for me, but that is also appropriate). In your example below, why do you care what the levels of gender are after the subset? Why waste time/effort dropping the levels for a column that by definition only has one value? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Ulrik Stervbo > Sent: Saturday, September 04, 2010 6:53 AM > To: r-help at r-project.org > Subject: [R] Levels in returned data.frame after subset > > Dear List, > > When I subset a data.frame, the levels are not re-adjusted (see > example). Why is this? Am I missing out on some basic stuff here? > > Thanks > Ulrik > > > > m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt > = c(91,99, 74)) > > dim(m) > [1] 3 3 > > > levels(m$gender) > [1] "F" "M" > > > s <- subset(m, m$gender == "M") > > dim(s) > [1] 2 3 > > > levels(s$gender) > [1] "F" "M" > > > cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor) > > dim(s) > [1] 2 3 > > > levels(s$gender) > [1] "M" > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks for the replies! Obviously I must have used to wrong search terms - sorry. @greg: I care about the levels after the subset, because if they are not dropped, then they still appear in the subsequent heatmap I make with ggplot (with my read data-set of course). Admittedly I am quite green, and may do things in a rather silly way - but it works (at least I think it does) On 4 September 2010 15:41, Ista Zahn <izahn at psych.rochester.edu> wrote:> Hi Ulrik > > On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: >> Dear List, >> >> When I subset a data.frame, the levels are not re-adjusted (see >> example). Why is this? Am I missing out on some basic stuff here? > > Only that this issue has come up many times before, and that this list > is archived and searchable. Try > > RSiteSearch("subset drop levels", restrict = c("Rhelp10", "Rhelp08", "Rhelp02")) > > > -Ista > >> >> Thanks >> Ulrik >> >> >>> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74)) >>> dim(m) >> [1] 3 3 >> >>> levels(m$gender) >> [1] "F" "M" >> >>> s <- subset(m, m$gender == "M") >>> dim(s) >> [1] 2 3 >> >>> levels(s$gender) >> [1] "F" "M" >> >>> cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor) >>> dim(s) >> [1] 2 3 >> >>> levels(s$gender) >> [1] "M" >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org >