Dimitri Liakhovitski
2009-Feb-13 17:09 UTC
[R] tapply bug? - levels of a factor in a data frame after tapply are intermixed
Hello! I have encountered a really weird problem. Maybe you've encountered it before? I have a large data frame "importances". It has one factor ($A) with 3 levels: 3, 9, and 15. $B is a regular numeric variable. Below I am picking a really small sub-frame (just 3 rows) based on "indices". "indices" were chosen so that all 3 levels of A are present: indices=c(14329,14209,14353) test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][indices]) Here is what the new data frame "test" looks like: yy xx 1 -0.009984006 9 2 -2.339904131 3 3 -0.008427385 15 Here is the structure of "test":>str(test)'data.frame': 3 obs. of 2 variables: $ yy: num -0.00998 -2.3399 -0.00843 $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 Notice - the order of factor levels for xx is not 1 2 3 as it should be but 2 1 3. How come? Or also look at this:>test$xx[1] 9 3 15 Levels: 3 9 15 Same thing. Do you know what might be the reason? Thank you very much! -- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
jim holtman
2009-Feb-13 17:23 UTC
[R] tapply bug? - levels of a factor in a data frame after tapply are intermixed
Think of the levels as a table you are going to index into. The factors that you see (2, 1, 3) are the indices into the levels so you get 9, 3, 15 as the result. What were you expecting? It is working as it is supposed to. On Fri, Feb 13, 2009 at 12:09 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Hello! I have encountered a really weird problem. Maybe you've > encountered it before? > I have a large data frame "importances". It has one factor ($A) with 3 > levels: 3, 9, and 15. $B is a regular numeric variable. > Below I am picking a really small sub-frame (just 3 rows) based on > "indices". "indices" were chosen so that all 3 levels of A are > present: > > indices=c(14329,14209,14353) > test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][indices]) > Here is what the new data frame "test" looks like: > > yy xx > 1 -0.009984006 9 > 2 -2.339904131 3 > 3 -0.008427385 15 > > Here is the structure of "test": >>str(test) > 'data.frame': 3 obs. of 2 variables: > $ yy: num -0.00998 -2.3399 -0.00843 > $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 > > Notice - the order of factor levels for xx is not 1 2 3 as it should > be but 2 1 3. How come? > > Or also look at this: >>test$xx > [1] 9 3 15 > Levels: 3 9 15 > > Same thing. > Do you know what might be the reason? > > Thank you very much! > > -- > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Marc Schwartz
2009-Feb-13 17:24 UTC
[R] tapply bug? - levels of a factor in a data frame after tapply are intermixed
on 02/13/2009 11:09 AM Dimitri Liakhovitski wrote:> Hello! I have encountered a really weird problem. Maybe you've > encountered it before? > I have a large data frame "importances". It has one factor ($A) with 3 > levels: 3, 9, and 15. $B is a regular numeric variable. > Below I am picking a really small sub-frame (just 3 rows) based on > "indices". "indices" were chosen so that all 3 levels of A are > present: > > indices=c(14329,14209,14353) > test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][indices]) > Here is what the new data frame "test" looks like: > > yy xx > 1 -0.009984006 9 > 2 -2.339904131 3 > 3 -0.008427385 15 > > Here is the structure of "test": >> str(test) > 'data.frame': 3 obs. of 2 variables: > $ yy: num -0.00998 -2.3399 -0.00843 > $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 > > Notice - the order of factor levels for xx is not 1 2 3 as it should > be but 2 1 3. How come? > > Or also look at this: >> test$xx > [1] 9 3 15 > Levels: 3 9 15 > > Same thing. > Do you know what might be the reason? > > Thank you very much!The output of str() is showing you the factor levels of test$xx, followed by the internal integer codes for the three actual values of test$xx, 9, 3, and 15:> str(test$xx)Factor w/ 3 levels "3","9","15": 2 1 3> levels(test$xx)[1] "3" "9" "15"> as.integer(test$xx)[1] 2 1 3 9 is the second level, hence the 2 3 is the first level, hence the 1 15 is the third level, hence the 3. No problems, just clarification needed on what you are seeing. Note that you do not reference anything above regarding tapply() as per your subject line, though I suspect that I have an idea as to why you did... HTH, Marc Schwartz