Tymek W
2009-Jul-09 22:04 UTC
[R] Strange t-test error: "grouping factor must have exactly 2 levels" while it does...
Hi, Could anyone tell me what is wrong:> length(unique(mydata$myvariable))[1] 2>and in t-test: (...) Error in t.test.formula(othervariable ~ myvariable, mydata) : grouping factor must have exactly 2 levels>I re-checked the code and still don't get what is wrong. Moreover, there is some strange behavior: /1 It seems that the error is vulnerable to NA'a, because it affects some variables in data set with NA's and doesn't affect same ones in dataset with NA's removed. /2 It seems it works differently with different ways of using variables in t.test: eg. it hapends here: t.test(x~y, dataset) and does not here: t.test(dataset[['x']]~dataset[['y']]) Does anyone have any ideas? Greetz, Timo
Marc Schwartz
2009-Jul-10 00:11 UTC
[R] Strange t-test error: "grouping factor must have exactly 2 levels" while it does...
On Jul 9, 2009, at 5:04 PM, Tymek W wrote:> Hi, > > Could anyone tell me what is wrong: > >> length(unique(mydata$myvariable)) > [1] 2 >> > > and in t-test: > > (...) > Error in t.test.formula(othervariable ~ myvariable, mydata) : > grouping factor must have exactly 2 levels >> > > I re-checked the code and still don't get what is wrong. > > Moreover, there is some strange behavior: > > /1 It seems that the error is vulnerable to NA'a, because it affects > some variables in data set with NA's and doesn't affect same ones in > dataset with NA's removed. > > /2 It seems it works differently with different ways of using > variables in t.test: > > eg. it hapends here: t.test(x~y, dataset) and does not here: > t.test(dataset[['x']]~dataset[['y']]) > > Does anyone have any ideas? > > Greetz, > TimoCheck the output of: na.omit(cbind(mydata$othervariable, mydata$myvariable)) which will give you some insight into what data is actually available to be used in the t test. This will remove any rows that have missing data. Your first test above, checking the number of levels, is before missing data is removed. The likelihood is that once missing values have been removed, you are only left with one unique grouping value in mydata$myvariable. For your note number 2, it should be the same for both examples, as in both cases, the same basic approach is used. For example: DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3)) > DF x y 1 1 1 2 2 1 3 3 1 4 NA 2 5 NA 2 6 NA 2 # Remove missing data > na.omit(DF) x y 1 1 1 2 2 1 3 3 1 > t.test(x ~ y, data = DF) Error in t.test.formula(x ~ y, data = DF) : grouping factor must have exactly 2 levels > t.test(DF$x ~ DF$y) Error in t.test.formula(DF$x ~ DF$y) : grouping factor must have exactly 2 levels If you have a small reproducible example where the two function calls behave differently, please post back with it. HTH, Marc Schwartz