I have a data set similar to the following: Color Score RED 10 RED 13 RED 12 WHITE 22 WHITE 27 WHITE 25 BLUE 18 BLUE 17 BLUE 16 and I am trying to to select just the values of Color that are equal to RED or WHITE, excluding the BLUE. I've tried the following: myComp1<-subset(dataset, Color =="RED" | Color == "WHITE") myComp1<-subset(dataset, Color != "BLUE") myComp1<-dataset[which(dataset$Color != "BLUE"),] Each of the above lines successfully excludes the BLUE subjects, but the "BLUE" category is still present in my data set; that is, if I try table(Color) I get RED WHITE BLUE 82 151 0 If I try to do a t-test (since I've presumably gone from three groups to two groups), I get: Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed In addition: Warning message: In mean.default(y) : argument is not numeric or logical: returning NA and describe.by(score,Color) gives me descriptives for RED and WHITE, and BLUE also shows up as NULL. How can I eliminate the BLUE category completely so I can do a t-test using Color (with just the RED and WHITE subjects)? Many thanks in advance!! John -- View this message in context: http://www.nabble.com/Selecting-groups-with-R-tp25088073p25088073.html Sent from the R help mailing list archive at Nabble.com.
dataset[dataset$Color != "BLUE",] On 21-Aug-09, at 3:08 PM, jlwoodard wrote:> > I have a data set similar to the following: > > Color Score > RED 10 > RED 13 > RED 12 > WHITE 22 > WHITE 27 > WHITE 25 > BLUE 18 > BLUE 17 > BLUE 16 > > and I am trying to to select just the values of Color that are > equal to RED > or WHITE, excluding the BLUE. > > I've tried the following: > myComp1<-subset(dataset, Color =="RED" | Color == "WHITE") > myComp1<-subset(dataset, Color != "BLUE") > myComp1<-dataset[which(dataset$Color != "BLUE"),] > > Each of the above lines successfully excludes the BLUE subjects, > but the > "BLUE" category is still present in my data set; that is, if I try > table(Color) I get > > RED WHITE BLUE > 82 151 0 > > If I try to do a t-test (since I've presumably gone from three > groups to two > groups), I get: > Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs > (my))) > stop("data are essentially constant") : > missing value where TRUE/FALSE needed > In addition: Warning message: > In mean.default(y) : argument is not numeric or logical: returning NA > > and describe.by(score,Color) gives me descriptives for RED and > WHITE, and > BLUE also shows up as NULL. > > How can I eliminate the BLUE category completely so I can do a t- > test using > Color (with just the RED and WHITE subjects)? > > Many thanks in advance!! > > John > > > > -- > View this message in context: http://www.nabble.com/Selecting- > groups-with-R-tp25088073p25088073.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.Don McKenzie, Research Ecologist Pacific WIldland Fire Sciences Lab US Forest Service Affiliate Professor School of Forest Resources, College of the Environment CSES Climate Impacts Group University of Washington desk: 206-732-7824 cell: 206-321-5966 dmck at u.washington.edu donaldmckenzie at fs.fed.us
Hi John, I would guess that your Color column is a factor, with three levels ("RED","BLUE","WHITE"), which means that they will all be included in the output of a table() call, even if they are empty. Try dataset <- transform(dataset, Color=as.character(Color)) or something similar and then create the table. /Fredrik On Fri, Aug 21, 2009 at 11:08 PM, jlwoodard<john.woodard at wayne.edu> wrote:> > I have a data set similar to the following: > > Color ?Score > RED ? ? ?10 > RED ? ? ?13 > RED ? ? ?12 > WHITE ? 22 > WHITE ? 27 > WHITE ? 25 > BLUE ? ? 18 > BLUE ? ? 17 > BLUE ? ? 16 > > and I am trying to to select just the values of Color that are equal to RED > or WHITE, excluding the BLUE. > > I've tried the following: > myComp1<-subset(dataset, Color =="RED" | Color == "WHITE") > myComp1<-subset(dataset, Color != "BLUE") > myComp1<-dataset[which(dataset$Color != "BLUE"),] > > Each of the above lines successfully excludes the BLUE subjects, but the > "BLUE" category is still present in my data set; that is, if I try > table(Color) ?I get > > RED ?WHITE ?BLUE > 82 ? ? 151 ? ? ?0 > > If I try to do a t-test (since I've presumably gone from three groups to two > groups), I get: > Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) > stop("data are essentially constant") : > ?missing value where TRUE/FALSE needed > In addition: Warning message: > In mean.default(y) : argument is not numeric or logical: returning NA > > and describe.by(score,Color) gives me descriptives for RED and WHITE, and > BLUE also shows up as NULL. > > How can I eliminate the BLUE category completely so I can do a t-test using > Color (with just the RED and WHITE subjects)? > > Many thanks in advance!! > > John > > > > -- > View this message in context: http://www.nabble.com/Selecting-groups-with-R-tp25088073p25088073.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "Life is like a trumpet - if you don't put anything into it, you don't get anything out of it."
On Aug 21, 2009, at 6:08 PM, jlwoodard wrote:> > I have a data set similar to the following: > > Color Score > RED 10 > RED 13 > RED 12 > WHITE 22 > WHITE 27 > WHITE 25 > BLUE 18 > BLUE 17 > BLUE 16 > > and I am trying to to select just the values of Color that are equal > to RED > or WHITE, excluding the BLUE. > > I've tried the following: > myComp1<-subset(dataset, Color =="RED" | Color == "WHITE") > myComp1<-subset(dataset, Color != "BLUE") > myComp1<-dataset[which(dataset$Color != "BLUE"),] > > Each of the above lines successfully excludes the BLUE subjects, but > the > "BLUE" category is still present in my data set; that is, if I try > table(Color) I get > > RED WHITE BLUE > 82 151 0You are being bitten by the behavior of factors.> > If I try to do a t-test (since I've presumably gone from three > groups to two > groups), I get:How.... did you do the "t-test"?> Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), > abs(my))) > stop("data are essentially constant") : > missing value where TRUE/FALSE needed > In addition: Warning message: > In mean.default(y) : argument is not numeric or logical: returning NA > > and describe.by(score,Color) gives me descriptives for RED and > WHITE, and > BLUE also shows up as NULL. > > How can I eliminate the BLUE category completely so I can do a t- > test using > Color (with just the RED and WHITE subjects)?dataset$Color <- as.character(dataset$Color) -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Thank you David! David Winsemius wrote:> > > How.... did you do the "t-test"? > >t.test(Score,Color) John -- View this message in context: http://www.nabble.com/Selecting-groups-with-R-tp25088073p25088342.html Sent from the R help mailing list archive at Nabble.com.
On Aug 21, 2009, at 6:35 PM, jlwoodard wrote:> > Thank you David! > > > David Winsemius wrote: >> >> >> How.... did you do the "t-test"? >> >> > > t.test(Score,Color)?t,test t.test expects two numeric vectors, not a numeric vector and a grouping indicator. > t.test(dataset[dataset$Color=="RED", "Score"], dataset[dataset $Color=="WHITE", "Score"] ) Welch Two Sample t-test data: dataset[dataset$Color == "RED", "Score"] and dataset[dataset $Color == "WHITE", "Score"] t = -7.6485, df = 3.298, p-value = 0.003305 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -18.143205 -7.856795 sample estimates: mean of x mean of y 11.66667 24.66667>-- David Winsemius, MD Heritage Laboratories West Hartford, CT
David Winsemius wrote:> > t.test expects two numeric vectors, not a numeric vector and a > grouping indicator. > > > t.test(dataset[dataset$Color=="RED", "Score"], dataset[dataset > $Color=="WHITE", "Score"] ) > >Thank you again, David! I also just realized I could have replaced the comma with a tilde, as in t.test(Score~Color). What a difference a character makes! John -- View this message in context: http://www.nabble.com/Selecting-groups-with-R-tp25088073p25088645.html Sent from the R help mailing list archive at Nabble.com.
To drop empty factor levels from a subset, I use the following: a.subset <- subset(dataset, Color!='BLUE') ifac <- sapply(a.subset,is.factor) a.subset[ifac] <- lapply(a.subset[ifac],factor) Mike> datasetColor Score 1 RED 10 2 RED 13 3 RED 12 4 WHITE 22 5 WHITE 27 6 WHITE 25 7 BLUE 18 8 BLUE 17 9 BLUE 16> table(dataset)Score Color 10 12 13 16 17 18 22 25 27 BLUE 0 0 0 1 1 1 0 0 0 RED 1 1 1 0 0 0 0 0 0 WHITE 0 0 0 0 0 0 1 1 1> > a.subset <- subset(dataset, Color!='BLUE') > a.subsetColor Score 1 RED 10 2 RED 13 3 RED 12 4 WHITE 22 5 WHITE 27 6 WHITE 25> > table(a.subset)Score Color 10 12 13 22 25 27 BLUE 0 0 0 0 0 0 RED 1 1 1 0 0 0 WHITE 0 0 0 1 1 1> > ifac <- sapply(a.subset,is.factor) > a.subset[ifac] <- lapply(a.subset[ifac],factor) > > table(a.subset)Score Color 10 12 13 22 25 27 RED 1 1 1 0 0 0 WHITE 0 0 0 1 1 1
jlwoodard wrote:> > > Each of the above lines successfully excludes the BLUE subjects, but the > "BLUE" category is still present in my data set; that is, if I try > table(Color) I get > > RED WHITE BLUE > 82 151 0 > > How can I eliminate the BLUE category completely so I can do a t-test > using Color (with just the RED and WHITE subjects)? > >A simpler example. See "details" in the help file for factor() for an explanation.>#Factor with 3 levels > x <- rep(c("blue","red","white"),c(1,1,2)) > > x <- factor(x) > > table(x)x blue red white 1 1 2> >#Subset is still a factor with 3 levels > y <- x[x!="blue"] > > table(y)y blue red white 0 1 2> >#Drops unused levels; result a factor with 2 levels > table(factor(y))red white 1 2 -- View this message in context: http://www.nabble.com/Selecting-groups-with-R-tp25088073p25119474.html Sent from the R help mailing list archive at Nabble.com.