Hi all, I have got a seemingly simple problem (I am an R starter) with subsetting my data set, but cannot figure out the solution: I want to subset a data set from six to two levels, so that all analyses are done only with these two remaining levels. I tried TOTAL<-read.delim('total.csv',header=T) SUBSET.OF.TOTAL<-subset(TOTAL, FactorX %in% c("Level1","Level2")) attach(SUBSET.OF.TOTAL) but R does not eliminate the remaining levels of FactorX, just assigns 'not available' to the data. Like this, the other levels still show up in plots etc., but without data entries. Anybody got a solution how to subset the data so that I eliminate the other levels completely? Thanks a lot for the help, -- View this message in context: http://www.nabble.com/subset-problem-%28reducing-from-six-to-two-levels%29-tp21861044p21861044.html Sent from the R help mailing list archive at Nabble.com.
On Thu, Feb 05, 2009 at 01:01:59PM -0800, Ine wrote:> > Hi all, > I have got a seemingly simple problem (I am an R starter) with subsetting my > data set, but cannot figure out the solution: I want to subset a data set > from six to two levels, so that all analyses are done only with these two > remaining levels. > I tried > > TOTAL<-read.delim('total.csv',header=T) > SUBSET.OF.TOTAL<-subset(TOTAL, FactorX %in% c("Level1","Level2")) > attach(SUBSET.OF.TOTAL) > > but R does not eliminate the remaining levels of FactorX,One solution is to have the factors re-built after subsetting: foo = factor(c('a','a','b','c','c')) # unused levels persistent:> foo[foo=='a'][1] a a Levels: a b c # but:> factor(foo[foo=='a'])[1] a a Levels: a cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel
Stephan Kolassa
2009-Feb-05 21:53 UTC
[R] subset problem (reducing from six to two levels)
Hi, does this help? http://www.nabble.com/factor-question-to18638814.html#a18638814 HTH, Stephan Ine schrieb:> Hi all, > I have got a seemingly simple problem (I am an R starter) with subsetting my > data set, but cannot figure out the solution: I want to subset a data set > from six to two levels, so that all analyses are done only with these two > remaining levels. > I tried > > TOTAL<-read.delim('total.csv',header=T) > SUBSET.OF.TOTAL<-subset(TOTAL, FactorX %in% c("Level1","Level2")) > attach(SUBSET.OF.TOTAL) > > but R does not eliminate the remaining levels of FactorX, just assigns 'not > available' to the data. Like this, the other levels still show up in plots > etc., but without data entries. Anybody got a solution how to subset the > data so that I eliminate the other levels completely? > > Thanks a lot for the help, >
Ine wrote:> Hi all, > I have got a seemingly simple problem (I am an R starter) with subsetting my > data set, but cannot figure out the solution: I want to subset a data set > from six to two levels, so that all analyses are done only with these two > remaining levels. > I tried > > TOTAL<-read.delim('total.csv',header=T) > SUBSET.OF.TOTAL<-subset(TOTAL, FactorX %in% c("Level1","Level2")) > attach(SUBSET.OF.TOTAL) > > but R does not eliminate the remaining levels of FactorX, just assigns 'not > available' to the data. Like this, the other levels still show up in plots > etc., but without data entries. Anybody got a solution how to subset the > data so that I eliminate the other levels completely? > > Thanks a lot for the help, >R does not "assign 'not available'" (look at the subsetted data). However, factors do not lose levels just because they are not present in a subset of data. There are good reasons for that, but let's not go there this time (look in the list archives if you care). To get rid of unwanted levels, use FactorXX <- factor(FactorX, levels=c("Level1","Level2")) or just factor(FactorX) if you know that both levels are present (or don't care). -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907