Daniel Malter
2009-Nov-12 02:00 UTC
[R] redundant factor levels after subsetting a dataset
#I have a data frame with a numeric and a character variable. x=c(1,2,3,2,0,2,-1,-2,-4) md=c(rep("Miller",3), rep("Richard",3),rep("Smith",3)) data1=data.frame(x,md) #I subset this data.frame in a way such that one level of the character variable does not appear in the new dataset. data2=data1[x>0,] data3=subset(data1,x>0) #However, when I check the levels of the factor variable in the subset data frame, it still shows the levels that are now unused. unique(data2$md) unique(data3$md) #This leads to complications in table and tapply that I want to avoid. table(data2$md) tapply(data2$x,data2$md,mean) table(data3$md) tapply(data3$x,data3$md,mean) #Basically, I want to completely remove "Smith" from data frame data2 or data3 so that it would not show up in table or tapply operations. Thanks for any pointers, Daniel ----------------------------------------------- "Who has visions, should see a doctor," Helmut Schmidt, German Chancellor (1974-1982).
David Winsemius
2009-Nov-12 02:20 UTC
[R] redundant factor levels after subsetting a dataset
On Nov 11, 2009, at 9:00 PM, Daniel Malter wrote:> #I have a data frame with a numeric and a character variable. > > x=c(1,2,3,2,0,2,-1,-2,-4) > md=c(rep("Miller",3), rep("Richard",3),rep("Smith",3)) > data1=data.frame(x,md) > > #I subset this data.frame in a way such that one level of the > character > variable does not appear in the new dataset. > > data2=data1[x>0,] > data3=subset(data1,x>0)I thought this was asked and answered yesterday ((???)): > data2 <- as.data.frame(lapply(data2, function(x) x[,drop=TRUE])) > data2 x md 1 1 Miller 2 2 Miller 3 3 Miller 4 2 Richard 5 2 Richard > data3 <- as.data.frame(lapply(data3, function(x) x[,drop=TRUE])) > data3 x md 1 1 Miller 2 2 Miller 3 3 Miller 4 2 Richard 5 2 Richard > unique(data2$md) [1] Miller Richard Levels: Miller Richard > unique(data3$md) [1] Miller Richard Levels: Miller Richard -- David> > #However, when I check the levels of the factor variable in the > subset data > frame, it still shows the levels that are now unused. > > unique(data2$md) > unique(data3$md) > > #This leads to complications in table and tapply that I want to avoid. > > table(data2$md) > tapply(data2$x,data2$md,mean) > > table(data3$md) > tapply(data3$x,data3$md,mean) > > #Basically, I want to completely remove "Smith" from data frame > data2 or > data3 so that it would not show up in table or tapply operations. > > Thanks for any pointers, > Daniel > > > > > > > > ----------------------------------------------- > "Who has visions, should see a doctor," > Helmut Schmidt, German Chancellor (1974-1982). > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Daniel Malter
2009-Nov-12 05:50 UTC
[R] redundant factor levels after subsetting a dataset
Thanks, works a charme. I was not aware that it had been answered just yesterday. The solution previously suggested in this thread did not work for me. Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: David Winsemius [mailto:dwinsemius at comcast.net] Gesendet: Wednesday, November 11, 2009 9:21 PM An: Daniel Malter Cc: r-help at stat.math.ethz.ch Betreff: Re: [R] redundant factor levels after subsetting a dataset On Nov 11, 2009, at 9:00 PM, Daniel Malter wrote:> #I have a data frame with a numeric and a character variable. > > x=c(1,2,3,2,0,2,-1,-2,-4) > md=c(rep("Miller",3), rep("Richard",3),rep("Smith",3)) > data1=data.frame(x,md) > > #I subset this data.frame in a way such that one level of the > character variable does not appear in the new dataset. > > data2=data1[x>0,] > data3=subset(data1,x>0)I thought this was asked and answered yesterday ((???)): > data2 <- as.data.frame(lapply(data2, function(x) x[,drop=TRUE])) > data2 x md 1 1 Miller 2 2 Miller 3 3 Miller 4 2 Richard 5 2 Richard > data3 <- as.data.frame(lapply(data3, function(x) x[,drop=TRUE])) > data3 x md 1 1 Miller 2 2 Miller 3 3 Miller 4 2 Richard 5 2 Richard > unique(data2$md) [1] Miller Richard Levels: Miller Richard > unique(data3$md) [1] Miller Richard Levels: Miller Richard -- David> > #However, when I check the levels of the factor variable in the subset > data frame, it still shows the levels that are now unused. > > unique(data2$md) > unique(data3$md) > > #This leads to complications in table and tapply that I want to avoid. > > table(data2$md) > tapply(data2$x,data2$md,mean) > > table(data3$md) > tapply(data3$x,data3$md,mean) > > #Basically, I want to completely remove "Smith" from data frame > data2 or > data3 so that it would not show up in table or tapply operations. > > Thanks for any pointers, > Daniel > > > > > > > > ----------------------------------------------- > "Who has visions, should see a doctor," > Helmut Schmidt, German Chancellor (1974-1982). > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT