Lopez, Dan
2012-Aug-30 15:38 UTC
[R] Identifying and Removing NA Columns and factor Columns with more than x Levels
Hi, How do you subset a dataframe so that you only have columns: 1. that contain one or more NAs? 2. that contain factors with greater than or equal to 32 levels? How do you remove from a dataframe columns** 3. with one or more NA's? 4. that contain factors with greater than or equal to 32 levels? ** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns). For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used: mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))] I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them. BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference. Dan [[alternative HTML version deleted]]
Bert Gunter
2012-Aug-30 15:54 UTC
[R] Identifying and Removing NA Columns and factor Columns with more than x Levels
If d is your data frame i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31) ## a vector of length ncol(d) that is TRUE only for factor columns with >31 levels i2 >- sapply(d,function(x)any(is.na(x))) ## You can figure it out. -- Bert On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov> wrote:> Hi, > > How do you subset a dataframe so that you only have columns: > > 1. that contain one or more NAs? > > 2. that contain factors with greater than or equal to 32 levels? > > How do you remove from a dataframe columns** > > 3. with one or more NA's? > > 4. that contain factors with greater than or equal to 32 levels? > > ** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns). > For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used: > mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))] > I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them. > BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference. > > > Dan > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
arun
2012-Aug-30 18:25 UTC
[R] Identifying and Removing NA Columns and factor Columns with more than x Levels
Hi, For the first part in the two questions, do this: dat1<-data.frame(Temp=c(5,10,9,15,NA,14,25,21,24,23,21,24,35,35,36,34,32,33),Temp2=c(5,10,9,15,15,14,25,21,24,23,21,24,35,35,36,34,32,33),Month=rep(c("January","February","March","April","May","June"),each=3),Roof=as.factor(rep(1:6,times=3))) ?dat1[,colMeans(is.na(dat1))!=0] dat1[,colMeans(is.na(dat1))==0] #or ?dat1[,complete.cases(t(dat1))] #Second part of two questions: In your case, it is 32. ?dat1[unlist(lapply(dat1,function(x) length(levels(x))>=4))] or, dat1[sapply(dat1,function(x) length(levels(x))>=4)] #and ?dat1[sapply(dat1,function(x) length(levels(x))<4)] I guess you wanted this as separate solutions.? A.K. ----- Original Message ----- From: "Lopez, Dan" <lopez235 at llnl.gov> To: "R help (r-help at r-project.org)" <r-help at r-project.org> Cc: Sent: Thursday, August 30, 2012 11:38 AM Subject: [R] Identifying and Removing NA Columns and factor Columns with more than x Levels Hi, How do you subset a dataframe so that you only have columns: 1.? ? ? that contain one or more NAs? 2.? ? ? that contain factors with greater than or equal to 32 levels? How do you remove from a dataframe columns** 3.? ? ? with one or more NA's? 4.? ? ? that contain factors with greater than or equal to 32 levels? ** I know how to remove columns at a basic level but I am trying to figure out a more efficient way of performing these particular tasks (my data set has 60 columns). For NA's I essentially used summary(mtcars) and manually made a note of where NA's appeared than used: mtcars1<-mtcars1[,!(names(mtcars1)%in% c("hp","wt","vs"))] I did something similar for factors with greater than x levels only I used str(mtcars) to help me identify them. BTW I know mtcars doesn't have any of these issues. I just used it as a quick reference. Dan ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.