thr3ads.net - R help - [R] Identifying and Removing NA Columns and factor Columns with more than x Levels [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Lopez, Dan

2012-Aug-30 15:38 UTC

[R] Identifying and Removing NA Columns and factor Columns with more than x Levels

Hi,

How do you subset a dataframe so that you only have columns:

1.       that contain one or more NAs?

2.       that contain factors with greater than or equal to 32 levels?

How do you remove from a dataframe columns**

3.       with one or more NA's?

4.       that contain factors with greater than or equal to 32 levels?

** I know how to remove columns at a basic level but I am trying to figure out a
more efficient way of performing these particular tasks (my data set has 60
columns).
For NA's I essentially used summary(mtcars) and manually made a note of
where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in%
c("hp","wt","vs"))]
I did something similar for factors with greater than x levels only I used
str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a
quick reference.


Dan


	[[alternative HTML version deleted]]

Bert Gunter

2012-Aug-30 15:54 UTC

head link

[R] Identifying and Removing NA Columns and factor Columns with more than x Levels

If d is your data frame

i1 <- sapply(d,function(x)is.factor(x)&&length(levels(x))>31)
## a vector of length ncol(d) that is TRUE only for factor columns
with >31 levels

i2 >- sapply(d,function(x)any(is.na(x)))
## You can figure it out.

-- Bert

On Thu, Aug 30, 2012 at 8:38 AM, Lopez, Dan <lopez235 at llnl.gov>
wrote:> Hi,
>
> How do you subset a dataframe so that you only have columns:
>
> 1.       that contain one or more NAs?
>
> 2.       that contain factors with greater than or equal to 32 levels?
>
> How do you remove from a dataframe columns**
>
> 3.       with one or more NA's?
>
> 4.       that contain factors with greater than or equal to 32 levels?
>
> ** I know how to remove columns at a basic level but I am trying to figure
out a more efficient way of performing these particular tasks (my data set has
60 columns).
> For NA's I essentially used summary(mtcars) and manually made a note of
where NA's appeared than used:
> mtcars1<-mtcars1[,!(names(mtcars1)%in%
c("hp","wt","vs"))]
> I did something similar for factors with greater than x levels only I used
str(mtcars) to help me identify them.
> BTW I know mtcars doesn't have any of these issues. I just used it as a
quick reference.
>
>
> Dan
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

arun

2012-Aug-30 18:25 UTC

head link

[R] Identifying and Removing NA Columns and factor Columns with more than x Levels

Hi,
For the first part in the two questions, do this:
dat1<-data.frame(Temp=c(5,10,9,15,NA,14,25,21,24,23,21,24,35,35,36,34,32,33),Temp2=c(5,10,9,15,15,14,25,21,24,23,21,24,35,35,36,34,32,33),Month=rep(c("January","February","March","April","May","June"),each=3),Roof=as.factor(rep(1:6,times=3)))

?dat1[,colMeans(is.na(dat1))!=0]
dat1[,colMeans(is.na(dat1))==0]
#or
?dat1[,complete.cases(t(dat1))]

#Second part of two questions: In your case, it is 32.
?dat1[unlist(lapply(dat1,function(x) length(levels(x))>=4))]
or,
dat1[sapply(dat1,function(x) length(levels(x))>=4)]

#and
?dat1[sapply(dat1,function(x) length(levels(x))<4)]

I guess you wanted this as separate solutions.? 
A.K.

----- Original Message -----
From: "Lopez, Dan" <lopez235 at llnl.gov>
To: "R help (r-help at r-project.org)" <r-help at r-project.org>
Cc: 
Sent: Thursday, August 30, 2012 11:38 AM
Subject: [R] Identifying and Removing NA Columns and factor Columns with more
than x Levels

Hi,

How do you subset a dataframe so that you only have columns:

1.? ? ?  that contain one or more NAs?

2.? ? ?  that contain factors with greater than or equal to 32 levels?

How do you remove from a dataframe columns**

3.? ? ?  with one or more NA's?

4.? ? ?  that contain factors with greater than or equal to 32 levels?

** I know how to remove columns at a basic level but I am trying to figure out a
more efficient way of performing these particular tasks (my data set has 60
columns).
For NA's I essentially used summary(mtcars) and manually made a note of
where NA's appeared than used:
mtcars1<-mtcars1[,!(names(mtcars1)%in%
c("hp","wt","vs"))]
I did something similar for factors with greater than x levels only I used
str(mtcars) to help me identify them.
BTW I know mtcars doesn't have any of these issues. I just used it as a
quick reference.


Dan


??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more reasonably related threads

R help - Aug 2012 - Identifying and Removing NA Columns and factor Columns with more than x Levels

[R] Identifying and Removing NA Columns and factor Columns with more than x Levels

[R] Identifying and Removing NA Columns and factor Columns with more than x Levels

[R] Identifying and Removing NA Columns and factor Columns with more than x Levels

Possibly Parallel Threads