thr3ads.net - R help - [R] Simple Missing cases Function [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Tim Elwell-Sutton

2011-Apr-19 07:29 UTC

[R] Simple Missing cases Function

Dear all

 

I have written a function to perform a very simple but useful task which I
do regularly. It is designed to show how many values are missing from each
variable in a data.frame. In its current form it works but is slow because I
have used several loops to achieve this simple task. 

 

Can anyone see a more efficient way to get the same results? Or is there
existing function which does this?

 

Thanks for your help

Tim

 

Function:

miss <- function (data) 

{

    miss.list <- list(NA)

    for (i in 1:length(data)) {

        miss.list[[i]] <- table(is.na(data[i]))

    }

    for (i in 1:length(miss.list)) {

        if (length(miss.list[[i]]) == 2) {

            miss.list[[i]] <- miss.list[[i]][2]

        }

    }

    for (i in 1:length(miss.list)) {

        if (names(miss.list[[i]]) == "FALSE") {

            miss.list[[i]] <- 0

        }

    }

    data.frame(names(data), as.numeric(miss.list))

}

 

Example:

data(ToothGrowth)

     data.m <- ToothGrowth

     data.m$supp[sample(1:nrow(data.m), size=25)] <- NA

     miss(data.m)


	[[alternative HTML version deleted]]

Petr PIKAL

2011-Apr-19 07:37 UTC

head link

[R] Odp: Simple Missing cases Function

Hi

Hi
try

colSums(is.na(data.m))

It is not in data frame but you can easily transform it if you want.

Regards
Petr


r-help-bounces at r-project.org napsal dne 19.04.2011 09:29:08:
> Dear all
> 
> 
> 
> I have written a function to perform a very simple but useful task which 
I> do regularly. It is designed to show how many values are missing from 
each> variable in a data.frame. In its current form it works but is slow 
because I> have used several loops to achieve this simple task. 
> 
> 
> 
> Can anyone see a more efficient way to get the same results? Or is there
> existing function which does this?
> 
> 
> 
> Thanks for your help
> 
> Tim
> 
> 
> 
> Function:
> 
> miss <- function (data) 
> 
> {
> 
>     miss.list <- list(NA)
> 
>     for (i in 1:length(data)) {
> 
>         miss.list[[i]] <- table(is.na(data[i]))
> 
>     }
> 
>     for (i in 1:length(miss.list)) {
> 
>         if (length(miss.list[[i]]) == 2) {
> 
>             miss.list[[i]] <- miss.list[[i]][2]
> 
>         }
> 
>     }
> 
>     for (i in 1:length(miss.list)) {
> 
>         if (names(miss.list[[i]]) == "FALSE") {
> 
>             miss.list[[i]] <- 0
> 
>         }
> 
>     }
> 
>     data.frame(names(data), as.numeric(miss.list))
> 
> }
> 
> 
> 
> Example:
> 
> data(ToothGrowth)
> 
>      data.m <- ToothGrowth
> 
>      data.m$supp[sample(1:nrow(data.m), size=25)] <- NA
> 
>      miss(data.m)
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Philipp Pagel

2011-Apr-19 07:42 UTC

head link

[R] Simple Missing cases Function

On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton
wrote:> Dear all
> 
>  
> 
> I have written a function to perform a very simple but useful task which I
> do regularly. It is designed to show how many values are missing from each
> variable in a data.frame. In its current form it works but is slow because
I
> have used several loops to achieve this simple task. 
Why not use summary?
> foo <- data.frame(a=c(1,3,4,NA), b=c(NA,4,NA,8), c=factor(c('A',
NA, 'A', 'B')))
> summary(foo)       a               b        c    
 Min.   :1.000   Min.   :4   A   :2  
 1st Qu.:2.000   1st Qu.:5   B   :1  
 Median :3.000   Median :6   NA's:1  
 Mean   :2.667   Mean   :6           
 3rd Qu.:3.500   3rd Qu.:7           
 Max.   :4.000   Max.   :8           
 NA's   :1.000   NA's   :2      

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

Philipp Pagel

2011-Apr-19 07:44 UTC

head link

[R] Simple Missing cases Function

On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton
wrote:> Dear all
> 
>  
> 
> I have written a function to perform a very simple but useful task which I
> do regularly. It is designed to show how many values are missing from each
> variable in a data.frame. In its current form it works but is slow because
I
> have used several loops to achieve this simple task. 
Oh - and in case you ONLY wnt the number of NAs in each column this
should be pretty efficient:

lapply(foo, function(x){sum(is.na(x))})

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

Tyler Rinker

2011-Apr-19 07:52 UTC

head link

[R] Simple Missing cases Function

I use the following code/function which gives me some quick descriptives about
each variable (ie. n of missing values, % missing, case #'s missing, etc.):
Fairly quick, maybe not pretty but effective on either single variables or
entire data sets.
 
NAhunter<-function(dataset)
{
find.NA<-function(variable)
{
if(is.numeric(variable)){
n<-length(variable)
mean<-mean(variable, na.rm=T)
median<-median(variable, na.rm=T)
sd<-sd(variable, na.rm=T)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing
Value'),]
list("NUMERIC
DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING
VALUES"=missing.values[,1])
}
else{
n<-length(variable)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing
Value'),]
list("CATEGORICAL
DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING
VALUES"=missing.values[,1])
}
}
dataset<-data.frame(dataset)
options(scipen=100)
options(digits=2)
lapply(dataset,find.NA)
}

 > From: tesutton@hku.hk
> To: r-help@r-project.org
> Date: Tue, 19 Apr 2011 15:29:08 +0800
> Subject: [R] Simple Missing cases Function
> 
> Dear all
> 
> 
> 
> I have written a function to perform a very simple but useful task which I
> do regularly. It is designed to show how many values are missing from each
> variable in a data.frame. In its current form it works but is slow because
I
> have used several loops to achieve this simple task. 
> 
> 
> 
> Can anyone see a more efficient way to get the same results? Or is there
> existing function which does this?
> 
> 
> 
> Thanks for your help
> 
> Tim
> 
> 
> 
> Function:
> 
> miss <- function (data) 
> 
> {
> 
> miss.list <- list(NA)
> 
> for (i in 1:length(data)) {
> 
> miss.list[[i]] <- table(is.na(data[i]))
> 
> }
> 
> for (i in 1:length(miss.list)) {
> 
> if (length(miss.list[[i]]) == 2) {
> 
> miss.list[[i]] <- miss.list[[i]][2]
> 
> }
> 
> }
> 
> for (i in 1:length(miss.list)) {
> 
> if (names(miss.list[[i]]) == "FALSE") {
> 
> miss.list[[i]] <- 0
> 
> }
> 
> }
> 
> data.frame(names(data), as.numeric(miss.list))
> 
> }
> 
> 
> 
> Example:
> 
> data(ToothGrowth)
> 
> data.m <- ToothGrowth
> 
> data.m$supp[sample(1:nrow(data.m), size=25)] <- NA
> 
> miss(data.m)
> 
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 		 	   		  
	[[alternative HTML version deleted]]

Tim Elwell-Sutton

2011-Apr-19 08:18 UTC

head link

[R] Simple Missing cases Function

Dear Petr
Thanks so much. That is a LOT more efficient.
Tim

-----Original Message-----
From: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
Sent: Tuesday, April 19, 2011 3:37 PM
To: tesutton
Cc: r-help at r-project.org
Subject: Odp: [R] Simple Missing cases Function

Hi

Hi
try

colSums(is.na(data.m))

It is not in data frame but you can easily transform it if you want.

Regards
Petr


r-help-bounces at r-project.org napsal dne 19.04.2011 09:29:08:
> Dear all
> 
> 
> 
> I have written a function to perform a very simple but useful task which 
I> do regularly. It is designed to show how many values are missing from 
each> variable in a data.frame. In its current form it works but is slow 
because I> have used several loops to achieve this simple task. 
> 
> 
> 
> Can anyone see a more efficient way to get the same results? Or is there
> existing function which does this?
> 
> 
> 
> Thanks for your help
> 
> Tim
> 
> 
> 
> Function:
> 
> miss <- function (data) 
> 
> {
> 
>     miss.list <- list(NA)
> 
>     for (i in 1:length(data)) {
> 
>         miss.list[[i]] <- table(is.na(data[i]))
> 
>     }
> 
>     for (i in 1:length(miss.list)) {
> 
>         if (length(miss.list[[i]]) == 2) {
> 
>             miss.list[[i]] <- miss.list[[i]][2]
> 
>         }
> 
>     }
> 
>     for (i in 1:length(miss.list)) {
> 
>         if (names(miss.list[[i]]) == "FALSE") {
> 
>             miss.list[[i]] <- 0
> 
>         }
> 
>     }
> 
>     data.frame(names(data), as.numeric(miss.list))
> 
> }
> 
> 
> 
> Example:
> 
> data(ToothGrowth)
> 
>      data.m <- ToothGrowth
> 
>      data.m$supp[sample(1:nrow(data.m), size=25)] <- NA
> 
>      miss(data.m)
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more maybe matching threads

R help - Apr 2011 - Simple Missing cases Function

[R] Simple Missing cases Function

[R] Odp: Simple Missing cases Function

[R] Simple Missing cases Function

[R] Simple Missing cases Function

[R] Simple Missing cases Function

[R] Simple Missing cases Function

Seemingly Similar Threads