Dear all I have written a function to perform a very simple but useful task which I do regularly. It is designed to show how many values are missing from each variable in a data.frame. In its current form it works but is slow because I have used several loops to achieve this simple task. Can anyone see a more efficient way to get the same results? Or is there existing function which does this? Thanks for your help Tim Function: miss <- function (data) { miss.list <- list(NA) for (i in 1:length(data)) { miss.list[[i]] <- table(is.na(data[i])) } for (i in 1:length(miss.list)) { if (length(miss.list[[i]]) == 2) { miss.list[[i]] <- miss.list[[i]][2] } } for (i in 1:length(miss.list)) { if (names(miss.list[[i]]) == "FALSE") { miss.list[[i]] <- 0 } } data.frame(names(data), as.numeric(miss.list)) } Example: data(ToothGrowth) data.m <- ToothGrowth data.m$supp[sample(1:nrow(data.m), size=25)] <- NA miss(data.m) [[alternative HTML version deleted]]
Hi Hi try colSums(is.na(data.m)) It is not in data frame but you can easily transform it if you want. Regards Petr r-help-bounces at r-project.org napsal dne 19.04.2011 09:29:08:> Dear all > > > > I have written a function to perform a very simple but useful task whichI> do regularly. It is designed to show how many values are missing fromeach> variable in a data.frame. In its current form it works but is slowbecause I> have used several loops to achieve this simple task. > > > > Can anyone see a more efficient way to get the same results? Or is there > existing function which does this? > > > > Thanks for your help > > Tim > > > > Function: > > miss <- function (data) > > { > > miss.list <- list(NA) > > for (i in 1:length(data)) { > > miss.list[[i]] <- table(is.na(data[i])) > > } > > for (i in 1:length(miss.list)) { > > if (length(miss.list[[i]]) == 2) { > > miss.list[[i]] <- miss.list[[i]][2] > > } > > } > > for (i in 1:length(miss.list)) { > > if (names(miss.list[[i]]) == "FALSE") { > > miss.list[[i]] <- 0 > > } > > } > > data.frame(names(data), as.numeric(miss.list)) > > } > > > > Example: > > data(ToothGrowth) > > data.m <- ToothGrowth > > data.m$supp[sample(1:nrow(data.m), size=25)] <- NA > > miss(data.m) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton wrote:> Dear all > > > > I have written a function to perform a very simple but useful task which I > do regularly. It is designed to show how many values are missing from each > variable in a data.frame. In its current form it works but is slow because I > have used several loops to achieve this simple task.Why not use summary?> foo <- data.frame(a=c(1,3,4,NA), b=c(NA,4,NA,8), c=factor(c('A', NA, 'A', 'B'))) > summary(foo)a b c Min. :1.000 Min. :4 A :2 1st Qu.:2.000 1st Qu.:5 B :1 Median :3.000 Median :6 NA's:1 Mean :2.667 Mean :6 3rd Qu.:3.500 3rd Qu.:7 Max. :4.000 Max. :8 NA's :1.000 NA's :2 cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton wrote:> Dear all > > > > I have written a function to perform a very simple but useful task which I > do regularly. It is designed to show how many values are missing from each > variable in a data.frame. In its current form it works but is slow because I > have used several loops to achieve this simple task.Oh - and in case you ONLY wnt the number of NAs in each column this should be pretty efficient: lapply(foo, function(x){sum(is.na(x))}) cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/
I use the following code/function which gives me some quick descriptives about each variable (ie. n of missing values, % missing, case #'s missing, etc.): Fairly quick, maybe not pretty but effective on either single variables or entire data sets. NAhunter<-function(dataset) { find.NA<-function(variable) { if(is.numeric(variable)){ n<-length(variable) mean<-mean(variable, na.rm=T) median<-median(variable, na.rm=T) sd<-sd(variable, na.rm=T) NAs<-is.na(variable) total.NA<-sum(NAs) percent.missing<-total.NA/n descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing) rownames(descriptives)<-c(" ") Case.Number<-1:n Missing.Values<-ifelse(NAs>0,"Missing Value"," ") missing.value<-data.frame(Case.Number,Missing.Values) missing.values<-missing.value[ which(Missing.Values=='Missing Value'),] list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING VALUES"=missing.values[,1]) } else{ n<-length(variable) NAs<-is.na(variable) total.NA<-sum(NAs) percent.missing<-total.NA/n descriptives<-data.frame(n,total.NA,percent.missing) rownames(descriptives)<-c(" ") Case.Number<-1:n Missing.Values<-ifelse(NAs>0,"Missing Value"," ") missing.value<-data.frame(Case.Number,Missing.Values) missing.values<-missing.value[ which(Missing.Values=='Missing Value'),] list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING VALUES"=missing.values[,1]) } } dataset<-data.frame(dataset) options(scipen=100) options(digits=2) lapply(dataset,find.NA) }> From: tesutton@hku.hk > To: r-help@r-project.org > Date: Tue, 19 Apr 2011 15:29:08 +0800 > Subject: [R] Simple Missing cases Function > > Dear all > > > > I have written a function to perform a very simple but useful task which I > do regularly. It is designed to show how many values are missing from each > variable in a data.frame. In its current form it works but is slow because I > have used several loops to achieve this simple task. > > > > Can anyone see a more efficient way to get the same results? Or is there > existing function which does this? > > > > Thanks for your help > > Tim > > > > Function: > > miss <- function (data) > > { > > miss.list <- list(NA) > > for (i in 1:length(data)) { > > miss.list[[i]] <- table(is.na(data[i])) > > } > > for (i in 1:length(miss.list)) { > > if (length(miss.list[[i]]) == 2) { > > miss.list[[i]] <- miss.list[[i]][2] > > } > > } > > for (i in 1:length(miss.list)) { > > if (names(miss.list[[i]]) == "FALSE") { > > miss.list[[i]] <- 0 > > } > > } > > data.frame(names(data), as.numeric(miss.list)) > > } > > > > Example: > > data(ToothGrowth) > > data.m <- ToothGrowth > > data.m$supp[sample(1:nrow(data.m), size=25)] <- NA > > miss(data.m) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Dear Petr Thanks so much. That is a LOT more efficient. Tim -----Original Message----- From: Petr PIKAL [mailto:petr.pikal at precheza.cz] Sent: Tuesday, April 19, 2011 3:37 PM To: tesutton Cc: r-help at r-project.org Subject: Odp: [R] Simple Missing cases Function Hi Hi try colSums(is.na(data.m)) It is not in data frame but you can easily transform it if you want. Regards Petr r-help-bounces at r-project.org napsal dne 19.04.2011 09:29:08:> Dear all > > > > I have written a function to perform a very simple but useful task whichI> do regularly. It is designed to show how many values are missing fromeach> variable in a data.frame. In its current form it works but is slowbecause I> have used several loops to achieve this simple task. > > > > Can anyone see a more efficient way to get the same results? Or is there > existing function which does this? > > > > Thanks for your help > > Tim > > > > Function: > > miss <- function (data) > > { > > miss.list <- list(NA) > > for (i in 1:length(data)) { > > miss.list[[i]] <- table(is.na(data[i])) > > } > > for (i in 1:length(miss.list)) { > > if (length(miss.list[[i]]) == 2) { > > miss.list[[i]] <- miss.list[[i]][2] > > } > > } > > for (i in 1:length(miss.list)) { > > if (names(miss.list[[i]]) == "FALSE") { > > miss.list[[i]] <- 0 > > } > > } > > data.frame(names(data), as.numeric(miss.list)) > > } > > > > Example: > > data(ToothGrowth) > > data.m <- ToothGrowth > > data.m$supp[sample(1:nrow(data.m), size=25)] <- NA > > miss(data.m) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.