Shreyasee
2009-Mar-24 06:58 UTC
[R] Calculating percentage Missing value for variables using one object
Hi, I have a dataset in which there are in all 250 variables and for each variable the data is entered over the months. I need to calculate the percentage of missing values for each variable over each month and then plot a graph for that. I am running the following code for doing the same *ds <- read.csv(file="filepath", header=TRUE) attach(ds) may <- length(variable1[variable1==""]) / length(dos[dos=="May-06"]) * 100 jun <- length(variable1[variable1==""]) / length(dos[dos=="June-06"]) * 100 . . . var1 <- c(may, jun, ...........) x <- seq(as.Date("2006-01-01"), as.Date("2007-03-31"), by="months") plot(var1~x)* So likewise I am calculating the percentage of missing values for each variable for each month using different variables and storing the values in those variables and then combining those variables in one object for plotting the graph. I need to know, whether can I combine all the variables from that dataset in one object and calculate the missing values percentage over months together, instead of creating different variables for each month and then combining them. Also, after doing that, I need to plot the graph for each variable and combine it in a single pdf file. I highly appreciate all your help. Thanks, Shreyasee [[alternative HTML version deleted]]
David Winsemius
2009-Mar-24 12:53 UTC
[R] Calculating percentage Missing value for variables using one object
It looks to me that you should be using the table or the xtabs function. You have apparently already decided not to use NA for missing values, so the instances in which variable1 == "" you should get counts with those functions: dft <- data.frame(var1 = sample(c("", "this", "that", "and"), 120, replace=TRUE), dt = sample( seq(as.Date("2006-01-01"), as.Date("2007-12-31"), by="months"), 120, replace=TRUE)) mo.tbl <- xtabs( ~var1+ dt, data=dft) # the =="" entry is the first row > mo.tbl[1,] 2006-01-01 2006-02-01 2006-03-01 2006-04-01 2006-05-01 2006-06-01 2006-07-01 2 1 1 2 2 3 1 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01 2007-01-01 2007-02-01 0 1 1 1 2 1 2 2007-03-01 2007-04-01 2007-05-01 2007-06-01 2007-07-01 2007-08-01 2007-09-01 2 2 2 0 1 3 4 2007-10-01 2007-11-01 2007-12-01 1 3 2 x <- seq(as.Date("2006-01-01"), as.Date("2007-03-31"), by="months") plot(mo.tbl[1,]~x) -- David Winsemius On Mar 24, 2009, at 2:58 AM, Shreyasee wrote:> Hi, > > I have a dataset in which there are in all 250 variables and for each > variable the data is entered over the months. > I need to calculate the percentage of missing values for each > variable over > each month and then plot a graph for that. > I am running the following code for doing the same > > *ds <- read.csv(file="filepath", header=TRUE) > attach(ds) > may <- length(variable1[variable1==""]) / length(dos[dos=="May-06"]) > * 100 > jun <- length(variable1[variable1==""]) / > length(dos[dos=="June-06"]) * 100 > . > . > . > var1 <- c(may, jun, ...........) > x <- seq(as.Date("2006-01-01"), as.Date("2007-03-31"), by="months") > plot(var1~x)* > > So likewise I am calculating the percentage of missing values for each > variable for each month using different variables and storing the > values in > those variables and then combining those variables in one object for > plotting the graph. > I need to know, whether can I combine all the variables from that > dataset in > one object and calculate the missing values percentage over months > together, > instead of creating different variables for each month and then > combining > them. > Also, after doing that, I need to plot the graph for each variable and > combine it in a single pdf file. > > I highly appreciate all your help. > > Thanks, > Shreyasee > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Gabor Grothendieck
2009-Mar-24 13:21 UTC
[R] Calculating percentage Missing value for variables using one object
Read in the data, aggregate it by month and then turn it into a monthly zoo object and plot using a custom X axis: Lines <- 'dos,variable1,variable2 May-06,1,"" May-06,2,"" June-06,"",2 June-06,1,4 July-06,1,4 July-06,1,4 August-06,1,4 August-06,1,4' DF <- read.table(textConnection(Lines), header = TRUE, sep = ",") library(zoo) DF.na <- aggregate(DF[-1], DF["dos"], function(x) mean(is.na(x))) z <- zoo(as.matrix(DF.na[-1]), as.yearmon(DF.na$dos, "%B-%y")) i <- 1 plot(z[,i], xaxt = "n", ylab = "Fraction Missing", main = names(DF)[i+1]) axis(1, time(z), format(time(z), "%m/%y"), cex.axis = .7) On Tue, Mar 24, 2009 at 2:58 AM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:> Hi, > > I have a dataset in which there are in all 250 variables and for each > variable the data is entered over the months. > I need to calculate the percentage of missing values for each variable over > each month and then plot a graph for that. > I am running the following code for doing the same > > *ds <- read.csv(file="filepath", header=TRUE) > attach(ds) > may <- length(variable1[variable1==""]) / length(dos[dos=="May-06"]) * 100 > jun <- length(variable1[variable1==""]) / length(dos[dos=="June-06"]) * 100 > . > . > . > var1 <- c(may, jun, ...........) > x <- seq(as.Date("2006-01-01"), as.Date("2007-03-31"), by="months") > plot(var1~x)* > > So likewise I am calculating the percentage of missing values for each > variable for each month using different variables and storing the values in > those variables and then combining those variables in one object for > plotting the graph. > I need to know, whether can I combine all the variables from that dataset in > one object and calculate the missing values percentage over months together, > instead of creating different variables for each month and then combining > them. > Also, after doing that, I need to plot the graph for each variable and > combine it in a single pdf file. > > I highly appreciate all your help. > > Thanks, > Shreyasee > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >