Christopher W. Ryan
2010-Mar-10 15:30 UTC
[R] trouble calculating rates--sometimes the denominator is missing
Every day I get a csv file containing the names of the 64 schools in our county, the number of students sent home ill, and the number of students absent (plus lots of other variables). The file is cumulative since fall of 2009. It is in "long" format: one line per school per day. Each line is also supposed to contain the total number of students enrolled in the school. That number doesn't change often or much, so the same value is usually repeated on each line for each school. Thus calculating proportion of students absent or sent home ill is easy (see lines between the #####); here is the beginning of my code (my apologies for the word-wrapping, I use some long variable names): setwd("C:/data/bchd/schoolsurveillance") library(ggplot2) library(doBy) library(reshape) data <- read.csv("C:/DATA/BCHD/schoolsurveillance/Broome_02MAR10.csv", header=TRUE, sep=",", fill=TRUE) data$date <- as.character(data$ReportingDate) data$date <- as.Date(data$ReportingDate, format="%d%b%y") #### data$PercentStudentsAbsent <- data$StudentsAbsentTotal/data$TotalStudentsEnrolled data$PercentSentHome <- data$SentHomeTotal/data$TotalStudentsEnrolled #### attach(data) The problem is that sometimes, in some of the daily files, the TotalStudentsEnrolled field is left entirely blank--in every record. Unfortunately the data collection system is out of my hands, and still a little rough around the edges. The powers-that-be can put those numbers back in on the subsequent day, then my code runs fine. But if possible, I want to make my code less susceptible to this external "threat." What would be a good way to "store up" the names of the 64 schools and their total enrollments (which are basically static), and them use those values for the denominators for the rates as calculated above (####), rather than relying on always having a complete, rectangular, data file, every line containing the necessary value for a denominator? Thanks. -- Christopher W. Ryan, MD SUNY Upstate Medical University Clinical Campus at Binghamton 425 Robinson Street, Binghamton, NY 13904 cryanatbinghamtondotedu "If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea." [Antoine de St. Exupery]
Christopher W. Ryan
2010-Mar-10 15:44 UTC
[R] trouble calculating rates--sometimes the denominator is missing
One more bit: I got as far as this, thinking it might help: Using a data file that I know has all the necessary denominators, I created a dataframe of school names (as factor) and TotalStudentsEnrolled. data.frame(data$School[!duplicated(data$School)], data$TotalStudentsEnrolled[!duplicated(data$School)]) 1 BENJAMIN FRANKLIN ES 465 2 CALVIN COOLIDGE SCHL 379 3 EAST MS 590 4 HORACE MANN SCHL 374 5 MAC ARTHUR SCHL 481 6 THEO ROOSEVELT SCHL 377 [truncated] I thought I might be able to "look up" the necessary value for each school from this dataframe. But I can't get my head around using indices to do it. Thanks again. --Chris -- Christopher W. Ryan, MD SUNY Upstate Medical University Clinical Campus at Binghamton 425 Robinson Street, Binghamton, NY 13904 cryanatbinghamtondotedu "If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea." [Antoine de St. Exupery]