Hi Rui,
Thanks for your reply to my post. My code still has various shortcomings but at
least now it is fully functional.
It may be that, as I transition to using R, I'll have to live with some less
than ideal code, at least at the outset. I'll just have to write and
re-write my code as I improve.
Appreciate your help.
Paul
Message: 66
Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
From: Rui Barradas <ruipbarradas at sapo.pt>
To: r-help at r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
improvement
Message-ID: <1327427697928-4324533.post at n4.nabble.com>
Content-Type: text/plain; charset=us-ascii
Hello,
Point 3 is very simple, instead of 'print' use 'cat'.
Unlike 'print' it allows for several arguments and (very) simple
formating.
{ cat("Error: Invalid date values in", DateNames[[i]],
"\n",
TestDates[DateNames][[i]][TestDates$Invalid==1], "\n")
}
Rui Barradas
Message: 53
Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)
From: Paul Miller <pjmiller_57 at yahoo.com>
To: r-help at r-project.org
Subject: [R] Checking for invalid dates: Code works but needs
improvement
Message-ID:
<1327424089.1149.YahooMailClassic at web161604.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Hello Everyone,
Still new to R. Wrote some code that finds and prints invalid dates (see below).
This code works but I suspect it's not very good. If someone could show me a
better way, I'd greatly appreciate it.
Here is some information about what I'm trying to accomplish. My sense is
that the R date functions are best at identifying invalid dates when fed
character data in their default format. So my code converts the input dates to
character, breaks them apart using strsplit, and then reformats them. It then
identifies which dates are "missing" in the sense that the month or
year are unknown and prints out any remaining invalid date values.
As I see it, the code has at least 4 shortcomings.
1. It's too long. My understanding is that skilled programmers can usually
or often complete tasks like this in a few lines.
2. It's not vectorized. I started out trying to do something that was
vectorized but ran into problems with the strsplit function. I looked at the
help file and it appears this function will only accept a single character
vector.
3. It prints out the incorrect dates but doesn't indicate which date
variable they belong to. I tried various things with paste but never came up
with anything that worked. Ideally, I'd like to get something that looks
roughly like:
Error: Invalid date values in birthDT
"21931-11-23"
"1933-06-31"
Error: Invalid date values in diagnosisDT
"2010-02-30"
4. There's no way to specify names for input and output data. I imagine this
would be fairly easy to specify this in the arguments to a function but am not
sure how to incorporate it into a for loop.
Thanks,
Paul
##########################################
#### Code for detecting invalid dates ####
##########################################
#### Test Data ####
connection <- textConnection("
1 11/23/21931 05/23/2009 un/17/2011
2 06/20/1940 02/30/2010 03/17/2011
3 06/17/1935 12/20/2008 07/un/2011
4 05/31/1937 01/18/2007 04/30/2011
5 06/31/1933 05/16/2009 11/20/un
")
TestDates <- data.frame(scan(connection,
list(Patient=0, birthDT="", diagnosisDT="",
metastaticDT="")))
close(connection)
TestDates
class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)
#### List of Date Variables ####
DateNames <- c("birthDT", "diagnosisDT",
"metastaticDT")
#### Read Dates ####
for (i in seq(TestDates[DateNames])){
TestDates[DateNames][[i]] <- as.character(TestDates[DateNames][[i]])
TestDates$ParsedDT <- strsplit(TestDates[DateNames][[i]],"/")
TestDates$Month <- sapply(TestDates$ParsedDT,function(x)x[1])
TestDates$Day <- sapply(TestDates$ParsedDT,function(x)x[2])
TestDates$Year <- sapply(TestDates$ParsedDT,function(x)x[3])
TestDates$Day[TestDates$Day=="un"] <- "15"
TestDates[DateNames][[i]] <- with(TestDates, paste(Year, Month, Day, sep =
"-"))
is.na( TestDates[DateNames][[i]] [TestDates$Month=="un"] ) <- T
is.na( TestDates[DateNames][[i]] [TestDates$Year=="un"] ) <- T
TestDates$Date <- as.Date(TestDates[DateNames][[i]],
format="%Y-%m-%d")
TestDates$Invalid <- ifelse(is.na(TestDates$Date) &
!is.na(TestDates[DateNames][[i]]), 1, 0)
if( sum(TestDates$Invalid)==0 )
{ TestDates[DateNames][[i]] <- TestDates$Date } else
{ print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
TestDates <- subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date,
Invalid))
}
TestDates
class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)