On May 18, 2005, at 5:39 PM, sms13+ at pitt.edu wrote:
> I was wondering if someone can help me figure out the following:
> I have two patient datasets, ds1 and ds2. ds1 has fields
"patid",
> "date", and "lab1". ds2 has "patid",
"date", and "lab2". I want to
> find all the patids that have at least 2 dated records for each lab.
> I started by splitting each dataset by patid, to create ds1.list and
> ds2.list. Then I did some processing (with sapply) to each list to
> get the lengths of each patient list item. Then I kind of lost my way
> and things got messy as I tried to extract just the patids of those
> with lengths >= 2, convert them to dataframes (which I didn't have
> much success with), and then merge the two dataframes to get a vector
> of the desired patids. Any help would be much appreciated.
>
> Thanks,
> Steven
Steven,
I might not exactly understand your problem, but for
what it's worth, you could try to identify the patients
in ds1 who appear at least twice and identify the patients
in ds2 who appear at least twice via
ptid1 <- c("A", "A", "B", "C",
"D", "D")
keep1 <- names(table(ptid1))[table(ptid1) >= 2]
keep1
or if ptid is numeric
ptid1 <- c(1, 1, 2, 3, 4, 4)
keep1 <- as.numeric(names(table(ptid1))[table(ptid1) >= 2])
keep1
then subset the respective data sets via
ds1.keep <- subset(ds1, ptid %in% intersect(keep1, keep2))
ds2.keep <- subset(ds2, ptid %in% intersect(keep1, keep2))
then use merge().
Good luck!
Stephen