Jorgen Harmse
2025-Sep-19 14:55 UTC
[R] Matching when each subject has multiple records, but each subject should be used only once in the match
Your propensity model (mentioned in your follow-up message) is presumably at the
user level, so you need a function that accepts all the records for a user and
produces a single record with features for the propensity model. (Maybe the
function averages values over individual records or computes slopes or uses a
sequence model to produce an embedding.) Then something like plyr::ddply can
produce a new data frame with one row per user. Use that for your matching or
weighting or other propensity-score method, and if necessary use something like
base::merge to broadcast the result back to the original data frame.
Other answers refer to new entrants to the study & similar complications.
Presumably the first step in your feature-extraction function for propensity
scores is to discard all information from after the treatment selection. You
want confounders that might have influenced the treatment, not
pseudo-confounders that may have been influenced by the treatment.
Regards,
Jorgen Harmse.
Message: 1
Date: Thu, 18 Sep 2025 11:08:23 +0000
From: "Sorkin, John" <jsorkin at som.umaryland.edu>
To: Leo Mada via R-help <r-help at r-project.org>
Subject: [R] Matching when each subject has multiple records, but each
subject should be used only once in the match
Message-ID:
<DM6PR03MB50492D24D394635A0BE114F2E216A at
DM6PR03MB5049.namprd03.prod.outlook.com>
Content-Type: text/plain; charset="iso-8859-1"
I have a file that contains longitudinal data for each subject. As a result,
each subject can have multiple records. For example a given subject might have a
record in Jan 2020, another in June 2020, another in Feb 2021, another in May
2021, another in Sept 2022, etc. At each time for which a subject has a record
the subject is identified as a case or a control.
Over the course of the longitudinal data, I want to match a given case to a
given control. Once a subject is matched, I don't want the subject to be
eligible for being matched again.
If each subject had a single record, matching could easily be accomplished. How
can I accomplish the match in my file having repeated measures for each subject?
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of Medicine
Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
[[alternative HTML version deleted]]