Jorgen Harmse
2025-Sep-19 14:55 UTC
[R] Matching when each subject has multiple records, but each subject should be used only once in the match
Your propensity model (mentioned in your follow-up message) is presumably at the user level, so you need a function that accepts all the records for a user and produces a single record with features for the propensity model. (Maybe the function averages values over individual records or computes slopes or uses a sequence model to produce an embedding.) Then something like plyr::ddply can produce a new data frame with one row per user. Use that for your matching or weighting or other propensity-score method, and if necessary use something like base::merge to broadcast the result back to the original data frame. Other answers refer to new entrants to the study & similar complications. Presumably the first step in your feature-extraction function for propensity scores is to discard all information from after the treatment selection. You want confounders that might have influenced the treatment, not pseudo-confounders that may have been influenced by the treatment. Regards, Jorgen Harmse. Message: 1 Date: Thu, 18 Sep 2025 11:08:23 +0000 From: "Sorkin, John" <jsorkin at som.umaryland.edu> To: Leo Mada via R-help <r-help at r-project.org> Subject: [R] Matching when each subject has multiple records, but each subject should be used only once in the match Message-ID: <DM6PR03MB50492D24D394635A0BE114F2E216A at DM6PR03MB5049.namprd03.prod.outlook.com> Content-Type: text/plain; charset="iso-8859-1" I have a file that contains longitudinal data for each subject. As a result, each subject can have multiple records. For example a given subject might have a record in Jan 2020, another in June 2020, another in Feb 2021, another in May 2021, another in Sept 2022, etc. At each time for which a subject has a record the subject is identified as a case or a control. Over the course of the longitudinal data, I want to match a given case to a given control. Once a subject is matched, I don't want the subject to be eligible for being matched again. If each subject had a single record, matching could easily be accomplished. How can I accomplish the match in my file having repeated measures for each subject? John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 [[alternative HTML version deleted]]