Pam Dopart
2015-Feb-17 21:02 UTC
[R] multiple imputation of longitudinal, time-unstructured data
Hello! I have a longitudinal dataset of radiation exposures of an occupational cohort. A percentage of the exposure values are missing and I would like to multiply impute the missing values (it is one option of several we are comparing). The data are recorded in long format (one row for each exposure entry) and there are multiple exposure measurements per worker. However, the data are time-unstructured (different data collection schedules for each worker) and unbalanced. I want to account for the correlation between repeated measurements on the same worker. However, because of the time-unstructured nature of the dataset, I am unable to convert my dataset into wide format and impute that way. I have begun reading about about using multilevel imputation for such a scenario, but I rather unfamiliar with this approach, including within R. Is this an appropriate method to investigate? Any advice on how to get started would be greatly appreciated! Thank you! Pam [[alternative HTML version deleted]]
John Sorkin
2015-Feb-18 00:18 UTC
[R] multiple imputation of longitudinal, time-unstructured data
Pam, Please let me know what you discover. I just started looking at a similar problem. I understand that a Kalman filter can sometimes be applied to this problem, but at this time I don't know how to accomplish this. John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Pam Dopart <dopartpj at gmail.com> 2/17/2015 4:02 PM >>>Hello! I have a longitudinal dataset of radiation exposures of an occupational cohort. A percentage of the exposure values are missing and I would like to multiply impute the missing values (it is one option of several we are comparing). The data are recorded in long format (one row for each exposure entry) and there are multiple exposure measurements per worker. However, the data are time-unstructured (different data collection schedules for each worker) and unbalanced. I want to account for the correlation between repeated measurements on the same worker. However, because of the time-unstructured nature of the dataset, I am unable to convert my dataset into wide format and impute that way. I have begun reading about about using multilevel imputation for such a scenario, but I rather unfamiliar with this approach, including within R. Is this an appropriate method to investigate? Any advice on how to get started would be greatly appreciated! Thank you! Pam [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.