Muhuri, Pradip (SAMHSA/CBHSQ)
2015-Jan-03 13:20 UTC
[R] R function to convert person-level observations to person-period observations
Hello, I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm). Please see my reproducible example below. The function (PLPP) in the R script takes five arguments. 1) data (i.e., the data set to be converted) 2) id (i.e., the identifier for each observation) 3) period (i.e., number pf periods the person or observation was followed-up) 4) event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting). 5) direction which "indicates whether the function should go from person-level to person-period or from person-period to person-level". On my example data set, the R script ran successfully. Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results. But the issue is that the value of the "dead" indicator variable is incorrect. I have a gut feeling that the function needs to tweaked a bit to get desired results. Correct results ID dead studyyrs 1 A 1 2 2 B 0 5 3 C 1 3 Incorrect results - the "dead" column ID dead studyyrs 1 A 0 1 2 A 0 2 3 B 0 1 4 B 0 2 5 B 0 3 6 B 0 4 7 B 1 5 8 C 0 1 9 C 0 2 10 C 0 3 Desired results ID dead studyyrs 1 A 0 1 2 A 1 2 3 B 0 1 4 B 0 2 5 B 0 3 6 B 0 4 7 B 0 5 8 C 0 1 9 C 0 2 10 C 1 3 I would appreciate receiving your help or hints for resolving the issue. Thanks, ## Below is my reproducible code is shown below) ## Below is my data frame (3 observations) df <- data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) df ## Person-Level Person-Period Converter Function - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm PLPP <- function(data, id, period, event, direction = c("period", "level")) { ## Data Checking and Verification Steps stopifnot(is.matrix(data) || is.data.frame(data)) stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data))) if (any(is.na(data[, c(id, period, event)]))) { stop("PLPP cannot currently handle missing data in the id, period, or event variables") } ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm switch(match.arg(direction), period = { index <- rep(1:nrow(data), data[, period]) idmax <- cumsum(data[, period]) reve <- !data[, event] dat <- data[index, ] dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along) dat[, event] <- 0 dat[idmax, event] <- reve}, level = { tmp <- cbind(data[, c(period, id)], i = 1:nrow(data)) index <- as.vector(by(tmp, tmp[, id], FUN = function(x) x[which.max(x[, period]), "i"])) dat <- data[index, ] dat[, event] <- as.integer(!dat[, event]) }) rownames(dat) <- NULL return(dat) } tpp <- PLPP(data = df, id = "ID", period = "studyyrs", event = "dead", direction = "period") tpp Pradip K. Muhuri, SAMHSA/CBHSQ [[alternative HTML version deleted]]
David Barron
2015-Jan-03 15:18 UTC
[R] R function to convert person-level observations to person-period observations
Your data are wrong. The 'event' variable (dead in your example) needs to be 1 for cases that end in an event and 0 for spells that are censored: yours is the other way around. If you change the 'dead' variable to c(1,0,1) you will get the desired result. If you really need to reverse the behaviour of the function, change the line reve <- !data[, event] to reve <- data[, event] David On 3 January 2015 at 13:20, Muhuri, Pradip (SAMHSA/CBHSQ) <Pradip.Muhuri at samhsa.hhs.gov> wrote:> Hello, > > I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm). Please see my reproducible example below. The function (PLPP) in the R script takes five arguments. > > > 1) data (i.e., the data set to be converted) > > 2) id (i.e., the identifier for each observation) > > 3) period (i.e., number pf periods the person or observation was followed-up) > > 4) event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting). > > 5) direction which "indicates whether the function should go from person-level to person-period or from person-period to person-level". > On my example data set, the R script ran successfully. Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results. But the issue is that the value of the "dead" indicator variable is incorrect. I have a gut feeling that the function needs to tweaked a bit to get desired results. > > > Correct results > ID dead studyyrs > 1 A 1 2 > 2 B 0 5 > 3 C 1 3 > > Incorrect results - the "dead" column > > ID dead studyyrs > > 1 A 0 1 > > 2 A 0 2 > > 3 B 0 1 > > 4 B 0 2 > > 5 B 0 3 > > 6 B 0 4 > > 7 B 1 5 > > 8 C 0 1 > > 9 C 0 2 > > 10 C 0 3 > > > > > Desired results > > ID dead studyyrs > > 1 A 0 1 > > 2 A 1 2 > > 3 B 0 1 > > 4 B 0 2 > > 5 B 0 3 > > 6 B 0 4 > > 7 B 0 5 > > 8 C 0 1 > > 9 C 0 2 > > 10 C 1 3 > > > I would appreciate receiving your help or hints for resolving the issue. Thanks, > > > > ## Below is my reproducible code is shown below) > > ## Below is my data frame (3 observations) > df <- data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) > df > > ## Person-Level Person-Period Converter Function - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm > PLPP <- function(data, id, period, event, direction = c("period", "level")) { > ## Data Checking and Verification Steps > stopifnot(is.matrix(data) || is.data.frame(data)) > stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data))) > > if (any(is.na(data[, c(id, period, event)]))) { > stop("PLPP cannot currently handle missing data in the id, period, or event variables") > } > > ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm > switch(match.arg(direction), > period = { > index <- rep(1:nrow(data), data[, period]) > idmax <- cumsum(data[, period]) > reve <- !data[, event] > dat <- data[index, ] > dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along) > dat[, event] <- 0 > dat[idmax, event] <- reve}, > level = { > tmp <- cbind(data[, c(period, id)], i = 1:nrow(data)) > index <- as.vector(by(tmp, tmp[, id], > FUN = function(x) x[which.max(x[, period]), "i"])) > dat <- data[index, ] > dat[, event] <- as.integer(!dat[, event]) > }) > > rownames(dat) <- NULL > return(dat) > } > > tpp <- PLPP(data = df, id = "ID", period = "studyyrs", > event = "dead", direction = "period") > tpp > > > > Pradip K. Muhuri, > SAMHSA/CBHSQ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Muhuri, Pradip (SAMHSA/CBHSQ)
2015-Jan-03 18:22 UTC
[R] R function to convert person-level observations to person-period observations
Hello David, Thank you so much for your advice. The revision of the code as "reve <- data[, event]" in the function (but with no changing of the example data) seems to provide the desired results (shown below). These 3 subjects are followed for 5 years. Subject A experienced the event in year 2, and subject C experienced the event in year 3 while subject B were censored at the end follow-up period (i.e., year 5). The person-period observations now seem to be consistent with the person-level observations. Do you see any issues? Regards, Pradip ########################################################################################### ## person-level observations ID dead studyyrs 1 A 1 2 2 B 0 5 3 C 1 3 ## person-period observation ID dead studyyrs 1 A 0 1 2 A 1 2 3 B 0 1 4 B 0 2 5 B 0 3 6 B 0 4 7 B 0 5 8 C 0 1 9 C 0 2 10 C 1 3 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -----Original Message----- From: David Barron [mailto:dnbarron at gmail.com] Sent: Saturday, January 03, 2015 10:19 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help at r-project.org Subject: Re: [R] R function to convert person-level observations to person-period observations Your data are wrong. The 'event' variable (dead in your example) needs to be 1 for cases that end in an event and 0 for spells that are censored: yours is the other way around. If you change the 'dead' variable to c(1,0,1) you will get the desired result. If you really need to reverse the behaviour of the function, change the line reve <- !data[, event] to reve <- data[, event] David On 3 January 2015 at 13:20, Muhuri, Pradip (SAMHSA/CBHSQ) <Pradip.Muhuri at samhsa.hhs.gov> wrote:> Hello, > > I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm). Please see my reproducible example below. The function (PLPP) in the R script takes five arguments. > > > 1) data (i.e., the data set to be converted) > > 2) id (i.e., the identifier for each observation) > > 3) period (i.e., number pf periods the person or observation was > followed-up) > > 4) event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting). > > 5) direction which "indicates whether the function should go from person-level to person-period or from person-period to person-level". > On my example data set, the R script ran successfully. Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results. But the issue is that the value of the "dead" indicator variable is incorrect. I have a gut feeling that the function needs to tweaked a bit to get desired results. > > > Correct results > ID dead studyyrs > 1 A 1 2 > 2 B 0 5 > 3 C 1 3 > > Incorrect results - the "dead" column > > ID dead studyyrs > > 1 A 0 1 > > 2 A 0 2 > > 3 B 0 1 > > 4 B 0 2 > > 5 B 0 3 > > 6 B 0 4 > > 7 B 1 5 > > 8 C 0 1 > > 9 C 0 2 > > 10 C 0 3 > > > > > Desired results > > ID dead studyyrs > > 1 A 0 1 > > 2 A 1 2 > > 3 B 0 1 > > 4 B 0 2 > > 5 B 0 3 > > 6 B 0 4 > > 7 B 0 5 > > 8 C 0 1 > > 9 C 0 2 > > 10 C 1 3 > > > I would appreciate receiving your help or hints for resolving the > issue. Thanks, > > > > ## Below is my reproducible code is shown below) > > ## Below is my data frame (3 observations) df <- data.frame( > ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) df > > ## Person-Level Person-Period Converter Function - Source: > http://www.ats.ucla.edu/stat/r/faq/person_period.htm > PLPP <- function(data, id, period, event, direction = c("period", "level")) { > ## Data Checking and Verification Steps > stopifnot(is.matrix(data) || is.data.frame(data)) > stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data))) > > if (any(is.na(data[, c(id, period, event)]))) { > stop("PLPP cannot currently handle missing data in the id, period, or event variables") > } > > ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm > switch(match.arg(direction), > period = { > index <- rep(1:nrow(data), data[, period]) > idmax <- cumsum(data[, period]) > reve <- !data[, event] > dat <- data[index, ] > dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along) > dat[, event] <- 0 > dat[idmax, event] <- reve}, > level = { > tmp <- cbind(data[, c(period, id)], i = 1:nrow(data)) > index <- as.vector(by(tmp, tmp[, id], > FUN = function(x) x[which.max(x[, period]), "i"])) > dat <- data[index, ] > dat[, event] <- as.integer(!dat[, event]) > }) > > rownames(dat) <- NULL > return(dat) > } > > tpp <- PLPP(data = df, id = "ID", period = "studyyrs", > event = "dead", direction = "period") tpp > > > > Pradip K. Muhuri, > SAMHSA/CBHSQ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.