thr3ads.net - R help - [R] Manipulation of longitudinal data by row [Dec 2012]

If this information is useful, please help other people find it:
Share via:

marcel curlin

2012-Dec-14 16:37 UTC

[R] Manipulation of longitudinal data by row

I have a dataset of the form below, consisting of one unique ID per
row, followed by a series of visit dates.  At each visit there are
values for 3 dichotomous variables. Of the 8 different possible
combinations of the three variables, 4  are "abnormal" and the
remaining 4 are "normal". Everyone starts out abnormal, and then
either continues to be abnormal at subsequent visits, or resolves to a
normal pattern at a later visit (I ignore reversion back to abnormal -
once they are normal, they are normal)

I have to end up with 4 new columns indicating 1) date of last
completed visit (regardless of intervening "NAs", 2) whether an ID
resolved or stayed abnormal, 3) if resolved, what the resolution
pattern was and 4) what the date of resolution was. NAs always come in
groups of 4 (ie no visit date, and no value for the 3 variables) and
are ignored.

Eventually I have to determine mean time to resolution, mean follow-up
time, etc and I think I can do that, but the first part is a bit
beyond my coding skill. Suggestions appreciated.

tC <- textConnection("
ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c
001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA
002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA
003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes
004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA
005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes
")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

Adams, Jean

2012-Dec-17 22:29 UTC

head link

[R] Manipulation of longitudinal data by row

I had some difficulty getting the data read in using the code you included
in your email, although I'm not sure why.  I'm pasting in the code that
worked for me, below.

I think that the calculations that you want to make would be easier if you
rearranged your data first.  I used your example data to do just that.
 Once the data are rearranged, it is very easy to look at information on
the last visit from each ID (see code, below).  This includes much of the
information you describe in your query, 1) date of last completed visit 2)
whether an ID resolved, and 3) what the final pattern was.

Jean

tC <- textConnection("ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date
V3a
V3b V3c
001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA
002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA
003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes
004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA
005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

# rearrange the data
data2 <- data.frame(
id = rep(data1$ID, 3),
visit = rep(1:3, rep(dim(data1)[1], 3)),
 date = as.Date(c(data1$V1Date, data1$V2date, data1$V3date),
"%m/%d/%y"),
dva = c(data1$V1a, data1$V2a, data1$V3a),
 dvb = c(data1$V1a, data1$V2a, data1$V3a),
dvc = c(data1$V1a, data1$V2a, data1$V3a))
# define a new variable that is a combination of the three dichotomous
variables
data2$abc <- paste0(substring(data2$dva, 1, 1), substring(data2$dvb, 1, 1),
substring(data2$dvb, 1, 1))
# define a new variable that indicates whether the combination is
"normal"
data2$normal <- data2$abc %in% c("YYN", "YNY",
"YYN", "NNY")

# eliminate rows without visit information
data3 <- data2[!is.na(data2$date), ]
# split the data into lists according to id
list4 <- split(data3, data3$id)

# show the last visit from each id
do.call(rbind, lapply(list4, function(df) df[dim(df)[1], ]))



On Fri, Dec 14, 2012 at 10:37 AM, marcel curlin
<marcelcurlin@gmail.com>wrote:
> I have a dataset of the form below, consisting of one unique ID per
> row, followed by a series of visit dates.  At each visit there are
> values for 3 dichotomous variables. Of the 8 different possible
> combinations of the three variables, 4  are "abnormal" and the
> remaining 4 are "normal". Everyone starts out abnormal, and then
> either continues to be abnormal at subsequent visits, or resolves to a
> normal pattern at a later visit (I ignore reversion back to abnormal -
> once they are normal, they are normal)
>
> I have to end up with 4 new columns indicating 1) date of last
> completed visit (regardless of intervening "NAs", 2) whether an
ID
> resolved or stayed abnormal, 3) if resolved, what the resolution
> pattern was and 4) what the date of resolution was. NAs always come in
> groups of 4 (ie no visit date, and no value for the 3 variables) and
> are ignored.
>
> Eventually I have to determine mean time to resolution, mean follow-up
> time, etc and I think I can do that, but the first part is a bit
> beyond my coding skill. Suggestions appreciated.
>
> tC <- textConnection("
> ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c
> 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA
> 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA
> 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes
> 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA
> 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes
> ")
> data1 <- read.table(header=TRUE, tC)
> close.connection(tC)
> rm(tC)
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

arun

2012-Dec-17 23:08 UTC

head link

[R] Manipulation of longitudinal data by row

Hi Jean,

Just to clarify whether it is a 'typo' or not.
data2 <- data.frame(
id = rep(data1$ID, 3),
visit = rep(1:3, rep(dim(data1)[1], 3)),
date = as.Date(c(data1$V1Date, data1$V2date, data1$V3date),
"%m/%d/%y"),
dva = c(data1$V1a, data1$V2a, data1$V3a),
dvb = c(data1$V1a, data1$V2a, data1$V3a),#? 'b'
dvc = c(data1$V1a, data1$V2a, data1$V3a)) # 'c'


A.K.



----- Original Message -----
From: "Adams, Jean" <jvadams at usgs.gov>
To: marcel curlin <marcelcurlin at gmail.com>
Cc: r-help at r-project.org
Sent: Monday, December 17, 2012 5:29 PM
Subject: Re: [R] Manipulation of longitudinal data by row

I had some difficulty getting the data read in using the code you included
in your email, although I'm not sure why.? I'm pasting in the code that
worked for me, below.

I think that the calculations that you want to make would be easier if you
rearranged your data first.? I used your example data to do just that.
Once the data are rearranged, it is very easy to look at information on
the last visit from each ID (see code, below).? This includes much of the
information you describe in your query, 1) date of last completed visit 2)
whether an ID resolved, and 3) what the final pattern was.

Jean

tC <- textConnection("ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date
V3a
V3b V3c
001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA
002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA
003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes
004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA
005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

# rearrange the data
data2 <- data.frame(
id = rep(data1$ID, 3),
visit = rep(1:3, rep(dim(data1)[1], 3)),
date = as.Date(c(data1$V1Date, data1$V2date, data1$V3date),
"%m/%d/%y"),
dva = c(data1$V1a, data1$V2a, data1$V3a),
dvb = c(data1$V1a, data1$V2a, data1$V3a),
dvc = c(data1$V1a, data1$V2a, data1$V3a))
# define a new variable that is a combination of the three dichotomous
variables
data2$abc <- paste0(substring(data2$dva, 1, 1), substring(data2$dvb, 1, 1),
substring(data2$dvb, 1, 1))
# define a new variable that indicates whether the combination is
"normal"
data2$normal <- data2$abc %in% c("YYN", "YNY",
"YYN", "NNY")

# eliminate rows without visit information
data3 <- data2[!is.na(data2$date), ]
# split the data into lists according to id
list4 <- split(data3, data3$id)

# show the last visit from each id
do.call(rbind, lapply(list4, function(df) df[dim(df)[1], ]))



On Fri, Dec 14, 2012 at 10:37 AM, marcel curlin <marcelcurlin at
gmail.com>wrote:
> I have a dataset of the form below, consisting of one unique ID per
> row, followed by a series of visit dates.? At each visit there are
> values for 3 dichotomous variables. Of the 8 different possible
> combinations of the three variables, 4? are "abnormal" and the
> remaining 4 are "normal". Everyone starts out abnormal, and then
> either continues to be abnormal at subsequent visits, or resolves to a
> normal pattern at a later visit (I ignore reversion back to abnormal -
> once they are normal, they are normal)
>
> I have to end up with 4 new columns indicating 1) date of last
> completed visit (regardless of intervening "NAs", 2) whether an
ID
> resolved or stayed abnormal, 3) if resolved, what the resolution
> pattern was and 4) what the date of resolution was. NAs always come in
> groups of 4 (ie no visit date, and no value for the 3 variables) and
> are ignored.
>
> Eventually I have to determine mean time to resolution, mean follow-up
> time, etc and I think I can do that, but the first part is a bit
> beyond my coding skill. Suggestions appreciated.
>
> tC <- textConnection("
> ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c
> 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA
> 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA
> 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes
> 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA
> 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes
> ")
> data1 <- read.table(header=TRUE, tC)
> close.connection(tC)
> rm(tC)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more reasonably related threads

R help - Dec 2012 - Manipulation of longitudinal data by row

[R] Manipulation of longitudinal data by row

[R] Manipulation of longitudinal data by row

[R] Manipulation of longitudinal data by row

Maybe Matching Threads