thr3ads.net - R help - [R] Selecting one row or multiple rows per ID [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Vedula, Satyanarayana

2009-Mar-04 06:09 UTC

[R] Selecting one row or multiple rows per ID

Hi,

Could someone help with coding this in R?

I need to select one row per patient i in clinic j. The data is organized
similar to that shown below.

Two columns - patient i in column j identify each unique patient. There are two
columns on outcome. Some patients have multiple rows with each row representing
one visit, coded for in the column, visit. Some patients have just one row
indicating data from a single visit.

I need to select one row per patient i in clinic j using the following
algorithm:

If patient has outcome recorded at visit 2, then outcome = outcome columns at
visit 2
If patient does not have visit 2, then outcome = outcome at visit 5
If patient does not have visit 2 and visit 5, then outcome = outcome at visit 4
If patient does not have visits 2, 5, and 4, then outcome = outcome at visit 3
If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at visit
1
If patient does not have any of the visits, outcome = missing


Patient     Clinic     Visit     Outcome_left   Outcome_right
patient 1  clinic 1   visit 2        22                        21
patient 1  clinic 3   visit 1        21                        21
patient 1  clinic 3   visit 2        21                        22
patient 1  clinic 3   visit 3        20                        22
patient 3  clinic 5   visit 1        24                        21
patient 3  clinic 5   visit 3        21                        22
patient 3  clinic 5   visit 4        22                        23
patient 3  clinic 5   visit 5        22                        22

I need to select just the first row for patient 1/clinic 1; the second row
(visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient
3/clinic 5.

How does one code for that?

Thank you,
SV

Dieter Menne

2009-Mar-04 09:09 UTC

head link

[R] Selecting one row or multiple rows per ID

Vedula, Satyanarayana <svedula <at> jhsph.edu> writes:
> 
> 
> I need to select one row per patient i in clinic j. The data is organized
> similar to that shown below.
> 
... > If patient has outcome recorded at visit 2, then outcome = outcome 
>columns   at visit 2
> If patient does not have visit 2, then outcome = outcome at visit 5
> If patient does not have visit 2 and visit 5, then outcome = outcome at 
> visit ... other rules
I prefer to use a table driven approach here, because one can easily
get lost in all these if's, and medical research requires well defined
documentation of the outcome you choose.

So I first convert the data to the wide format; you might alternatively
use function cast in package reshape for this, but I never can get the 
syntax right. I also prefer to do most of this preparatory work on the
database level, e.g. with PIVOT.

Create a translation table of the 25 possible combinations to the 
column you selected, and you can be sure you forgot no combination.

Dieter



outc = data.frame(
  patclin = as.factor(
         paste(c(1,1,1,1,3,3,3,3),
               c(1,3,3,3,5,5,5,5),sep=".")), 
  vis  = as.factor(c(2,1,2,3,1,3,4,5)),
  outcom = c(22,21,21,20,24,21,22,22))

outw =
reshape(outc,v.names="outcom",idvar="patclin",timevar="vis",
  direction="wide")
outw = outw[,order(names(outw))]
# I am sure there is a more elegant way to do this
# I prefer to do this type of work on the database level 
outw$code= as.factor(
  apply(sapply(outw[,1:5],function(x){as.integer(!is.na(x))}),1,paste,
  collapse=""))

# Note : the values here are not exactly what you requeste, 
# use your logic to select columns here
usevisit = data.frame(code=levels(outw$code),visit=c(2,3,4))
outw = merge(usevisit,outw)
outw

# you get a documented table of the columns you selected and
# can use visit to select the column
#   code visit outcom.1 outcom.2 outcom.3 outcom.4 outcom.5 patclin
#1 01000     2       NA       22       NA       NA       NA     1.1
#2 10111     3       24       NA       21       22       22     3.5
#3 11100     4       21       21       20       NA       NA     1.3

hadley wickham

2009-Mar-04 14:55 UTC

head link

[R] Selecting one row or multiple rows per ID

On Wed, Mar 4, 2009 at 12:09 AM, Vedula, Satyanarayana
<svedula at jhsph.edu> wrote:> Hi,
>
> Could someone help with coding this in R?
>
> I need to select one row per patient i in clinic j. The data is organized
similar to that shown below.
>
> Two columns - patient i in column j identify each unique patient. There are
two columns on outcome. Some patients have multiple rows with each row
representing one visit, coded for in the column, visit. Some patients have just
one row indicating data from a single visit.
>
> I need to select one row per patient i in clinic j using the following
algorithm:
>
> If patient has outcome recorded at visit 2, then outcome = outcome columns
at visit 2
> If patient does not have visit 2, then outcome = outcome at visit 5
> If patient does not have visit 2 and visit 5, then outcome = outcome at
visit 4
> If patient does not have visits 2, 5, and 4, then outcome = outcome at
visit 3
> If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at
visit 1
> If patient does not have any of the visits, outcome = missing
>
>
> Patient ? ? Clinic ? ? Visit ? ? Outcome_left ? Outcome_right
> patient 1 ?clinic 1 ? visit 2 ? ? ? ?22 ? ? ? ? ? ? ? ? ? ? ? ?21
> patient 1 ?clinic 3 ? visit 1 ? ? ? ?21 ? ? ? ? ? ? ? ? ? ? ? ?21
> patient 1 ?clinic 3 ? visit 2 ? ? ? ?21 ? ? ? ? ? ? ? ? ? ? ? ?22
> patient 1 ?clinic 3 ? visit 3 ? ? ? ?20 ? ? ? ? ? ? ? ? ? ? ? ?22
> patient 3 ?clinic 5 ? visit 1 ? ? ? ?24 ? ? ? ? ? ? ? ? ? ? ? ?21
> patient 3 ?clinic 5 ? visit 3 ? ? ? ?21 ? ? ? ? ? ? ? ? ? ? ? ?22
> patient 3 ?clinic 5 ? visit 4 ? ? ? ?22 ? ? ? ? ? ? ? ? ? ? ? ?23
> patient 3 ?clinic 5 ? visit 5 ? ? ? ?22 ? ? ? ? ? ? ? ? ? ? ? ?22
>
> I need to select just the first row for patient 1/clinic 1; the second row
(visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient
3/clinic 5.
I'd approach this problem in the following way:

df <- read.csv(textConnection("
Patient,Clinic,Visit,Outcome_left,Outcome_right
patient 1,clinic 1,visit 2,22,21
patient 1,clinic 3,visit 1,21,21
patient 1,clinic 3,visit 2,21,22
patient 1,clinic 3,visit 3,20,22
patient 3,clinic 5,visit 1,24,21
patient 3,clinic 5,visit 3,21,22
patient 3,clinic 5,visit 4,22,23
patient 3,clinic 5,visit 5,22,22
"), header = T)
closeAllConnections()


# With a single patient it's pretty easy to find the preferred visit
preferred_visit <- paste("visit", c(2, 5, 4, 3, 1))

one <- subset(df, Patient == "patient 3" & Clinic ==
"clinic 5")
best_visit <- na.omit(match(preferred_visit, one$Visit))[1]
one[best_visit, ]

# We then turn this into a function
find_best_visit <- function(one) {
  best_visit <- na.omit(match(preferred_visit, one$Visit))[1]
  one[best_visit, ]
}

# Then apply it to every combination of patient and clinic with plyr
ddply(df, .(Patient, Clinic), find_best_visit)

# You can learn more about plyr at http://had.co.nz/plyr


Hadley

-- 
http://had.co.nz/

Seemingly Similar Threads

Search for more reasonably related threads

R help - Mar 2009 - Selecting one row or multiple rows per ID

[R] Selecting one row or multiple rows per ID

[R] Selecting one row or multiple rows per ID

[R] Selecting one row or multiple rows per ID

Seemingly Similar Threads