On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote:
> Dear people from the R help list,
>
> I have a question that I can't get my head around to start
> answering, that is why I am writing to the list.
>
> I have data in a format like this (tabs might look weird):
>
> John A1 1 0 1
> John A2 1 1 1
> John A3 1 0 0
> Mary A1 1 0 1
> Mary A2 0 0 1
> Mary A3 1 1 0
> Peter A1 1 0 0
> Peter A2 0 0 1
> Peter A3 1 1 1
> Josh A1 1 0 0
> Josh A2
> Josh A3 0 0 0
>
> I want to convert it into a format where variable rows from a single
> subject are placed behind each other, but with the different scores
> still matching up (i.e., it needs to be able to cope with missing
> data, as for Josh's A2 score).
>
> John A1 1 0 1 A2 1 1 1 A3 1
> 0 0
> Mary A1 1 0 1 A2 0 0 1 A3 1
> 1 0
> Peter A1 1 0 0 A2 0 0 1 A3 1
> 1 1
> Josh A1 1 0 0 A2 A3 0 0 0
>
> Preferably, the row identification would become the header of the
> new table, something like this:
>
> A11 A12 A13 A21 A22 A23 A31 A32 A33
> John 1 0 1 1 1 1 1 0 0
> Mary 1 0 1 0 0 1 1 1 0
> Peter 1 0 0 0 0 1 1 1 1
> Josh 1 0 0 0 0 0
>
> Probably, this has been addressed before - I just don't know how to
> search for the answer with the right search terms.
>
> Any help is appreciated, even just a link to a page where this is
> addressed!
There is a reshape function in the stats package that nobody except
Phil Spector seems to understand and then there is the reshape and
reshape2 packages that everybody seems to get. (I don't understand why
the classification variables are on the left-hand-side, though.
Positionally it makes some sense, but logically it does not connect
with how I understand the process.)
require(reshape2)
# entered your data with default names V1 V2 V3 V4 V5
> nam123
V1 V2 V3 V4 V5
1 John A1 1 0 1
2 John A2 1 1 1
3 John A3 1 0 0
4 Mary A1 1 0 1
5 Mary A2 0 0 1
6 Mary A3 1 1 0
7 Peter A1 1 0 0
8 Peter A2 0 0 1
9 Peter A3 1 1 1
10 Josh A1 1 0 0
11 Josh A2 NA NA NA
12 Josh A3 0 0 0
> nams.mlt <- melt(nam123, idvars=c("V1", "V2"))
> str(nams.mlt)
'data.frame': 36 obs. of 4 variables:
$ V1 : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3
3 4 4 4
2 ...
$ V2 : Factor w/ 3 levels "A1","A2","A3": 1
2 3 1 2 3 1 2 3 1 ...
$ variable: Factor w/ 3 levels "V3","V4","V5": 1
1 1 1 1 1 1 1 1 1 ...
$ value : int 1 1 1 1 0 1 1 0 1 1 ...
> dcast(nams.mlt, V1+V2 ~ variable)
V1 V2 V3 V4 V5
1 John A1 1 0 1
2 John A2 1 1 1
3 John A3 1 0 0
4 Josh A1 1 0 0
5 Josh A2 NA NA NA
6 Josh A3 0 0 0
7 Mary A1 1 0 1
8 Mary A2 0 0 1
9 Mary A3 1 1 0
10 Peter A1 1 0 0
11 Peter A2 0 0 1
12 Peter A3 1 1 1
> dcast(nams.mlt, V1 ~ V2+variable)
V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5
1 John 1 0 1 1 1 1 1 0 0
2 Josh 1 0 0 NA NA NA 0 0 0
3 Mary 1 0 1 0 0 1 1 1 0
4 Peter 1 0 0 0 0 1 1 1 1
You can always change the names of the dataframe if you want, and in
this case it would be a simple sub() operation. Personally I would
substitute "." rather than "".
--
David Winsemius, MD
West Hartford, CT