thr3ads.net - R help - [R] Restructuring data - unstack, reshape? [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Jen M

2011-Sep-26 01:47 UTC

[R] Restructuring data - unstack, reshape?

Hi all,

I'm having a problem restructuring my data the way I'd like it.  I have
data
that look like this:

  Candidate.ID        Specialty       Office         Score
       110002         C                  London           47
       110002         C                  East              48
       110003         RM               West              45
       110003         RM               Southwest      39
       110003         C                  Southwest      38
       110004         H                  South             42
       110006         G                  East               47
       110006         G                  London           45

Candidates can apply for the same job specialty in up to 2 offices (never
more).  They can apply for different specialties in further centres.  I
would like to look at score differences when candidates apply for the same
specialty in two different offices.  With the help of the archives I have
tried various stack/unstack and reshape/melt/cast combinations, and I've
managed to get a huge matrix where the columns are all possible combinations
of Specialties & Offices - and there are many.  This leaves a very sparse
matrix  with mainly null values, and this is not what I want.  I'd like the
scores from the two attempts in two columns so I can do scatterplots,
calculate differences by specialty etc. In SPSS I'd use
'restructure' to get
what I want.  I'm working to order with specific requests here so I have to
do it this way (as opposed to a modelling approach).

I would like it restructured to look something like this:

  Candidate.ID        Specialty       Office.1       Score.1      Office.2
Score.2
       110002         C                  London           47
 East          48
       110003         RM               West              45
 Southwest  39
       110003         C                  Southwest      38
       110004         H                  South             42
       110006         G                  East               47
London      45
       110006         G                  London           45

So one row per candidate/specialty combination, with 2 sets of
offices/scores and null values in the second set if they've only applied
once for that specialty.  Can anyone help me out with this? Is it possible
using stack or reshape?

Many thanks for reading,
Jan

PS Closest I've come to what I need is the sparse matrix produced by this:

recast(spec.scores, Candidate.ID ~ Specialty + Office,
measure.var="Score")

	[[alternative HTML version deleted]]

Dennis Murphy

2011-Sep-26 08:05 UTC

head link

[R] Restructuring data - unstack, reshape?

Hi:

Here's one approach using the reshape() function in base R:

# Read in your data:
d <- read.table(textConnection("
Candidate.ID        Specialty       Office         Score
      110002         C                  London           47
      110002         C                  East              48
      110003         RM               West              45
      110003         RM               Southwest      39
      110003         C                  Southwest      38
      110004         H                  South             42
      110006         G                  East               47
      110006         G                  London           45"), header =
TRUE)
closeAllConnections()

# Create a variable to distinguish positions
d$Position <- c(1, 2, 1, 2, 1, 1, 1, 2)

reshape(d, idvar = 'Candidate.ID', timevar = 'Position',
           v.names = c('Specialty', 'Office', 'Score'),
           direction = 'wide')

HTH,
Dennis

On Sun, Sep 25, 2011 at 6:47 PM, Jen M <jmstatshelp at gmail.com>
wrote:> Hi all,
>
> I'm having a problem restructuring my data the way I'd like it. ?I
have data
> that look like this:
>
> ?Candidate.ID ? ? ? ?Specialty ? ? ? Office ? ? ? ? Score
> ? ? ? 110002 ? ? ? ? C ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 47
> ? ? ? 110002 ? ? ? ? C ? ? ? ? ? ? ? ? ?East ? ? ? ? ? ? ?48
> ? ? ? 110003 ? ? ? ? RM ? ? ? ? ? ? ? West ? ? ? ? ? ? ?45
> ? ? ? 110003 ? ? ? ? RM ? ? ? ? ? ? ? Southwest ? ? ?39
> ? ? ? 110003 ? ? ? ? C ? ? ? ? ? ? ? ? ?Southwest ? ? ?38
> ? ? ? 110004 ? ? ? ? H ? ? ? ? ? ? ? ? ?South ? ? ? ? ? ? 42
> ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?East ? ? ? ? ? ? ? 47
> ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 45
>
> Candidates can apply for the same job specialty in up to 2 offices (never
> more). ?They can apply for different specialties in further centres. ?I
> would like to look at score differences when candidates apply for the same
> specialty in two different offices. ?With the help of the archives I have
> tried various stack/unstack and reshape/melt/cast combinations, and
I've
> managed to get a huge matrix where the columns are all possible
combinations
> of Specialties & Offices - and there are many. ?This leaves a very
sparse
> matrix ?with mainly null values, and this is not what I want. ?I'd like
the
> scores from the two attempts in two columns so I can do scatterplots,
> calculate differences by specialty etc. In SPSS I'd use
'restructure' to get
> what I want. ?I'm working to order with specific requests here so I
have to
> do it this way (as opposed to a modelling approach).
>
> I would like it restructured to look something like this:
>
> ?Candidate.ID ? ? ? ?Specialty ? ? ? Office.1 ? ? ? Score.1 ? ? ?Office.2
> Score.2
> ? ? ? 110002 ? ? ? ? C ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 47
> ?East ? ? ? ? ?48
> ? ? ? 110003 ? ? ? ? RM ? ? ? ? ? ? ? West ? ? ? ? ? ? ?45
> ?Southwest ?39
> ? ? ? 110003 ? ? ? ? C ? ? ? ? ? ? ? ? ?Southwest ? ? ?38
> ? ? ? 110004 ? ? ? ? H ? ? ? ? ? ? ? ? ?South ? ? ? ? ? ? 42
> ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?East ? ? ? ? ? ? ? 47
> London ? ? ?45
> ? ? ? 110006 ? ? ? ? G ? ? ? ? ? ? ? ? ?London ? ? ? ? ? 45
>
> So one row per candidate/specialty combination, with 2 sets of
> offices/scores and null values in the second set if they've only
applied
> once for that specialty. ?Can anyone help me out with this? Is it possible
> using stack or reshape?
>
> Many thanks for reading,
> Jan
>
> PS Closest I've come to what I need is the sparse matrix produced by
this:
>
> recast(spec.scores, Candidate.ID ~ Specialty + Office,
measure.var="Score")
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more apparently analagous threads

R help - Sep 2011 - Restructuring data - unstack, reshape?

[R] Restructuring data - unstack, reshape?

[R] Restructuring data - unstack, reshape?

Maybe Matching Threads