thr3ads.net - R help - [R] Help with recast() syntax [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Chris Conner

2011-Nov-29 05:32 UTC

[R] Help with recast() syntax

Dear Help-Rs,
 
I have data similar to the following:
 
DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("NEG", "POS"), class =
"factor"), YR_MO = c(201011L,
201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L, 
201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L, 
201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L, 
124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L, 
394L, 412L, 379L)), .Names = c("X", "RESULT",
"YR_MO", "TOT_TESTS"
), class = "data.frame", row.names = c(NA, -22L))
 
Currently there are 2 observations for each month (one for negative and one for
positive test results).  What I need to create a data set that looks like the
following, with positive and negative test results in the same row organized by
month:
 
DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class =
"factor"),
    YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L, 
    201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L, 
    98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
    ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L, 
    383L, 394L, 412L, 379L)), .Names = c("X", "RESULT",
"YR_MO",
"POS_TESTS", "NEG_TESTS"), class = "data.frame",
row.names = c(NA,
-11L))
 
As this is something that I understand Hadley Wickham's Reshape package is
ideally suited for, I tried using the following reshape command:
 
ReshapeDF <- recast(DF, YR_MO~variable)
 
I get the following error message:
 
Using RESULT as id variables
Error: Casting formula contains variables not found in molten data: YR_MO
 
I have a work around that allows me to get to my desired endpoint that involves
splitting the data.frame into two (by test result), then using the YR_MO as the
by.x/by.y in a merge, but I think this task would be handled more efficiently
using reshape?  Can anyone help me to see where I'm going wrong?  Thanks in
advance!

	[[alternative HTML version deleted]]

David Winsemius

2011-Nov-29 06:25 UTC

head link

[R] Help with recast() syntax

On Nov 29, 2011, at 12:32 AM, Chris Conner wrote:
> Dear Help-Rs,
>
> I have data similar to the following:
>
> DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
This section of the structure has two NEG's for 201109 and none for POS.
> 1L, 1L), .Label = c("NEG", "POS"), class =
"factor"), YR_MO =
> c(201011L,
> 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L,
> 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L,
> 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
> ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L,
> 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L,
> 394L, 412L, 379L)), .Names = c("X", "RESULT",
"YR_MO", "TOT_TESTS"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> Currently there are 2 observations for each month (one for negative  
> and one for positive test results).  What I need to create a data  
> set that looks like the following, with positive and negative test  
> results in the same row organized by month:
After fixing the POS/NEG discrepancy, this works:

 > dcast(DF, YR_MO ~ RESULT, value_var="TOT_TESTS")
     YR_MO NEG POS
1  201011 349  66
2  201012 393  98
3  201101 376 109
4  201102 371 122
5  201103 396 113
6  201104 367 111
7  201105 406 113
8  201106 383 146
9  201107 394 124
10 201108 412 130
11 201109 379 120

-- 
David.>
> DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class =
"factor"),
>     YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L,
>     201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L,
>     98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
>     ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L,
>     383L, 394L, 412L, 379L)), .Names = c("X", "RESULT",
"YR_MO",
> "POS_TESTS", "NEG_TESTS"), class =
"data.frame", row.names = c(NA,
> -11L))
>
> As this is something that I understand Hadley Wickham's Reshape  
> package is ideally suited for, I tried using the following reshape  
> command:
>
> ReshapeDF <- recast(DF, YR_MO~variable)
>
> I get the following error message:
>
> Using RESULT as id variables
> Error: Casting formula contains variables not found in molten data:  
> YR_MO
>
> I have a work around that allows me to get to my desired endpoint  
> that involves splitting the data.frame into two (by test result),  
> then using the YR_MO as the by.x/by.y in a merge, but I think this  
> task would be handled more efficiently using reshape?  Can anyone  
> help me to see where I'm going wrong?  Thanks in advance!
>
> 	[[alternative HTML version deleted]]
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

jdnewmil

2011-Nov-29 06:25 UTC

head link

[R] Help with recast() syntax

Inline below...

 On Mon, 28 Nov 2011 21:32:21 -0800 (PST), Chris Conner 
 <connerpharmd at yahoo.com> wrote:> Dear Help-Rs,
> ?
> I have data similar to the following:
> ?
> DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L), .Label = c("NEG", "POS"), class =
"factor"), YR_MO =
> c(201011L,
> 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L,
> 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L,
> 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
> ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L,
> 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L,
> 394L, 412L, 379L)), .Names = c("X", "RESULT",
"YR_MO", "TOT_TESTS"
> ), class = "data.frame", row.names = c(NA, -22L))
> ?
> Currently there are 2 observations for each month (one for negative
> and one for positive test results).? What I need to create a data set
> that looks like the following, with positive and negative test 
> results
> in the same row organized by month:
> ?
> DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class =
"factor"),
> ??? YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L,
> ??? 201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L,
> ??? 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
> ??? ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L,
> ??? 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT",
"YR_MO",
> "POS_TESTS", "NEG_TESTS"), class =
"data.frame", row.names = c(NA,
> -11L))
 Thanks for the sample data.
> As this is something that I understand Hadley Wickham's Reshape
> package is ideally suited for, I tried using the following reshape
> command:
> ?
> ReshapeDF <- recast(DF, YR_MO~variable)
> ?
> I get the following error message:
> ?
> Using RESULT as id variables
> Error: Casting formula contains variables not found in molten data: 
> YR_MO
 I don't think you need to melt the data first, so you don't need the 
 recast function.

 # reshape2 is faster than reshape, but slightly syntactically different
 library(reshape2)
 # rename the RESULT levels
 DF0 <- DF
 levels( DF0$RESULT ) <- c( "NEG_TOTAL", "POS_TOTAL" )
 # cast to data frame, use sum if more than one row for a given YR_MO
 DF0 <- dcast( DF0, YR_MO~RESULT, sum, value.var="TOT_TESTS" )
 # The rest of this is to make the data frame look like your result, 
 which seems
 # unnecessary to me, but perhaps there is a good reason for keeping X 
 and RESULT
 DF1 <- merge( DF[ DF$RESULT=="POS", c( "X",
"RESULT", "YR_MO" ) ], DF0
 )
 DF2 <- DF1[,c("X", "RESULT", "YR_MO",
"POS_TOTAL", "NEG_TOTAL" ) ]
> I have a work around that allows me to get to my desired endpoint
> that involves splitting the data.frame into two (by test result), 
> then
> using the YR_MO as the by.x/by.y in a merge, but I think this task
> would be handled more efficiently using reshape?? Can anyone help me
> to see where I'm going wrong?? Thanks in advance!
>
> 	[[alternative HTML version deleted]]
 (Please remember that this is a plain text email list.)

 ---------------------------------------------------------------------------
 Jeff Newmiller                        The     .....       .....  Go 
 Live...
 DCN:<jdnewmil_at_dcn.davis.ca.us>     Basics: ##.#.       ##.#.  Live 
 Go...
                                       Live:   OO#.. Dead: OO#..  
 Playing
 Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
 /Software/Embedded Controllers)               .OO#.       .OO#.  
 rocks...1k

Maybe Matching Threads

Search for more possibly parallel threads

R help - Nov 2011 - Help with recast() syntax

[R] Help with recast() syntax

[R] Help with recast() syntax

[R] Help with recast() syntax

Maybe Matching Threads