Dear Help-Rs, I have data similar to the following: DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS" ), class = "data.frame", row.names = c(NA, -22L)) Currently there are 2 observations for each month (one for negative and one for positive test results). What I need to create a data set that looks like the following, with positive and negative test results in the same row organized by month: DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"), YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA, -11L)) As this is something that I understand Hadley Wickham's Reshape package is ideally suited for, I tried using the following reshape command: ReshapeDF <- recast(DF, YR_MO~variable) I get the following error message: Using RESULT as id variables Error: Casting formula contains variables not found in molten data: YR_MO I have a work around that allows me to get to my desired endpoint that involves splitting the data.frame into two (by test result), then using the YR_MO as the by.x/by.y in a merge, but I think this task would be handled more efficiently using reshape? Can anyone help me to see where I'm going wrong? Thanks in advance! [[alternative HTML version deleted]]
On Nov 29, 2011, at 12:32 AM, Chris Conner wrote:> Dear Help-Rs, > > I have data similar to the following: > > DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,This section of the structure has two NEG's for 201109 and none for POS.> 1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO = > c(201011L, > 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L, > 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L, > 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L > ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L, > 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L, > 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS" > ), class = "data.frame", row.names = c(NA, -22L)) > > Currently there are 2 observations for each month (one for negative > and one for positive test results). What I need to create a data > set that looks like the following, with positive and negative test > results in the same row organized by month:After fixing the POS/NEG discrepancy, this works: > dcast(DF, YR_MO ~ RESULT, value_var="TOT_TESTS") YR_MO NEG POS 1 201011 349 66 2 201012 393 98 3 201101 376 109 4 201102 371 122 5 201103 396 113 6 201104 367 111 7 201105 406 113 8 201106 383 146 9 201107 394 124 10 201108 412 130 11 201109 379 120 -- David.> > DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"), > YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L, > 201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L, > 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L > ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L, > 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", > "POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA, > -11L)) > > As this is something that I understand Hadley Wickham's Reshape > package is ideally suited for, I tried using the following reshape > command: > > ReshapeDF <- recast(DF, YR_MO~variable) > > I get the following error message: > > Using RESULT as id variables > Error: Casting formula contains variables not found in molten data: > YR_MO > > I have a work around that allows me to get to my desired endpoint > that involves splitting the data.frame into two (by test result), > then using the YR_MO as the by.x/by.y in a merge, but I think this > task would be handled more efficiently using reshape? Can anyone > help me to see where I'm going wrong? Thanks in advance! > > [[alternative HTML version deleted]]David Winsemius, MD Heritage Laboratories West Hartford, CT
Inline below... On Mon, 28 Nov 2011 21:32:21 -0800 (PST), Chris Conner <connerpharmd at yahoo.com> wrote:> Dear Help-Rs, > ? > I have data similar to the following: > ? > DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO = > c(201011L, > 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L, > 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L, > 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L > ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L, > 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L, > 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS" > ), class = "data.frame", row.names = c(NA, -22L)) > ? > Currently there are 2 observations for each month (one for negative > and one for positive test results).? What I need to create a data set > that looks like the following, with positive and negative test > results > in the same row organized by month: > ? > DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"), > ??? YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L, > ??? 201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L, > ??? 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L > ??? ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L, > ??? 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", > "POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA, > -11L))Thanks for the sample data.> As this is something that I understand Hadley Wickham's Reshape > package is ideally suited for, I tried using the following reshape > command: > ? > ReshapeDF <- recast(DF, YR_MO~variable) > ? > I get the following error message: > ? > Using RESULT as id variables > Error: Casting formula contains variables not found in molten data: > YR_MOI don't think you need to melt the data first, so you don't need the recast function. # reshape2 is faster than reshape, but slightly syntactically different library(reshape2) # rename the RESULT levels DF0 <- DF levels( DF0$RESULT ) <- c( "NEG_TOTAL", "POS_TOTAL" ) # cast to data frame, use sum if more than one row for a given YR_MO DF0 <- dcast( DF0, YR_MO~RESULT, sum, value.var="TOT_TESTS" ) # The rest of this is to make the data frame look like your result, which seems # unnecessary to me, but perhaps there is a good reason for keeping X and RESULT DF1 <- merge( DF[ DF$RESULT=="POS", c( "X", "RESULT", "YR_MO" ) ], DF0 ) DF2 <- DF1[,c("X", "RESULT", "YR_MO", "POS_TOTAL", "NEG_TOTAL" ) ]> I have a work around that allows me to get to my desired endpoint > that involves splitting the data.frame into two (by test result), > then > using the YR_MO as the by.x/by.y in a merge, but I think this task > would be handled more efficiently using reshape?? Can anyone help me > to see where I'm going wrong?? Thanks in advance! > > [[alternative HTML version deleted]](Please remember that this is a plain text email list.) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil_at_dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k