jim holtman
2011-Mar-03 22:04 UTC
[R] 'merge' function creating duplicate columns names in the output
The "merge" command is creating duplicate column names in a dataframe that is the result of the merge. The following is the 'merge' command: x <- merge(invType , allocSlots , by.x = 'index' , by.y = 'indx' , all.x = TRUE ) The 'invType' dataframe was the result of a previous merge and has the following column names that are probably causing the problem: height.x height.y height> str(invType)'data.frame': 2219 obs. of 30 variables: $ loc : chr "F0AA63" "F0AA65" "F0AA73" "F0AA75" ... $ KLN : int 3569383 3515513 3565497 3555138 3565162 3555001 3565139 3555886 3565796 3556647 ... $ comm : int 451 57 560 40 560 39 560 40 560 46 ... $ case : num 7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ... $ desc : chr "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ... $ height.x: num 7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ... $ length : num 14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ... $ weight : num 11 16.3 39 11 35.6 6.5 36 4 30 12.5 ... $ width : num 9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ... $ high : int 5 2 3 3 4 3 3 3 3 3 ... $ pqty : int 65 26 18 45 20 45 21 39 24 30 ... $ boh : int 4372 58 1199 51 836 116 64 312 371 389 ... $ awm : num 694 44.3 53.8 35 0.8 ... $ cubes : num 0.586 1.06 1.821 0.563 1.328 ... $ pallet : num 42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ... $ adm : num 99.143 6.329 7.686 5 0.114 ... $ tie : num 13 13 6 15 5 15 7 13 8 10 ... $ origComm: int 457 57 547 40 541 39 552 40 552 46 ... $ days : num 0.656 6.162 2.342 11.998 216.853 ... $ class : chr "single" "double" "single" "double" ... $ top.x : logi TRUE TRUE FALSE TRUE FALSE TRUE ... $ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8 23 15 23 16 ... $ type.x : int 2 2 2 2 2 2 2 2 2 2 ... $ height.y: num 47 47 47 47 47 47 47 47 47 47 ... $ top.y : logi FALSE TRUE FALSE TRUE FALSE TRUE ... $ noChange: logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ type.y : int 2 2 2 2 2 2 2 2 2 2 ... $ depth : num 48 48 48 48 48 48 48 48 48 48 ... $ height : num 47 47 47 47 47 47 47 47 47 47 ... $ index : int 1 2 3 4 5 6 7 8 9 10 ... Now the "allocSlots" dataframe also has a column name 'height'> str(allocSlots)'data.frame': 2462 obs. of 6 variables: $ loc : chr "F1AA02" "F1AA12" "F1AA22" "F1AA32" ... $ height: num 72 72 72 72 72 72 72 72 72 72 ... $ depth : num 48 48 48 48 48 48 48 48 48 48 ... $ bay : chr "F1AA0" "F1AA0" "F1AA2" "F1AA2" ... $ indx : int 1675 1617 1386 1096 1077 963 816 471 275 259 ... $ type : int 1 1 1 1 1 1 1 1 1 1 ... Here is the result of the 'merge': (notice that there are now two 'height.x' and 'height.y' columns in the dataframe:> str(x)'data.frame': 2219 obs. of 35 variables: $ index : int 1 2 3 4 5 6 7 8 9 10 ... $ loc.x : chr "F0AA63" "F0AA65" "F0AA73" "F0AA75" ... $ KLN : int 3569383 3515513 3565497 3555138 3565162 3555001 3565139 3555886 3565796 3556647 ... $ comm : int 451 57 560 40 560 39 560 40 560 46 ... $ case : num 7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ... $ desc : chr "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ... $ height.x: num 7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ... $ length : num 14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ... $ weight : num 11 16.3 39 11 35.6 6.5 36 4 30 12.5 ... $ width : num 9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ... $ high : int 5 2 3 3 4 3 3 3 3 3 ... $ pqty : int 65 26 18 45 20 45 21 39 24 30 ... $ boh : int 4372 58 1199 51 836 116 64 312 371 389 ... $ awm : num 694 44.3 53.8 35 0.8 ... $ cubes : num 0.586 1.06 1.821 0.563 1.328 ... $ pallet : num 42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ... $ adm : num 99.143 6.329 7.686 5 0.114 ... $ tie : num 13 13 6 15 5 15 7 13 8 10 ... $ origComm: int 457 57 547 40 541 39 552 40 552 46 ... $ days : num 0.656 6.162 2.342 11.998 216.853 ... $ class : chr "single" "double" "single" "double" ... $ top.x : logi TRUE TRUE FALSE TRUE FALSE TRUE ... $ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8 23 15 23 16 ... $ type.x : int 2 2 2 2 2 2 2 2 2 2 ... $ height.y: num 47 47 47 47 47 47 47 47 47 47 ... $ top.y : logi FALSE TRUE FALSE TRUE FALSE TRUE ... $ noChange: logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ type.y : int 2 2 2 2 2 2 2 2 2 2 ... $ depth.x : num 48 48 48 48 48 48 48 48 48 48 ... $ height.x: num 47 47 47 47 47 47 47 47 47 47 ... $ loc.y : chr "F1KC22" "F1BM34" "F1HC73" "F1FJ65" ... $ height.y: num 72 44 72 44 72 44 72 44 72 72 ... $ depth.y : num 48 48 48 48 48 48 48 48 48 48 ... $ bay : chr "F1KC2" "F1BM2" "F1HC7" "F1FJ5" ... $ type : int 1 2 1 2 1 2 1 2 1 1 ... My workaround is to change one of the "height" to something else to avoid the problem, but someone else might stumble on the same error. Should we expect 'merge' to ensure that the column names are unique in the result?> sessionInfo()R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?