Hi all. I have unexpected reshape results on datasets with certain variable names. Here a reproducible example: d <- matrix(seq_len(7*7), 1, 7*7) vnames <- c('acc','ppeGross','CF','ROA','DeltaSales','invTA','DeltaRevDeltaRec') varying <- unlist(lapply(vnames, paste, 1:7, sep='.')) d <- data.frame(d) names(d) <- varying d1 <- reshape(d, varying=varying, direction="long") d[,'ppeGross.2'] == d1[d1$time==2,'ppeGross'] #This is FALSE! ##Try to compare d and d1: values are wrong from the 2nd column ##Changing variable names makes thinks go right: vnames <- letters[1:7] varying <- unlist(lapply(vnames, paste, 1:7, sep='.')) names(d) <- varying d1 <- reshape(d, varying=varying, direction="long") d[,'b.2'] == d1[d1$time==2,'b'] #This is TRUE, as expected ##Try to compare d and d1 now: they look right Any hint on what's wrong here? By now, my workarond is changing variable names before reshaping, than re-assign old variable names back after reshape. Best regards, Antonio, Fabio Di Narzo.> R.version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 6.0 year 2007 month 10 day 03 svn rev 43063 language R version.string R version 2.6.0 (2007-10-03)
I got the problem: dataframe columns are re-ordered alfabetically, but variable names aren't reordered accordingly in the resulting dataframe. The problem disappears by specifying the 'varying' argument as a named list: d <- matrix(seq_len(7*7), 1, 7*7) vnames <- c('acc','ppeGross','CF','ROA','DeltaSales','invTA','DeltaRevDeltaRec') vnames.all <- unlist(lapply(vnames, paste, 1:7, sep='.')) varying <- split(vnames.all, rep(vnames, each=7)) d <- data.frame(d) names(d) <- vnames.all d1 <- reshape(d, varying=varying, direction="long") d d1 #Now is ok 2007/11/24, Antonio, Fabio Di Narzo <antonio.fabio at gmail.com>:> Hi all. > I have unexpected reshape results on datasets with certain variable > names. Here a reproducible example: > > d <- matrix(seq_len(7*7), 1, 7*7) > vnames <- c('acc','ppeGross','CF','ROA','DeltaSales','invTA','DeltaRevDeltaRec') > varying <- unlist(lapply(vnames, paste, 1:7, sep='.')) > d <- data.frame(d) > names(d) <- varying > d1 <- reshape(d, varying=varying, direction="long") > d[,'ppeGross.2'] == d1[d1$time==2,'ppeGross'] #This is FALSE! > ##Try to compare d and d1: values are wrong from the 2nd column > > ##Changing variable names makes thinks go right: > vnames <- letters[1:7] > varying <- unlist(lapply(vnames, paste, 1:7, sep='.')) > names(d) <- varying > d1 <- reshape(d, varying=varying, direction="long") > d[,'b.2'] == d1[d1$time==2,'b'] #This is TRUE, as expected > ##Try to compare d and d1 now: they look right > > Any hint on what's wrong here? By now, my workarond is changing > variable names before reshaping, than re-assign old variable names > back after reshape. > > Best regards, > Antonio, Fabio Di Narzo. > > > R.version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 6.0 > year 2007 > month 10 > day 03 > svn rev 43063 > language R > version.string R version 2.6.0 (2007-10-03) >-- Antonio, Fabio Di Narzo Ph.D. student at Department of Statistical Sciences University of Bologna, Italy
Antonio, Fabio Di Narzo wrote:> Hi all. > I have unexpected reshape results on datasets with certain variable > names. Here a reproducible example: > > d <- matrix(seq_len(7*7), 1, 7*7) > vnames <- c('acc','ppeGross','CF','ROA','DeltaSales','invTA','DeltaRevDeltaRec') > varying <- unlist(lapply(vnames, paste, 1:7, sep='.')) > d <- data.frame(d) > names(d) <- varying > d1 <- reshape(d, varying=varying, direction="long") > d[,'ppeGross.2'] == d1[d1$time==2,'ppeGross'] #This is FALSE! > ##Try to compare d and d1: values are wrong from the 2nd column > > ##Changing variable names makes thinks go right: > vnames <- letters[1:7] > varying <- unlist(lapply(vnames, paste, 1:7, sep='.')) > names(d) <- varying > d1 <- reshape(d, varying=varying, direction="long") > d[,'b.2'] == d1[d1$time==2,'b'] #This is TRUE, as expected > ##Try to compare d and d1 now: they look right > > Any hint on what's wrong here? By now, my workarond is changing > variable names before reshaping, than re-assign old variable names > back after reshape. > > Best regards, > Antonio, Fabio Di Narzo. >Ouch. This was dumb (*): The problem is the guess() function using split(nms, nn[,1]), which implicitly runs factor(nn[,1]) and so gives out the groups in the order of sort(unique(nn[,1])), but later on we just use unique(nn[,1]). Fortunately, this is wrong enough and trivial enough to fix, that it can make it into 2.6.1. -pd (*) I think I wrote it, so I can say so.> >> R.version >> > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 6.0 > year 2007 > month 10 > day 03 > svn rev 43063 > language R > version.string R version 2.6.0 (2007-10-03) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907