Marco Barbàra
2008-Oct-08 17:30 UTC
[R] Observed responses in 'augPred' data frame - Wrong order ?
Dea-R community. I'd like to draw your attention to an issue I have recently encountered while doing my current data analysis. I've got an unexpected (to me) result from the command:> augPred(lmList(my.object)),'my.object' being a grouped data frame of class:> class(my.object)[1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame" The problem is more easily explained by showing these two to Trellis plot:> plot(my.object,some_options) > plot(augPred(lmList(my.object)),some_options)[link to output: http://www.palug.net/Members/jabba ] Clearly the problem is that the predictions are correct, but the responses in the ``original'' part of the augPred data frame are in the wrong order, i.e. the response order does not match the grouping factor order (not sure this is in correct English, sorry). Debugging the function 'augPred.lmList' I've managed to understand that the problem resides in the following function calls:> debug(nlme:::augPred.lmList) > fm1.lis <- lmList(my.object) > augPred(fm1.lis)debugging in: augPred.lmList(fm1.lis) debug: { ..... primary <- getCovariate(data) ..... groups <- getGroups(object) ..... orig <- data.frame(primary,groups, getResponse(object)) ..... } This approach works well on other datasets, such us the Orthodont data, but not for mine because, i guess, my rows are not first ordered by subject (subject is my grouping factor). In fact:> getGroups(fm1.lis)[1:5][1] CBT.1 CBT.2 CBT.3 CBT.4 CBT.5 34 Levels: CBT.14 < CBT.2 < CBT.11 < CBT.13 < CBT.12 < CBT.5 < ... < V.6> getResponse(fm1.lis)[1:5]CBT.1 CBT.1 CBT.1 CBT.1 CBT.2 34 32 29 27 28 My question: is there a simple way to get the proper result (modify the dataset or modify the function)? RTFM responses are welcomed. My (humble?) opinion: Neither Pinheiro-Bates (2000) nor online documentation, report the need for a particular row order in the costruction of grouped data objects, and i think it should be better do not rely on it. Or, at least, to document it somewhere. I want to thank the R-core team and the entire R community for this amazingly efficient and versatile software, and i am very proud that the (not only) statistical state-of-the-art software have a ``gpl distribution'' (which have many desirable properties ;-) ). Cheers. +---------------------------+ |Relevant system information| +---------------------------+> R.version_ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major 2 minor 7.2 year 2008 month 08 day 25 svn rev 46428 language R version.string R version 2.7.2 (2008-08-25)> sessionInfo()R version 2.7.2 (2008-08-25) i486-pc-linux-gnu locale: LC_CTYPE=it_IT;LC_NUMERIC=C;LC_TIME=it_IT;LC_COLLATE=it_IT;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=it_IT;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.2-44 lattice_0.17-15 nlme_3.1-89 loaded via a namespace (and not attached): [1] grid_2.7.2 tools_2.7.2 -- Marco Barb?ra - Undergraduate student in Statistics at the University of Palermo (Italy) - Only free-as-in-freedom software user - Emacs lover
Petr PIKAL
2008-Oct-09 13:11 UTC
[R] Odp: Observed responses in 'augPred' data frame - Wrong order ?
Hi I had similar problem before and I ended with some ordering of groups before making grouped data object but it was data specific so I can not give you definite clue. Regards Petr r-help-bounces at r-project.org napsal dne 08.10.2008 19:30:07:> Dea-R community. > > I'd like to draw your attention to an issue I have recently > encountered while doing my current data analysis. > > I've got an unexpected (to me) result from the command: > > > augPred(lmList(my.object)), > > 'my.object' being a grouped data frame of class: > > > class(my.object) > [1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame" > > The problem is more easily explained by showing these two to Trellis > plot: > > > plot(my.object,some_options) > > plot(augPred(lmList(my.object)),some_options) > > [link to output: http://www.palug.net/Members/jabba ] > > Clearly the problem is that the predictions are correct, but the > responses in the ``original'' part of the augPred data frame are in the > wrong order, i.e. the response order does not match the grouping > factor order (not sure this is in correct English, sorry). > > Debugging the function 'augPred.lmList' I've managed to understand that > the problem resides in the following function calls: > > > debug(nlme:::augPred.lmList) > > fm1.lis <- lmList(my.object) > > augPred(fm1.lis) > debugging in: augPred.lmList(fm1.lis) > debug: { > ..... > primary <- getCovariate(data) > ..... > groups <- getGroups(object) > ..... > orig <- data.frame(primary,groups, getResponse(object)) > ..... > } > > This approach works well on other datasets, such us the Orthodont data, > but not for mine because, i guess, my rows are not first ordered by > subject (subject is my grouping factor). In fact: > > > getGroups(fm1.lis)[1:5] > [1] CBT.1 CBT.2 CBT.3 CBT.4 CBT.5 > 34 Levels: CBT.14 < CBT.2 < CBT.11 < CBT.13 < CBT.12 < CBT.5 < ... < V.6 > > getResponse(fm1.lis)[1:5] > CBT.1 CBT.1 CBT.1 CBT.1 CBT.2 > 34 32 29 27 28 > > My question: > is there a simple way to get the proper result (modify the dataset or > modify the function)? RTFM responses are welcomed. > > My (humble?) opinion: > Neither Pinheiro-Bates (2000) nor online documentation, report the > need for a particular row order in the costruction of grouped data > objects, and i think it should be better do not rely on it. Or, at > least, to document it somewhere. > > > I want to thank the R-core team and the entire R community for this > amazingly efficient and versatile software, and i am very proud that the > (not only) statistical state-of-the-art software have a ``gpl > distribution'' (which have many desirable properties ;-) ). > > Cheers. > > +---------------------------+ > |Relevant system information| > +---------------------------+ > > > R.version > _ > platform i486-pc-linux-gnu > arch i486 > os linux-gnu > system i486, linux-gnu > status > major 2 > minor 7.2 > year 2008 > month 08 > day 25 > svn rev 46428 > language R > version.string R version 2.7.2 (2008-08-25) > > > sessionInfo() > R version 2.7.2 (2008-08-25) > i486-pc-linux-gnu > > locale: >LC_CTYPE=it_IT;LC_NUMERIC=C;LC_TIME=it_IT;LC_COLLATE=it_IT;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=it_IT;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT;LC_IDENTIFICATION=C> > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] MASS_7.2-44 lattice_0.17-15 nlme_3.1-89 > > loaded via a namespace (and not attached): > [1] grid_2.7.2 tools_2.7.2 > > > -- > Marco Barb?ra > > - Undergraduate student in Statistics > at the University of Palermo (Italy) > > - Only free-as-in-freedom software user > > - Emacs lover > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.