Jeff Newmiller
2019-Jul-03 16:42 UTC
[R] Control the variable order after multiple declarations using within
Dummy columns do have some drawbacks though, if you find yourself working with large data frames. The dummy columns waste memory and time as compared to either reorganizing columns after the `within` or using separate sequential `with` expressions as I previously suggested. I think mutate avoids this overhead also. On July 3, 2019 8:25:32 AM PDT, Eric Berger <ericjberger at gmail.com> wrote:>Nice suggestion, Richard. > >On Wed, Jul 3, 2019 at 4:28 PM Richard O'Keefe <raoknz at gmail.com> >wrote: > >> Why not set all the new columns to dummy values to get the order you >> want and then set them to their final values in the order that works >> for that? >> >> >> On Thu, 4 Jul 2019 at 00:12, Kevin Thorpe <kevin.thorpe at utoronto.ca> >> wrote: >> >> > >> > > On Jul 3, 2019, at 3:15 AM, Sebastien Bihorel < >> > sebastien.bihorel at cognigencorp.com> wrote: >> > > >> > > Hi, >> > > >> > > The within function can be used to modify data.frames (among >other >> > objects). One can even provide multiple expressions to modify the >> > data.frame by more than one expression. However, when new variables >are >> > created, they seem to be inserted in the data.frame in the opposite >order >> > they were declared: >> > > >> > >> df <- data.frame(a=1) >> > >> within(df, {b<-a*2; c<-b*3}) >> > > a c b >> > > 1 1 6 2 >> > > >> > > Is there a way to insert the variables in an order consistent >with the >> > order of declaration (ie, a, b, c)? >> > > >> > >> > One way is to use mutate() from the dplyr package. >> > >> > >> > > Thanks >> > > >> > > Sebastien >> > > >> > > ______________________________________________ >> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible >code. >> > >> > >> > -- >> > Kevin E. Thorpe >> > Head of Biostatistics, Applied Health Research Centre (AHRC) >> > Li Ka Shing Knowledge Institute of St. Michael's >> > Assistant Professor, Dalla Lana School of Public Health >> > University of Toronto >> > email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: >416.864.3016 >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Duncan Murdoch
2019-Jul-03 16:52 UTC
[R] Control the variable order after multiple declarations using within
On 03/07/2019 12:42 p.m., Jeff Newmiller wrote:> Dummy columns do have some drawbacks though, if you find yourself working with large data frames. The dummy columns waste memory and time as compared to either reorganizing columns after the `within` or using separate sequential `with` expressions as I previously suggested. I think mutate avoids this overhead also.I think mutate() has only a very small advantage over within(). Neither one of them is flexible about the order of columns in the final result. In the OPs example, mutate creates the variables in the desired order, but it would be no better if the desired order had been a, c, b, because b is needed for the calculation of c, so it would be created first. Eric's suggestion within(df, {b<-a*2; c<-b*3})[c("a","b","c")] is the best so far, though I'd probably write it as within(df, {b<-a*2; c<-b*3})[, c("a","b","c")] just to avoid confusing my future self and make clear that I'm talking about specifying an order for the columns. And if you really, really want everything to happen within the call, just create the variables in the reverse order to what you want, e.g. within(df, {c <- a; b<-a*2; c<-b*3}) but to me that is a lot less clear than Eric's solution. Duncan Murdoch> > On July 3, 2019 8:25:32 AM PDT, Eric Berger <ericjberger at gmail.com> wrote: >> Nice suggestion, Richard. >> >> On Wed, Jul 3, 2019 at 4:28 PM Richard O'Keefe <raoknz at gmail.com> >> wrote: >> >>> Why not set all the new columns to dummy values to get the order you >>> want and then set them to their final values in the order that works >>> for that? >>> >>> >>> On Thu, 4 Jul 2019 at 00:12, Kevin Thorpe <kevin.thorpe at utoronto.ca> >>> wrote: >>> >>>> >>>>> On Jul 3, 2019, at 3:15 AM, Sebastien Bihorel < >>>> sebastien.bihorel at cognigencorp.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> The within function can be used to modify data.frames (among >> other >>>> objects). One can even provide multiple expressions to modify the >>>> data.frame by more than one expression. However, when new variables >> are >>>> created, they seem to be inserted in the data.frame in the opposite >> order >>>> they were declared: >>>>> >>>>>> df <- data.frame(a=1) >>>>>> within(df, {b<-a*2; c<-b*3}) >>>>> a c b >>>>> 1 1 6 2 >>>>> >>>>> Is there a way to insert the variables in an order consistent >> with the >>>> order of declaration (ie, a, b, c)? >>>>> >>>> >>>> One way is to use mutate() from the dplyr package. >>>> >>>> >>>>> Thanks >>>>> >>>>> Sebastien >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible >> code. >>>> >>>> >>>> -- >>>> Kevin E. Thorpe >>>> Head of Biostatistics, Applied Health Research Centre (AHRC) >>>> Li Ka Shing Knowledge Institute of St. Michael's >>>> Assistant Professor, Dalla Lana School of Public Health >>>> University of Toronto >>>> email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: >> 416.864.3016 >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Sebastien Bihorel
2019-Jul-04 10:14 UTC
[R] Control the variable order after multiple declarations using within
Thanks all for your inputs. ----- Original Message ----- From: "Duncan Murdoch" <murdoch.duncan at gmail.com> To: "Jeff Newmiller" <jdnewmil at dcn.davis.ca.us>, r-help at r-project.org, "Eric Berger" <ericjberger at gmail.com>, "Richard O'Keefe" <raoknz at gmail.com> Cc: "Sebastien Bihorel" <sebastien.bihorel at cognigencorp.com> Sent: Wednesday, July 3, 2019 12:52:55 PM Subject: Re: [R] Control the variable order after multiple declarations using within On 03/07/2019 12:42 p.m., Jeff Newmiller wrote:> Dummy columns do have some drawbacks though, if you find yourself working with large data frames. The dummy columns waste memory and time as compared to either reorganizing columns after the `within` or using separate sequential `with` expressions as I previously suggested. I think mutate avoids this overhead also.I think mutate() has only a very small advantage over within(). Neither one of them is flexible about the order of columns in the final result. In the OPs example, mutate creates the variables in the desired order, but it would be no better if the desired order had been a, c, b, because b is needed for the calculation of c, so it would be created first. Eric's suggestion within(df, {b<-a*2; c<-b*3})[c("a","b","c")] is the best so far, though I'd probably write it as within(df, {b<-a*2; c<-b*3})[, c("a","b","c")] just to avoid confusing my future self and make clear that I'm talking about specifying an order for the columns. And if you really, really want everything to happen within the call, just create the variables in the reverse order to what you want, e.g. within(df, {c <- a; b<-a*2; c<-b*3}) but to me that is a lot less clear than Eric's solution. Duncan Murdoch> > On July 3, 2019 8:25:32 AM PDT, Eric Berger <ericjberger at gmail.com> wrote: >> Nice suggestion, Richard. >> >> On Wed, Jul 3, 2019 at 4:28 PM Richard O'Keefe <raoknz at gmail.com> >> wrote: >> >>> Why not set all the new columns to dummy values to get the order you >>> want and then set them to their final values in the order that works >>> for that? >>> >>> >>> On Thu, 4 Jul 2019 at 00:12, Kevin Thorpe <kevin.thorpe at utoronto.ca> >>> wrote: >>> >>>> >>>>> On Jul 3, 2019, at 3:15 AM, Sebastien Bihorel < >>>> sebastien.bihorel at cognigencorp.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> The within function can be used to modify data.frames (among >> other >>>> objects). One can even provide multiple expressions to modify the >>>> data.frame by more than one expression. However, when new variables >> are >>>> created, they seem to be inserted in the data.frame in the opposite >> order >>>> they were declared: >>>>> >>>>>> df <- data.frame(a=1) >>>>>> within(df, {b<-a*2; c<-b*3}) >>>>> a c b >>>>> 1 1 6 2 >>>>> >>>>> Is there a way to insert the variables in an order consistent >> with the >>>> order of declaration (ie, a, b, c)? >>>>> >>>> >>>> One way is to use mutate() from the dplyr package. >>>> >>>> >>>>> Thanks >>>>> >>>>> Sebastien >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible >> code. >>>> >>>> >>>> -- >>>> Kevin E. Thorpe >>>> Head of Biostatistics, Applied Health Research Centre (AHRC) >>>> Li Ka Shing Knowledge Institute of St. Michael's >>>> Assistant Professor, Dalla Lana School of Public Health >>>> University of Toronto >>>> email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: >> 416.864.3016 >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >