What the OP is doing looks fine to me. The environment holding the data vectors is not necessary, but it helps organize things - you know where to look for this sort of data vector. I would avoid the *.rda file, since it is not text, hence not readily editable or trackable with most source control systems. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 13, 2018 at 6:17 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> a) There is a mailing list for package development questions: > R-package-devel. > > b) This seems like a job for the sysdata.rda file... no explicit > environments needed. See the Writing R Extensions manual. > > On July 13, 2018 5:51:06 PM PDT, Michael Hannon < > jmhannon.ucdavis at gmail.com> wrote: > >Greetings. I'm putting together a small package in which I use > >`dplyr::read_csv()` to read CSV files from several different sources. > >I do > >this in several different files, but with various kinds of subsequent > >processing, depending on the file. > > > >I find it useful to specify column types, as the apparent data type of > >a given > >column sometimes changes unexpectedly deep into the file. I.e., a > >field that > >consistently looks like an integer, suddenly becomes a fraction: > > > > 1, 1, ..., 1, 1/2, 1, ... > > > >Hence, the column type has to be treated as a character, rather than as > >an > >integer (with the possibility of later conversion to double, if > >necessary). > >(This is just an example.) > > > >Therefore I use the `col_types` argument in all of the calls to > >`read_csv()`. > > > >These calls are spread over several files, but I want the keep all of > >the > >column types in a single place, yet have them available in each of the > >several > >files. This is just for the sake of maintainability. > > > >At the moment I do this by putting the column-type definitions into a > >single, > >file: > > > > 000_define_data_attributes.R > > > >that: > > > > (1) is named so that it's parsed first by `devtools::build()` > > (2) sets up an environment and stuffs the column types into it: > > > > data_env <- new.env(parent=emptyenv()) > > data_env$col_types_alpha <- list( > > Date = col_date(), > > var1 = col_double(), > > ... > > ) > > > >There are a few other things that go into the file as well. > > > >Then I pick off the appropriate stuff from the environment in the other > >files: > > > >foo_alpha <- read_csv("alpha.csv", col_types > >data_env$col_types_alpha) > > > >This seems to work, but it doesn't "feel" right to me. (If this were > >Python, > >people would accuse me of being "non-pythonic"). > > > >Hence, I'm seeking suggestions for the best practice for this kind of > >thing. > > > >BTW, I note that both the sources of data ("alpha", etc.) and the > >column types > >are more or less guaranteed to be static for the foreseeable future. > >Hence, > >there really isn't much danger in just replicating the column-type > >definitions > >in each of the various files, which would obviate the need for the > >"000..." > >file. In other words, this is mostly a style thing. > > > >Thanks for any advice you can provide. > > > >-- Mike > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Avoiding rda files because they don't track well with version control seems weak to me, since you should be creating the rda with an R file in the tools directory. On July 13, 2018 6:50:31 PM PDT, William Dunlap <wdunlap at tibco.com> wrote:>What the OP is doing looks fine to me. > >The environment holding the data vectors is not necessary, but it helps >organize things - you know where to look for this sort of data vector. > >I would avoid the *.rda file, since it is not text, hence not readily >editable >or trackable with most source control systems. > > >Bill Dunlap >TIBCO Software >wdunlap tibco.com > >On Fri, Jul 13, 2018 at 6:17 PM, Jeff Newmiller ><jdnewmil at dcn.davis.ca.us> >wrote: > >> a) There is a mailing list for package development questions: >> R-package-devel. >> >> b) This seems like a job for the sysdata.rda file... no explicit >> environments needed. See the Writing R Extensions manual. >> >> On July 13, 2018 5:51:06 PM PDT, Michael Hannon < >> jmhannon.ucdavis at gmail.com> wrote: >> >Greetings. I'm putting together a small package in which I use >> >`dplyr::read_csv()` to read CSV files from several different >sources. >> >I do >> >this in several different files, but with various kinds of >subsequent >> >processing, depending on the file. >> > >> >I find it useful to specify column types, as the apparent data type >of >> >a given >> >column sometimes changes unexpectedly deep into the file. I.e., a >> >field that >> >consistently looks like an integer, suddenly becomes a fraction: >> > >> > 1, 1, ..., 1, 1/2, 1, ... >> > >> >Hence, the column type has to be treated as a character, rather than >as >> >an >> >integer (with the possibility of later conversion to double, if >> >necessary). >> >(This is just an example.) >> > >> >Therefore I use the `col_types` argument in all of the calls to >> >`read_csv()`. >> > >> >These calls are spread over several files, but I want the keep all >of >> >the >> >column types in a single place, yet have them available in each of >the >> >several >> >files. This is just for the sake of maintainability. >> > >> >At the moment I do this by putting the column-type definitions into >a >> >single, >> >file: >> > >> > 000_define_data_attributes.R >> > >> >that: >> > >> > (1) is named so that it's parsed first by `devtools::build()` >> > (2) sets up an environment and stuffs the column types into it: >> > >> > data_env <- new.env(parent=emptyenv()) >> > data_env$col_types_alpha <- list( >> > Date = col_date(), >> > var1 = col_double(), >> > ... >> > ) >> > >> >There are a few other things that go into the file as well. >> > >> >Then I pick off the appropriate stuff from the environment in the >other >> >files: >> > >> >foo_alpha <- read_csv("alpha.csv", col_types >> >data_env$col_types_alpha) >> > >> >This seems to work, but it doesn't "feel" right to me. (If this >were >> >Python, >> >people would accuse me of being "non-pythonic"). >> > >> >Hence, I'm seeking suggestions for the best practice for this kind >of >> >thing. >> > >> >BTW, I note that both the sources of data ("alpha", etc.) and the >> >column types >> >are more or less guaranteed to be static for the foreseeable future. >> >Hence, >> >there really isn't much danger in just replicating the column-type >> >definitions >> >in each of the various files, which would obviate the need for the >> >"000..." >> >file. In other words, this is mostly a style thing. >> > >> >Thanks for any advice you can provide. >> > >> >-- Mike >> > >> >______________________________________________ >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>-- Sent from my phone. Please excuse my brevity.
Thanks to all for your replies. So far as I can see, there was nothing wrong with my original approach, but I've decided to stuff all the relevant definitions into a function (or functions), as this seems to make "devtools::check()" happier. -- Mike On Fri, Jul 13, 2018 at 6:54 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Avoiding rda files because they don't track well with version control seems weak to me, since you should be creating the rda with an R file in the tools directory. > > On July 13, 2018 6:50:31 PM PDT, William Dunlap <wdunlap at tibco.com> wrote: >>What the OP is doing looks fine to me. >> >>The environment holding the data vectors is not necessary, but it helps >>organize things - you know where to look for this sort of data vector. >> >>I would avoid the *.rda file, since it is not text, hence not readily >>editable >>or trackable with most source control systems. >> >> >>Bill Dunlap >>TIBCO Software >>wdunlap tibco.com >> >>On Fri, Jul 13, 2018 at 6:17 PM, Jeff Newmiller >><jdnewmil at dcn.davis.ca.us> >>wrote: >> >>> a) There is a mailing list for package development questions: >>> R-package-devel. >>> >>> b) This seems like a job for the sysdata.rda file... no explicit >>> environments needed. See the Writing R Extensions manual. >>> >>> On July 13, 2018 5:51:06 PM PDT, Michael Hannon < >>> jmhannon.ucdavis at gmail.com> wrote: >>> >Greetings. I'm putting together a small package in which I use >>> >`dplyr::read_csv()` to read CSV files from several different >>sources. >>> >I do >>> >this in several different files, but with various kinds of >>subsequent >>> >processing, depending on the file. >>> > >>> >I find it useful to specify column types, as the apparent data type >>of >>> >a given >>> >column sometimes changes unexpectedly deep into the file. I.e., a >>> >field that >>> >consistently looks like an integer, suddenly becomes a fraction: >>> > >>> > 1, 1, ..., 1, 1/2, 1, ... >>> > >>> >Hence, the column type has to be treated as a character, rather than >>as >>> >an >>> >integer (with the possibility of later conversion to double, if >>> >necessary). >>> >(This is just an example.) >>> > >>> >Therefore I use the `col_types` argument in all of the calls to >>> >`read_csv()`. >>> > >>> >These calls are spread over several files, but I want the keep all >>of >>> >the >>> >column types in a single place, yet have them available in each of >>the >>> >several >>> >files. This is just for the sake of maintainability. >>> > >>> >At the moment I do this by putting the column-type definitions into >>a >>> >single, >>> >file: >>> > >>> > 000_define_data_attributes.R >>> > >>> >that: >>> > >>> > (1) is named so that it's parsed first by `devtools::build()` >>> > (2) sets up an environment and stuffs the column types into it: >>> > >>> > data_env <- new.env(parent=emptyenv()) >>> > data_env$col_types_alpha <- list( >>> > Date = col_date(), >>> > var1 = col_double(), >>> > ... >>> > ) >>> > >>> >There are a few other things that go into the file as well. >>> > >>> >Then I pick off the appropriate stuff from the environment in the >>other >>> >files: >>> > >>> >foo_alpha <- read_csv("alpha.csv", col_types >>> >data_env$col_types_alpha) >>> > >>> >This seems to work, but it doesn't "feel" right to me. (If this >>were >>> >Python, >>> >people would accuse me of being "non-pythonic"). >>> > >>> >Hence, I'm seeking suggestions for the best practice for this kind >>of >>> >thing. >>> > >>> >BTW, I note that both the sources of data ("alpha", etc.) and the >>> >column types >>> >are more or less guaranteed to be static for the foreseeable future. >>> >Hence, >>> >there really isn't much danger in just replicating the column-type >>> >definitions >>> >in each of the various files, which would obviate the need for the >>> >"000..." >>> >file. In other words, this is mostly a style thing. >>> > >>> >Thanks for any advice you can provide. >>> > >>> >-- Mike >>> > >>> >______________________________________________ >>> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >https://stat.ethz.ch/mailman/listinfo/r-help >>> >PLEASE do read the posting guide >>> >http://www.R-project.org/posting-guide.html >>> >and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Sent from my phone. Please excuse my brevity. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/ >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > -- > Sent from my phone. Please excuse my brevity.