I believe Georg's pronouncements are wrong. See inline below. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) "...> Within R there are some limitations for storing the informtation about what a variable or a value within a variable means.That is FALSE. There are no limitations. For example, just attach a "doc" attribute to your data that says whatever you wish to about them. e.g.> somedata <- runif(10) > attr(somedata,"doc") <- "Anything you want to say about the data"> attr(somedata,"doc")[1] "Anything you want to say about the data" You can go as crazy as you want to with this, e.g. creating a (S3 or S4 )class "documented" with appropriate methods for printing it from classes that inherit from data frames, lists, etc. See also the roxygen2 package for data documentation and R's ?promptData function for data documentation file in Rd format. R is Turing complete -- so it can do anything any other programming language can do. You could program SAS in R if you wanted. The difference is that SAS has pre-programmed some capabilities that R leaves for users, including contributed packages -- like Sweave, knitr, etc. You may or may not like this extra flexibility (and extra work, depending on whether someone else has already done the work for you), and efficiency may or may not be an issue; but to say that R has "limitations" is a gross misrepresentation, imho. Possibilities to store this information is in other software packages like SAS or SPSS much broader implemented. In R you can work with meaningful variable names and the data type/class factor which can store mappings between values and value descriptions.> > Example > -- cut -- > var1 <- c(rep(1:5, 3)) > ds_example <- data.frame(var1) > > var1_labels <- c("1 = Strongly Agree", > "2 = Agree", > "3 = Neither agree/nor disagree", > "4 = Disagree", > "5 = Strongly disagree") > > ds_example[["var1"]] <- factor(ds_example[["var1"]], > levels = c(1, 2, 3, 4, 5), > labels = var1_labels) > > summary(ds_example["var1"]) > -- cut -- > > In addition you find methods to work with variable labels and value labels in the pacakges Hmisc and memisc. They can also produce a thing called codebook which contains all variable names, variable labels, values, value labels and summaries of the distribution of values within the variables. > > 3. In addition to this you could structure your script in a modular way according to the analysis process, e. g. > importing, cleaning, preparation for analysis, analysis, reporting. Other structure may be more sufficient in your case. These modules could have a number in the file name indicating in which sequence the scripts should be run. > > 4. I find it valuable to use a software repository like Github, Sourceforge or others to keep the revisions save and seucre in case you would like to go back to a version with code you deleted before and figure out that you need it now again. The R Studio IDE has an interface to git if you like to go with that. Good commit message can help you track what has changed. Commits also help you to prepare precise steps when developing your scripts. > > 5. I have no experience with Sweave or knitr but you could also compile a simple documentation through copying comments to an Excel sheet using R-2-Excel libraries like excel.link or others. > > Example > install.packages("excel.link") > library(excel.link) > xlc["A1"] <- "Project Documentation" > xlc["A2"] <- "Step XY" > xlc["A3"] <- "Some explanation about step xy" > > This way you have the documentation in your code and in an external source. > > Which approach you chose depends on your experience with R and its libraries as well as the size of your project and the need for documentation. > > 6. It can be helpful to store interim results in a format that can be read by non-R-users, e. g. Excel. > > 7. Documenting code can be done using roxygen2. > > If there are different opinions to my suggestions please say so. > > Kind regards > > Georg > > >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr >> Von: "Pito Salas" <pitosalas at brandeis.edu> >> An: r-help at r-project.org >> Betreff: [R] Documenting data >> >> I am studying statistics and using R in doing it. I come from software development where we document everything we do. >> >> As I ?massage? my data, adding columns to a frame, computing on other data, perhaps cleaning, I feel the need to document in detail what the meaning, or background, or calculations, or whatever of the data is. After all it is now derived from my raw data (which may have been well documented) but it is ?new.? >> >> Is this a real problem? Is there a ?best practice? to address this? >> >> Thanks! >> >> Pito Salas >> Brandeis Computer Science >> Feldberg 131 >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Bert, Hi Readers, I did not know much about attributes in R and how to use them. If it is that flexible you are right and I have learnt something. Kind regards Georg> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr > Von: "Bert Gunter" <bgunter.4567 at gmail.com> > An: G.Maubach at gmx.de > Cc: "Pito Salas" <pitosalas at brandeis.edu>, "R Help" <r-help at r-project.org> > Betreff: Re: [R] Documenting data > > I believe Georg's pronouncements are wrong. See inline below. > > -- Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > "... > > > Within R there are some limitations for storing the informtation about what a variable or a value within a variable means. > > That is FALSE. There are no limitations. For example, just attach a > "doc" attribute to your data that says whatever you wish to about > them. e.g. > > > somedata <- runif(10) > > attr(somedata,"doc") <- "Anything you want to say about the data" > > > attr(somedata,"doc") > [1] "Anything you want to say about the data" > > > You can go as crazy as you want to with this, e.g. creating a (S3 or > S4 )class "documented" with appropriate methods for printing it from > classes that inherit from data frames, lists, etc. See also the > roxygen2 package for data documentation and R's ?promptData function > for data documentation file in Rd format. > > R is Turing complete -- so it can do anything any other programming > language can do. You could program SAS in R if you wanted. The > difference is that SAS has pre-programmed some capabilities that R > leaves for users, including contributed packages -- like Sweave, > knitr, etc. You may or may not like this extra flexibility (and extra > work, depending on whether someone else has already done the work for > you), and efficiency may or may not be an issue; but to say that R has > "limitations" is a gross misrepresentation, imho. > > > > Possibilities to store this information is in other software packages > like SAS or SPSS much broader implemented. In R you can work with > meaningful variable names and the data type/class factor which can > store mappings between values and value descriptions. > > > > Example > > -- cut -- > > var1 <- c(rep(1:5, 3)) > > ds_example <- data.frame(var1) > > > > var1_labels <- c("1 = Strongly Agree", > > "2 = Agree", > > "3 = Neither agree/nor disagree", > > "4 = Disagree", > > "5 = Strongly disagree") > > > > ds_example[["var1"]] <- factor(ds_example[["var1"]], > > levels = c(1, 2, 3, 4, 5), > > labels = var1_labels) > > > > summary(ds_example["var1"]) > > -- cut -- > > > > In addition you find methods to work with variable labels and value labels in the pacakges Hmisc and memisc. They can also produce a thing called codebook which contains all variable names, variable labels, values, value labels and summaries of the distribution of values within the variables. > > > > 3. In addition to this you could structure your script in a modular way according to the analysis process, e. g. > > importing, cleaning, preparation for analysis, analysis, reporting. Other structure may be more sufficient in your case. These modules could have a number in the file name indicating in which sequence the scripts should be run. > > > > 4. I find it valuable to use a software repository like Github, Sourceforge or others to keep the revisions save and seucre in case you would like to go back to a version with code you deleted before and figure out that you need it now again. The R Studio IDE has an interface to git if you like to go with that. Good commit message can help you track what has changed. Commits also help you to prepare precise steps when developing your scripts. > > > > 5. I have no experience with Sweave or knitr but you could also compile a simple documentation through copying comments to an Excel sheet using R-2-Excel libraries like excel.link or others. > > > > Example > > install.packages("excel.link") > > library(excel.link) > > xlc["A1"] <- "Project Documentation" > > xlc["A2"] <- "Step XY" > > xlc["A3"] <- "Some explanation about step xy" > > > > This way you have the documentation in your code and in an external source. > > > > Which approach you chose depends on your experience with R and its libraries as well as the size of your project and the need for documentation. > > > > 6. It can be helpful to store interim results in a format that can be read by non-R-users, e. g. Excel. > > > > 7. Documenting code can be done using roxygen2. > > > > If there are different opinions to my suggestions please say so. > > > > Kind regards > > > > Georg > > > > > >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr > >> Von: "Pito Salas" <pitosalas at brandeis.edu> > >> An: r-help at r-project.org > >> Betreff: [R] Documenting data > >> > >> I am studying statistics and using R in doing it. I come from software development where we document everything we do. > >> > >> As I ?massage? my data, adding columns to a frame, computing on other data, perhaps cleaning, I feel the need to document in detail what the meaning, or background, or calculations, or whatever of the data is. After all it is now derived from my raw data (which may have been well documented) but it is ?new.? > >> > >> Is this a real problem? Is there a ?best practice? to address this? > >> > >> Thanks! > >> > >> Pito Salas > >> Brandeis Computer Science > >> Feldberg 131 > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >
On Thu, Jun 30, 2016 at 2:50 PM, <G.Maubach at gmx.de> wrote:> Hi Bert, > Hi Readers, > > I did not know much about attributes in R and how to use them. If it is that flexible you are right and I have learnt something.It is that flexible, but there is a big limitation that makes them much less useful than Bert suggests. Extending Bert's example: somedata <- runif(10) str(somedata) num [1:10] 0.9393 0.59204 0.04016 0.00273 0.02146 ... attr(somedata,"doc") <- "Anything you want to say about the data" str(somedata) ## atomic [1:10] 0.9393 0.59204 0.04016 0.00273 0.02146 ... ## - attr(*, "doc")= chr "Anything you want to say about the data" Notice that attaching attributes makes the output of str less informative. The other main limitation is that attributes tend to get lost when you manipulate the data: somedata <- somedata[!is.na(somedata)] attributes(somedata) ## NULL Since attributes tend to disappear when you manipulate the data I tend to avoid attaching them to the data directly. You can work around this of course, and there are several packages that do it for you, but the combination of these to drawbacks makes the attributes system in R less useful for documenting data IMO. Best, Ista> > Kind regards > > Georg > >> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr >> Von: "Bert Gunter" <bgunter.4567 at gmail.com> >> An: G.Maubach at gmx.de >> Cc: "Pito Salas" <pitosalas at brandeis.edu>, "R Help" <r-help at r-project.org> >> Betreff: Re: [R] Documenting data >> >> I believe Georg's pronouncements are wrong. See inline below. >> >> -- Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> "... >> >> > Within R there are some limitations for storing the informtation about what a variable or a value within a variable means. >> >> That is FALSE. There are no limitations. For example, just attach a >> "doc" attribute to your data that says whatever you wish to about >> them. e.g. >> >> > somedata <- runif(10) >> > attr(somedata,"doc") <- "Anything you want to say about the data" >> >> > attr(somedata,"doc") >> [1] "Anything you want to say about the data" >> >> >> You can go as crazy as you want to with this, e.g. creating a (S3 or >> S4 )class "documented" with appropriate methods for printing it from >> classes that inherit from data frames, lists, etc. See also the >> roxygen2 package for data documentation and R's ?promptData function >> for data documentation file in Rd format. >> >> R is Turing complete -- so it can do anything any other programming >> language can do. You could program SAS in R if you wanted. The >> difference is that SAS has pre-programmed some capabilities that R >> leaves for users, including contributed packages -- like Sweave, >> knitr, etc. You may or may not like this extra flexibility (and extra >> work, depending on whether someone else has already done the work for >> you), and efficiency may or may not be an issue; but to say that R has >> "limitations" is a gross misrepresentation, imho. >> >> >> >> Possibilities to store this information is in other software packages >> like SAS or SPSS much broader implemented. In R you can work with >> meaningful variable names and the data type/class factor which can >> store mappings between values and value descriptions. >> > >> > Example >> > -- cut -- >> > var1 <- c(rep(1:5, 3)) >> > ds_example <- data.frame(var1) >> > >> > var1_labels <- c("1 = Strongly Agree", >> > "2 = Agree", >> > "3 = Neither agree/nor disagree", >> > "4 = Disagree", >> > "5 = Strongly disagree") >> > >> > ds_example[["var1"]] <- factor(ds_example[["var1"]], >> > levels = c(1, 2, 3, 4, 5), >> > labels = var1_labels) >> > >> > summary(ds_example["var1"]) >> > -- cut -- >> > >> > In addition you find methods to work with variable labels and value labels in the pacakges Hmisc and memisc. They can also produce a thing called codebook which contains all variable names, variable labels, values, value labels and summaries of the distribution of values within the variables. >> > >> > 3. In addition to this you could structure your script in a modular way according to the analysis process, e. g. >> > importing, cleaning, preparation for analysis, analysis, reporting. Other structure may be more sufficient in your case. These modules could have a number in the file name indicating in which sequence the scripts should be run. >> > >> > 4. I find it valuable to use a software repository like Github, Sourceforge or others to keep the revisions save and seucre in case you would like to go back to a version with code you deleted before and figure out that you need it now again. The R Studio IDE has an interface to git if you like to go with that. Good commit message can help you track what has changed. Commits also help you to prepare precise steps when developing your scripts. >> > >> > 5. I have no experience with Sweave or knitr but you could also compile a simple documentation through copying comments to an Excel sheet using R-2-Excel libraries like excel.link or others. >> > >> > Example >> > install.packages("excel.link") >> > library(excel.link) >> > xlc["A1"] <- "Project Documentation" >> > xlc["A2"] <- "Step XY" >> > xlc["A3"] <- "Some explanation about step xy" >> > >> > This way you have the documentation in your code and in an external source. >> > >> > Which approach you chose depends on your experience with R and its libraries as well as the size of your project and the need for documentation. >> > >> > 6. It can be helpful to store interim results in a format that can be read by non-R-users, e. g. Excel. >> > >> > 7. Documenting code can be done using roxygen2. >> > >> > If there are different opinions to my suggestions please say so. >> > >> > Kind regards >> > >> > Georg >> > >> > >> >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr >> >> Von: "Pito Salas" <pitosalas at brandeis.edu> >> >> An: r-help at r-project.org >> >> Betreff: [R] Documenting data >> >> >> >> I am studying statistics and using R in doing it. I come from software development where we document everything we do. >> >> >> >> As I ?massage? my data, adding columns to a frame, computing on other data, perhaps cleaning, I feel the need to document in detail what the meaning, or background, or calculations, or whatever of the data is. After all it is now derived from my raw data (which may have been well documented) but it is ?new.? >> >> >> >> Is this a real problem? Is there a ?best practice? to address this? >> >> >> >> Thanks! >> >> >> >> Pito Salas >> >> Brandeis Computer Science >> >> Feldberg 131 >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.