Dear richard, I am very grateful for your informative reply. THe fact is, I am doing a project, which is not less complex,(if not more) than those of Microsoft or Accenture or Google , but I am doing it all by myself. Can you please let me the full title of the book by Watts Humphrey? Or any online resources for "personal software process"? Perhaps I can get some tips on how to go about my project ( I've mostly taken into account standard methods of the state of the art, I am looking for something "whizzy" than aids development by one person). Thanks again, Yours sinecerly, AKSHAY M KULKARNI ________________________________ From: Richard O'Keefe <raoknz at gmail.com> Sent: Monday, February 14, 2022 5:23 AM To: akshay kulkarni <akshay_e4 at hotmail.com> Cc: R help Mailing list <r-help at r-project.org> Subject: Re: [R] SDLC methodology for R and Data science...... There are at least two ways to use R. If you have devised a statistical/data science technique and are writing a package to be used by other people, that is normal software development that happens to be using R and the R tool. Lots of attention to documentation and tests. Test-Driven Development is one approach. Many R users aren't developing code for other people. They are trying to make sense of some kind of data. This is what used to be called "exploratory programming". And heavyweight development processes aren't really appropriate for this kind of work. In traditional terms, when you are doing exploratory programming, you spend most of your time in the requirements phase. Perhaps the most important thing here is to keep a log of what you are doing and record things that didn't work, why they didn't work, and what you learned from it. When something DOES give you some insight, you want to be able to do it again. The tricky thing is scaling from exploration to development. After playing around with one data set, you might want to provide a script that other people can use to process similar data sets the same way. Use a light weight process, but make sure you have plenty of tests, and adequate documentation. Watts Humphrey developed something he called the "Personal Software Process" and wrote a book about it. I don't like his examples for several reasons, but the point about watching what you do and measuring it so you can improve is well made. On Mon, 14 Feb 2022 at 05:33, akshay kulkarni <akshay_e4 at hotmail.com<mailto:akshay_e4 at hotmail.com>> wrote: dear members, I am Stock trader and using R for research. Until now I was coding very haphazardly, but recently I stumbled upon the Software Development Life Cycle (SDLC), which introduced me to principled software design. I am college dropout and don't have in depth knowledge in Software Engineering principles. However, now, I want to go in a structured manner. I googled for a SDLC method (like XP, AGILE and WATERFALL) that suits the R programming language and specifically for data science, but was bootless. Do you people have any idea on which software engineering methodology to use in R and data science, so that I can code efficiently and in a structured manner? The point to note, with regards to R, is that statistical ANALYSIS sometimes takes very little code as compared to other programming languages. Any SDLC method for these types of analysis, besides, rigorous scripting with R? Thanking you, Yours sincerely, AKSHAY M KULKARNI [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
A book I have found useful in this regard is The Workflow of Data Analysis Using Stata by J. Scott Long. Obviously the book is targeted towards Stata users but the concepts work just as well for R. On Tue, Feb 15, 2022 at 11:27 AM akshay kulkarni <akshay_e4 at hotmail.com> wrote:> > Dear richard, > I am very grateful for your informative reply. > > THe fact is, I am doing a project, which is not less complex,(if not more) than those of Microsoft or Accenture or Google , but I am doing it all by myself. Can you please let me the full title of the book by Watts Humphrey? Or any online resources for "personal software process"? Perhaps I can get some tips on how to go about my project ( I've mostly taken into account standard methods of the state of the art, I am looking for something "whizzy" than aids development by one person). > > Thanks again, > Yours sinecerly, > AKSHAY M KULKARNI > ________________________________ > From: Richard O'Keefe <raoknz at gmail.com> > Sent: Monday, February 14, 2022 5:23 AM > To: akshay kulkarni <akshay_e4 at hotmail.com> > Cc: R help Mailing list <r-help at r-project.org> > Subject: Re: [R] SDLC methodology for R and Data science...... > > There are at least two ways to use R. > If you have devised a statistical/data science technique > and are writing a package to be used by other people, > that is normal software development that happens to be > using R and the R tool. Lots of attention to documentation > and tests. Test-Driven Development is one approach. > > Many R users aren't developing code for other people. > They are trying to make sense of some kind of data. > This is what used to be called "exploratory programming". > And heavyweight development processes aren't really > appropriate for this kind of work. In traditional terms, > when you are doing exploratory programming, you spend > most of your time in the requirements phase. > > Perhaps the most important thing here is to keep a log > of what you are doing and record things that didn't work, > why they didn't work, and what you learned from it. > When something DOES give you some insight, you want to > be able to do it again. > > The tricky thing is scaling from exploration to development. > After playing around with one data set, you might want to > provide a script that other people can use to process > similar data sets the same way. > Use a light weight process, but make sure you have plenty > of tests, and adequate documentation. > > Watts Humphrey developed something he called the "Personal > Software Process" and wrote a book about it. I don't like > his examples for several reasons, but the point about > watching what you do and measuring it so you can improve is > well made. > > > > On Mon, 14 Feb 2022 at 05:33, akshay kulkarni <akshay_e4 at hotmail.com<mailto:akshay_e4 at hotmail.com>> wrote: > dear members, > I am Stock trader and using R for research. > > Until now I was coding very haphazardly, but recently I stumbled upon the Software Development Life Cycle (SDLC), which introduced me to principled software design. I am college dropout and don't have in depth knowledge in Software Engineering principles. However, now, I want to go in a structured manner. > > I googled for a SDLC method (like XP, AGILE and WATERFALL) that suits the R programming language and specifically for data science, but was bootless. Do you people have any idea on which software engineering methodology to use in R and data science, so that I can code efficiently and in a structured manner? The point to note, with regards to R, is that statistical ANALYSIS sometimes takes very little code as compared to other programming languages. Any SDLC method for these types of analysis, besides, rigorous scripting with R? > > Thanking you, > Yours sincerely, > AKSHAY M KULKARNI > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
1. This dialogue should be taken offlist imo. 2. And really, make some effort of your own before posting: An internet search on "Watts Humphrey Software Development" immediately brought up what appeared to be answers to at least some of your queries. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Feb 15, 2022 at 8:27 AM akshay kulkarni <akshay_e4 at hotmail.com> wrote:> Dear richard, > I am very grateful for your informative reply. > > THe fact is, I am doing a project, which is not less complex,(if not more) > than those of Microsoft or Accenture or Google , but I am doing it all by > myself. Can you please let me the full title of the book by Watts Humphrey? > Or any online resources for "personal software process"? Perhaps I can get > some tips on how to go about my project ( I've mostly taken into account > standard methods of the state of the art, I am looking for something > "whizzy" than aids development by one person). > > Thanks again, > Yours sinecerly, > AKSHAY M KULKARNI > ________________________________ > From: Richard O'Keefe <raoknz at gmail.com> > Sent: Monday, February 14, 2022 5:23 AM > To: akshay kulkarni <akshay_e4 at hotmail.com> > Cc: R help Mailing list <r-help at r-project.org> > Subject: Re: [R] SDLC methodology for R and Data science...... > > There are at least two ways to use R. > If you have devised a statistical/data science technique > and are writing a package to be used by other people, > that is normal software development that happens to be > using R and the R tool. Lots of attention to documentation > and tests. Test-Driven Development is one approach. > > Many R users aren't developing code for other people. > They are trying to make sense of some kind of data. > This is what used to be called "exploratory programming". > And heavyweight development processes aren't really > appropriate for this kind of work. In traditional terms, > when you are doing exploratory programming, you spend > most of your time in the requirements phase. > > Perhaps the most important thing here is to keep a log > of what you are doing and record things that didn't work, > why they didn't work, and what you learned from it. > When something DOES give you some insight, you want to > be able to do it again. > > The tricky thing is scaling from exploration to development. > After playing around with one data set, you might want to > provide a script that other people can use to process > similar data sets the same way. > Use a light weight process, but make sure you have plenty > of tests, and adequate documentation. > > Watts Humphrey developed something he called the "Personal > Software Process" and wrote a book about it. I don't like > his examples for several reasons, but the point about > watching what you do and measuring it so you can improve is > well made. > > > > On Mon, 14 Feb 2022 at 05:33, akshay kulkarni <akshay_e4 at hotmail.com > <mailto:akshay_e4 at hotmail.com>> wrote: > dear members, > I am Stock trader and using R for research. > > Until now I was coding very haphazardly, but recently I stumbled upon the > Software Development Life Cycle (SDLC), which introduced me to principled > software design. I am college dropout and don't have in depth knowledge in > Software Engineering principles. However, now, I want to go in a structured > manner. > > I googled for a SDLC method (like XP, AGILE and WATERFALL) that suits the > R programming language and specifically for data science, but was bootless. > Do you people have any idea on which software engineering methodology to > use in R and data science, so that I can code efficiently and in a > structured manner? The point to note, with regards to R, is that > statistical ANALYSIS sometimes takes very little code as compared to other > programming languages. Any SDLC method for these types of analysis, > besides, rigorous scripting with R? > > Thanking you, > Yours sincerely, > AKSHAY M KULKARNI > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Khalil, THanks a lot. ________________________________ From: Huzefa Khalil <huzefak at umich.edu> Sent: Tuesday, February 15, 2022 10:04 PM To: akshay kulkarni <akshay_e4 at hotmail.com> Cc: Richard O'Keefe <raoknz at gmail.com>; R help Mailing list <r-help at r-project.org> Subject: Re: [R] SDLC methodology for R and Data science...... A book I have found useful in this regard is The Workflow of Data Analysis Using Stata by J. Scott Long. Obviously the book is targeted towards Stata users but the concepts work just as well for R. On Tue, Feb 15, 2022 at 11:27 AM akshay kulkarni <akshay_e4 at hotmail.com> wrote:> > Dear richard, > I am very grateful for your informative reply. > > THe fact is, I am doing a project, which is not less complex,(if not more) than those of Microsoft or Accenture or Google , but I am doing it all by myself. Can you please let me the full title of the book by Watts Humphreyt some tips on how to go about my project ( I've mostly taken into account standard methods of the state of the art, I am looking for something "whizzy" than aids development by one person).> > Thanks again, > Yours sinecerly, > AKSHAY M KULKARNI > ________________________________ > From: Richard O'Keefe <raoknz at gmail.com> > Sent: Monday, February 14, 2022 5:23 AM > To: akshay kulkarni <akshay_e4 at hotmail.com> > Cc: R help Mailing list <r-help at r-project.org> > Subject: Re: [R] SDLC methodology for R and Data science...... > > There are at least two ways to use R. > If you have devised a statistical/data science technique > and are writing a package to be used by other people, > that is normal software development that happens to be > using R and the R tool. Lots of attention to documentation > and tests. Test-Driven Development is one approach. > > Many R users aren't developing code for other people. > They are trying to make sense of some kind of data. > This is what used to be called "exploratory programming". > And heavyweight development processes aren't really > appropriate for this kind of work. In traditional terms, > when you are doing exploratory programming, you spend > most of your time in the requirements phase. > > Perhaps the most important thing here is to keep a log > of what you are doing and record things that didn't work, > why they didn't work, and what you learned from it. > When something DOES give you some insight, you want to > be able to do it again. > > The tricky thing is scaling from exploration to development. > After playing around with one data set, you might want to > provide a script that other people can use to process > similar data sets the same way. > Use a light weight process, but make sure you have plenty > of tests, and adequate documentation. > > Watts Humphrey developed something he called the "Personal > Software Process" and wrote a book about it. I don't like > his examples for several reasons, but the point about > watching what you do and measuring it so you can improve is > well made. > > > > On Mon, 14 Feb 2022 at 05:33, akshay kulkarni <akshay_e4 at hotmail.com<mailto:akshay_e4 at hotmail.com>> wrote: > dear members, > I am Stock trader and using R for research. > > Until now I was coding very haphazardly, but recently I stumbled upon the Software Development Life Cycle (SDLC), which introduced me to principled software design. I am college dropout and don't have in depth knowledge in Software Engineering principles. However, now, I want to go in a structured manner. > > I googled for a SDLC method (like XP, AGILE and WATERFALL) that suits the R programming language and specifically for data science, but was bootless. Do you people have any idea on which software engineering methodology to use in R and data science, so that I can code efficiently and in a structured manner? The point to note, with regards to R, is that statistical ANALYSIS sometimes takes very little code as compared to other programming languages. Any SDLC method for these types of analysis, besides, rigorous scripting with R? > > Thanking you, > Yours sincerely, > AKSHAY M KULKARNI > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Come to think of it, there's a CMU report you can get free: https://resources.sei.cmu.edu/asset_files/TechnicalReport/2000_005_001_13751.pdf The books are "Introduction to the Personal Software Process" by Watts S. Humphrey, https://www.amazon.com/Introduction-Personal-Software-Process-Humphrey/dp/0201548097 and "PSP: A Self-Improvement Process for Software Engineers", https://www.amazon.com/gp/product/B001EWOG8A/ref=dbs_a_def_rwt_bibl_vppi_i0 You might like to look into using R's "testthat" package (or some other testing framework), see http://rstudio-pubs-static.s3.amazonaws.com/278724_4d8935a2955c49d9934e2113c737e70e.html for an introduction. Way back in 1982(?) when I was getting stuck on my PhD thesis, my co-supervisor (Lawrence Byrd) said "write an example of using your system, and then explain how it does that." That unstuck me, and it was pretty much test-driven development in a nutshell (as we put it in those days, program by debugging the empty program). If you're looking for whizzy tools, - knitr (documentation, there's a good little book) - testthat (testing) - lintr (static checking). On Wed, 16 Feb 2022 at 05:27, akshay kulkarni <akshay_e4 at hotmail.com> wrote:> Dear richard, > I am very grateful for your informative reply. > > THe fact is, I am doing a project, which is not less complex,(if not more) > than those of Microsoft or Accenture or Google , but I am doing it all by > myself. Can you please let me the full title of the book by Watts Humphrey? > Or any online resources for "personal software process"? Perhaps I can get > some tips on how to go about my project ( I've mostly taken into account > standard methods of the state of the art, I am looking for something > "whizzy" than aids development by one person). > > Thanks again, > Yours sinecerly, > AKSHAY M KULKARNI > ------------------------------ > *From:* Richard O'Keefe <raoknz at gmail.com> > *Sent:* Monday, February 14, 2022 5:23 AM > *To:* akshay kulkarni <akshay_e4 at hotmail.com> > *Cc:* R help Mailing list <r-help at r-project.org> > *Subject:* Re: [R] SDLC methodology for R and Data science...... > > There are at least two ways to use R. > If you have devised a statistical/data science technique > and are writing a package to be used by other people, > that is normal software development that happens to be > using R and the R tool. Lots of attention to documentation > and tests. Test-Driven Development is one approach. > > Many R users aren't developing code for other people. > They are trying to make sense of some kind of data. > This is what used to be called "exploratory programming". > And heavyweight development processes aren't really > appropriate for this kind of work. In traditional terms, > when you are doing exploratory programming, you spend > most of your time in the requirements phase. > > Perhaps the most important thing here is to keep a log > of what you are doing and record things that didn't work, > why they didn't work, and what you learned from it. > When something DOES give you some insight, you want to > be able to do it again. > > The tricky thing is scaling from exploration to development. > After playing around with one data set, you might want to > provide a script that other people can use to process > similar data sets the same way. > Use a light weight process, but make sure you have plenty > of tests, and adequate documentation. > > Watts Humphrey developed something he called the "Personal > Software Process" and wrote a book about it. I don't like > his examples for several reasons, but the point about > watching what you do and measuring it so you can improve is > well made. > > > > On Mon, 14 Feb 2022 at 05:33, akshay kulkarni <akshay_e4 at hotmail.com> > wrote: > > dear members, > I am Stock trader and using R for research. > > Until now I was coding very haphazardly, but recently I stumbled upon the > Software Development Life Cycle (SDLC), which introduced me to principled > software design. I am college dropout and don't have in depth knowledge in > Software Engineering principles. However, now, I want to go in a structured > manner. > > I googled for a SDLC method (like XP, AGILE and WATERFALL) that suits the > R programming language and specifically for data science, but was bootless. > Do you people have any idea on which software engineering methodology to > use in R and data science, so that I can code efficiently and in a > structured manner? The point to note, with regards to R, is that > statistical ANALYSIS sometimes takes very little code as compared to other > programming languages. Any SDLC method for these types of analysis, > besides, rigorous scripting with R? > > Thanking you, > Yours sincerely, > AKSHAY M KULKARNI > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]