Alan Feuerbacher
2019-Jan-30 03:08 UTC
[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C
On 1/28/2019 7:51 PM, Jeff Newmiller wrote:> If you forge on with your preconceptions of how such a simulation should be implemented then you will be able to reproduce your failure just as spectacularly using R as you did using Octave.I think I've come to the same conclusion. :-)> It is crucial to employ vectorization of your algorithms if you want good performance with either Octave or R. That vectorization may either be over time or over separate simulations.Please explain further, if you don't mind. My background is not in programming, but in analog microchip circuit design (I'm now retired). Thus I'm a user of circuit simulators, not a programmer of them. Also, I'm running this stuff on my home computers, either Linux or Windows machines.> I am running simulations of a million cases of power plant performance over 25 years in about a minute. I know someone who used R to simulate a CFD river flow problem in a class in a few minutes, while others using Fortran or Matlab were struggling to get comparable runs completed in many hours. I believe the difference was in how the data were structured and manipulated more than the language that was being used. I think the strong capabilities for presenting results using R makes using it advantageous over Octave, though.After my failed attempt at using Octave, I realized that most likely the main contributing factor was that I was not able to figure out an efficient data structure to model one person. But C lent itself perfectly to my idea of how to go about programming my simulation. So here's a simplified pseudocode sort of example of what I did: To model a single reproducing woman I used this C construct: typedef struct woman { int isAlive; int isPregnant; double age; . . . } WOMAN; Then I allocated memory for a big array of these things, using the C malloc() function, which gave me the equivalent of this statement: WOMAN women[NWOMEN]; /* An array of NWOMEN woman-structs */ After some initialization I set up two loops: for( j=0; j<numberOfYears; j++) { for(i=1; i< numberOfWomen; i++) { updateWomen(); } } The function updateWomen() figures out things like whether the woman becomes pregnant or gives birth on a given day, dies, etc. I added other refinements that are not relevant here, such as random variations of various parameters, using the GNU Scientific Library random number generator functions. If you can suggest a data construct in R or Octave that does something like this, and uses your idea of vectorization, I'd like to hear it. I'd like to implement it and compare results with my C implementation.> If your problems truly need a compiled language, the Rcpp package lets you mix C++ with R quite easily and then you get the best of both worlds. (C and Fortran are supported, but they are a bit more finicky to setup than C++).I don't know the answer to that, but perhaps you can help decide. Alan> On January 28, 2019 4:00:07 PM PST, Alan Feuerbacher <alanf00 at comcast.net> wrote: >> On 1/28/2019 4:20 PM, Rolf Turner wrote: >>> >>> On 1/29/19 10:05 AM, Alan Feuerbacher wrote: >>> >>>> Hi, >>>> >>>> I recently learned of the existence of R through a physicist friend >>>> who uses it in his research. I've used Octave for a decade, and C >> for >>>> 35 years, but would like to learn R. These all have advantages and >>>> disadvantages for certain tasks, but as I'm new to R I hardly know >> how >>>> to evaluate them. Any suggestions? >>> >>> * C is fast, but with a syntax that is (to my mind) virtually >>> ? incomprehensible.? (You probably think differently about this.) >> >> I've been doing it long enough that I have little problem with it, >> except for pointers. :-) >> >>> * In C, you essentially have to roll your own for all tasks; in R, >>> ? practically anything (well ...) that you want to do has already >>> ? been programmed up.? CRAN is a wonderful resource, and there's >> more >>> ? on github. >>> >>> * The syntax of R meshes beautifully with *my* thought patterns; >> YMMV. >>> >>> * Why not just bog in and try R out?? It's free, it's readily >> available, >>> ? and there are a number of good online tutorials. >> >> I just installed R on my Linux Fedora system, so I'll do that. >> >> I wonder if you'd care to comment on my little project that prompted >> this? As part of another project, I wanted to model population growth >> starting from a handful of starting individuals. This is exponential in >> >> the long run, of course, but I wanted to see how a few basic parameters >> >> affected the outcome. Using Octave, I modeled a single person as a >> "cell", which in Octave has a good deal of overhead. The program >> basically looped over the entire population, and updated each person >> according to the parameters, which included random statistical >> variations. So when the total population reached, say 10,000, and an >> update time of 1 day, the program had to execute 10,000 x 365 update >> operations for each year of growth. For large populations, say 100,000, >> >> the program did not return even after 24 hours of run time. >> >> So I switched to C, and used its "struct" declaration and an array of >> structs to model the population. This allowed the program to complete >> in >> under a minute as opposed to 24 hours+. So in line with your comments, >> C >> is far more efficient than Octave. >> >> How do you think R would fare in this simulation? >> >> Alan >> >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Jeff Newmiller
2019-Jan-30 06:50 UTC
[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C
On Tue, 29 Jan 2019, Alan Feuerbacher wrote:> On 1/28/2019 7:51 PM, Jeff Newmiller wrote: >> If you forge on with your preconceptions of how such a simulation should be >> implemented then you will be able to reproduce your failure just as >> spectacularly using R as you did using Octave. > > I think I've come to the same conclusion. :-) > >> It is crucial to employ vectorization of your algorithms if you want good >> performance with either Octave or R. That vectorization may either be over >> time or over separate simulations. > > Please explain further, if you don't mind. My background is not in > programming, but in analog microchip circuit design (I'm now retired). Thus > I'm a user of circuit simulators, not a programmer of them. Also, I'm running > this stuff on my home computers, either Linux or Windows machines. > >> I am running simulations of a million cases of power plant performance over >> 25 years in about a minute. I know someone who used R to simulate a CFD >> river flow problem in a class in a few minutes, while others using Fortran >> or Matlab were struggling to get comparable runs completed in many hours. I >> believe the difference was in how the data were structured and manipulated >> more than the language that was being used. I think the strong capabilities >> for presenting results using R makes using it advantageous over Octave, >> though. > > After my failed attempt at using Octave, I realized that most likely the main > contributing factor was that I was not able to figure out an efficient data > structure to model one person. But C lent itself perfectly to my idea of how > to go about programming my simulation. So here's a simplified pseudocode sort > of example of what I did:Don't model one person... model an array of people.> To model a single reproducing woman I used this C construct: > > typedef struct woman { > int isAlive; > int isPregnant; > double age; > . . . > } WOMAN;# e.g. Nwomen <- 100 women <- data.frame( isAlive = rep( TRUE, Nwomen ) , isPregnant = rep( FALSE, Nwomen ) , age = rep( 20, Nwomen ) )> Then I allocated memory for a big array of these things, using the C malloc() > function, which gave me the equivalent of this statement: > > WOMAN women[NWOMEN]; /* An array of NWOMEN woman-structs */ > > After some initialization I set up two loops: > > for( j=0; j<numberOfYears; j++) { > for(i=1; i< numberOfWomen; i++) { > updateWomen(); > } > }for ( j in seq.int( numberOfYears ) { # let vectorized data storage automatically handle the other for loop women <- updateWomen( women ) }> The function updateWomen() figures out things like whether the woman becomes > pregnant or gives birth on a given day, dies, etc.You can use your "fixed size" allocation strategy with flags indicating whether specific rows are in use, or you can only work with valid rows and add rows as needed for children... best to compute a logical vector that identifies all of the birthing mothers as a subset of the data frame, and build a set of children rows using the birthing mothers data frame as input, and then rbind the new rows to the updated women dataframe as appropriate. The most clear approach for individual decision calculations is the use of the vectorized "ifelse" function, though under certain circumstances putting an indexed subset on the left side of an assignment can modify memory "in place" (the functional-programming restriction against this is probably a foreign idea to a dyed-in-the-wool C programmer, but R usually prevents you from modifying the variable that was input to a function, automatically making a local copy of the input as needed in order to prevent such backwash into the caller's context).> I added other refinements that are not relevant here, such as random > variations of various parameters, using the GNU Scientific Library random > number generator functions.R has quite sophisticated random number generation by default.> If you can suggest a data construct in R or Octave that does something like > this, and uses your idea of vectorization, I'd like to hear it. I'd like to > implement it and compare results with my C implementation. > >> If your problems truly need a compiled language, the Rcpp package lets you >> mix C++ with R quite easily and then you get the best of both worlds. (C >> and Fortran are supported, but they are a bit more finicky to setup than >> C++). > > I don't know the answer to that, but perhaps you can help decide. > > Alan > > >> On January 28, 2019 4:00:07 PM PST, Alan Feuerbacher <alanf00 at comcast.net> >> wrote: >>> On 1/28/2019 4:20 PM, Rolf Turner wrote: >>>> >>>> On 1/29/19 10:05 AM, Alan Feuerbacher wrote: >>>> >>>>> Hi, >>>>> >>>>> I recently learned of the existence of R through a physicist friend >>>>> who uses it in his research. I've used Octave for a decade, and C >>> for >>>>> 35 years, but would like to learn R. These all have advantages and >>>>> disadvantages for certain tasks, but as I'm new to R I hardly know >>> how >>>>> to evaluate them. Any suggestions? >>>> >>>> * C is fast, but with a syntax that is (to my mind) virtually >>>> ? incomprehensible.? (You probably think differently about this.) >>> >>> I've been doing it long enough that I have little problem with it, >>> except for pointers. :-) >>> >>>> * In C, you essentially have to roll your own for all tasks; in R, >>>> ? practically anything (well ...) that you want to do has already >>>> ? been programmed up.? CRAN is a wonderful resource, and there's >>> more >>>> ? on github. >>>> >>>> * The syntax of R meshes beautifully with *my* thought patterns; >>> YMMV. >>>> >>>> * Why not just bog in and try R out?? It's free, it's readily >>> available, >>>> ? and there are a number of good online tutorials. >>> >>> I just installed R on my Linux Fedora system, so I'll do that. >>> >>> I wonder if you'd care to comment on my little project that prompted >>> this? As part of another project, I wanted to model population growth >>> starting from a handful of starting individuals. This is exponential in >>> >>> the long run, of course, but I wanted to see how a few basic parameters >>> >>> affected the outcome. Using Octave, I modeled a single person as a >>> "cell", which in Octave has a good deal of overhead. The program >>> basically looped over the entire population, and updated each person >>> according to the parameters, which included random statistical >>> variations. So when the total population reached, say 10,000, and an >>> update time of 1 day, the program had to execute 10,000 x 365 update >>> operations for each year of growth. For large populations, say 100,000, >>> >>> the program did not return even after 24 hours of run time. >>> >>> So I switched to C, and used its "struct" declaration and an array of >>> structs to model the population. This allowed the program to complete >>> in >>> under a minute as opposed to 24 hours+. So in line with your comments, >>> C >>> is far more efficient than Octave. >>> >>> How do you think R would fare in this simulation? >>> >>> Alan >>> >>> >>> --- >>> This email has been checked for viruses by Avast antivirus software. >>> https://www.avast.com/antivirus >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ---------------------------------------------------------------------------
Alan Feuerbacher
2019-Jan-30 16:16 UTC
[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C
On 1/29/2019 11:50 PM, Jeff Newmiller wrote: Thanks very much for providing these coding examples! I think this is a good way to learn some R. Alan> On Tue, 29 Jan 2019, Alan Feuerbacher wrote: > >> On 1/28/2019 7:51 PM, Jeff Newmiller wrote: >>> If you forge on with your preconceptions of how such a simulation >>> should be implemented then you will be able to reproduce your failure >>> just as spectacularly using R as you did using Octave. >> >> I think I've come to the same conclusion. :-) >> >>> It is crucial to employ vectorization of your algorithms if you want >>> good performance with either Octave or R. That vectorization may >>> either be over time or over separate simulations. >> >> Please explain further, if you don't mind. My background is not in >> programming, but in analog microchip circuit design (I'm now retired). >> Thus I'm a user of circuit simulators, not a programmer of them. Also, >> I'm running this stuff on my home computers, either Linux or Windows >> machines. >> >>> I am running simulations of a million cases of power plant >>> performance over 25 years in about a minute. I know someone who used >>> R to simulate a CFD river flow problem in a class in a few minutes, >>> while others using Fortran or Matlab were struggling to get >>> comparable runs completed in many hours. I believe the difference was >>> in how the data were structured and manipulated more than the >>> language that was being used. I think the strong capabilities for >>> presenting results using R makes using it advantageous over Octave, >>> though. >> >> After my failed attempt at using Octave, I realized that most likely >> the main contributing factor was that I was not able to figure out an >> efficient data structure to model one person. But C lent itself >> perfectly to my idea of how to go about programming my simulation. So >> here's a simplified pseudocode sort of example of what I did: > > Don't model one person... model an array of people. > >> To model a single reproducing woman I used this C construct: >> >> typedef struct woman { >> ?int isAlive; >> ?int isPregnant; >> ?double age; >> ?. . . >> } WOMAN; > > # e.g. > Nwomen <- 100 > women <- data.frame( isAlive = rep( TRUE, Nwomen ) > ?????????????????? , isPregnant = rep( FALSE, Nwomen ) > ?????????????????? , age = rep( 20, Nwomen ) > ?????????????????? ) > >> Then I allocated memory for a big array of these things, using the C >> malloc() function, which gave me the equivalent of this statement: >> >> WOMAN women[NWOMEN];? /* An array of NWOMEN woman-structs */ >> >> After some initialization I set up two loops: >> >> for( j=0; j<numberOfYears; j++) { >> ?for(i=1; i< numberOfWomen; i++) { >> ?? updateWomen(); >> ?} >> } > > for ( j in seq.int( numberOfYears ) { > ? # let vectorized data storage automatically handle the other for loop > ? women <- updateWomen( women ) > } > >> The function updateWomen() figures out things like whether the woman >> becomes pregnant or gives birth on a given day, dies, etc. > > You can use your "fixed size" allocation strategy with flags indicating > whether specific rows are in use, or you can only work with valid rows > and add rows as needed for children... best to compute a logical vector > that identifies all of the birthing mothers as a subset of the data > frame, and build a set of children rows using the birthing mothers data > frame as input, and then rbind the new rows to the updated women > dataframe as appropriate. The most clear approach for individual > decision calculations is the use of the vectorized "ifelse" function, > though under certain circumstances putting an indexed subset on the left > side of an assignment can modify memory "in place" (the > functional-programming restriction against this is probably a foreign > idea to a dyed-in-the-wool C programmer, but R usually prevents you from > modifying the variable that was input to a function, automatically > making a local copy of the input as needed in order to prevent such > backwash into the caller's context). > >> I added other refinements that are not relevant here, such as random >> variations of various parameters, using the GNU Scientific Library >> random number generator functions. > > R has quite sophisticated random number generation by default. > >> If you can suggest a data construct in R or Octave that does something >> like this, and uses your idea of vectorization, I'd like to hear it. >> I'd like to implement it and compare results with my C implementation. >> >>> If your problems truly need a compiled language, the Rcpp package >>> lets you mix C++ with R quite easily and then you get the best of >>> both worlds. (C and Fortran are supported, but they are a bit more >>> finicky to setup than C++). >> >> I don't know the answer to that, but perhaps you can help decide. >> >> Alan >> >> >>> On January 28, 2019 4:00:07 PM PST, Alan Feuerbacher >>> <alanf00 at comcast.net> wrote: >>>> On 1/28/2019 4:20 PM, Rolf Turner wrote: >>>>> >>>>> On 1/29/19 10:05 AM, Alan Feuerbacher wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I recently learned of the existence of R through a physicist friend >>>>>> who uses it in his research. I've used Octave for a decade, and C >>>> for >>>>>> 35 years, but would like to learn R. These all have advantages and >>>>>> disadvantages for certain tasks, but as I'm new to R I hardly know >>>> how >>>>>> to evaluate them. Any suggestions? >>>>> >>>>> * C is fast, but with a syntax that is (to my mind) virtually >>>>> ? ? incomprehensible.? (You probably think differently about this.) >>>> >>>> I've been doing it long enough that I have little problem with it, >>>> except for pointers. :-) >>>> >>>>> * In C, you essentially have to roll your own for all tasks; in R, >>>>> ? ? practically anything (well ...) that you want to do has already >>>>> ? ? been programmed up.? CRAN is a wonderful resource, and there's >>>> more >>>>> ? ? on github. >>>>> >>>>> * The syntax of R meshes beautifully with *my* thought patterns; >>>> YMMV. >>>>> >>>>> * Why not just bog in and try R out?? It's free, it's readily >>>> available, >>>>> ? ? and there are a number of good online tutorials. >>>> >>>> I just installed R on my Linux Fedora system, so I'll do that. >>>> >>>> I wonder if you'd care to comment on my little project that prompted >>>> this? As part of another project, I wanted to model population growth >>>> starting from a handful of starting individuals. This is exponential in >>>> >>>> the long run, of course, but I wanted to see how a few basic parameters >>>> >>>> affected the outcome. Using Octave, I modeled a single person as a >>>> "cell", which in Octave has a good deal of overhead. The program >>>> basically looped over the entire population, and updated each person >>>> according to the parameters, which included random statistical >>>> variations. So when the total population reached, say 10,000, and an >>>> update time of 1 day, the program had to execute 10,000 x 365 update >>>> operations for each year of growth. For large populations, say 100,000, >>>> >>>> the program did not return even after 24 hours of run time. >>>> >>>> So I switched to C, and used its "struct" declaration and an array of >>>> structs to model the population. This allowed the program to complete >>>> in >>>> under a minute as opposed to 24 hours+. So in line with your comments, >>>> C >>>> is far more efficient than Octave. >>>> >>>> How do you think R would fare in this simulation? >>>> >>>> Alan >>>> >>>> >>>> --- >>>> This email has been checked for viruses by Avast antivirus software. >>>> https://www.avast.com/antivirus >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > --------------------------------------------------------------------------- > Jeff Newmiller??????????????????????? The???? .....?????? .....? Go Live... > DCN:<jdnewmil at dcn.davis.ca.us>??????? Basics: ##.#.?????? ##.#.? Live Go... > ????????????????????????????????????? Live:?? OO#.. Dead: OO#..? Playing > Research Engineer (Solar/Batteries??????????? O.O#.?????? #.O#.? with > /Software/Embedded Controllers)?????????????? .OO#.?????? .OO#.? rocks...1k > ---------------------------------------------------------------------------
Alan Feuerbacher
2019-Feb-02 18:29 UTC
[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C
On 1/29/2019 11:50 PM, Jeff Newmiller wrote:> On Tue, 29 Jan 2019, Alan Feuerbacher wrote: > >> After my failed attempt at using Octave, I realized that most likely >> the main contributing factor was that I was not able to figure out an >> efficient data structure to model one person. But C lent itself >> perfectly to my idea of how to go about programming my simulation. So >> here's a simplified pseudocode sort of example of what I did: > > Don't model one person... model an array of people. > >> To model a single reproducing woman I used this C construct: >> >> typedef struct woman { >> ?int isAlive; >> ?int isPregnant; >> ?double age; >> ?. . . >> } WOMAN; > > # e.g. > Nwomen <- 100 > women <- data.frame( isAlive = rep( TRUE, Nwomen ) > ?????????????????? , isPregnant = rep( FALSE, Nwomen ) > ?????????????????? , age = rep( 20, Nwomen ) > ?????????????????? ) > >> Then I allocated memory for a big array of these things, using the C >> malloc() function, which gave me the equivalent of this statement: >> >> WOMAN women[NWOMEN];? /* An array of NWOMEN woman-structs */ >> >> After some initialization I set up two loops: >> >> for( j=0; j<numberOfYears; j++) { >> ?for(i=1; i< numberOfWomen; i++) { >> ?? updateWomen(); >> ?} >> } > > for ( j in seq.int( numberOfYears ) { > ? # let vectorized data storage automatically handle the other for loop > ? women <- updateWomen( women ) > } > >> The function updateWomen() figures out things like whether the woman >> becomes pregnant or gives birth on a given day, dies, etc. > > You can use your "fixed size" allocation strategy with flags indicating > whether specific rows are in use, or you can only work with valid rows > and add rows as needed for children... best to compute a logical vector > that identifies all of the birthing mothers as a subset of the data > frame, and build a set of children rows using the birthing mothers data > frame as input, and then rbind the new rows to the updated women > dataframe as appropriate. The most clear approach for individual > decision calculations is the use of the vectorized "ifelse" function, > though under certain circumstances putting an indexed subset on the left > side of an assignment can modify memory "in place" (the > functional-programming restriction against this is probably a foreign > idea to a dyed-in-the-wool C programmer, but R usually prevents you from > modifying the variable that was input to a function, automatically > making a local copy of the input as needed in order to prevent such > backwash into the caller's context).Hi Jeff, I'm well along in implementing your suggestions, but I don't understand the last paragraph. Here is part of the experimenting I've done so far: *=======*=======*=======*=======*=======*=======* updatePerson <- function() { ifelse( women$isAlive, { # Check whether to kill off this person, if she's pregnant whether # to give birth, whether to make her pregnant again. women$age = women$age + timeStep # Check if the person has reached maxAge } ) } calculatePopulation <- function() { lastDate = 0 jd = 0 while( jd < maxDate ) { for( i in seq_len( nWomen ) ) { updatePerson(); } todaysDateInt = floor(jd/dpy) NAlive[todaysDateInt] = nWomen - nDead # Do various other things todaysDate = todaysDate + timeStep jd = jd + timeStep } } nWomen <- 5 numberOfYears <- 30 women <- data.frame( isAlive = rep_len( TRUE, nWomen ) , isPregnant = rep_len( FALSE, nWomen ) , nChildren = rep_len( 0L, nWomen ) , ageInt = rep_len( 0L, nWomen ) , age = rep_len( 0, nWomen ) , dateOfPregnancy = rep_len( 0, nWomen ) , endDateLastPregnancy = rep_len( 0.0, nWomen ) , minBirthAge = rep_len( 0, nWomen ) , maxBirthAge = rep_len( 0, nWomen ) ) # . . . calculatePopulation() *=======*=======*=======*=======*=======*=======* The above code (in its complete form) executes without errors. I don't understand at least two things: In the updatePerson function, in the ifelse statement, how do I change the appropriate values in the women dataframe? I don't understand most of your last paragraph at all. Thanks so much for your help in learning R! Alan --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus