g@e@sth@m@giibert m@iii@g oii gm@ii@com
2019-Aug-18 18:47 UTC
[R] Creating data using multiple for loops
I would like to create pseudo identification numbers in the format of last four of a social security number (0000 to 9999), month of birth (01 to 12), and day of birth (01-28). The IDs can be character. I have gotten this far: for (ssn in 0:9){ for (month in 1:3){ for (day in 1:5){ } id <-paste(ssn, month, day, sep="") } } limiting each value above for demonstration purposes. I cannot figure out how to store the created IDs. I know I have to create a container, but I don't know, among other things, how to index the container. Any help is appreciated. TIA -Greg [[alternative HTML version deleted]]
id <- do.call(paste0,expand.grid(0:9, 1:3, 1:5)) Comment: If you use R much, you'll do much better using R language constructs than trying to apply those from other languages (Java perhaps?). I realize this can be difficult, especially if you are experienced in the another language (or languages), but it's worth the effort. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Aug 18, 2019 at 11:58 AM <g.eastham.gilbert at gmail.com> wrote:> I would like to create pseudo identification numbers in the format of last > four of a social security number (0000 to 9999), month of birth (01 to 12), > and day of birth (01-28). The IDs can be character. > > I have gotten this far: > > for (ssn in 0:9){ > for (month in 1:3){ > for (day in 1:5){ > } > id <-paste(ssn, month, day, sep="") > } > } > > limiting each value above for demonstration purposes. I cannot figure out > how to store the created IDs. I know I have to create a container, but I > don't know, among other things, how to index the container. Any help is > appreciated. TIA > > -Greg > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Greg, One problem is that you have misplaced the closing brace in the third loop. It should follow the assignment statement. Because you used loops rather than Bert's suggestion, perhaps you are trying to order the values assigned. In your example, the ordering will be ssn, then month of birth, then day of birth. Occasionally people resort to an explicit calculation for the index: id<-vector("character",10*3*5) for (ssn in 0:9){ for (month in 1:3){ for (day in 1:5){ id[day+(month-1)*5+ssn*15] <-paste0(ssn, month, day) } } } This would order the values in the opposite precedence. Also, you may not want to create well over 3 million values as in your initial specification, in which case a different strategy using "sample" would be appropriate. Jim On Mon, Aug 19, 2019 at 4:58 AM <g.eastham.gilbert at gmail.com> wrote:> > I would like to create pseudo identification numbers in the format of last > four of a social security number (0000 to 9999), month of birth (01 to 12), > and day of birth (01-28). The IDs can be character. > > I have gotten this far: > > for (ssn in 0:9){ > for (month in 1:3){ > for (day in 1:5){ > } > id <-paste(ssn, month, day, sep="") > } > } > > limiting each value above for demonstration purposes. I cannot figure out > how to store the created IDs. I know I have to create a container, but I > don't know, among other things, how to index the container. Any help is > appreciated. TIA > > -Greg > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
do.call(paste0,expand.grid(0:1000, 1:12, 1:30)) takes care of storing all the values, but note that paste() doesn't put leading zeroes in front of small numbers so this maps lots of ssn/month/day combos to the the same id. sprintf() can take care of that: id <- with(expand.grid(ssn=0:1000, month=1:12, day=1:30), sprintf("%04d%02d%02d", ssn, month, day)) You probably should define a function to map vectors of ssn, month, and day to a vector of ids (it can also check for inappropriate inputs), check that it works, and use it instead of repeating the sprintf() or paste0() code. Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Aug 18, 2019 at 12:18 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> id <- do.call(paste0,expand.grid(0:9, 1:3, 1:5)) > > Comment: If you use R much, you'll do much better using R language > constructs than trying to apply those from other languages (Java perhaps?). > I realize this can be difficult, especially if you are experienced in the > another language (or languages), but it's worth the effort. > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, Aug 18, 2019 at 11:58 AM <g.eastham.gilbert at gmail.com> wrote: > > > I would like to create pseudo identification numbers in the format of > last > > four of a social security number (0000 to 9999), month of birth (01 to > 12), > > and day of birth (01-28). The IDs can be character. > > > > I have gotten this far: > > > > for (ssn in 0:9){ > > for (month in 1:3){ > > for (day in 1:5){ > > } > > id <-paste(ssn, month, day, sep="") > > } > > } > > > > limiting each value above for demonstration purposes. I cannot figure out > > how to store the created IDs. I know I have to create a container, but I > > don't know, among other things, how to index the container. Any help is > > appreciated. TIA > > > > -Greg > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Greg, I replied because I thought the name of the "expand.grid" function can be puzzling. While "expand.grid" is a very elegant and useful function, it is much easier to see what is happening with explicit loops rather than loops buried deep inside "expand.grid". Also note Bill's comment about producing repeats by converting numeric values to character without the leading zeros. You can also use "formatC" to deal with that problem. Jim On Tue, Aug 20, 2019 at 12:05 AM <g.eastham.gilbert at gmail.com> wrote:> > Jim, > > Thank you very much for your help. I have "unpacked" the code and have a rudimentary understanding of what you did. Thanks again. However, I have no idea to what Bert is referring. Could you help me understand his suggestion? Thanks. > > -Greg
From section 9.2.2 (on looping) in "An Introduction to R": "*Warning*: for() loops are used in R code much less often than in compiled languages. Code that takes a ?whole object? view is likely to be both clearer and faster in R." Web searching on "for loops in R" and similar will give you further comments and perspectives. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 19, 2019 at 2:12 PM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Greg, > I replied because I thought the name of the "expand.grid" function can > be puzzling. While "expand.grid" is a very elegant and useful > function, it is much easier to see what is happening with explicit > loops rather than loops buried deep inside "expand.grid". Also note > Bill's comment about producing repeats by converting numeric values to > character without the leading zeros. You can also use "formatC" to > deal with that problem. > > Jim > > On Tue, Aug 20, 2019 at 12:05 AM <g.eastham.gilbert at gmail.com> wrote: > > > > Jim, > > > > Thank you very much for your help. I have "unpacked" the code and have a > rudimentary understanding of what you did. Thanks again. However, I have no > idea to what Bert is referring. Could you help me understand his > suggestion? Thanks. > > > > -Greg > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Perhaps different people find different concepts the most challenging, but I find looking at the output of expand.grid quite informative... do try it out. The do.call function seems to be the more obscure function here, but Bert's code id <- do.call( paste0, expand.grid(0:9,1:3,1:5) ) is equivalent to all_comb <- expand.grid( 0:9, 1:3, 1:5 ) all_comb # look at it for learning, remove once you understand paste0( all_comb[[1]], all_comb[[2]], all_comb[[3]] ) because all_comb is a data frame, which is a list of column vectors all the same length. The do.call function expects the first argument to be a function symbol, while the second argument to do.call should be a single object that is a list of arguments you want that function to be given as separate arguments. The paste0 function puts the three vectors together into one character vector, element by element. Read the help pages for each function: ?expand.grid ?paste0 ?do.call On the other hand, nested for loops seem to become spaghetti quickly in my mind... essentially just write-only code because I never want to look at it again. On August 19, 2019 2:09:59 PM PDT, Jim Lemon <drjimlemon at gmail.com> wrote:>Hi Greg, >I replied because I thought the name of the "expand.grid" function can >be puzzling. While "expand.grid" is a very elegant and useful >function, it is much easier to see what is happening with explicit >loops rather than loops buried deep inside "expand.grid". Also note >Bill's comment about producing repeats by converting numeric values to >character without the leading zeros. You can also use "formatC" to >deal with that problem. > >Jim > >On Tue, Aug 20, 2019 at 12:05 AM <g.eastham.gilbert at gmail.com> wrote: >> >> Jim, >> >> Thank you very much for your help. I have "unpacked" the code and >have a rudimentary understanding of what you did. Thanks again. >However, I have no idea to what Bert is referring. Could you help me >understand his suggestion? Thanks. >> >> -Greg > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.