Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? -- Regards Abhinaba Roy [[alternative HTML version deleted]]
On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:> Dear R helpers, > > I want to generate data for say 1000 patients (i.e., 1000 unique IDs) > having suffered from various diseases in the past (say diseases > A,B,C,D,E,F). The only condition imposed is that each patient should've > suffered from *atleast* two diseases. So my data frame will have two > columns 'ID' and 'Disease'. > > I want to do a basket analysis with this data, where ID will be the > identifier and we will establish rules based on the 'Disease' column. > > How can I generate this type of data in R? >Perhaps something along these lines for 20 cases:> data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" ) )+ ) patient disease 1 1 F+D 2 2 F+A+D+E 3 3 F+D+C+E 4 4 B+D+C+A 5 5 D+A+F+C 6 6 E+A+D 7 7 E+F+B+C+A+D 8 8 A+B+C+D+E 9 9 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F> -- > Regards > Abhinaba Roy > > [[alternative HTML version deleted]]You should read the Posting Guide and learn to post in HTML.> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- David Winsemius Alameda, CA, USA
Hi, Check if this works: ?set.seed(495) ?dat <- data.frame(ID=sample(1:10,20,replace=TRUE), Disease=sample(LETTERS[1:6], 20, replace=TRUE) ) subset(melt(table(dat)[rowSums(!!table(dat))>1,]), !!value,select=1:2) ?? ID Disease 1?? 2?????? A 3?? 4?????? A 4?? 6?????? A 6? 10?????? A 8?? 3?????? B 15? 4?????? C 16? 6?????? C 20? 3?????? D 22? 6?????? D 24 10?????? D 26? 3?????? E 27? 4?????? E 29? 7?????? E 31? 2?????? F 33? 4?????? F 35? 7?????? F A.K. On Wednesday, June 25, 2014 1:17 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? -- Regards Abhinaba Roy ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.