Hi all, I'm new to using R, and apologize for simplicity of this question. I'm using a data set with over 60,000 observations, Two variables are patient ID, and cost incurred by the patient. I'd like to generate frequency/table by patient and cost IF the total cost is over 2000. Right now I'm using: by(x$cost, x$patient, sum) but this generates a huge list for each patient. What is the best way to either (1) export the output into a csv so I can visually inspect each patient or more helpful (2) create the table IF sum of cost > 1000 Thanks!
1. Before you post to this list again, please read "An Introd to R" -- or other basic R tutorial. "Intro" ships with every R installation. There's a reason for this -- to avoid badgering this list with basic R queries that minimal homework could answer. 2. However, see also ?table and links therein. 3. "?"[" and ?subset are also relevant. -- Bert On Tue, Jun 5, 2012 at 8:34 AM, mkm1616 <mkm1616 at gmail.com> wrote:> Hi all, I'm new to using R, and apologize for simplicity of this > question. > > I'm using a data set with over 60,000 observations, Two variables are > patient ID, and cost incurred by the patient. ?I'd like to generate > frequency/table by patient and cost IF the total cost is over 2000. > > Right now I'm using: > > by(x$cost, x$patient, sum) > > but this generates a huge list for each patient. > > What is the best way to either (1) export the output into a csv so I > can visually inspect each patient or more helpful (2) create the table > IF sum of cost > 1000 > > Thanks! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hello, Try Total <- aggregate(cost~patient, data=x) Total[Total$cost > 1000, ] As for writing to a csv file, see ?write.csv Hope this helps, Rui Barradas Em 05-06-2012 16:34, mkm1616 escreveu:> Hi all, I'm new to using R, and apologize for simplicity of this > question. > > I'm using a data set with over 60,000 observations, Two variables are > patient ID, and cost incurred by the patient. I'd like to generate > frequency/table by patient and cost IF the total cost is over 2000. > > Right now I'm using: > > by(x$cost, x$patient, sum) > > but this generates a huge list for each patient. > > What is the best way to either (1) export the output into a csv so I > can visually inspect each patient or more helpful (2) create the table > IF sum of cost> 1000 > > Thanks! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, Try this example: set.seed(1) dat2<-data.frame(patient=rep(c(1:20),5),cost=rnorm(100,15,1)) agg1<-aggregate(dat2,by=list(dat2$patient),FUN=sum) ?agg2<-agg1$cost> 75 agg3<-data.frame(agg1[agg2,2:3]) A.K. ----- Original Message ----- From: mkm1616 <mkm1616 at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, June 5, 2012 11:34 AM Subject: [R] need descriptive help Hi all, I'm new to using R, and apologize for simplicity of this question. I'm using a data set with over 60,000 observations, Two variables are patient ID, and cost incurred by the patient.? I'd like to generate frequency/table by patient and cost IF the total cost is over 2000. Right now I'm using: by(x$cost, x$patient, sum) but this generates a huge list for each patient. What is the best way to either (1) export the output into a csv so I can visually inspect each patient or more helpful (2) create the table IF sum of cost > 1000 Thanks! ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, Oops! I selected the wrong columns in the previous reply. Try this: ?set.seed(1) ? ?dat2<-data.frame(patient=rep(c(1:20),5),cost=rnorm(100,15,1)) ?agg1<-aggregate(dat2,by=list(dat2$patient),FUN=sum) ? agg2<-agg1$cost> 75 ?agg3<-data.frame(agg1[agg2,c(1,3)]) ?names(agg3)<-c("patient","cost") ?agg3 ----- Original Message ----- From: mkm1616 <mkm1616 at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, June 5, 2012 11:34 AM Subject: [R] need descriptive help Hi all, I'm new to using R, and apologize for simplicity of this question. I'm using a data set with over 60,000 observations, Two variables are patient ID, and cost incurred by the patient.? I'd like to generate frequency/table by patient and cost IF the total cost is over 2000. Right now I'm using: by(x$cost, x$patient, sum) but this generates a huge list for each patient. What is the best way to either (1) export the output into a csv so I can visually inspect each patient or more helpful (2) create the table IF sum of cost > 1000 Thanks! ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.