I know enough about R to be dangerous and our marketing people have asked me to "automate" some reporting. Data comes from an SQL source and graphs and various summaries are currently created manually in Excel. The raw information is invoicing records and the reporting is basically summaries by customer, region, product line etc. With function such as aggregate(), hist() and pareto() (which someone on this list kindly pointed me at) I can produce something roughly equivalent to the current reports. My question is, are there any neat R "lock out" features people here like to use on this kind of info, particularly when the output is very visual (report is intended for marketing people). Another way of looking at this is, What kind of "hidden" information can I extract with R that the Excel solution hasn't touched? For example, even the pareto plot mentioned earlier is something the Excel guys haven't thought of or can't easily produce. regards BTW the tool chain I am using goes something like: Production (run daily): DB -> SQL/python -> CSV -> R/python -> images -> network Presentation: network -> CGI/python -> browser
Hi Paul, I find your question intriguing, but might I ask that you elaborate on your terminology and context of "lock out" and "hidden" in your question? Otherwise I am afraid that my current ideas on an answer will surely be based on the wrong diagnosis of what you are really looking for. Thanks, Bill ------------------------------- Bill Pikounis, PhD Nonclinical Statistics Centocor, Inc. 200 Great Valley Parkway MailStop C4-1 Malvern, PA 19355 610 240 8498 fax 610 651 6717> -----Original Message----- > From: Paul Sorenson [mailto:Paul.Sorenson@vision-bio.com] > Sent: Wednesday, January 19, 2005 5:19 PM > To: r-help@stat.math.ethz.ch > Subject: [R] easing out of Excel > > > I know enough about R to be dangerous and our marketing > people have asked me to "automate" some reporting. Data > comes from an SQL source and graphs and various summaries are > currently created manually in Excel. The raw information is > invoicing records and the reporting is basically summaries by > customer, region, product line etc. > > With function such as aggregate(), hist() and pareto() (which > someone on this list kindly pointed me at) I can produce > something roughly equivalent to the current reports. > > My question is, are there any neat R "lock out" features > people here like to use on this kind of info, particularly > when the output is very visual (report is intended for > marketing people). > > Another way of looking at this is, What kind of "hidden" > information can I extract with R that the Excel solution > hasn't touched? > > For example, even the pareto plot mentioned earlier is > something the Excel guys haven't thought of or can't easily produce. > > regards > > BTW the tool chain I am using goes something like: > Production (run daily): > DB -> SQL/python -> CSV -> R/python -> images -> network > Presentation: > network -> CGI/python -> browser > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >[[alternative HTML version deleted]]
>>> "Paul Sorenson" <Paul.Sorenson at vision-bio.com> 01/19/05 03:18PM >>> >> I know enough about R to be dangerous and our marketing people have >> asked me to "automate" some reporting. Data comes from an SQLsource>> and graphs and various summaries are currently created manually in >> Excel. The raw information is invoicing records and the reportingis>> basically summaries by customer, region, product line etc. >> >> With function such as aggregate(), hist() and pareto() (whichsomeone>> on this list kindly pointed me at) I can produce something roughly >> equivalent to the current reports. >> >> My question is, are there any neat R "lock out" features peoplehere>> like to use on this kind of info, particularly when the output isvery>> visual (report is intended for marketing people). >> >> Another way of looking at this is, What kind of "hidden"information>> can I extract with R that the Excel solution hasn't touched?Since you are looking for summaries within groups, you should look at the lattice package and some of the plots that you can produce with it (maybe for each product line you can produce a lattice/trellis graph with each panel representing a region and different colors symbols within panels to represent different customers). If we had more of an idea of what you are looking for, we could give better suggestions.>> For example, even the pareto plot mentioned earlier is somethingthe>> Excel guys haven't thought of or can't easily produce. >> >> regards >> >> BTW the tool chain I am using goes something like: >> Production (run daily): >> DB -> SQL/python -> CSV -> R/python -> images ->network>> Presentation: >> network -> CGI/python -> browserIt looks like you want the reports fully automated and the final result as HTML (to be viewed with a browser), I suggest you look at the R2HTML package and the sweave function (this lets you write a report in HTML with r-code in place of graphs and output, then a quick run through sweave and you have a final report in HTML ready to be viewed). There are also several tools available for running R through CGI, go to: http://www-r.project.org/ and click on "R web-servers" under the "Related Projects" heading in the left column to get details. Hope this helps, Greg Snow, Ph.D. Statistical Data Center greg.snow at ihc.com (801) 408-8111
Definitely check out the lattice package. One other option is to use sweave/latex mixed with RODBC. This can be used to produce PDF's for easy distribution as well. I would also consider operating this in a batch mode, the R/sweave/latex works very well this way. Shawn Way, PE Engineering Manager -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Snow Sent: Thursday, January 20, 2005 10:52 AM To: r-help at stat.math.ethz.ch; Paul.Sorenson at vision-bio.com Subject: Re: [R] easing out of Excel>>> "Paul Sorenson" <Paul.Sorenson at vision-bio.com> 01/19/05 03:18PM >>> >> I know enough about R to be dangerous and our marketing people have >> asked me to "automate" some reporting. Data comes from an SQLsource>> and graphs and various summaries are currently created manually in >> Excel. The raw information is invoicing records and the reportingis>> basically summaries by customer, region, product line etc. >> >> With function such as aggregate(), hist() and pareto() (whichsomeone>> on this list kindly pointed me at) I can produce something roughly >> equivalent to the current reports. >> >> My question is, are there any neat R "lock out" features peoplehere>> like to use on this kind of info, particularly when the output isvery>> visual (report is intended for marketing people). >> >> Another way of looking at this is, What kind of "hidden"information>> can I extract with R that the Excel solution hasn't touched?Since you are looking for summaries within groups, you should look at the lattice package and some of the plots that you can produce with it (maybe for each product line you can produce a lattice/trellis graph with each panel representing a region and different colors symbols within panels to represent different customers). If we had more of an idea of what you are looking for, we could give better suggestions.>> For example, even the pareto plot mentioned earlier is somethingthe>> Excel guys haven't thought of or can't easily produce. >> >> regards >> >> BTW the tool chain I am using goes something like: >> Production (run daily): >> DB -> SQL/python -> CSV -> R/python -> images ->network>> Presentation: >> network -> CGI/python -> browserIt looks like you want the reports fully automated and the final result as HTML (to be viewed with a browser), I suggest you look at the R2HTML package and the sweave function (this lets you write a report in HTML with r-code in place of graphs and output, then a quick run through sweave and you have a final report in HTML ready to be viewed). There are also several tools available for running R through CGI, go to: http://www-r.project.org/ and click on "R web-servers" under the "Related Projects" heading in the left column to get details. Hope this helps, Greg Snow, Ph.D. Statistical Data Center greg.snow at ihc.com (801) 408-8111 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thanks for the responses to this question, I fully realise it is a rather open question and the "open" pointers are the kind of thing I am looking for. I will look into the lattice package and layout. Regarding the HTML output, the current "tool chain" assets that I have have been refactored over time and are almost totally driven by config files so they suit my purposes very well. I will look into other possibilities at a later date. For those looking for a more rigorous specification of the problem, you are well justified in this. I was deliberately fuzzy since managers just want "stuff" and I thought casting a wide net would pay off. The problem is to summarise information which is nothing more than sales data. The kinds of columns I am dealing with look like: date, customer, invoice_no, product, amount, sales_region, etc etc. Managers want to know things like: - which products are doing well - which regions are doing well - who are good customers - etc To me these are simple aggregates and sorts, with visual presentations to match. I figure a bit of effort, R can extract considerably more useful information from the data. To be honest I am just evolving it as I go, using an existing spreadsheet as a basis. I try something and if it is useful then great, if not, put it down to learning. cheers
I hesitate to add this comment since it either completely confuses people or they take to it very quickly. The data that you are using is mostly categorical. I expect that tables will have been used in the past and that to acertain extent the graphics are suppossed to help with getting a quick understanding of the data. There is a package called vcd (Visualizing Categorical Data) which is useful for analysing this type of data. I like the use of the mosaicplot and in particular the shade parameter (which is based on standardized residuals). If set up properly it can be used to very quickly identify sales regions that are doing significantly better than they were last year, customers who have significantly reduced purchases. Basically if you can produce a table that would give this information then a shaded mosaicplot can efficiently highlight the significant parts of the table. They take a little bit of getting used to at first, but if you need to analyse this type of data they take a lot of the guess work out of making commentary on the data. How useful they are depends upon the users, who as I have said seem to be polarised in their reactions to the output. Tom> -----Original Message----- > From: Paul Sorenson [mailto:Paul.Sorenson at vision-bio.com] > Sent: Friday, 21 January 2005 11:33 AM > To: r-help at stat.math.ethz.ch > Subject: RE: [R] easing out of Excel > > > Thanks for the responses to this question, I fully realise it > is a rather open question and the "open" pointers are the > kind of thing I am looking for. > > I will look into the lattice package and layout. > > Regarding the HTML output, the current "tool chain" assets > that I have have been refactored over time and are almost > totally driven by config files so they suit my purposes very > well. I will look into other possibilities at a later date. > > For those looking for a more rigorous specification of the > problem, you are well justified in this. I was deliberately > fuzzy since managers just want "stuff" and I thought casting > a wide net would pay off. The problem is to summarise > information which is nothing more than sales data. The kinds > of columns I am dealing with look like: > > date, customer, invoice_no, product, amount, sales_region, etc etc. > > Managers want to know things like: > - which products are doing well > - which regions are doing well > - who are good customers > - etc > > To me these are simple aggregates and sorts, with visual > presentations to match. > > I figure a bit of effort, R can extract considerably more > useful information from the data. > > To be honest I am just evolving it as I go, using an existing > spreadsheet as a basis. I try something and if it is useful > then great, if not, put it down to learning. > > cheers > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >