thr3ads.net - R help - [R] easing out of Excel [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Paul Sorenson

2005-Jan-19 22:18 UTC

[R] easing out of Excel

I know enough about R to be dangerous and our marketing people have asked me to
"automate" some reporting.  Data comes from an SQL source and graphs
and various summaries are currently created manually in Excel.  The raw
information is invoicing records and the reporting is basically summaries by
customer, region, product line etc.

With function such as aggregate(), hist() and pareto() (which someone on this
list kindly pointed me at) I can produce something roughly equivalent to the
current reports.

My question is, are there any neat R "lock out" features people here
like to use on this kind of info, particularly when the output is very visual
(report is intended for marketing people).

Another way of looking at this is, What kind of "hidden" information
can I extract with R that the Excel solution hasn't touched?

For example, even the pareto plot mentioned earlier is something the Excel guys
haven't thought of or can't easily produce.

regards

BTW the tool chain I am using goes something like:
	Production (run daily):
		DB -> SQL/python -> CSV -> R/python -> images -> network
	Presentation:
		network -> CGI/python -> browser

Pikounis, Bill [CNTUS]

2005-Jan-20 12:21 UTC

head link

[R] easing out of Excel

Hi Paul,
I find your question intriguing, but might I ask that you elaborate on your
terminology and context of "lock out" and "hidden" in your
question?
Otherwise I am afraid that my current ideas on an answer will surely be
based on the wrong diagnosis of what you are really looking for.

Thanks,
Bill

-------------------------------
Bill Pikounis, PhD

Nonclinical Statistics
Centocor, Inc.
200 Great Valley Parkway
MailStop C4-1
Malvern, PA 19355

610 240 8498
fax 610 651 6717 
> -----Original Message-----
> From: Paul Sorenson [mailto:Paul.Sorenson@vision-bio.com]
> Sent: Wednesday, January 19, 2005 5:19 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] easing out of Excel
> 
> 
> I know enough about R to be dangerous and our marketing 
> people have asked me to "automate" some reporting.  Data 
> comes from an SQL source and graphs and various summaries are 
> currently created manually in Excel.  The raw information is 
> invoicing records and the reporting is basically summaries by 
> customer, region, product line etc.
> 
> With function such as aggregate(), hist() and pareto() (which 
> someone on this list kindly pointed me at) I can produce 
> something roughly equivalent to the current reports.
> 
> My question is, are there any neat R "lock out" features 
> people here like to use on this kind of info, particularly 
> when the output is very visual (report is intended for 
> marketing people).
> 
> Another way of looking at this is, What kind of "hidden" 
> information can I extract with R that the Excel solution 
> hasn't touched?
> 
> For example, even the pareto plot mentioned earlier is 
> something the Excel guys haven't thought of or can't easily
produce.
> 
> regards
> 
> BTW the tool chain I am using goes something like:
> 	Production (run daily):
> 		DB -> SQL/python -> CSV -> R/python -> images -> network
> 	Presentation:
> 		network -> CGI/python -> browser
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
	[[alternative HTML version deleted]]

Greg Snow

2005-Jan-20 16:52 UTC

head link

[R] easing out of Excel

>>> "Paul Sorenson" <Paul.Sorenson at vision-bio.com>
01/19/05 03:18PM >>>
>> I know enough about R to be dangerous and our marketing people have
>> asked me to "automate" some reporting.  Data comes from an
SQL
source>> and graphs and various summaries are currently created manually in
>> Excel.  The raw information is invoicing records and the reporting
is>> basically summaries by customer, region, product line etc.
>> 
>> With function such as aggregate(), hist() and pareto() (which
someone>> on this list kindly pointed me at) I can produce something roughly
>> equivalent to the current reports.
>> 
>> My question is, are there any neat R "lock out" features
people
here>> like to use on this kind of info, particularly when the output is
very>> visual (report is intended for marketing people).
>> 
>> Another way of looking at this is, What kind of "hidden"
information>> can I extract with R that the Excel solution hasn't touched?
Since you are looking for summaries within groups, you should look at
the
lattice package and some of the plots that you can produce with it
(maybe
for each product line you can produce a lattice/trellis graph with each
panel
representing a region and different colors symbols within panels to
represent
different customers).

If we had more of an idea of what you are looking for, we could give
better
suggestions.

>> For example, even the pareto plot mentioned earlier is something
the>> Excel guys haven't thought of or can't easily produce.
>> 
>> regards
>> 
>> BTW the tool chain I am using goes something like:
>> 	Production (run daily):
>> 		DB -> SQL/python -> CSV -> R/python -> images ->
network>> 	Presentation:
>> 		network -> CGI/python -> browser
It looks like you want the reports fully automated and the final result
as HTML
(to be viewed with a browser), I suggest you look at the R2HTML package
and
the sweave function (this lets you write a report in HTML with r-code
in place of 
graphs and output, then a quick run through sweave and you have a final
report
in HTML ready to be viewed).

There are also several tools available for running R through CGI, go
to: 
http://www-r.project.org/ and click on "R web-servers" under the
"Related Projects"
heading in the left column to get details.

Hope this helps,


Greg Snow, Ph.D.
Statistical Data Center
greg.snow at ihc.com
(801) 408-8111

Shawn Way

2005-Jan-20 20:41 UTC

head link

[R] easing out of Excel

Definitely check out the lattice package.

One other option is to use sweave/latex mixed with RODBC.  This can be
used to produce PDF's for easy distribution as well.  I would also
consider operating this in a batch mode, the R/sweave/latex works very
well this way.


Shawn Way, PE
Engineering Manager

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Snow
Sent: Thursday, January 20, 2005 10:52 AM
To: r-help at stat.math.ethz.ch; Paul.Sorenson at vision-bio.com
Subject: Re: [R] easing out of Excel
>>> "Paul Sorenson" <Paul.Sorenson at vision-bio.com>
01/19/05 03:18PM >>>
>> I know enough about R to be dangerous and our marketing people have
>> asked me to "automate" some reporting.  Data comes from an
SQL
source>> and graphs and various summaries are currently created manually in
>> Excel.  The raw information is invoicing records and the reporting
is>> basically summaries by customer, region, product line etc.
>> 
>> With function such as aggregate(), hist() and pareto() (which
someone>> on this list kindly pointed me at) I can produce something roughly
>> equivalent to the current reports.
>> 
>> My question is, are there any neat R "lock out" features
people
here>> like to use on this kind of info, particularly when the output is
very>> visual (report is intended for marketing people).
>> 
>> Another way of looking at this is, What kind of "hidden"
information>> can I extract with R that the Excel solution hasn't touched?
Since you are looking for summaries within groups, you should look at
the
lattice package and some of the plots that you can produce with it
(maybe
for each product line you can produce a lattice/trellis graph with each
panel
representing a region and different colors symbols within panels to
represent
different customers).

If we had more of an idea of what you are looking for, we could give
better
suggestions.

>> For example, even the pareto plot mentioned earlier is something
the>> Excel guys haven't thought of or can't easily produce.
>> 
>> regards
>> 
>> BTW the tool chain I am using goes something like:
>> 	Production (run daily):
>> 		DB -> SQL/python -> CSV -> R/python -> images ->
network>> 	Presentation:
>> 		network -> CGI/python -> browser
It looks like you want the reports fully automated and the final result
as HTML
(to be viewed with a browser), I suggest you look at the R2HTML package
and
the sweave function (this lets you write a report in HTML with r-code
in place of 
graphs and output, then a quick run through sweave and you have a final
report
in HTML ready to be viewed).

There are also several tools available for running R through CGI, go
to: 
http://www-r.project.org/ and click on "R web-servers" under the
"Related Projects"
heading in the left column to get details.

Hope this helps,


Greg Snow, Ph.D.
Statistical Data Center
greg.snow at ihc.com
(801) 408-8111

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Paul Sorenson

2005-Jan-21 03:33 UTC

head link

[R] easing out of Excel

Thanks for the responses to this question, I fully realise it is a rather open
question and the "open" pointers are the kind of thing I am looking
for.

I will look into the lattice package and layout.

Regarding the HTML output, the current "tool chain" assets that I have
have been refactored over time and are almost totally driven by config files so
they suit my purposes very well.  I will look into other possibilities at a
later date.

For those looking for a more rigorous specification of the problem, you are well
justified in this.  I was deliberately fuzzy since managers just want
"stuff" and I thought casting a wide net would pay off.  The problem
is to summarise information which is nothing more than sales data.  The kinds of
columns I am dealing with look like:

date, customer, invoice_no, product, amount, sales_region, etc etc.

Managers want to know things like:
	- which products are doing well
	- which regions are doing well
	- who are good customers
	- etc

To me these are simple aggregates and sorts, with visual presentations to match.

I figure a bit of effort, R can extract considerably more useful information
from the data.

To be honest I am just evolving it as I go, using an existing spreadsheet as a
basis.  I try something and if it is useful then great, if not, put it down to
learning.

cheers

Mulholland, Tom

2005-Jan-21 04:04 UTC

head link

[R] easing out of Excel

I hesitate to add this comment since it either completely confuses people or
they take to it very quickly.

The data that you are using is mostly categorical. I expect that tables will
have been used in the past and that to acertain extent the graphics are
suppossed to help with getting a quick understanding of the data.

There is a package called vcd (Visualizing Categorical Data) which is useful for
analysing this type of data. I like the use of the mosaicplot and in particular
the shade parameter (which is based on standardized residuals). If set up
properly it can be used to very quickly identify sales regions that are doing
significantly better than they were last year, customers who have significantly
reduced purchases. Basically if you can produce a table that would give this
information then a shaded mosaicplot can efficiently highlight the  significant
parts of the table.

They take a little bit of getting used to at first, but if you need to analyse
this type of data they take a lot of the guess work out of making commentary on
the data. How useful they are depends upon the users, who as I have said seem to
be polarised in their reactions to the output.

Tom
> -----Original Message-----
> From: Paul Sorenson [mailto:Paul.Sorenson at vision-bio.com]
> Sent: Friday, 21 January 2005 11:33 AM
> To: r-help at stat.math.ethz.ch
> Subject: RE: [R] easing out of Excel
> 
> 
> Thanks for the responses to this question, I fully realise it 
> is a rather open question and the "open" pointers are the 
> kind of thing I am looking for.
> 
> I will look into the lattice package and layout.
> 
> Regarding the HTML output, the current "tool chain" assets 
> that I have have been refactored over time and are almost 
> totally driven by config files so they suit my purposes very 
> well.  I will look into other possibilities at a later date.
> 
> For those looking for a more rigorous specification of the 
> problem, you are well justified in this.  I was deliberately 
> fuzzy since managers just want "stuff" and I thought casting 
> a wide net would pay off.  The problem is to summarise 
> information which is nothing more than sales data.  The kinds 
> of columns I am dealing with look like:
> 
> date, customer, invoice_no, product, amount, sales_region, etc etc.
> 
> Managers want to know things like:
> 	- which products are doing well
> 	- which regions are doing well
> 	- who are good customers
> 	- etc
> 
> To me these are simple aggregates and sorts, with visual 
> presentations to match.
> 
> I figure a bit of effort, R can extract considerably more 
> useful information from the data.
> 
> To be honest I am just evolving it as I go, using an existing 
> spreadsheet as a basis.  I try something and if it is useful 
> then great, if not, put it down to learning.
> 
> cheers
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Jan 2005 - easing out of Excel

[R] easing out of Excel

[R] easing out of Excel

[R] easing out of Excel

[R] easing out of Excel

[R] easing out of Excel

[R] easing out of Excel

Apparently Analagous Threads