Hi, I'd like to use R to do what excel pivot tables do, and plot results. I've never used R before, and I've managed to do something, but it's quite a lot of code to do something simple. I can't help but think I'm not "Doing it the R way". I could be using R for the wrong thing, in which case, please tell me off. I was hoping something like plot(by(t, factor(t$snr), summary)) would do something, but it doesn't. Say my data is (for example) SNR timeError 4 1.3 4 2.1 4 1.2 6 2.1 6 2.2 6 2.1 8 3.2 8 3.7 8 3.1 I want to produce a plot of SNR vs mean(timeError) with error bars of magnitude 3 sigma. here's what I've got so far (without the error bars. I can't do that yet). I'm sure it's the wrong way to go about this: ******* BEGIN SNIPPET ******* get_stats <- function(t) { cnfac <- factor(t$cnset); mu <- as.list(by(t$snr, cnfac, mean)); tvar <- as.list(by(t$snr, cnfac, var)); t <- list(mu=mu, var=tvar); } vn <- read.table('vn.csv', sep=','); vn_stats <- get_stats(vn); vsn <- read.table('vsn.csv', sep=','); vsn_stats <- get_stats(vsn); snrs <- as.numeric(names(vn_stats$mu)) matplot(snrs, cbind(vn_stats$mu, vsn_stats$mu)); windows(); matplot(snrs, cbind(vn_stats$var, vsn_stats$var)); ******* END SNIPPET ******* Appreciate any helpful hints from the pros. Cheers! p.s. We've been having rather a good time around the office recently with "International Talk Like a Pirate Day" (www.yarr.org.uk). R fits in very well: "I be usin' Arrrgghhhh for my post processin'". Keith Bannister -- Electrical Engineer Astrium Ltd This email is for the intended addressee only. If you have received it in error then you must not use, retain, disseminate or otherwise deal with it. Please notify the sender by return email. The views of the author may not necessarily constitute the views of EADS Astrium Limited. Nothing in this email shall bind EADS Astrium Limited in any contract or obligation. EADS Astrium Limited, Registered in England and Wales No. 2449259 Registered Office: Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS, England
>>> "BANNISTER, Keith" <keith.bannister at astrium.eads.net> 09/20/0509:46AM >>>>> >> Hi, >> >> I'd like to use R to do what excel pivot tables do, and plotresults. R does not have pivot tables and I hope that it never does. My experiance with pivot tables is that they encourage poor initial design followed by non-easily-reproducable post-hoc twiddling. R encourages proper initial design followed by fixing the core design in cases where things don't turn out the way you intended. In R I prefer to work with script files and save the file. If the table or graph does not turn out the way I intended, then I just edit the script file and rerun it. While this may be a little more work than clicking on a pivot table at first, in the long run I find it saves more time. Consider the situation where you create a table/graph, then a month later your boss/client/coworker finds some typos in the original data and needs the table and/or graph recreated with the corrected data (or maybe a new dataset that needs a similar graph/table). With the pivot table you need to try and remember everything that you clicked on and click on it again. With the R script file you just fix the data (or load in the new data) and rerun the script and your done. OK, enough of my ranting, on to helping with your problem.>> I've never used R before, and I've managed to do something, but it'squite a>> lot of code to do something simple. I can't help but think I'm not"Doing it>> the R way". >> >> I could be using R for the wrong thing, in which case, please tellme off. [snip] "by" is a bit of an overkill for this situation, tapply will probably work better. try this basic script as a starting place: ### start ### my.df <- data.frame( SNR=rep( c(4,6,8), each=3), timeError = c(1.3,2.1,1.2,2.1,2.2,2.1,3.2,3.7,3.1)) tmp.mean <- tapply( my.df$timeError, my.df$SNR, mean) tmp.sd <- tapply( my.df$timeError, my.df$SNR, sd) tmp.x <- unique(my.df$SNR) plot( tmp.x, tmp.mean, ylim=range(tmp.mean+3*tmp.sd,tmp.mean-3*tmp.sd), xlab='SNR',ylab='timeError') segments(tmp.x, tmp.mean-3*tmp.sd, tmp.x, tmp.mean+3*tmp.sd, col='green') ### optional points(tmp.x, tmp.mean+3*tmp.sd, pch='-',cex=3,col='green') points(tmp.x, tmp.mean-3*tmp.sd, pch='-',cex=3,col='green') points(tmp.x, tmp.mean) ### end script ### This may be even simpler with a loaded package. a quick search shows the following functions (package in parens) that may help: plotCI(gplots) Plot Error Bars and Confidence Intervals errbar(Hmisc) Plot Error Bars xYplot(Hmisc) xyplot and dotplot with Matrix Variables to Plot Error Bars and Bands plotCI(plotrix) Plot confidence intervals/error bars errbar(sfsmisc) Scatter Plot with Error Bars plotCI(sfsmisc) Plot Confidence Intervals / Error Bars>> Appreciate any helpful hints from the pros. >>hope this helps,>> Cheers! >> >> p.s. We've been having rather a good time around the office recentlywith>> "International Talk Like a Pirate Day" (www.yarr.org.uk). R fits invery>> well: "I be usin' Arrrgghhhh for my post processin'". >> >> >> Keith BannisterGreg Snow, Ph.D. Statistical Data Center, LDS Hospital Intermountain Health Care greg.snow at ihc.com (801) 408-8111
Hi Keith, You might want to check out my reshape package (http://had.co.nz/reshape/) which is very much pivot table inspired. I doesn't produce graphics yet, but the output is very amenable to being fed into existing R graphics function (especially lattice graphics). Hadley
> From: Greg Snow > > >>> "BANNISTER, Keith" <keith.bannister at astrium.eads.net> 09/20/05 > 09:46AM >>> > >> > >> Hi, > >> > >> I'd like to use R to do what excel pivot tables do, and plot > results. > > R does not have pivot tables and I hope that it never does. > > My experiance with pivot tables is that they encourage poor initial > design followed > by non-easily-reproducable post-hoc twiddling. > > R encourages proper initial design followed by fixing the core design > in cases > where things don't turn out the way you intended. > > In R I prefer to work with script files and save the file. If the > table or graph > does not turn out the way I intended, then I just edit the script file > and rerun it. > While this may be a little more work than clicking on a pivot table at > first, in the > long run I find it saves more time.Actually, it's even better to write functions for repetitive tasks. This is one of the things Martin talked about at useR! 2004: http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Maechler.pdf For Keith's problem, here's one possibility (using plotCI() from gplots): myErrorBarPlot <- function(SNR, timeError, ...) { stopifnot(require(gplots)) m <- aggregate(timeError, list(SNR), mean) d <- aggregate(timeError, list(SNR), sd) dat <- cbind(m, d[, 2]) names(dat) <- c("SNR", "mean", "sd") dat$SNR <- as.numeric(as.character(dat$SNR)) with(dat, plotCI(SNR, mean, uiw=3*sd, ...)) invisible(dat) } vn <- read.table("clipboard", header=TRUE) myErrorBarPlot(vn$SNR, vn$timeError) Andy> Consider the situation where you create a table/graph, then a month > later your > boss/client/coworker finds some typos in the original data and needs > the table > and/or graph recreated with the corrected data (or maybe a new dataset > that > needs a similar graph/table). With the pivot table you need > to try and > remember > everything that you clicked on and click on it again. With the R > script file you > just fix the data (or load in the new data) and rerun the script and > your done. > > OK, enough of my ranting, on to helping with your problem. > > > >> I've never used R before, and I've managed to do > something, but it's > quite a > >> lot of code to do something simple. I can't help but think I'm not > "Doing it > >> the R way". > >> > >> I could be using R for the wrong thing, in which case, please tell > me off. > [snip] > > "by" is a bit of an overkill for this situation, tapply will probably > work better. > > try this basic script as a starting place: > > ### start ### > my.df <- data.frame( SNR=rep( c(4,6,8), each=3), > timeError = c(1.3,2.1,1.2,2.1,2.2,2.1,3.2,3.7,3.1)) > > tmp.mean <- tapply( my.df$timeError, my.df$SNR, mean) > tmp.sd <- tapply( my.df$timeError, my.df$SNR, sd) > > tmp.x <- unique(my.df$SNR) > > plot( tmp.x, tmp.mean, > ylim=range(tmp.mean+3*tmp.sd,tmp.mean-3*tmp.sd), > xlab='SNR',ylab='timeError') > > segments(tmp.x, tmp.mean-3*tmp.sd, tmp.x, tmp.mean+3*tmp.sd, > col='green') > > ### optional > points(tmp.x, tmp.mean+3*tmp.sd, pch='-',cex=3,col='green') > points(tmp.x, tmp.mean-3*tmp.sd, pch='-',cex=3,col='green') > points(tmp.x, tmp.mean) > > ### end script ### > > This may be even simpler with a loaded package. a quick search shows > the following functions (package in parens) that may help: > > plotCI(gplots) Plot Error Bars and Confidence Intervals > errbar(Hmisc) Plot Error Bars > xYplot(Hmisc) xyplot and dotplot with Matrix Variables to > Plot Error Bars and Bands > > plotCI(plotrix) Plot confidence intervals/error bars > > errbar(sfsmisc) Scatter Plot with Error Bars > plotCI(sfsmisc) Plot Confidence Intervals / Error Bars > > > > > >> Appreciate any helpful hints from the pros. > >> > > hope this helps, > > >> Cheers! > >> > >> p.s. We've been having rather a good time around the > office recently > with > >> "International Talk Like a Pirate Day" (www.yarr.org.uk). R fits in > very > >> well: "I be usin' Arrrgghhhh for my post processin'". > >> > >> > >> Keith Bannister > > > Greg Snow, Ph.D. > Statistical Data Center, LDS Hospital > Intermountain Health Care > greg.snow at ihc.com > (801) 408-8111 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
>>> Gabor Grothendieck <ggrothendieck at gmail.com> 09/20/05 11:31AM >>> >> Just one comment here lest we be arguing against a strawman. >> While I agree that reproducibility can be a problem with pivottables>> if created interactively and this applies to just about anything youdo>> in Excel if done interactively, it should also be realized thatExcel is>> completely programmable, like R, using VBA or any language(including R!)>> via its COM object interface. >> >> The fact that Excel has both an interactive interface and ascript-based>> interface whereas R has only a script-based interface puts it ahead,not>> behind, R in at least some respects.Just one comment here lest we be arguing against a strawman. R has both interactive and script-based interfaces available and has for a long time (I remember working with an early port of S in the 1980's on VMS machines which if you used the old Tek10 graphics driver (anyone else remember the days of printer()...show() and tektronics(sp?) dumb terminals?) allowed you to click on a point in your graph and have it labelled). One of the big differences I see between R and Excel is that while they both have script and gui based interfaces, the gui interfaces for R (take Rcmdr for example) provide an aid to learning, while still encouraging the use of command lines, scripts, and functions, while Excel hides the script interface from all but experts and encourages non-reproducable clicking. Just because a software package has a capability does not mean much if the overall design promotes the use of a less desirable feature. I remember one job where before I came along they were using a spreedsheet to compute a column of numbers, highlighting and printing out those numbers, then hand entering these same numbers into a different spreadsheet. Dr. Burns has already posted the url that contains another of my experiances with intelligent people getting caught in one of Excel's traps (and yes Excel has a feature that would have prevented the trap, but Excel convieniently hid the need to use it). Greg Snow, Ph.D. Statistical Data Center, LDS Hospital Intermountain Health Care greg.snow at ihc.com (801) 408-8111
On 9/20/05, Greg Snow <greg.snow at ihc.com> wrote:> > >>> Gabor Grothendieck <ggrothendieck at gmail.com> 09/20/05 11:31AM >>> > >> Just one comment here lest we be arguing against a strawman. > >> While I agree that reproducibility can be a problem with pivot > tables > >> if created interactively and this applies to just about anything you > do > >> in Excel if done interactively, it should also be realized that > Excel is > >> completely programmable, like R, using VBA or any language > (including R!) > >> via its COM object interface. > >> > >> The fact that Excel has both an interactive interface and a > script-based > >> interface whereas R has only a script-based interface puts it ahead, > not > >> behind, R in at least some respects. > > Just one comment here lest we be arguing against a strawman. > R has both interactive and script-based interfaces available and has > for a long time (I remember working with an early port of S in the > 1980's on VMS machines which if you used the old Tek10 graphics driver > (anyone else remember the days of printer()...show() and > tektronics(sp?) dumb terminals?) allowed you to click on a point in > your graph and have it labelled).This hardly qualifies to be in any way comparable to Excel's pervasive all encompassing interactive GUI interface.> > One of the big differences I see between R and Excel is that while > they both have script and gui based interfaces, the gui interfaces > for R (take Rcmdr for example) provide an aid to learning, while > still encouraging the use of command lines, scripts, and functions, > while Excel hides the script interface from all but experts and > encourages non-reproducable clicking.Rcmdr is an excellent package but it is restricted to prewritten sets of functionality. On the other hand, Excel is completely general and will allow you to automatically write scripts that can be massaged based on virtually any interactive operation using its macro recording facility.> > Just because a software package has a capability does not mean much if > the overall design promotes the use of a less desirable feature. IMaybe you are not really familiar with Excel. The scripting capability is very powerful and easier to learn than R.> remember one job where before I came along they were using a > spreedsheet to compute a column of numbers, highlighting and printing > out those numbers, then hand entering these same numbers into a > different spreadsheet.Excel can produce output in many ways and one can copy and paste from it too. This is not a valid criticism of Excel. Excel is excellent at interacting with other applications and the operating system.> Dr. Burns has already posted the url that > contains another of my experiances with intelligent people getting > caught in one of Excel's traps (and yes Excel has a feature that would > have prevented the trap, but Excel convieniently hid the need to use > it).One can get caught in many traps with R too and, in fact, just about any piece of complex software will have some items that require experience before you figure out the workarounds.