I'm wondering if anyone can give some basic advice about how to approach a specific task in R. I'm new to R but have used SAS for many years, and while I can muscle through a lot of the code details, I'm unsure of a few things. Specific questions: If I have to perform a set of actions on a group of files, should I use a loop (I feel like I've heard people say to avoid looping in R)? How to get means for "by" groups and subset a files based on those (subset highest and lowest groups)? (I can do this in multiple steps* but wonder what the best, "R way" is to do this.) How to draw cutoff lines at specific points on density plots? How to create a matrix of plots? (Take 4 separate plots and put them into a single graphic.) * Get group means, add means back to file, sort by mean, take first and last groups Feel free to excoriate me if I'm asking for too much help. If possible though, a few words of advice (loops are the best way, just use the "main" parameter to combine plots) would be lovely if you can provide. Thanks! [[alternative HTML version deleted]]
R. Michael Weylandt <michael.weylandt@gmail.com>
2011-Jul-31 21:02 UTC
[R] help with algorithm
Loops, as in any program language, are sometimes unavoidable but this is rarely the case in R. If you have any experience with MATLAB or similar software, you know the importance of working vectorwise and the same principles apply to R. For instance, if I wanted a list of all the squares of numbers, I could write: x = c(1,2,3,4,5,6,7,8,9,10) for (i in 1:10) { temp = x[i] temp = temp^2 print(temp) } but it will be much faster to write print(x^2), which handles the call to "^2" once. More generally, you can work on selected subsets of vectors or matrices using R's powerful "lexical scoping" nature. It's hard to give a great example off the cuff (there are great introductions to R online that can give better examples) but if you get started on a project, we can help turn something written as a loop into a more R-like fashion. The "apply" family of functions will also prove more than capable of replacing loops in many circumstances when you get a little more comfortable with R. I'm not sure what your second question means exactly, but if you have say a matrix (or data frame) like this: X Type 1 A 2 B 3 A 4 B 3 B 3 B 2 A 1 B 1 A 3 A 4 B something like mean(subset(X,Type=="A")) might do what you're asking but I'd be happy to help more if you can give a more concrete example. The layout() command will do this for you. If you ever get stuck on R there are quite a few good resources: first is to simply google "introduction to R" and you'll get some wonderful tutorial PDFs. Then, when you start getting your toes wet, you can use> ? COMMANDto get information about any command directly from the terminal. Finally, looking through the archives of this list or more generally through the r-seek.org engine will help you find more advanced answers. Good luck getting started with R, Michael Weylandt PS -- One command I wish I knew when I got started is the apropos() command: if you have a guess at the name of a function, but can't quite recall it, it will help you find it. On Sun, Jul 31, 2011 at 12:57 PM, r student <studentofr@gmail.com> wrote:> I'm wondering if anyone can give some basic advice about how to approach a > specific task in R. > > I'm new to R but have used SAS for many years, and while I can muscle > through a lot of the code details, I'm unsure of a few things. > > > Specific questions: > > If I have to perform a set of actions on a group of files, should I use a > loop (I feel like I've heard people say to avoid looping in R)? > > How to get means for "by" groups and subset a files based on those (subset > highest and lowest groups)? (I can do this in multiple steps* but wonder > what the best, "R way" is to do this.) > > How to draw cutoff lines at specific points on density plots? > > How to create a matrix of plots? (Take 4 separate plots and put them into > a > single graphic.) > > > * Get group means, add means back to file, sort by mean, take first and > last > groups > > > > Feel free to excoriate me if I'm asking for too much help. If possible > though, a few words of advice (loops are the best way, just use the "main" > parameter to combine plots) would be lovely if you can provide. > > > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi R Student, On Sun, Jul 31, 2011 at 10:57 AM, r student <studentofr at gmail.com> wrote:> I'm wondering if anyone can give some basic advice about how to approach a > specific task in R. > > I'm new to R but have used SAS for many years, and while I can muscle > through a lot of the code details, I'm unsure of a few things. > > Specific questions: > > If I have to perform a set of actions on a group of files, should I use a > loop (I feel like I've heard people say to avoid looping in R)?I think people suggesting avoiding loops in R when they are replacing use of vectorized functions. As an absurdly bad example of using a loop:> s <- 0 > for(i in 1:10) s <- s + (11:20)[i] > s[1] 155> sum(11:20)[1] 155 If you are doing the same computations on totally different datasets, a loop may be pretty reasonable. Also look at the *apply family of functions. Perhaps ?lapply though which is best depends on the task.> > How to get means for "by" groups and subset a files based on those (subset > highest and lowest groups)? ?(I can do this in multiple steps* but wonder > what the best, "R way" is to do this.)Here is one way to get the means by groups: tmp <- with(mtcars, tapply(mpg, cyl, mean)) ## and now subset by it subset(mtcars, mtcars$cyl %in% names(c(which.max(tmp), which.min(tmp))))> > How to draw cutoff lines at specific points on density plots?plot(density(rnorm(100))) abline(v = c(-1, 1)) See ?abline> > How to create a matrix of plots? ?(Take 4 separate plots and put them into a > single graphic.)This depends a bit on the potential complexity of layouts you need. See ?par and ?layout dev.new() # start a new graphics device, just so the different options do not overwrite each other ## and you can compare par(mfrow = c(2, 2)) plot(lm(mpg ~ hp * wt, data = mtcars)) ## layout is more flexible dev.new() layout(matrix(c(1, 2, 3, 1, 4, 3), ncol = 3, byrow = TRUE)) plot(lm(mpg ~ hp * wt, data = mtcars)) Basically the figure region is split into as many equal cells as the matrix, and then cells that have the same number are merged into bigger cells (or at least that is how I conceptualize what is going on). Then they are plotted in order (1, 2, 3, 4) wherever those are.> > > * Get group means, add means back to file, sort by mean, take first and last > groupsdat <- mtcars dat$gm <- with(mtcars, ave(mpg, cyl, FUN = mean)) and see before for subsetting. Hope this helps, Josh> > Feel free to excoriate me if I'm asking for too much help. ?If possible > though, a few words of advice (loops are the best way, just use the "main" > parameter to combine plots) would be lovely if you can provide. > > Thanks! > > ? ? ? ?[[alternative HTML version deleted]] > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
I can help with a few questions: Q) How to get means for "by" groups and subset a files based on those (subset highest and lowest groups)? (I can do this in multiple steps* but wonder what the best, "R way" is to do this.) A) aggregate Q) How to create a matrix of plots? (Take 4 separate plots and put them into a single graphic.) A) I am not sure exactly what you want to do. The mfcol or mfrow parameter of the par function might do what you want. The coplot function might also. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> r student <studentofr at gmail.com> 7/31/2011 1:57 PM >>>I'm wondering if anyone can give some basic advice about how to approach a specific task in R. I'm new to R but have used SAS for many years, and while I can muscle through a lot of the code details, I'm unsure of a few things. Specific questions: If I have to perform a set of actions on a group of files, should I use a loop (I feel like I've heard people say to avoid looping in R)? How to get means for "by" groups and subset a files based on those (subset highest and lowest groups)? (I can do this in multiple steps* but wonder what the best, "R way" is to do this.) How to draw cutoff lines at specific points on density plots? How to create a matrix of plots? (Take 4 separate plots and put them into a single graphic.) * Get group means, add means back to file, sort by mean, take first and last groups Feel free to excoriate me if I'm asking for too much help. If possible though, a few words of advice (loops are the best way, just use the "main" parameter to combine plots) would be lovely if you can provide. Thanks! [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}}
On 07/31/2011 05:57 PM, r student wrote:> I'm wondering if anyone can give some basic advice about how to approach a > specific task in R. > > I'm new to R but have used SAS for many years, and while I can muscle > through a lot of the code details, I'm unsure of a few things. > > > Specific questions: > > If I have to perform a set of actions on a group of files, should I use a > loop (I feel like I've heard people say to avoid looping in R)?Hi, Looping over several files is best done using the apply family of functions. Especially the llply, ldply and ddply functions from the plyr package I use a lot for processing. An example of looping over files and recombining the results would look something like: library(plyr) listoffiles = list.files("/where/the/files/are") combinedResult = ldply(listoffiles, function(filename) { bla = read.table(filename) ... now maybe do some stuff with it... return(result) # Note that result is a data.frame # Can contain e.g. summary stats of bla }) ldply will automatically combine the result of the function call in an efficient manner. It can take some time to get the hang of these things, but I love working with them when processing data.> How to get means for "by" groups and subset a files based on those (subset > highest and lowest groups)? (I can do this in multiple steps* but wonder > what the best, "R way" is to do this.)when your data.frame has the form and is called dat: value by 1 A 5 A 3 B etc You can use ddply like this to get the mean value per category in 'by': ddply(dat, .(by), summarise, m = mean(value))> How to draw cutoff lines at specific points on density plots? > > How to create a matrix of plots? (Take 4 separate plots and put them into a > single graphic.)I really like the ggplot2 package, this provides drawing several plots using a special syntax construct (no need to manually subdivide the canvas nor keep the axis of the plots equal manually). Take a look at the website of ggplot2, specifically look at the examples given for the facet_wrap and facet_grid functions. cheers, Paul> > * Get group means, add means back to file, sort by mean, take first and last > groups > > > > Feel free to excoriate me if I'm asking for too much help. If possible > though, a few words of advice (loops are the best way, just use the "main" > parameter to combine plots) would be lovely if you can provide. > > > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770