Hi group, To mention it in advance, I am an R newbie, and most likely, my question is more a mix of smaller, simpler tasks. Anyway, I got mixed up between by, select, aggregate, lapply etc. My problem is as follows : I have read data in and transformed them into a matrix for no special reason so far. This matrix contains a column with regard to which I would like to group, i.e. one realisation specifies one group. Neither the number of occurences nor the value of these realisations is known in advance, which seems to be the mayor problem. For each group separately then, I would like to compute some aggregation function, namely the sum of a fraction of two columns. These sums should be kept in form of another vector. My two questions are then - Which object type (matrix, dataframe, list) lends itself to such a problem? - Do I have to create different objects for the groups, or can I compute the vector of sums directly? And how? Thanks in advance Alexander Hener -- GMX - Die Kommunikationsplattform im Internet. http://www.gmx.net -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 7.2.2002 15:03 Uhr, alexander.hener at gmx.de wrote:> - Which object type (matrix, dataframe, list) lends itself to such a > problem?It's probably easiest with a dataframe, but you can also use a matrix> - Do I have to create different objects for the groups, or can I compute the > vector of sums directly? And how?Do it directly, by all means. You can use any of tapply(), by() or aggregate():> testdata <- data.frame(a=factor(c(rep('a',4), rep('b',6))), b=rnorm(10)) > testdataa b 1 a 0.23158790 2 a -0.38852120 [snip] 9 b -1.81645407 10 b -0.44034004> tapply(testdata$b, testdata$a, sum)a b 1.282057 1.511260> by(testdata$b,testdata$a,sum)INDICES: a [1] 1.282057 ------------------------------------------------------------ INDICES: b [1] 1.51126> aggregate(testdata$b,list(testdata$a),sum)Group.1 x 1 a 1.282057 2 b 1.511260 See the functions' help texts and examples for further information. For me, tapply() does all I need. Cheers Kaspar Pflugshaupt -- Kaspar Pflugshaupt Geobotanisches Institut Zuerichbergstr. 38 CH-8044 Zuerich Tel. ++41 1 632 43 19 Fax ++41 1 632 12 15 mailto:pflugshaupt at geobot.umnw.ethz.ch privat:pflugshaupt at mails.ch http://www.geobot.umnw.ethz.ch -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Generally, the data frame is the most useful object for storing data. It allows each column to have a different type (factor, numeric, ...). This is the default object type returned from read.table(), read.csv(), etc. If you have your data in a data file named "data.txt" in the format Group Value1 Value2 A 1 2 A 1 3 B 2 3 B 1 3 C 1 1 ... you can read it into R with data <- read.table("data.txt",header=T) Now, to get the summary you want, you should 1) create a function to compute the summary for a data.frame containing only the data for one group. Something like compute.summary <- function(x) sum( x$Value1 / x$Value2 ) 2) Use 'split' to break the data frame into one chunk per group, and 'sapply' to call your function on each chunk: tmp <- split( data, data$Group ) results <- sapply( tmp, compute.summary ) You will probably want to look at the help pages for read.table, split, and sapply. You should also (if you haven't already) picked up the manual 'An Introduction to R' from http://cran.r-project.org/manuals.html -Greg> -----Original Message----- > From: alexander.hener at gmx.de [mailto:alexander.hener at gmx.de] > Sent: Thursday, February 07, 2002 9:03 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Grouping and Computing > > > Hi group, > > To mention it in advance, I am an R newbie, and most likely, > my question is > more a mix of smaller, simpler tasks. Anyway, I got mixed up > between by, > select, aggregate, lapply etc. > My problem is as follows : > > I have read data in and transformed them into a matrix for no > special reason > so far. This matrix contains a column with regard to which I > would like to > group, i.e. one realisation specifies one group. Neither the number of > occurences nor the value of these realisations is known in > advance, which seems to > be the mayor problem. For each group separately then, I would > like to compute > some aggregation function, namely the sum of a fraction of > two columns. These > sums should be kept in form of another vector. > > My two questions are then > > - Which object type (matrix, dataframe, list) lends itself to such a > problem? > - Do I have to create different objects for the groups, or > can I compute the > vector of sums directly? And how? > > Thanks in advance > > Alexander Hener > > -- > GMX - Die Kommunikationsplattform im Internet. > http://www.gmx.net > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hello Alexander, the function you are looking for is tapply and it works with dataframes and matrixs. here a small example d <- data.frame(group=c(1,4,5,4,5,2,1),value=c(1,54,2,6,87,4,6)) tapply(d$value,d$group,mean)#arguments: the datas, the groups, the function will produce 1 2 4 5 3.5 4.0 30.0 44.5 you can also use factors to define the group, this has the advantage, that you can give them a real name d <- data.frame(group=c('John','Jack','Joan','Jack','Jack','Joan','John'),value=c(1,54,2,6,87,4,6)) tapply(d$value,d$group,mean) will produce Jack Joan John 49.0 3.0 3.5 alexander.hener at gmx.de wrote:> > Hi group, > > To mention it in advance, I am an R newbie, and most likely, my question is > more a mix of smaller, simpler tasks. Anyway, I got mixed up between by, > select, aggregate, lapply etc. > My problem is as follows : > > I have read data in and transformed them into a matrix for no special reason > so far. This matrix contains a column with regard to which I would like to > group, i.e. one realisation specifies one group. Neither the number of > occurences nor the value of these realisations is known in advance, which seems to > be the mayor problem. For each group separately then, I would like to compute > some aggregation function, namely the sum of a fraction of two columns. These > sums should be kept in form of another vector. > > My two questions are then > > - Which object type (matrix, dataframe, list) lends itself to such a > problem? > - Do I have to create different objects for the groups, or can I compute the > vector of sums directly? And how? > > Thanks in advance > > Alexander Hener > > -- > GMX - Die Kommunikationsplattform im Internet. > http://www.gmx.net > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Joerg Maeder .:|:||:..:.||.:: maeder at atmos.umnw.ethz.ch Tel: +41 1 633 36 25 .:|:||:..:.||.:: http://www.iac.ethz.ch/staff/maeder PhD student at INSTITUTE FOR ATMOSPHERIC AND CLIMATE SCIENCE (IACETH) ETH Z?RICH Switzerland -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
hi group, thanks to Gregory, James, Joerg & Kaspar (in alphabetical order) for their quick and friendly answers. Maybe there is someone out there with a related problem, so here is a short summary in form of an FAQ: How do I apply functions to certain rows of a matrics or a dataframe, in dependence of certain (other) attributes? For example, compute separately average weights for men and women. One can use the function "tapply()", which allows to apply a function (mean) to attribute X (weight) ordered by attribute Z (gender): "tapply(X,Z,mean)" This function gives a list, where the results (means) are given separated by the attributed Z. Are there any other functions to do this? Yes, for example functions like "by()" or "aggregate()" => "by(X,Z,mean)" "aggregate(X,Z,mean)". But "tapply()" seems to be the most popular, and one should try to keep it simple. If I use a complicated function, how can I avoid confusion or simply increasy the clearity? First, define the function separately: "complicated.function <- function(x) ..." Second, use "split()" to separate the matrics or dataframe called data into groups according to attribute Z: "groups <- split(data,data$Z)" Finally, use "sapply()" to apply the function to the different groups: "sapply(groups,complicated.function)" I hope this helps, but I realise that without much experience, like in my case, it can first be disturbing to have such an abundance of possibilities to "do something", and second that it is very difficult to treat a topic separately. For example, there would be much to say about lists in this context. CU, Alexander Hener. -- GMX - Die Kommunikationsplattform im Internet. http://www.gmx.net -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._