thr3ads.net - similar to: "aggregating columns in a data frame in different ways"

Displaying 20 results from an estimated 2000 matches similar to: "aggregating columns in a data frame in different ways"

understanding output of tapply/by cumsum

2010 Dec 07

understanding output of tapply/by cumsum

Dear R-users, I have a dataset with categories and numbers. I would like to compute and add cumulative numbers to the dataset. I do not understand the structure of by(...) or tapply(...) output enough to handle it. Here a small example -------------- d<-expand.grid(a=1:5,b=1:3,c=1:2) d$n = 10 * d$a + d$b +0.1* d$c Sn<-by(d$n,list(d$a,d$c),cumsum) str(Sn) --------- List of 10 $ : num

Aggregating dataset to means/day

2011 Mar 16

Aggregating dataset to means/day

Hi, I have a dataset with many observations some days while only one others. I would like to calculate a mean value per day and then do regression analysis on the means. This is what I have: Year Day Time herring.density 2007 47 10.36 2.2 2007 47 11.50 1.1 2007 47 14.24 1.4 2007 66 9.35 2.5 This is what I want:

summarizing replicates with multiple treatments

2008 Mar 04

summarizing replicates with multiple treatments

I have a dataframe with several different treatment variables, and would like to calculate the mean and standard deviation of the replicates for each day and treatment variable. It seems like it should be easy, but I've only managed to do it for one treatment at a time using subset and tapply. Here is an example dataset: > `exampledata` <- structure(list(day = c(1L, 1L, 1L, 1L, 1L,

Aggregating a data frame (was: Re: new R-user needs help)

2006 Oct 18

Aggregating a data frame (was: Re: new R-user needs help)

Please use an informative subject for sake of the archives. Here are several solutions: aggregate(DF[4:8], DF[2], mean) library(doBy) summaryBy(x1 + x2 + x3 + x4 + x5 ~ name, DF, FUN = mean) # if Exp, name and id columns are factors then this can be reduced to library(doBy) summaryBy(. ~ name, DF, FUN = mean) library(reshape) cast(melt(DF, id = 1:3), name ~ variable, fun = mean) On

aggregate(...) with multiple functions

2010 Jul 16

aggregate(...) with multiple functions

hi all - i'm just wondering what sort of code people write to essentially performa an aggregate call, but with different functions being applied to the various columns. for example, if i have a data frame x and would like to marginalize by a factor f for the rows, but apply mean() to col1 and median() to col2. if i wanted to apply mean() to both columns, i would call: aggregate(x, list(f),

How to calculate means for multiple variables in samples with different sizes

2011 Mar 11

How to calculate means for multiple variables in samples with different sizes

Hello R-helpers: I have data like this: sample replicate height weight age A 1.00 12.0 0.64 6.00 A 2.00 12.2 0.38 6.00 A 3.00 12.4 0.49 6.00 B 1.00 12.7 0.65 4.00 B 2.00 12.8 0.78 5.00 C 1.00 11.9 0.45 6.00 C 2.00 11.84 0.44 2.00 C 3.00 11.43 0.32 3.00 C 4.00 10.24 0.84 4.00 D

Using nrow with summaryBy

2010 Mar 17

Using nrow with summaryBy

Hello Everyone- I'm calculating summary statistics on a dataset (~4000 records, observations are not uniformly distributed) using summaryBy and trying to add a column with the number of observations to the output as well. What occurs to me is to use nrow(), but this doesn't appear to be working I'm able to replicate the same results with an example from the summaryBy docs:

Using summaryBy with weighted data

2011 Jan 17

Using summaryBy with weighted data

Dear Soren and R users: I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: library(doBy) ## make up some data response = rnorm(100) group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) weights = runif(100, 0, 1) mydata = data.frame(response,group,weights) ## run summaryBy without weights:

summaryBy: transformed variable on RHS of formula?

2012 Apr 02

summaryBy: transformed variable on RHS of formula?

Hi Folks, I'm trying to cut my data inside the summaryBy function. Perhaps formulas don't work that way? I'd like to avoid adding another column if possible, but if I have to, I have to. Any ideas? Thanks, Allie require(doBy) df = dataframe(a <- rnorm(100), b <-rnorm(100)) summaryBy(a ~ cut(b,c(-100,-1,1,100)), data=df) # preferred solution, but it throws an

how to use "..."

2013 Jan 17

how to use "..."

Dear users, I'm trying to learn how to use the "...". I have written a function (simplified here) that uses doBy::summaryBy(): # 'dat' is a data.frame from which the aggregation is computed # 'vec_cat' is a integer vector defining which columns of the data.frame should be use on the right side of the formula # 'stat_fun' is the function that will be run to

Data aggregation question

2011 Jul 28

Data aggregation question

Hi all, I'm working with a sizable dataset that I'd like to summarize, but I can't find a tool or function that will do quite what I'd like. Basically, I'd like to summarize the data by fully crossing three variables and getting a count of the number of observations for every level of that 3-way interaction. For example, if factors A, B, and C each have 3 levels (all of

cloud() works but wireframe() is blank

2006 Oct 25

cloud() works but wireframe() is blank

Per the message from Alexander Nervedi, 29 April 2006: > I have to be making a riddiculously silly ommission. > when I run the fillowing i get the cloud plot ok. But I cant figure > out what I am missing out when I call wireframe. > Any help would be appreciated. > x<-runif(100) > y<-rnorm(100) > z<-runif(100) > temp <-data.frame(x,y,z) >

summaryBy(): Is it the best option?

2006 Dec 05

summaryBy(): Is it the best option?

Hi, since I have quite large tables and the processing takes quite a while I am curious if I can improve the performance of this aggregation somehow: At the moment I am using summaryBy from the doBy package under R 2.4.0, Win2K. summaryBy(soc_s6aq5 + soc_s6aq7 + soc_s6aq9 + soc_s6aq11 ~ hh + comgroup,soc6a,postfix=c("","","",""),FUN=sum, na.rm=T) The

Any way to apply TWO functions with tapply()?

2010 May 07

Any way to apply TWO functions with tapply()?

I need to compute the mean and the standard deviation of a data set and would like to have the results in one table/data frame. I call tapply() two times and do then merge the resulting tables to have them all in one table. Is there any way to tell tapply() to use the functions mean and sd within one function call? Something like tapply(data$response, list(data$targets, data$conditions), c(mean,

aggregate(), with multiple functions in FUN?

2008 May 16

aggregate(), with multiple functions in FUN?

I've got a data frame having numerical data by zip code: ZIP DATA 94111 12135.545 93105 321354.65654 94111 545.555 94706 558858.66 ... ... I'm using this function to group records by ZIP and calculate the median of DATA: aggregate(d$DATA, list(Zip = d$ZIP), FUN=median, na.rm=T) but what I really want to do is to calculate several statistics (median,

Suggestion to extend aggregate() to return multiple and/or named values

2007 Jul 13

Suggestion to extend aggregate() to return multiple and/or named values

Hi all, This is my first post to the developers list. As I understand it, aggregate() currently repeats a function across cells in a dataframe but is only able to handle functions with single value returns. Aggregate() also lacks the ability to retain the names given to the returned value. I've created an agg() function (pasted below) that is apparently backwards compatible (i.e.

column selection for aggregate()

2010 Jan 18

column selection for aggregate()

Hi everybody! I'm working on R today so I have a lot of questions (you may have noticed that it's the 3rd email today). I'm new on R, so please excuse the "spam"! I have a dataset "ssfa" with many rows and the column names are: > names(ssfa) [1] "SPECSHOR" "BONE" "TO_POS" "MEASUREM" "FACETTE"

Problem in summaryBy

2007 Feb 15

Problem in summaryBy

The R script below gives values of 1 for all minimum values when I use a custom function in summaryBy. I get the correct values when I use FUN=min directly. Any help is much appreciated. The continuous information provided in this forum is fabulous as are the different R packages available. Rene # Simulated simplified data Subj <- rep(1:4, each=6) Analyte <-

Counting observations split by a factor when there are NAs in the data

2006 Jul 10

Counting observations split by a factor when there are NAs in the data

I am a very novice R user, a social scientist (linguist) who is trying to learn to use R after being very familiar with SPSS. Please be kind! My concern: I cannot figure out a way to get an accurate count of observations of one column of data split by a factor when there are NAs in the data. I know how to use commands like tapply and summaryBy to obtain other summary statistics I am interested

Problem mit summaryBy: Group sums gives me "incorrectly" zero for one variable

2007 Aug 20

Problem mit summaryBy: Group sums gives me "incorrectly" zero for one variable

Hi, first I want to thank all of you for the quick aid which is provided here on the list during all times. Thanks a lot for that! Then, I have a problem using summaryBy which most probably is a problem of wrong use by me or the like: I use this command: summaryBy(total+total.inf~gr, aE, FUN=sum) where aE is a > str(aE) 'data.frame': 127880 obs. of 16 variables: $ gr

similar to: aggregating columns in a data frame in different ways