thr3ads.net - similar to: "Data aggregation question"

Displaying 20 results from an estimated 10000 matches similar to: "Data aggregation question"

2010 Jun 25

Lattice plotting question

Hi all, I'm working on some plots using lattice (R 2.10.1), and have entered the polish phase. I've produced a satisfactory pair of xyplots ( http://imgur.com/EyXGi.png), but would like to align the y-axes of the top and bottom plots. I assume that I need to adjust axis padding or something, but I can't figure this one out. Thanks for any help! Dave -- Post-doctoral Fellow

Odd graphics output problem

2010 Sep 16

Odd graphics output problem

Hi all, I'm having trouble saving graphics output from within a loop, and I can't figure out a solution. I'd like to produce and save lots of individual plots for inspection, so I set up the following script: library( lattice ) wd = "~/Documents/PPM/" ppm = read.table( paste( wd, "ppm_summary.txt", sep = "" ), sep = "\t", header = TRUE )

how to make aggregation in R ?

2009 Mar 20

how to make aggregation in R ?

Hi, I am trying to aggregate the sum of my test data.frame as follow: testDF <- data.frame(v1 = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "d", "d", "d", "d",

aggregation with two statistical functions - mean and variance

2007 Dec 24

aggregation with two statistical functions - mean and variance

Hello, using the syntax aggregate(daten[,c(3,4)], list(A,B), mean) I'm getting the following data.frame: A B C D 1 35 1 6.16000 5 2 47 1 31.24333 20 3 54 1 26.81773 2 4 3 2 12.99000 7 5 4 2 6.49000 1 C and D are both means. But now I want to

how to use "..."

2013 Jan 17

how to use "..."

Dear users, I'm trying to learn how to use the "...". I have written a function (simplified here) that uses doBy::summaryBy(): # 'dat' is a data.frame from which the aggregation is computed # 'vec_cat' is a integer vector defining which columns of the data.frame should be use on the right side of the formula # 'stat_fun' is the function that will be run to

multiple 2 by 2 crosstabulations?

2010 May 20

multiple 2 by 2 crosstabulations?

Hello, I have a dataframe (var_1, var_2, ..., var_n) and I would like to export summary statistics to Latex in the form of a table. I want specific summary statistics by crossing numerous variables 2x2 AT ONCE. In each cell I would like sometimes to have the median (Q1 - Q3), or frequency and proportion, etc. CrossTable, xtab, etc... do not allow for multiple 2 by 2 crosstabulation. The table

column selection for aggregate()

2010 Jan 18

column selection for aggregate()

Hi everybody! I'm working on R today so I have a lot of questions (you may have noticed that it's the 3rd email today). I'm new on R, so please excuse the "spam"! I have a dataset "ssfa" with many rows and the column names are: > names(ssfa) [1] "SPECSHOR" "BONE" "TO_POS" "MEASUREM" "FACETTE"

summaryBy(): Is it the best option?

2006 Dec 05

summaryBy(): Is it the best option?

Hi, since I have quite large tables and the processing takes quite a while I am curious if I can improve the performance of this aggregation somehow: At the moment I am using summaryBy from the doBy package under R 2.4.0, Win2K. summaryBy(soc_s6aq5 + soc_s6aq7 + soc_s6aq9 + soc_s6aq11 ~ hh + comgroup,soc6a,postfix=c("","","",""),FUN=sum, na.rm=T) The

aggregate(...) with multiple functions

2010 Jul 16

aggregate(...) with multiple functions

hi all - i'm just wondering what sort of code people write to essentially performa an aggregate call, but with different functions being applied to the various columns. for example, if i have a data frame x and would like to marginalize by a factor f for the rows, but apply mean() to col1 and median() to col2. if i wanted to apply mean() to both columns, i would call: aggregate(x, list(f),

Convert the output of by() to a data frame

2011 Feb 08

Convert the output of by() to a data frame

I'd like to summarize several variables in a data frame, for multiple groups, and store the results in a data.frame. To do so, I'm using by(). For example: df<-data.frame(a=1:10,b=11:20,c=21:30,grp1=c("x","y"),grp2=c("x","y"),grp3=c("x","y")) dfsum<-by(df[c("a","b","c")],

Using nrow with summaryBy

2010 Mar 17

Using nrow with summaryBy

Hello Everyone- I'm calculating summary statistics on a dataset (~4000 records, observations are not uniformly distributed) using summaryBy and trying to add a column with the number of observations to the output as well. What occurs to me is to use nrow(), but this doesn't appear to be working I'm able to replicate the same results with an example from the summaryBy docs:

Using summaryBy with weighted data

2011 Jan 17

Using summaryBy with weighted data

Dear Soren and R users: I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: library(doBy) ## make up some data response = rnorm(100) group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) weights = runif(100, 0, 1) mydata = data.frame(response,group,weights) ## run summaryBy without weights:

Summarize by two-column factor, retaining original factors

2006 Feb 24

Summarize by two-column factor, retaining original factors

I am having trouble doing the following. I have a data.frame like this, where x and y are a variable that I want to do calculations on: Name Year x y ab 2001 15 3 ab 2001 10 2 ab 2002 12 8 ab 2003 7 10 dv 2002 10 15 dv 2002 3 2 dv 2003 1 15 Before I do all the other things I need to do with this data, I need to summarize or collapse the data by name and year. I've

summaryBy: transformed variable on RHS of formula?

2012 Apr 02

summaryBy: transformed variable on RHS of formula?

Hi Folks, I'm trying to cut my data inside the summaryBy function. Perhaps formulas don't work that way? I'd like to avoid adding another column if possible, but if I have to, I have to. Any ideas? Thanks, Allie require(doBy) df = dataframe(a <- rnorm(100), b <-rnorm(100)) summaryBy(a ~ cut(b,c(-100,-1,1,100)), data=df) # preferred solution, but it throws an

Counting things

2009 Aug 05

Counting things

I've completed an experiment and want to summarize the results. There are two things I like to create. 1) A simple count of things from the data.frame with predictions 1a) Number of predictions with probability greater than x 1b) Number of predictions with probability greater than x that are really true In SQL, this would be, "Select count(predictions) from

Aggregation using list with Hmisc summarize function

2006 Dec 28

Aggregation using list with Hmisc summarize function

Hi All, I'm using the Hmisc summarize function and used list instead of llist to provide the by variables. It generated an error message. Is this a bug, or do I misunderstand how Hmisc works with lists? The program below demonstrates the error message. Thanks, Bob x<-1:8 group <- c(1,1,1,1,2,2,2,2) gender<- c(1,2,1,2,1,2,1,2) mydata<-data.frame(x,group,gender)

Any way to apply TWO functions with tapply()?

2010 May 07

Any way to apply TWO functions with tapply()?

I need to compute the mean and the standard deviation of a data set and would like to have the results in one table/data frame. I call tapply() two times and do then merge the resulting tables to have them all in one table. Is there any way to tell tapply() to use the functions mean and sd within one function call? Something like tapply(data$response, list(data$targets, data$conditions), c(mean,

aggregate(), with multiple functions in FUN?

2008 May 16

aggregate(), with multiple functions in FUN?

I've got a data frame having numerical data by zip code: ZIP DATA 94111 12135.545 93105 321354.65654 94111 545.555 94706 558858.66 ... ... I'm using this function to group records by ZIP and calculate the median of DATA: aggregate(d$DATA, list(Zip = d$ZIP), FUN=median, na.rm=T) but what I really want to do is to calculate several statistics (median,

Suggestion to extend aggregate() to return multiple and/or named values

2007 Jul 13

Suggestion to extend aggregate() to return multiple and/or named values

Hi all, This is my first post to the developers list. As I understand it, aggregate() currently repeats a function across cells in a dataframe but is only able to handle functions with single value returns. Aggregate() also lacks the ability to retain the names given to the returned value. I've created an agg() function (pasted below) that is apparently backwards compatible (i.e.

Aggregation of data frame with calculations of proportions

2007 Jun 26

Aggregation of data frame with calculations of proportions

Dear all, I have been stuck on this problem, am rather struggling and would appreciate some advice if anyone can help. I apologise if this is a bit long-winded, but I've tried to limit it to the bare essentials, but don't know how to make it more generic! I have some slightly odd real world data that I'm looking at representing number of positive diagnoses for different diseases,

similar to: Data aggregation question