thr3ads.net - similar to: "aggregate / collapse big data frame efficiently"

Displaying 20 results from an estimated 10000 matches similar to: "aggregate / collapse big data frame efficiently"

aggregate data.frame based on column class

2013 Jan 11

aggregate data.frame based on column class

Hi, When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables. So for example in this data-frame: data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5))) aggregate(data,

mean-aggregate – but use unique for factor variables

2012 Sep 25

mean-aggregate – but use unique for factor variables

Hi, I have a data.frame which I want to aggregate. There are some grouping variables and some continuous variables for which I would like to have the mean. However there are also some factor-variables in the data-frame that are not grouping variables and I actually would like to aggregate these variables with the unique() function. Is that possible with the standard aggregate-function? If I

best way to aggregate / rearrange data.frame with different data types

2011 Jul 11

best way to aggregate / rearrange data.frame with different data types

Hi, I have a data.frame that looks like this: Subject <- c(rep(1,4), rep(2,4), rep(3,4)) y <- rnorm(12, 3, 2) gender <- c(rep("w",4), rep("m",4), rep("w",4)) comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C",4)) data <- data.frame(Subject,y,gender,comment) data Subject y gender

Creating a new table from a set of constraints

2003 Aug 29

Creating a new table from a set of constraints

Hi Everyone, Here's a silly newbie question. How do I remove unwanted rows from an R table? Say that I read my data as: X <- read.table("mydata.txt") and say that there are columns for age and gender. Call these X[5] and X[10], respectively. Here, X[5] is a column of positive integers and X[10] is binary valued i.e., zero (for male) and one (for female) Now, say that I

Collapsing panel data

2009 Feb 03

Collapsing panel data

Dear R-helpers, I've been thinking about this for some time, maybe someone can help. I have a fairly large dataset with thousands of firms, call the a, b, c, etc.. such as [,1] [,2] [1,] "A" 0.5 [2,] "" 0.2 [3,] "" 0.3 [4,] "B" 0.1 [5,] "" 0.9 [6,] "C" 0.4 Or to put it differently two vectors such as y

"aggregate" in R

2011 Feb 22

"aggregate" in R

Hi, R users, I'm wondering how I can aggregate data in R with different functions for different columns. For example: x<-rep(1:5,3) y<-cbind(x,a=1:15,b=21:35) y<-data.frame(y) I want to aggregate "a" and "b" in y by "x". With "a", I want to use function "mean"; with "b", I want to use function "sum". I tried:

Box-Cox Transformation: Drastic differences when varying added constants

2010 May 16

Box-Cox Transformation: Drastic differences when varying added constants

Dear experts, I tried to learn about Box-Cox-transformation but found the following thing: When I had to add a constant to make all values of the original variable positive, I found that the lambda estimates (box.cox.powers-function) differed dramatically depending on the specific constant chosen. In addition, the correlation between the transformed variable and the original were not 1 (as I

aggregate syntax for grouped column means

2011 Nov 29

aggregate syntax for grouped column means

I am calculating the mean of each column grouped by the variable 'id'. I do this using aggregate, data.table, and plyr. My aggregate results do not match the other two, and I am trying to figure out what is incorrect with my syntax. Any suggestions? Thanks. Here is the data. myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61, 30.59, 30.84, 30.98, 30.79, 30.79,

Collapsing data frame; aggregate() or better function?

2007 Sep 13

Collapsing data frame; aggregate() or better function?

Hello r-help, I am trying to collapse or aggregate 'some' of a data frame. A very simplified version of my data frame looks like: > tester trip set num sex lfs1 lfs2 1 313 15 5 M 2 3 2 313 15 3 F 1 2 3 313 17 1 M 0 1 4 313 17 2 F 1 1 5 313 17 1 U 1 0 And I want to omit sex from the picture and just get an addition of num,

Aggregate

2000 Nov 06

Aggregate

Hello to all, I recently downloaded R to my PC and am enjoying getting acquainted with it. Thank you to everyone involved in the R-project! I am interested in doing a log-linear analysis with R on a data set with dichotomous variables. There are 11 variables (columns) and around 1000 subjects (rows). How do I aggregate my data, i.e. how do I make a new dataset that includes the variable giving

algorithm that iteratively drops columns of a data-frame

2011 Nov 09

algorithm that iteratively drops columns of a data-frame

Dear R-Users, I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion. So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression). The problem is that in every round I use a new subset of my data.frame. However, as soon as I "generate" this

Help with Iterator

2010 Nov 09

Help with Iterator

Dear Experts, The following is my "Iterator". When I try to write a new function with itel, I got error. This is what I have: > supDist<-function(x,y) return(max(abs(x-y))) > > myIterator <- function(xinit,f,data=NULL,eps=1e-6,itmax=5,verbose=FALSE) { + xold<-xinit + itel<-0 + repeat { + xnew<-f(xold,data) + if (verbose) { + cat( +

aggregate

2012 Mar 14

aggregate

Dear all I am having a vector with large length and I would like to ask you if I can aggregate the values by constant sized windows. For example for the following vector, I would like to take 30 points until the end and find their mean. > myData<-seq(1:100000) > > c(mean(myData[1:30]),mean(myData[31:60])) #...and so one until the end [1] 15.5 45.5 I have searched in the R

Using functions within functions (environment problems)

2007 Jan 26

Using functions within functions (environment problems)

Hi everyone, I've been having difficulty writing wrapper functions for some functions where those same functions include other functions with eval() calls where the environment is specified. A very simple example using function lmer from lme4: lmerWrapper <- function(formula, data, family = gaussian, method = c("REML", "ML", "PQL", "Laplace",

How to find moving averages within each subgroup of a data frame

2009 Oct 22

How to find moving averages within each subgroup of a data frame

Dear all, If I have the following data frame: > set.seed(21) > df1 <- data.frame(col1=c(rep('a',5), rep('b',5), rep('c',5)), col4=rnorm(1:15)) col1 col4 1 a 0.793013171 2 a 0.522251264 3 a 1.746222241 4 a -1.271336123 5 a 2.197389533 6 b 0.433130777 7 b -1.570199630 8 b -0.934905667 9 b 0.063493345 10 b

Extracting values from a ecdf (empirical cumulative distribution function) curve

2013 Oct 31

Extracting values from a ecdf (empirical cumulative distribution function) curve

Hi R users, I am a new user, still learning basics of R. Is there anyway to extract y (or x) value for a known x (or y) value from ecdf (empirical cumulative distribution function) curve? Thanks in advance. Mano. [[alternative HTML version deleted]]

Prediction when using orthogonal polynomials in regression

2006 Jan 26

Prediction when using orthogonal polynomials in regression

Folks, I'm doing fine with using orthogonal polynomials in a regression context: # We will deal with noisy data from the d.g.p. y = sin(x) + e x <- seq(0, 3.141592654, length.out=20) y <- sin(x) + 0.1*rnorm(10) d <- lm(y ~ poly(x, 4)) plot(x, y, type="l"); lines(x, d$fitted.values, col="blue") # Fits great! all.equal(as.numeric(d$coefficients[1] + m

aggregate() runs out of memory

2012 Sep 14

aggregate() runs out of memory

I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns). I want to get the result of table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is 24.3G, and no end in sight. both V1 and V2 are characters (not factors). Is there anything I could do to speed this up? Thanks. -- Sam Steingold

Problem reading in external data and assigning data.frames within R

2005 Oct 04

Problem reading in external data and assigning data.frames within R

Hey there, I apologize if this is an irritatingly simple question ... I'm a new user. I can't understand why R flips the sign of all data values when reading in external text files (tab delimited or csv) with the read.delim or read.csv functions. The signs of data values also seem to be flipped after assigning a new data.frame from within R (xnew <-- edit(data.frame()). What am

union data in column

2010 Jul 24

union data in column

Is there any function/way to merge/unite the following data GENEID col1 col2 col3 col4 G234064 1 0 0 0 G234064 1 0 0 0 G234064 1 0 0 0 G234064 0 1

similar to: aggregate / collapse big data frame efficiently