similar to: aggregate / collapse big data frame efficiently

Displaying 20 results from an estimated 10000 matches similar to: "aggregate / collapse big data frame efficiently"

2013 Jan 11
3
aggregate data.frame based on column class
Hi, When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables. So for example in this data-frame: data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5))) aggregate(data,
2012 Sep 25
1
mean-aggregate – but use unique for factor variables
Hi, I have a data.frame which I want to aggregate. There are some grouping variables and some continuous variables for which I would like to have the mean. However there are also some factor-variables in the data-frame that are not grouping variables and I actually would like to aggregate these variables with the unique() function. Is that possible with the standard aggregate-function? If I
2011 Jul 11
2
best way to aggregate / rearrange data.frame with different data types
Hi, I have a data.frame that looks like this: Subject <- c(rep(1,4), rep(2,4), rep(3,4)) y <- rnorm(12, 3, 2) gender <- c(rep("w",4), rep("m",4), rep("w",4)) comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C",4)) data <- data.frame(Subject,y,gender,comment) data Subject y gender
2003 Aug 29
3
Creating a new table from a set of constraints
Hi Everyone, Here's a silly newbie question. How do I remove unwanted rows from an R table? Say that I read my data as: X <- read.table("mydata.txt") and say that there are columns for age and gender. Call these X[5] and X[10], respectively. Here, X[5] is a column of positive integers and X[10] is binary valued i.e., zero (for male) and one (for female) Now, say that I
2009 Feb 03
1
Collapsing panel data
Dear R-helpers, I've been thinking about this for some time, maybe someone can help. I have a fairly large dataset with thousands of firms, call the a, b, c, etc.. such as [,1] [,2] [1,] "A" 0.5 [2,] "" 0.2 [3,] "" 0.3 [4,] "B" 0.1 [5,] "" 0.9 [6,] "C" 0.4 Or to put it differently two vectors such as y
2011 Feb 22
5
"aggregate" in R
Hi, R users, I'm wondering how I can aggregate data in R with different functions for different columns. For example: x<-rep(1:5,3) y<-cbind(x,a=1:15,b=21:35) y<-data.frame(y) I want to aggregate "a" and "b" in y by "x". With "a", I want to use function "mean"; with "b", I want to use function "sum". I tried:
2010 May 16
2
Box-Cox Transformation: Drastic differences when varying added constants
Dear experts, I tried to learn about Box-Cox-transformation but found the following thing: When I had to add a constant to make all values of the original variable positive, I found that the lambda estimates (box.cox.powers-function) differed dramatically depending on the specific constant chosen. In addition, the correlation between the transformed variable and the original were not 1 (as I
2011 Nov 29
2
aggregate syntax for grouped column means
I am calculating the mean of each column grouped by the variable 'id'. I do this using aggregate, data.table, and plyr. My aggregate results do not match the other two, and I am trying to figure out what is incorrect with my syntax. Any suggestions? Thanks. Here is the data. myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61, 30.59, 30.84, 30.98, 30.79, 30.79,
2007 Sep 13
1
Collapsing data frame; aggregate() or better function?
Hello r-help, I am trying to collapse or aggregate 'some' of a data frame. A very simplified version of my data frame looks like: > tester trip set num sex lfs1 lfs2 1 313 15 5 M 2 3 2 313 15 3 F 1 2 3 313 17 1 M 0 1 4 313 17 2 F 1 1 5 313 17 1 U 1 0 And I want to omit sex from the picture and just get an addition of num,
2000 Nov 06
5
Aggregate
Hello to all, I recently downloaded R to my PC and am enjoying getting acquainted with it. Thank you to everyone involved in the R-project! I am interested in doing a log-linear analysis with R on a data set with dichotomous variables. There are 11 variables (columns) and around 1000 subjects (rows). How do I aggregate my data, i.e. how do I make a new dataset that includes the variable giving
2011 Nov 09
2
algorithm that iteratively drops columns of a data-frame
Dear R-Users, I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion. So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression). The problem is that in every round I use a new subset of my data.frame. However, as soon as I "generate" this
2010 Nov 09
2
Help with Iterator
Dear Experts, The following is my "Iterator". When I try to write a new function with itel, I got error. This is what I have: > supDist<-function(x,y) return(max(abs(x-y))) > > myIterator <- function(xinit,f,data=NULL,eps=1e-6,itmax=5,verbose=FALSE) { + xold<-xinit + itel<-0 + repeat { + xnew<-f(xold,data) + if (verbose) { + cat( +
2012 Mar 14
2
aggregate
Dear all I am having a vector with large length and I would like to ask you if I can aggregate the values by constant sized windows. For example for the following vector, I would like to take 30 points until the end and find their mean. > myData<-seq(1:100000) > > c(mean(myData[1:30]),mean(myData[31:60])) #...and so one until the end [1] 15.5 45.5 I have searched in the R
2007 Jan 26
2
Using functions within functions (environment problems)
Hi everyone, I've been having difficulty writing wrapper functions for some functions where those same functions include other functions with eval() calls where the environment is specified. A very simple example using function lmer from lme4: lmerWrapper <- function(formula, data, family = gaussian, method = c("REML", "ML", "PQL", "Laplace",
2009 Oct 22
2
How to find moving averages within each subgroup of a data frame
Dear all, If I have the following data frame: > set.seed(21) > df1 <- data.frame(col1=c(rep('a',5), rep('b',5), rep('c',5)), col4=rnorm(1:15)) col1 col4 1 a 0.793013171 2 a 0.522251264 3 a 1.746222241 4 a -1.271336123 5 a 2.197389533 6 b 0.433130777 7 b -1.570199630 8 b -0.934905667 9 b 0.063493345 10 b
2013 Oct 31
1
Extracting values from a ecdf (empirical cumulative distribution function) curve
Hi R users, I am a new user, still learning basics of R. Is there anyway to extract y (or x) value for a known x (or y) value from ecdf (empirical cumulative distribution function) curve? Thanks in advance. Mano. [[alternative HTML version deleted]]
2006 Jan 26
2
Prediction when using orthogonal polynomials in regression
Folks, I'm doing fine with using orthogonal polynomials in a regression context: # We will deal with noisy data from the d.g.p. y = sin(x) + e x <- seq(0, 3.141592654, length.out=20) y <- sin(x) + 0.1*rnorm(10) d <- lm(y ~ poly(x, 4)) plot(x, y, type="l"); lines(x, d$fitted.values, col="blue") # Fits great! all.equal(as.numeric(d$coefficients[1] + m
2012 Sep 14
3
aggregate() runs out of memory
I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns). I want to get the result of table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is 24.3G, and no end in sight. both V1 and V2 are characters (not factors). Is there anything I could do to speed this up? Thanks. -- Sam Steingold
2005 Oct 04
3
Problem reading in external data and assigning data.frames within R
Hey there, I apologize if this is an irritatingly simple question ... I'm a new user. I can't understand why R flips the sign of all data values when reading in external text files (tab delimited or csv) with the read.delim or read.csv functions. The signs of data values also seem to be flipped after assigning a new data.frame from within R (xnew <-- edit(data.frame()). What am
2010 Jul 24
2
union data in column
Is there any function/way to merge/unite the following data GENEID col1 col2 col3 col4 G234064 1 0 0 0 G234064 1 0 0 0 G234064 1 0 0 0 G234064 0 1