thr3ads.net - similar to: "function which can apply a function by a grouping variable and also hand over an additional variable, e.g. a weight"

Displaying 20 results from an estimated 4000 matches similar to: "function which can apply a function by a grouping variable and also hand over an additional variable, e.g. a weight"

summaryBy(): Is it the best option?

2006 Dec 05

summaryBy(): Is it the best option?

Hi, since I have quite large tables and the processing takes quite a while I am curious if I can improve the performance of this aggregation somehow: At the moment I am using summaryBy from the doBy package under R 2.4.0, Win2K. summaryBy(soc_s6aq5 + soc_s6aq7 + soc_s6aq9 + soc_s6aq11 ~ hh + comgroup,soc6a,postfix=c("","","",""),FUN=sum, na.rm=T) The

Problem mit summaryBy: Group sums gives me "incorrectly" zero for one variable

2007 Aug 20

Problem mit summaryBy: Group sums gives me "incorrectly" zero for one variable

Hi, first I want to thank all of you for the quick aid which is provided here on the list during all times. Thanks a lot for that! Then, I have a problem using summaryBy which most probably is a problem of wrong use by me or the like: I use this command: summaryBy(total+total.inf~gr, aE, FUN=sum) where aE is a > str(aE) 'data.frame': 127880 obs. of 16 variables: $ gr

Using summaryBy with weighted data

2011 Jan 17

Using summaryBy with weighted data

Dear Soren and R users: I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: library(doBy) ## make up some data response = rnorm(100) group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) weights = runif(100, 0, 1) mydata = data.frame(response,group,weights) ## run summaryBy without weights:

Any way to apply TWO functions with tapply()?

2010 May 07

Any way to apply TWO functions with tapply()?

I need to compute the mean and the standard deviation of a data set and would like to have the results in one table/data frame. I call tapply() two times and do then merge the resulting tables to have them all in one table. Is there any way to tell tapply() to use the functions mean and sd within one function call? Something like tapply(data$response, list(data$targets, data$conditions), c(mean,

Error in unsplit() with tibbles

2020 Nov 21

Error in unsplit() with tibbles

Hello, using the `unsplit()` function with tibbles currently leads to the following error: > mtcars_tb <- as_tibble(mtcars, rownames = NULL) > s <- split(mtcars_tb, mtcars_tb$gear) > unsplit(s, mtcars_tb$gear) Error: Must subset rows with a valid subscript vector. ? Logical subscripts must match the size of the indexed input. x Input has size 15 but subscript `rep(NA, len)` has

Using unsplit - unsplit does not seem to reverse the effect of split

2005 Sep 27

Using unsplit - unsplit does not seem to reverse the effect of split

In data OME in MASS I would like to extract the first 5 observations per subject (=ID). So I do library(MASS) OMEsub <- split(OME, OME$ID) OMEsub <- lapply(OMEsub,function(x)x[1:5,]) unsplit(OMEsub, OME$ID) - which results in [[1]] [1] 1 1 1 1 1 [[2]] [1] 30 30 30 30 30 [[3]] [1] low low low low low Levels: N/A high low [[4]] [1] 35 35 40 40 45 [[5]] [1] coherent incoherent coherent

Error in unsplit() with tibbles

2020 Nov 21

Error in unsplit() with tibbles

I get the sentiment, but this is really just bad coding (on my own part, I suspect), so we might as well just fix it... -pd > On 21 Nov 2020, at 17:42 , Marc Schwartz via R-devel <r-devel at r-project.org> wrote: > > >> On Nov 21, 2020, at 10:55 AM, Mario Annau <mario.annau at gmail.com> wrote: >> >> Hello, >> >> using the `unsplit()`

NAs in unsplit factor

2006 Jun 08

NAs in unsplit factor

R-devel, Below is a simple example calling split and unsplit on a numeric vector of length 2 where 'f' is c(1,NA). > unsplit(split(c(1,2), c(1,NA)), c(1,NA)) [1] 1 0 I noticed that the call to vector in unsplit gives us 0 as the 2nd element of the result. Is this the intended result, as opposed to NA? Thanks for your help, Jeff -- Jeff Enos Kane Capital Management jeff at

Using nrow with summaryBy

2010 Mar 17

Using nrow with summaryBy

Hello Everyone- I'm calculating summary statistics on a dataset (~4000 records, observations are not uniformly distributed) using summaryBy and trying to add a column with the number of observations to the output as well. What occurs to me is to use nrow(), but this doesn't appear to be working I'm able to replicate the same results with an example from the summaryBy docs:

Using split and then unsplit

2010 Apr 19

Using split and then unsplit

Hello everyone, I use the split function splitting with the f function on a 3 columns and more than 100 000 rows data frame. Once it's split I have a list of data frames still with 3 columns and n rows. I manipulate those list elements and get a list of data frames still with 3 columns but less rows. So when I unsplit it, I get an error as I use the same factor function I used to split ( f in

[R] bug in unsplit()? (PR#1843)

2002 Jul 28

[R] bug in unsplit()? (PR#1843)

Hedderik van Rijn <hedderik@cmu.edu> writes: > If the second argument to unsplit is not a simple vector (but a "list > containing multiple lists"), the function seems to have some problems. > > Given a slight modification of the examples in help(split): > > > xg <- split(x,list(g1=g,g2=g)) > > unsplit(xg,list(g1=g,g2=g)) > [1] -0.7877109

summaryBy: transformed variable on RHS of formula?

2012 Apr 02

summaryBy: transformed variable on RHS of formula?

Hi Folks, I'm trying to cut my data inside the summaryBy function. Perhaps formulas don't work that way? I'd like to avoid adding another column if possible, but if I have to, I have to. Any ideas? Thanks, Allie require(doBy) df = dataframe(a <- rnorm(100), b <-rnorm(100)) summaryBy(a ~ cut(b,c(-100,-1,1,100)), data=df) # preferred solution, but it throws an

how to use "..."

2013 Jan 17

how to use "..."

Dear users, I'm trying to learn how to use the "...". I have written a function (simplified here) that uses doBy::summaryBy(): # 'dat' is a data.frame from which the aggregation is computed # 'vec_cat' is a integer vector defining which columns of the data.frame should be use on the right side of the formula # 'stat_fun' is the function that will be run to

unsplit list of data.frames with one column

2009 May 08

unsplit list of data.frames with one column

Perhaps this is the intended behavior, but I discovered that unsplit throws an error when it tries to set rownames of a variable that has no dimension. This occurs when unsplit is passed a list of data.frames that have only a single column. An example: df <- data.frame(letters[seq(25)]) fac <- rep(seq(5), 5) unsplit(split(df, fac), fac) For reference, I'm using R version 2.9.0

getting percentiles by factor

2011 Mar 10

getting percentiles by factor

Hello, I'm trying to get percentiles (PERCENTRANK for excel users) by factor in the following data.frame: myExample <- data.frame(Ret=seq(-2, 2.5, by=0.5),PE=seq(10,19),Sectors=rep(c("Financial","Industrial"),5)) myExample <- na.omit(myExample) Thanks to Patrick I I managed to put together the following lines which does it for the "Ret" column: myecdf

Problems with unsplit()

2011 May 19

Problems with unsplit()

Hi everyone, I have already used split() and unsplit() in data frames without problems, but now I’m applying these functions to other data and when using unsplit() I have received the following message: Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "2", "3", "4", : duplicate ''row.names'' are not allowed In

efficiently picking one row from a data frame per unique key

2010 Apr 13

efficiently picking one row from a data frame per unique key

Hello all, I'm trying to transform data frames by grouping the rows by the values in a particular column, ordered by another column, then picking the first row in each group. I'd like to convert a data frame like this: x y z 1 10 20 1 11 19 2 12 18 4 13 17 into one with three rows, like this, where i've discarded one row: x y z 1 1 11 19 2 2 12 18 4 4 13 17 I've got a

Problem in summaryBy

2007 Feb 15

Problem in summaryBy

The R script below gives values of 1 for all minimum values when I use a custom function in summaryBy. I get the correct values when I use FUN=min directly. Any help is much appreciated. The continuous information provided in this forum is fabulous as are the different R packages available. Rene # Simulated simplified data Subj <- rep(1:4, each=6) Analyte <-

Counting observations split by a factor when there are NAs in the data

2006 Jul 10

Counting observations split by a factor when there are NAs in the data

I am a very novice R user, a social scientist (linguist) who is trying to learn to use R after being very familiar with SPSS. Please be kind! My concern: I cannot figure out a way to get an accurate count of observations of one column of data split by a factor when there are NAs in the data. I know how to use commands like tapply and summaryBy to obtain other summary statistics I am interested

Apparent bug in summaryBy (PR#13941)

2009 Sep 04

Apparent bug in summaryBy (PR#13941)

Full_Name: Marc Paterno Version: 2.9.2 OS: Mac OS X 10.5.8 Submission from: (NULL) (99.53.212.55) summaryBy() produces incorrect results when given some data frames. Below is a transcript of a session showing the result, in a data frame with 2 observations of 2 variables. ------------------- thomas:999 paterno$ R --vanilla R version 2.9.2 (2009-08-24) Copyright (C) 2009 The R Foundation for

similar to: function which can apply a function by a grouping variable and also hand over an additional variable, e.g. a weight