similar to: Apparent bug in summaryBy (PR#13941)

Displaying 20 results from an estimated 2000 matches similar to: "Apparent bug in summaryBy (PR#13941)"

2010 Mar 17
2
Using nrow with summaryBy
Hello Everyone- I'm calculating summary statistics on a dataset (~4000 records, observations are not uniformly distributed) using summaryBy and trying to add a column with the number of observations to the output as well. What occurs to me is to use nrow(), but this doesn't appear to be working I'm able to replicate the same results with an example from the summaryBy docs:
2012 Apr 02
2
summaryBy: transformed variable on RHS of formula?
Hi Folks, I'm trying to cut my data inside the summaryBy function. Perhaps formulas don't work that way? I'd like to avoid adding another column if possible, but if I have to, I have to. Any ideas? Thanks, Allie require(doBy) df = dataframe(a <- rnorm(100), b <-rnorm(100)) summaryBy(a ~ cut(b,c(-100,-1,1,100)), data=df) # preferred solution, but it throws an
2011 Jan 17
2
Using summaryBy with weighted data
Dear Soren and R users: I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: library(doBy) ## make up some data response = rnorm(100) group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) weights = runif(100, 0, 1) mydata = data.frame(response,group,weights) ## run summaryBy without weights:
2006 Dec 05
1
summaryBy(): Is it the best option?
Hi, since I have quite large tables and the processing takes quite a while I am curious if I can improve the performance of this aggregation somehow: At the moment I am using summaryBy from the doBy package under R 2.4.0, Win2K. summaryBy(soc_s6aq5 + soc_s6aq7 + soc_s6aq9 + soc_s6aq11 ~ hh + comgroup,soc6a,postfix=c("","","",""),FUN=sum, na.rm=T) The
2007 Feb 15
1
Problem in summaryBy
The R script below gives values of 1 for all minimum values when I use a custom function in summaryBy. I get the correct values when I use FUN=min directly. Any help is much appreciated. The continuous information provided in this forum is fabulous as are the different R packages available. Rene # Simulated simplified data Subj <- rep(1:4, each=6) Analyte <-
2013 Jan 17
3
how to use "..."
Dear users, I'm trying to learn how to use the "...". I have written a function (simplified here) that uses doBy::summaryBy(): # 'dat' is a data.frame from which the aggregation is computed # 'vec_cat' is a integer vector defining which columns of the data.frame should be use on the right side of the formula # 'stat_fun' is the function that will be run to
2009 Sep 10
2
R 2.9.2 memory max - object vector size
Me: Win XP 4 gig ram R 2.9.2 library(foreign) # to read/write SPSS files library(doBy) # for summaryBy library(RODBC) setwd("C:\\Documents and Settings\\............00909BR") gc() memory.limit(size=4000) ## PROBLEM: I have memory limit problems. R and otherwise. My dataframes for merging or subsetting are about 300k to 900k records. I've had errors such as vector size too large.
2007 Aug 20
1
Problem mit summaryBy: Group sums gives me "incorrectly" zero for one variable
Hi, first I want to thank all of you for the quick aid which is provided here on the list during all times. Thanks a lot for that! Then, I have a problem using summaryBy which most probably is a problem of wrong use by me or the like: I use this command: summaryBy(total+total.inf~gr, aE, FUN=sum) where aE is a > str(aE) 'data.frame': 127880 obs. of 16 variables: $ gr
2012 May 15
0
Indexing in summaryBy
I'm trying to use a self-written function with the summaryBy function (doBy package). I have lots of data from Monte Carlo experiments comparing different estimators across different (combinations of) parameter values, similar to the following form: colnames(mydata) <- c("X", "b0", "b1", # parameter combination, corresponding (true) parameter values
2010 Feb 08
1
Follow-up Question: data frames; matching/merging
Wow.. thanks for the deluge of responses! Aggregate seems like the way to go here. But, suppose that instead of integers in column V2, I actually have dates (and instead of keeping the minimum integer, I want to keep the earliest date): > df =
2007 Aug 31
3
data frame row manipulation
Hello, struggling with the very basic needs... :( any help appreciated. #using the package doBY #who drinks how much beer per day and therefor cannot calculate rowise maxvals evaluation=data.frame(date=c(1,2,3,4,5,6,7,8,9), name=c("Michael","Steve","Bob", "Michael","Steve","Bob","Michael","Steve","Bob"),
2010 May 07
4
Any way to apply TWO functions with tapply()?
I need to compute the mean and the standard deviation of a data set and would like to have the results in one table/data frame. I call tapply() two times and do then merge the resulting tables to have them all in one table. Is there any way to tell tapply() to use the functions mean and sd within one function call? Something like tapply(data$response, list(data$targets, data$conditions), c(mean,
2011 Nov 18
1
couting events by subject with "black out" windows
I large datset that includes subjects(ID), Dates and events that need to be counted.  Not every date includes an event, and I need to only count one event per 30days, per subject.  So in essence, I need to create a 30-day "black out" period during which time an event cannot be "counted" for each subject.  The reason is that a rule has been set up, whereby a subject can only be
2007 Dec 26
1
data.frame - how to calculate the number of rows
Hello, it seems to be a simple problem, but I couldn't find an answer in the archiv. (I think, it must has something to do with the group-select, like in php) I've the following data.frame: A B C 1 3 6 5 2 4 4 20 3 5 8 2 I want to get the number of the
2007 Aug 27
1
Column naming mystery
Hi, I hope somebody could help me explain what seems mysterious to me? I use this line on a dataframe ae: summaryBy(total_inflated+total~gr1, data=ae, FUN=sum, na.rm=T) and it returns 3 columns as expected and columns "gr1" and "total_inflated.sum"are correct but the "total.sum" column consists of only zeros which is not correct. The same happens when I rename the
2005 Feb 01
2
How to write a new "top-level" Trellis/lattice function?
Hello, I am trying to write a new "top level" Trellis/lattice function. By "top-level", I mean a function like 'xyplot', 'histogram', 'bwplot', etc. These functions all call 'trellis.skeleton', which I am unable to call; an attempt to invoke the function that does so yields the error message: ----- Error in do.call("trellis.skeleton",
2010 Jul 16
2
aggregate(...) with multiple functions
hi all - i'm just wondering what sort of code people write to essentially performa an aggregate call, but with different functions being applied to the various columns. for example, if i have a data frame x and would like to marginalize by a factor f for the rows, but apply mean() to col1 and median() to col2. if i wanted to apply mean() to both columns, i would call: aggregate(x, list(f),
2011 Feb 01
2
Problems with sample means and standard deviations
An embedded and charset-unspecified text was scrubbed... Name: ei saatavilla URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110201/fe2362c4/attachment.pl>
2011 Feb 01
5
(no subject)
Hello I am trying to find a way to find the max value, for only a subset of a dataframe, depending on how the data is grouped for example, How would I find the maxmium responce, for all the GPR119a condition below: responce,mouce,condition 0.105902,KO,con 0.232018561,KO,con 0.335008375,KO,con 0.387025433,KO,GPR119a 0.576769897,KO,GPR119a 0.645120419,KO,GPR119a 0.2538608,KO,GPR119b
2007 Oct 30
2
flexible processing
Hello, unfortunately, I don't know a better subject. I would like to be very flexible in how to process my data. Assume the following dataset: par1 <- seq(0,1,length.out = 100) par2 <- seq(1,100) fac1 <- factor(rep(c("group1", "group2"), each = 50)) fac2 <- factor(rep(c("group3", "group4", "group5", "group6"), each =