Displaying 20 results from an estimated 1000 matches similar to: "data.frame - how to calculate the number of rows"
2007 Dec 24
1
aggregation with two statistical functions - mean and variance
Hello,
using the syntax
aggregate(daten[,c(3,4)], list(A,B), mean)
I'm getting the following data.frame:
A B C D
1 35 1 6.16000 5
2 47 1 31.24333 20
3 54 1 26.81773 2
4 3 2 12.99000 7
5 4 2 6.49000 1
C and D are both means. But now I want to
2013 Jan 17
3
how to use "..."
Dear users,
I'm trying to learn how to use the "...".
I have written a function (simplified here) that uses doBy::summaryBy():
# 'dat' is a data.frame from which the aggregation is computed
# 'vec_cat' is a integer vector defining which columns of the data.frame
should be use on the right side of the formula
# 'stat_fun' is the function that will be run to
2010 Mar 17
2
Using nrow with summaryBy
Hello Everyone-
I'm calculating summary statistics on a dataset (~4000 records,
observations are not uniformly distributed) using summaryBy and trying
to add a column with the number of observations to the output as well.
What occurs to me is to use nrow(), but this doesn't appear to be working
I'm able to replicate the same results with an example from the
summaryBy docs:
2005 Apr 29
1
na.action
Hi,
I had the following code:
testp <- rcorr(t(datcm1),type = "pearson")
mat1 <- testp[[1]][,] > 0.6
mat2 <- testp[[3]][,] < 0.05
mat3 <- mat1 + mat2
The resulting mat3 (smaller version) matrix looks like:
NA 0 0 0
0 NA 0 NA
0 0 NA 2
0 0 2 NA
To get to the number of times a '2' appears in the rows, I was
2012 Apr 02
2
summaryBy: transformed variable on RHS of formula?
Hi Folks,
I'm trying to cut my data inside the summaryBy function. Perhaps
formulas don't work that way? I'd like to avoid adding another column
if possible, but if I have to, I have to. Any ideas?
Thanks,
Allie
require(doBy)
df = dataframe(a <- rnorm(100), b <-rnorm(100))
summaryBy(a ~ cut(b,c(-100,-1,1,100)), data=df) # preferred
solution, but it throws an
2011 Jan 17
2
Using summaryBy with weighted data
Dear Soren and R users:
I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows:
library(doBy)
## make up some data
response = rnorm(100)
group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
weights = runif(100, 0, 1)
mydata = data.frame(response,group,weights)
## run summaryBy without weights:
2009 Sep 04
1
Apparent bug in summaryBy (PR#13941)
Full_Name: Marc Paterno
Version: 2.9.2
OS: Mac OS X 10.5.8
Submission from: (NULL) (99.53.212.55)
summaryBy() produces incorrect results when given some data frames. Below is a
transcript of a session showing the result, in a data frame with 2 observations
of 2 variables.
-------------------
thomas:999 paterno$ R --vanilla
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for
2012 Jun 05
1
Fwd: --link-dest does not appear to be linking on Cygwin
Hi:
I have attempted to following some instructions to use --link-dest in
order to preserve space for multiple backups. I'm using rsync on
Cygwin with a NAS (ext4) which does support hard-links on the
filesystem. I've written a short program that does attempt to create a
hard-link on this NAS from Cygwin and it does look to be working. If I
run ls -li on the NAS the inodes are the same.
2006 Dec 05
1
summaryBy(): Is it the best option?
Hi,
since I have quite large tables and the processing
takes quite a while I am
curious if I can improve the performance of this
aggregation somehow: At the
moment I am using summaryBy from the doBy package
under R 2.4.0, Win2K.
summaryBy(soc_s6aq5 + soc_s6aq7 + soc_s6aq9 +
soc_s6aq11 ~ hh +
comgroup,soc6a,postfix=c("","","",""),FUN=sum,
na.rm=T)
The
2012 May 15
0
Indexing in summaryBy
I'm trying to use a self-written function with the summaryBy function (doBy
package).
I have lots of data from Monte Carlo experiments comparing different
estimators across different (combinations of) parameter values, similar to
the following form:
colnames(mydata) <- c("X", "b0", "b1", # parameter combination,
corresponding (true) parameter values
2007 Feb 15
1
Problem in summaryBy
The R script below gives values of 1 for all minimum values when I use a
custom function in summaryBy. I get the correct values when I use FUN=min
directly. Any help is much appreciated.
The continuous information provided in this forum is fabulous as are the
different R packages available.
Rene
# Simulated simplified data
Subj <- rep(1:4, each=6)
Analyte <-
2006 Oct 18
0
Aggregating a data frame (was: Re: new R-user needs help)
Please use an informative subject for sake of the archives.
Here are several solutions:
aggregate(DF[4:8], DF[2], mean)
library(doBy)
summaryBy(x1 + x2 + x3 + x4 + x5 ~ name, DF, FUN = mean)
# if Exp, name and id columns are factors then this can be reduced to
library(doBy)
summaryBy(. ~ name, DF, FUN = mean)
library(reshape)
cast(melt(DF, id = 1:3), name ~ variable, fun = mean)
On
2010 Feb 08
1
Follow-up Question: data frames; matching/merging
Wow.. thanks for the deluge of responses!
Aggregate seems like the way to go here.
But, suppose that instead of integers in column V2, I actually have
dates (and instead of keeping the minimum integer, I want to keep the
earliest date):
> df =
2006 Dec 29
0
BradleyTerry "subscript out of bounds"
I don't see the problem with the following... the citations and
baseball data work fine, but my simulated data seems to give
BTm a headache. What am I missing?
---
library(BradleyTerry)
library(doBy)
ng <- 100
players <- factor( sort( c( "jeff", "mike", "paul", "rich" ) ) )
np <- length( players )
p1 <- factor( c( rep( "jeff", ng )
2011 Nov 18
1
couting events by subject with "black out" windows
I large datset that includes subjects(ID), Dates and events that need to be counted. Not every date includes an event, and I need to only count one event per 30days, per subject. So in essence, I need to create a 30-day "black out" period during which time an event cannot be "counted" for each subject. The reason is that a rule has been set up, whereby a subject can only be
2011 Oct 23
0
code review: is it too much to ask?
Hello all,
I really appreciate how helpful the people in this list are. Would it be too much to ask to send a small script to have it peer-reviewed? to make sure I am not making blatant mistakes? The script takes an experiment.dat as input and generates system Throughput using ggplot2. It works now ... [sigh] but I have this nasty feeling that I might be doing something wrong :). Changing
2007 Aug 31
3
data frame row manipulation
Hello,
struggling with the very basic needs... :( any help appreciated.
#using the package doBY
#who drinks how much beer per day and therefor cannot calculate rowise
maxvals
evaluation=data.frame(date=c(1,2,3,4,5,6,7,8,9),
name=c("Michael","Steve","Bob",
"Michael","Steve","Bob","Michael","Steve","Bob"),
2009 Sep 10
2
R 2.9.2 memory max - object vector size
Me:
Win XP
4 gig ram
R 2.9.2
library(foreign) # to read/write SPSS files
library(doBy) # for summaryBy
library(RODBC)
setwd("C:\\Documents and Settings\\............00909BR")
gc()
memory.limit(size=4000)
## PROBLEM:
I have memory limit problems. R and otherwise. My dataframes for
merging or subsetting are about 300k to 900k records.
I've had errors such as vector size too large.
2013 May 17
0
Mean using different group for a real r beginner
Hi,
Try either:
tolerance <- read.csv("http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt")
?aggregate(exposure~male,data=tolerance,mean)
?# male exposure
#1??? 0 1.246667
#2??? 1 1.120000
#or
?library(plyr)
?ddply(tolerance,.(male),summarize,exposure=mean(exposure))
#? male exposure
#1??? 0 1.246667
#2??? 1 1.120000
#or
2011 Aug 30
1
R crash
Dear users,
By running the script below, R crashes systematically at the last
command, namely dev.off(), on Windows 7, but not on Windows XP.
I therefore don't provide a reproducible example and do not really
extract the relevant parts of the script because it has most likely
nothing to do with the script itself. I can do it though if you think it
might be relevant.
R crashes on Windows