thr3ads.net - similar to: "Follow-up Question: data frames; matching/merging"

Displaying 20 results from an estimated 8000 matches similar to: "Follow-up Question: data frames; matching/merging"

2010 Feb 08

data frames; matching/merging

Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the "merge" function?). I figured somebody out there might've already figured this out: I have

how to use "..."

2013 Jan 17

how to use "..."

Dear users, I'm trying to learn how to use the "...". I have written a function (simplified here) that uses doBy::summaryBy(): # 'dat' is a data.frame from which the aggregation is computed # 'vec_cat' is a integer vector defining which columns of the data.frame should be use on the right side of the formula # 'stat_fun' is the function that will be run to

Using nrow with summaryBy

2010 Mar 17

Using nrow with summaryBy

Hello Everyone- I'm calculating summary statistics on a dataset (~4000 records, observations are not uniformly distributed) using summaryBy and trying to add a column with the number of observations to the output as well. What occurs to me is to use nrow(), but this doesn't appear to be working I'm able to replicate the same results with an example from the summaryBy docs:

summaryBy: transformed variable on RHS of formula?

2012 Apr 02

summaryBy: transformed variable on RHS of formula?

Hi Folks, I'm trying to cut my data inside the summaryBy function. Perhaps formulas don't work that way? I'd like to avoid adding another column if possible, but if I have to, I have to. Any ideas? Thanks, Allie require(doBy) df = dataframe(a <- rnorm(100), b <-rnorm(100)) summaryBy(a ~ cut(b,c(-100,-1,1,100)), data=df) # preferred solution, but it throws an

Aggregating a data frame (was: Re: new R-user needs help)

2006 Oct 18

Aggregating a data frame (was: Re: new R-user needs help)

Please use an informative subject for sake of the archives. Here are several solutions: aggregate(DF[4:8], DF[2], mean) library(doBy) summaryBy(x1 + x2 + x3 + x4 + x5 ~ name, DF, FUN = mean) # if Exp, name and id columns are factors then this can be reduced to library(doBy) summaryBy(. ~ name, DF, FUN = mean) library(reshape) cast(melt(DF, id = 1:3), name ~ variable, fun = mean) On

Using summaryBy with weighted data

2011 Jan 17

Using summaryBy with weighted data

Dear Soren and R users: I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: library(doBy) ## make up some data response = rnorm(100) group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) weights = runif(100, 0, 1) mydata = data.frame(response,group,weights) ## run summaryBy without weights:

data frame row manipulation

2007 Aug 31

data frame row manipulation

Hello, struggling with the very basic needs... :( any help appreciated. #using the package doBY #who drinks how much beer per day and therefor cannot calculate rowise maxvals evaluation=data.frame(date=c(1,2,3,4,5,6,7,8,9), name=c("Michael","Steve","Bob", "Michael","Steve","Bob","Michael","Steve","Bob"),

Apparent bug in summaryBy (PR#13941)

2009 Sep 04

Apparent bug in summaryBy (PR#13941)

Full_Name: Marc Paterno Version: 2.9.2 OS: Mac OS X 10.5.8 Submission from: (NULL) (99.53.212.55) summaryBy() produces incorrect results when given some data frames. Below is a transcript of a session showing the result, in a data frame with 2 observations of 2 variables. ------------------- thomas:999 paterno$ R --vanilla R version 2.9.2 (2009-08-24) Copyright (C) 2009 The R Foundation for

summaryBy(): Is it the best option?

2006 Dec 05

summaryBy(): Is it the best option?

Hi, since I have quite large tables and the processing takes quite a while I am curious if I can improve the performance of this aggregation somehow: At the moment I am using summaryBy from the doBy package under R 2.4.0, Win2K. summaryBy(soc_s6aq5 + soc_s6aq7 + soc_s6aq9 + soc_s6aq11 ~ hh + comgroup,soc6a,postfix=c("","","",""),FUN=sum, na.rm=T) The

Indexing in summaryBy

2012 May 15

Indexing in summaryBy

I'm trying to use a self-written function with the summaryBy function (doBy package). I have lots of data from Monte Carlo experiments comparing different estimators across different (combinations of) parameter values, similar to the following form: colnames(mydata) <- c("X", "b0", "b1", # parameter combination, corresponding (true) parameter values

Problem in summaryBy

2007 Feb 15

Problem in summaryBy

The R script below gives values of 1 for all minimum values when I use a custom function in summaryBy. I get the correct values when I use FUN=min directly. Any help is much appreciated. The continuous information provided in this forum is fabulous as are the different R packages available. Rene # Simulated simplified data Subj <- rep(1:4, each=6) Analyte <-

code review: is it too much to ask?

2011 Oct 23

code review: is it too much to ask?

Hello all, I really appreciate how helpful the people in this list are. Would it be too much to ask to send a small script to have it peer-reviewed? to make sure I am not making blatant mistakes? The script takes an experiment.dat as input and generates system Throughput using ggplot2. It works now ... [sigh] but I have this nasty feeling that I might be doing something wrong :). Changing

BradleyTerry "subscript out of bounds"

2006 Dec 29

BradleyTerry "subscript out of bounds"

I don't see the problem with the following... the citations and baseball data work fine, but my simulated data seems to give BTm a headache. What am I missing? --- library(BradleyTerry) library(doBy) ng <- 100 players <- factor( sort( c( "jeff", "mike", "paul", "rich" ) ) ) np <- length( players ) p1 <- factor( c( rep( "jeff", ng )

data.frame - how to calculate the number of rows

2007 Dec 26

data.frame - how to calculate the number of rows

Hello, it seems to be a simple problem, but I couldn't find an answer in the archiv. (I think, it must has something to do with the group-select, like in php) I've the following data.frame: A B C 1 3 6 5 2 4 4 20 3 5 8 2 I want to get the number of the

read.table: mysterious line omissions

2009 Dec 20

read.table: mysterious line omissions

Hello again, I am simply trying to import a rectangular table of strings. The table's dimensions are 1990 x 2, yet my read.table() command can only find 362 of the rows (and they're not the first 362). I would've taken the time to figure out how to use scan, readLines, or some other tool that can read in character strings, and then parse and input to a table, but that seems like

couting events by subject with "black out" windows

2011 Nov 18

couting events by subject with "black out" windows

I large datset that includes subjects(ID), Dates and events that need to be counted. Not every date includes an event, and I need to only count one event per 30days, per subject. So in essence, I need to create a 30-day "black out" period during which time an event cannot be "counted" for each subject. The reason is that a rule has been set up, whereby a subject can only be

Mean using different group for a real r beginner

2013 May 17

Mean using different group for a real r beginner

Hi, Try either: tolerance <- read.csv("http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt") ?aggregate(exposure~male,data=tolerance,mean) ?# male exposure #1??? 0 1.246667 #2??? 1 1.120000 #or ?library(plyr) ?ddply(tolerance,.(male),summarize,exposure=mean(exposure)) #? male exposure #1??? 0 1.246667 #2??? 1 1.120000 #or

R 2.9.2 memory max - object vector size

2009 Sep 10

R 2.9.2 memory max - object vector size

Me: Win XP 4 gig ram R 2.9.2 library(foreign) # to read/write SPSS files library(doBy) # for summaryBy library(RODBC) setwd("C:\\Documents and Settings\\............00909BR") gc() memory.limit(size=4000) ## PROBLEM: I have memory limit problems. R and otherwise. My dataframes for merging or subsetting are about 300k to 900k records. I've had errors such as vector size too large.

Nested select

2009 Sep 25

Nested select

my data : library(doBy) lines<-"lo ptcl5 ptcl99 variable 430 . 8787 a 430 342 2343 m 430 . 89 mr 431 456 4774 a 431 299 2777 m 431 99 96 mr 432 333 3433 a 432 . 7377 m 432 .

aggregate(...) with multiple functions

2010 Jul 16

aggregate(...) with multiple functions

hi all - i'm just wondering what sort of code people write to essentially performa an aggregate call, but with different functions being applied to the various columns. for example, if i have a data frame x and would like to marginalize by a factor f for the rows, but apply mean() to col1 and median() to col2. if i wanted to apply mean() to both columns, i would call: aggregate(x, list(f),

similar to: Follow-up Question: data frames; matching/merging