thr3ads.net - similar to: "Help with ddply to eliminate a for..loop"

Displaying 20 results from an estimated 10000 matches similar to: "Help with ddply to eliminate a for..loop"

2009 Sep 25

summarize-plyr package

Hi,I am using the amazing package 'plyr". I have one problem. I would appreciate help to fix the following error: Thanks. ______________________________ > library(plyr) > data(baseball) > summarise(baseball, + duration = max(year) - min(year), + nteams = length(unique(team))) Error: could not find function "summarise" > ddply(baseball, "id", summarise, +

Determining maximum hourly slope per day

2013 Mar 13

Determining maximum hourly slope per day

Hello, I have a challenge! I have a large dataset with three columns, "date","temp", "location". "date" is in the format %m/%d/%y %H:%M, with a "temp" recorded every 10 minutes. These temperatures of surface temperatures and so fluctuate during the day, heating up and then cooling down, so the data is a series of peaks and troughs. I would like

use of ddply() within function

2012 Sep 06

use of ddply() within function

Dear all, I am encountering problems with the application of ddply within the body of a self-defined function. The script is the following: moncostcarmoto <- function(costtype){ costaux_result <- data.frame() for (purp in PURPcount){for (per in PERcount){ costcarin =

extract data features from subsets

2011 Jun 07

extract data features from subsets

I have a large dataset similar to this: ID time result A 1 5 A 2 2 A 3 1 A 4 1 A 5 1 A 6 2 A 7 3 A 8 4 B 1 3 B 2 2 B 3 4 B 4 6 B 5 8 I need to extract a number of features for each individual in it (identified by "ID"). These are: * The lowest result (the nadir) * The time of the nadir - but if the nadir level is present at >1 time point, I need the minimum and maximum time of nadir

Model validation and penalization with rms package

2010 Jun 29

Model validation and penalization with rms package

I?ve been using Frank Harrell?s rms package to do bootstrap model validation. Is it the case that the optimum penalization may still give a model which is substantially overfitted? I calculated corrected R^2, optimism in R^2, and corrected slope for various penalties for a simple example: x1 <- rnorm(45) x2 <- rnorm(45) x3 <- rnorm(45) y <- x1 + 2*x2 + rnorm(45,0,3) ols0 <- ols(y

regression on data subsets in datafile

2011 Sep 12

regression on data subsets in datafile

I have data of the form tC <- textConnection(" Subject Date parameter1 bob 3/2/99 10 bob 4/2/99 10 bob 5/5/99 10 bob 6/27/99 NA bob 8/35/01 10 bob 3/2/02 10 steve 1/2/99 4 steve 2/2/00 7 steve 3/2/01 10 steve 4/2/02 NA steve 5/2/03 16 kevin 6/5/04 24 ") data <- read.table(header=TRUE, tC) close.connection(tC) rm(tC) I am trying to calculate rate of change of parameter1 in

Is it R or I?

2000 Sep 29

Is it R or I?

Salutations: I have been trying to translate a S-PLUS/ArcInfo (GIS software) application that I wrote on a SGI (IRIX) platform to public domain R and GrassGIS on a Linux platform. I am almost on the verge of abandoning it as I find R to be rather unstable, slow and frustrating. I enclose a section of my code for R experts to examine hoping that they'll point out that all the above three are

[plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

2010 Dec 06

[plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

Dear R-Helpers: I am using trying to use *ddply* to extract min and max of a particular column in a data.frame. I am using two different forms of the function: ## var_name_to_split is a string -- something like "var1" which is the name of a column in data.frame ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ , 3]))) ## fails with an error - case 1 ddply(

unexpected behaviour with ddply and colwise

2010 Apr 07

unexpected behaviour with ddply and colwise

Hi, I am confused by results from: > ddply(aa, names(aa), colwise(sum)) I thought ddply was just calling colwise(sum)() with each column. However ddply() returns a 13 x 5 result !! The general result I expected is similar to that of apply() , or using colwise(sum)() alone. Shouldn't ddply() produce the same ? Thanks in advance for your help, - Stuart Andrews >

ddply to count frequency of combinations

2011 Jun 21

ddply to count frequency of combinations

I have a dataframe df with two columns x and y. I want to count the number of times a unique x, y combination occurs. For example x<- c(1,2,3,4,5,1,2,3,4) y<- c(1,2,3,4,5,1,2,4,1) df<-as.data.frame(cbind(x, y)) #what is the correct way to use ddply for this example? ddply(df, c('x','y', summarize, ??) #desired output -- format and order doesn't matter # (x, y)

ddply with mean and max...

2011 May 11

ddply with mean and max...

I'm trying to use ddply to compute summary statistics for many variables splitting on the variable site. however, it seems to work fine for mean() but if i use max() or min() things fall apart. whats going on? test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100)) means<-ddply(test.set,.(site),mean) means site x y 1 1 -97459496 -0.14826303 2

Using a by() function to process several regression (lm()) functions

2009 Nov 05

Using a by() function to process several regression (lm()) functions

Hello, Thank you very much for looking at this. I have a "seasonal" user for R. I teach my undergrads and graduates students statistics using R and often find myself trying to solve problems to process student collected data in an efficient way. In this case, I have a data.frame with multiple observations. These are gas concentrations in a chamber and are used to measure into rates,

Degrees of Freedom for lme.

2012 Apr 01

Degrees of Freedom for lme.

Hi, I am trying to run a linear mixed effect model on data. I have 17 longitudinal subjects and 36 single subjects, and this is the code I'm using (below). So, INDEX1 is the column with brain volumns, and the predictors are gort and age, by time ID (time they were seen). I believe my data is set up the right way, but when I run it, I get DF for Intercept is 49, and DF for slope is 13?

ddply function nesting problems

2009 Nov 19

ddply function nesting problems

While putting my R code into functions, I've encountered a ddply function nesting issue and need a bit of advice on the proper way to fix it.? I've tried several approahces, but neither worked and I need to have the ability to include the "cut", "range", and "fullseq" methods within ddply.? (For a bit of that explanation refer to

ddply from plyr package - any alternatives?

2011 Aug 24

ddply from plyr package - any alternatives?

Hello everyone, I was asked to repost this again, sorry for any inconvenience. I'm looking replacement for ddply function from plyr package. Function allows to apply function by category stored in any column/columns. Regular loops or lapplys slow down greatly because my unique combination count exceeds 9000. Is there any available solution which allow me to apply function by category?

testing slope

2003 Feb 04

testing slope

Hi all, I try to test a linear slope using offset. I have: > m2 <- glm(Y~X*V) > summary(m2) Call: glm(formula = Y ~ X * V) Deviance Residuals: Min 1Q Median 3Q Max -2.01688 -0.56028 0.05224 0.53213 3.60216 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.3673 0.8476 1.613 0.119788 X

Function for ddply

2012 Jul 24

Function for ddply

Hello, all. I'm new to R and just beginning to learn to write functions. I know I'm out of my depth posting here, and I'm sure my issue is mundane. But here goes. I'm analyzing the American National Election Study (nes), looking at mean values of a numeric dep_var (environ.therm) across values of a factor (partyid3). I use ddply from plyr and wtd.mean from Hmisc. The nes requires a

How to extract from a column in a table?

2012 Aug 16

How to extract from a column in a table?

Hi, I have a table in which one column has the name of the objects as shown below. Name Budlamp-Woodcutter Complex - 15 to 60% slope (60/25/15) Budlamp-Woodcutter Complex - 15 to 60% slope (60/25/15) Terrarossa-Blacktail-Pyeatt Complex - 1 to 40% slope (40/35/15/10) Terrarossa-Blacktail-Pyeatt Complex - 1 to 40% slope (40/35/15/10) How can I split the single column into three columns

a question about "by" and "ddply"

2012 May 29

a question about "by" and "ddply"

Hi all, I have a data set (df, n=10 for the sake of simplicity here) where I have two continuous variables (age and weight) and I also have a grouping variable (group, with two levels). I want to run correlations for each group separately (kind of similar to "split file" in SPSS). I've been experimenting with different functions, and I was able to do this correctly using ddply

Correct use of ddply with own function

2012 May 05

Correct use of ddply with own function

Hi, I am really confused how ddply work, so maybe you can help me. I created a function that sorts a vector etc. fn <- function(x){ x1 <- sort(x) x2 <- seq(length(x)) x3 <- x2/max(x2) df <- data.frame(x1,x2,x3) df } Probably this is not the best form of the function, but at least it produces what I want (data to plot a cumulative count curve). This function works on a

similar to: Help with ddply to eliminate a for..loop