thr3ads.net - similar to: "ddply from plyr package

Displaying 20 results from an estimated 5000 matches similar to: "ddply from plyr package - any alternatives?"

[plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

2010 Dec 06

[plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function

Dear R-Helpers: I am using trying to use *ddply* to extract min and max of a particular column in a data.frame. I am using two different forms of the function: ## var_name_to_split is a string -- something like "var1" which is the name of a column in data.frame ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ , 3]))) ## fails with an error - case 1 ddply(

Apply functions along "layers" of a data matrix

2011 Nov 18

Apply functions along "layers" of a data matrix

Hello How can I apply functions along "layers" of a data matrix? Example: daf <- data.frame( 'id' = rep(1:5, 3), matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), rep = rep(1:3, each=5) ) The data frame "daf" contains 3 repetitions/layers (rep) of 4 variables of 5 persons (id). For some reason, I want to calculate

apply and functions with many arguments

2011 Oct 06

apply and functions with many arguments

Dear all, I would like to use the following function fitdist(data, distr, method=c("mle", "mme", "qme", "mge"), start=NULL, fix.arg=NULL, ...) for many different distr values like distr=c("norm","lnorm","pois") (just a small example) and take back into a list the parameter name which is what is inside distr plus what the

Problem with ddply in the plyr-package: surprising output of a date-column

2011 Apr 25

Problem with ddply in the plyr-package: surprising output of a date-column

Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step,

ggplot2 - extracting values of smooth

2011 Sep 30

ggplot2 - extracting values of smooth

Suppose that I'm working on Hadley's diamond dataset and I want to review the relationship between price, colour and carat. I might run the following: library(ggplot2) #plot scatter and add some hex binning q<-qplot(carat,price,data=diamonds, geom=c("hex"), main="Variability of Diamond Prices by Carat and Colour") #facet to get one scatter for

slow computation of functions over large datasets

2011 Aug 03

slow computation of functions over large datasets

Hello there, I’m computing the total value of an order from the price of the order items using a “for” loop and the “ifelse” function. I do this on a large dataframe (close to 1m lines). The computation of this function is painfully slow: in 1min only about 90 rows are calculated. The computation time taken for a given number of rows increases with the size of the dataset, see the example with

3 Overlayed simple plots

2011 Oct 31

3 Overlayed simple plots

Dear all, I am plotting 3 plots into the same x and y axis. I want the first one to be painted red with a continuous line The second one green with a continuous line and the third one blue with a continuous line plot(max_power(data),ylim=c(-120,-20)) par(new=T) plot(min_power(data),ylim=c(-120,-20)) par(new=T) plot(mean_power(data),ylim=c(-120,-20)) par(new=F) Is it also a

Regression by factor using "sapply"

2011 Aug 24

Regression by factor using "sapply"

Apologies for the elementary nature of the question (yes, I'm another newbie)... I'd like to perform a multiple regression on a single data set containing a representation of energy consumption and temperatures containing account number, usage (KWh), heating degree days (HDD) and cooling degree (CDD) days. I want to get the coefficients back from the following equation: lm(AvgKWh ~

help with algorithm

2011 Jul 31

help with algorithm

I'm wondering if anyone can give some basic advice about how to approach a specific task in R. I'm new to R but have used SAS for many years, and while I can muscle through a lot of the code details, I'm unsure of a few things. Specific questions: If I have to perform a set of actions on a group of files, should I use a loop (I feel like I've heard people say to avoid looping

ddply - how to transform df column "in place"

2011 Aug 23

ddply - how to transform df column "in place"

Dear R-users, I am trying to get the plyr syntax right, without much success. Given: d<- data.frame(cbind(x=1,y=seq(20100801,20100830,1))) names(d)<-c("first", "daterep") d2<-d # I can convert the daterep column in place the classic way: d$daterep<-as.Date(strptime(d$daterep, format="%Y%m%d")) # How to do it the plyr way? ddply(d2,

New PLYR issue

2012 Jan 17

New PLYR issue

Hello everyone, I have got the same problem, with the same error message. Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP The plyr mailing list does not provide any help until now. >require(plyr) >c(sample(c(1:100), 50, replace=TRUE))->V1 >c(rep( 1:5, 10))->f1 #variable to group V1 >data.frame(cbind(V1, f1))->DF >str(DF) >ddply(DF$V1, DF$f1,

ddply to count frequency of combinations

2011 Jun 21

ddply to count frequency of combinations

I have a dataframe df with two columns x and y. I want to count the number of times a unique x, y combination occurs. For example x<- c(1,2,3,4,5,1,2,3,4) y<- c(1,2,3,4,5,1,2,4,1) df<-as.data.frame(cbind(x, y)) #what is the correct way to use ddply for this example? ddply(df, c('x','y', summarize, ??) #desired output -- format and order doesn't matter # (x, y)

plyr and table question

2009 Apr 03

plyr and table question

Dear all, I'm puzzled by the following example inspired by a recent question on R-help, cc <- textConnection("user_id website time 20 google 0930 21 yahoo 0935 20 facebook 1000 25 facebook 1015 61 google 0940") d <- read.table(cc, head=T) ; close(cc) table(d$user_id) # count the

parallel computation with plyr 1.2.1

2010 Sep 16

parallel computation with plyr 1.2.1

Hi, I have been trying to use the new .parallel argument with the most recent version of plyr [1] to speed up some tasks. I can run the example in the NEWS file [1], and it seems to be working correctly. However, R will only use a single core when I try to apply this same approach with ddply(). 1. http://cran.r-project.org/web/packages/plyr/NEWS Watching my CPUs I see that in both cases

[plyr] Moving average filter with plyr

2013 Aug 27

[plyr] Moving average filter with plyr

Dear all, I'm stuck with a problem using plyr to process a rather large junk of data. What I'm trying to do is applying a moving average to all the subparts of the dataframe (the example data can be found here https://dl.dropboxusercontent.com/u/2414056/testData.Rdata). require(plyr) load("testData.Rdata") applyfilter<-function(x){ return(filter(x,rep(1/5, times=5))) }

references in R

2011 May 24

references in R

Hi everybody, my problem is the following: I have a function which produces for an input number a dataframe with a fixed number of columns and 0 to 10 rows. Now i want to apply this function to a vector with different inputs and merge all these result-dataframes in one data.frame. Can i give a data.frame-reference to the function, so that the result of the function is not returned but appended

using ddply but preserving some of the outside data

2009 Aug 05

using ddply but preserving some of the outside data

I have a bit of a quandy. I'm working with a data set for which I have sampled sites at a variety of dates. I want to use this data, and get a running average of the sampled values for the current and previous date. I originally thought something like ddply would be ideal for this, however, I cannot break up my data by date, and then apply a function that requires information

Function for ddply

2012 Jul 24

Function for ddply

Hello, all. I'm new to R and just beginning to learn to write functions. I know I'm out of my depth posting here, and I'm sure my issue is mundane. But here goes. I'm analyzing the American National Election Study (nes), looking at mean values of a numeric dep_var (environ.therm) across values of a factor (partyid3). I use ddply from plyr and wtd.mean from Hmisc. The nes requires a

Can package plyr also calculate the mode?

2013 Apr 03

Can package plyr also calculate the mode?

I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 <- ddply(all,"ACT_NAME", summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) )

Using plyr::dply more (memory) efficiently?

2010 Apr 29

Using plyr::dply more (memory) efficiently?

Hi all, In short: I'm running ddply on an admittedly (somehow) large data.frame (not that large). It runs fine until it finishes and gets to the "collating" part where all subsets of my data.frame have been summarized and they are being reassembled into the final summary data.frame (sorry, don't know the correct plyr terminology). During collation, my R workspace RAM usage goes

similar to: ddply from plyr package - any alternatives?