similar to: ddply from plyr package - any alternatives?

Displaying 20 results from an estimated 5000 matches similar to: "ddply from plyr package - any alternatives?"

2010 Dec 06
3
[plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
Dear R-Helpers: I am using trying to use *ddply* to extract min and max of a particular column in a data.frame. I am using two different forms of the function: ## var_name_to_split is a string -- something like "var1" which is the name of a column in data.frame ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ , 3]))) ## fails with an error - case 1 ddply(
2011 Nov 18
3
Apply functions along "layers" of a data matrix
Hello How can I apply functions along "layers" of a data matrix? Example: daf <- data.frame( 'id' = rep(1:5, 3), matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )), rep = rep(1:3, each=5) ) The data frame "daf" contains 3 repetitions/layers (rep) of 4 variables of 5 persons (id). For some reason, I want to calculate
2011 Oct 06
1
apply and functions with many arguments
Dear all, I would like to use the following function fitdist(data, distr, method=c("mle", "mme", "qme", "mge"), start=NULL, fix.arg=NULL, ...) for many different distr values like distr=c("norm","lnorm","pois") (just a small example) and take back into a list the parameter name which is what is inside distr plus what the
2011 Apr 25
2
Problem with ddply in the plyr-package: surprising output of a date-column
Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step,
2011 Sep 30
2
ggplot2 - extracting values of smooth
Suppose that I'm working on Hadley's diamond dataset and I want to review the relationship between price, colour and carat. I might run the following: library(ggplot2) #plot scatter and add some hex binning q<-qplot(carat,price,data=diamonds, geom=c("hex"), main="Variability of Diamond Prices by Carat and Colour") #facet to get one scatter for
2011 Aug 03
4
slow computation of functions over large datasets
Hello there, I’m computing the total value of an order from the price of the order items using a “for” loop and the “ifelse” function. I do this on a large dataframe (close to 1m lines). The computation of this function is painfully slow: in 1min only about 90 rows are calculated. The computation time taken for a given number of rows increases with the size of the dataset, see the example with
2011 Oct 31
2
3 Overlayed simple plots
Dear all, I am plotting 3 plots into the same x and y axis. I want the first one to be painted red with a continuous line The second one green with a continuous line and the third one blue with a continuous line plot(max_power(data),ylim=c(-120,-20))     par(new=T)     plot(min_power(data),ylim=c(-120,-20))     par(new=T)     plot(mean_power(data),ylim=c(-120,-20))     par(new=F) Is it also a
2011 Aug 24
2
Regression by factor using "sapply"
Apologies for the elementary nature of the question (yes, I'm another newbie)... I'd like to perform a multiple regression on a single data set containing a representation of energy consumption and temperatures containing account number, usage (KWh), heating degree days (HDD) and cooling degree (CDD) days. I want to get the coefficients back from the following equation: lm(AvgKWh ~
2011 Jul 31
4
help with algorithm
I'm wondering if anyone can give some basic advice about how to approach a specific task in R. I'm new to R but have used SAS for many years, and while I can muscle through a lot of the code details, I'm unsure of a few things. Specific questions: If I have to perform a set of actions on a group of files, should I use a loop (I feel like I've heard people say to avoid looping
2011 Aug 23
3
ddply - how to transform df column "in place"
Dear R-users, I am trying to get the plyr syntax right, without much success. Given: d<- data.frame(cbind(x=1,y=seq(20100801,20100830,1))) names(d)<-c("first", "daterep") d2<-d # I can convert the daterep column in place the classic way: d$daterep<-as.Date(strptime(d$daterep, format="%Y%m%d")) # How to do it the plyr way? ddply(d2,
2012 Jan 17
1
New PLYR issue
Hello everyone, I have got the same problem, with the same error message. Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP The plyr mailing list does not provide any help until now. >require(plyr) >c(sample(c(1:100), 50, replace=TRUE))->V1 >c(rep( 1:5, 10))->f1 #variable to group V1 >data.frame(cbind(V1, f1))->DF >str(DF) >ddply(DF$V1, DF$f1,
2011 Jun 21
4
ddply to count frequency of combinations
I have a dataframe df with two columns x and y. I want to count the number of times a unique x, y combination occurs. For example x<- c(1,2,3,4,5,1,2,3,4) y<- c(1,2,3,4,5,1,2,4,1) df<-as.data.frame(cbind(x, y)) #what is the correct way to use ddply for this example? ddply(df, c('x','y', summarize, ??) #desired output -- format and order doesn't matter # (x, y)
2009 Apr 03
3
plyr and table question
Dear all, I'm puzzled by the following example inspired by a recent question on R-help, cc <- textConnection("user_id website time 20 google 0930 21 yahoo 0935 20 facebook 1000 25 facebook 1015 61 google 0940") d <- read.table(cc, head=T) ; close(cc) table(d$user_id) # count the
2010 Sep 16
2
parallel computation with plyr 1.2.1
Hi, I have been trying to use the new .parallel argument with the most recent version of plyr [1] to speed up some tasks. I can run the example in the NEWS file [1], and it seems to be working correctly. However, R will only use a single core when I try to apply this same approach with ddply(). 1. http://cran.r-project.org/web/packages/plyr/NEWS Watching my CPUs I see that in both cases
2013 Aug 27
1
[plyr] Moving average filter with plyr
Dear all, I'm stuck with a problem using plyr to process a rather large junk of data. What I'm trying to do is applying a moving average to all the subparts of the dataframe (the example data can be found here https://dl.dropboxusercontent.com/u/2414056/testData.Rdata). require(plyr) load("testData.Rdata") applyfilter<-function(x){ return(filter(x,rep(1/5, times=5))) }
2011 May 24
1
references in R
Hi everybody, my problem is the following: I have a function which produces for an input number a dataframe with a fixed number of columns and 0 to 10 rows. Now i want to apply this function to a vector with different inputs and merge all these result-dataframes in one data.frame. Can i give a data.frame-reference to the function, so that the result of the function is not returned but appended
2009 Aug 05
2
using ddply but preserving some of the outside data
I have a bit of a quandy. I'm working with a data set for which I have sampled sites at a variety of dates. I want to use this data, and get a running average of the sampled values for the current and previous date. I originally thought something like ddply would be ideal for this, however, I cannot break up my data by date, and then apply a function that requires information
2012 Jul 24
1
Function for ddply
Hello, all. I'm new to R and just beginning to learn to write functions. I know I'm out of my depth posting here, and I'm sure my issue is mundane. But here goes. I'm analyzing the American National Election Study (nes), looking at mean values of a numeric dep_var (environ.therm) across values of a factor (partyid3). I use ddply from plyr and wtd.mean from Hmisc. The nes requires a
2013 Apr 03
5
Can package plyr also calculate the mode?
I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 <- ddply(all,"ACT_NAME", summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) )
2010 Apr 29
1
Using plyr::dply more (memory) efficiently?
Hi all, In short: I'm running ddply on an admittedly (somehow) large data.frame (not that large). It runs fine until it finishes and gets to the "collating" part where all subsets of my data.frame have been summarized and they are being reassembled into the final summary data.frame (sorry, don't know the correct plyr terminology). During collation, my R workspace RAM usage goes