Displaying 20 results from an estimated 4000 matches similar to: "New PLYR issue"
2011 Aug 24
3
ddply from plyr package - any alternatives?
Hello everyone,
I was asked to repost this again, sorry for any inconvenience.
I'm looking replacement for ddply function from plyr package.
Function allows to apply function by category stored in any column/columns.
Regular loops or lapplys slow down greatly because my unique combination
count exceeds 9000. Is there any available solution which allow me to apply
function by category?
2009 Apr 03
3
plyr and table question
Dear all,
I'm puzzled by the following example inspired by a recent question on
R-help,
cc <- textConnection("user_id website time
20 google 0930
21 yahoo 0935
20 facebook 1000
25 facebook 1015
61 google 0940")
d <- read.table(cc, head=T) ; close(cc)
table(d$user_id) # count the
2010 Sep 16
2
parallel computation with plyr 1.2.1
Hi,
I have been trying to use the new .parallel argument with the most recent
version of plyr [1] to speed up some tasks. I can run the example in the NEWS
file [1], and it seems to be working correctly. However, R will only use a
single core when I try to apply this same approach with ddply().
1. http://cran.r-project.org/web/packages/plyr/NEWS
Watching my CPUs I see that in both cases
2013 Aug 27
1
[plyr] Moving average filter with plyr
Dear all,
I'm stuck with a problem using plyr to process a rather large junk of data. What I'm trying to do is applying a moving average to all the subparts of the dataframe (the example data can be found here https://dl.dropboxusercontent.com/u/2414056/testData.Rdata).
require(plyr)
load("testData.Rdata")
applyfilter<-function(x){
return(filter(x,rep(1/5, times=5)))
}
2010 Dec 06
3
[plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
Dear R-Helpers:
I am using trying to use *ddply* to extract min and max of a particular
column in a data.frame. I am using two different forms of the function:
## var_name_to_split is a string -- something like "var1" which is the name
of a column in data.frame
ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[
, 3]))) ## fails with an error - case 1
ddply(
2013 Apr 03
5
Can package plyr also calculate the mode?
I am trying to replicate the SAS proc univariate in R. I got most of the
stats I needed for a by grouping in a data frame using:
all1 <- ddply(all,"ACT_NAME", summarise, mean=mean(COUNTS), sd=sd(COUNTS),
q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
q75=quantile(COUNTS,.75),
q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
q99=quantile(COUNTS,.99) )
2011 Apr 25
2
Problem with ddply in the plyr-package: surprising output of a date-column
Hi Together,
I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step,
2010 Apr 29
1
Using plyr::dply more (memory) efficiently?
Hi all,
In short:
I'm running ddply on an admittedly (somehow) large data.frame (not
that large). It runs fine until it finishes and gets to the
"collating" part where all subsets of my data.frame have been
summarized and they are being reassembled into the final summary
data.frame (sorry, don't know the correct plyr terminology). During
collation, my R workspace RAM usage goes
2009 Aug 18
1
Plyr and memory allocation issue
Dear R users
I am trying to create some new variables for a 4401 x 30 dataframe using
ddply and transform. The "id" variable i am using is a factor with 1330
levles eg
bb <- function(df) {transform(df,
years = study.year - min(study.year) + 1,
periods = length(study.year)
)}
test <- ddply(x,.(id),bb)
I havent copied the data to avoid clogging the
2009 Sep 25
2
summarize-plyr package
Hi,I am using the amazing package 'plyr". I have one problem. I would
appreciate help to fix the following error: Thanks.
______________________________
> library(plyr)
> data(baseball)
> summarise(baseball,
+ duration = max(year) - min(year),
+ nteams = length(unique(team)))
Error: could not find function "summarise"
> ddply(baseball, "id", summarise,
+
2011 Apr 21
1
Stymied by plyr
Hello, This is my first time trying to use plyr, and I'm getting
nowhere. I have teacher ratings data (1:4), on 10 components, by
external observers and internal observers, in schools in areas. I want
to calculate the percentage of each rating given on each component, by
each type of observer, within each school, within each area. The data
look like this:
unit area ext.obs rating comp
11
2011 Oct 12
3
Applying function to only numeric variable (plyr package?)
My data frame consists of character variables, factors, and proportions,
something like
c1 <- c("A", "B", "C", "C")
c2 <- factor(c(1, 1, 2, 2), labels = c("Y","N"))
x <- c(0.5234, 0.6919, 0.2307, 0.1160)
y <- c(0.9251, 0.7616, 0.3624, 0.4462)
df <- data.frame(c1, c2, x, y)
pct <- function(x) round(100*x, 1)
I want to
2011 Apr 27
3
MASS fitdistr with plyr or data.table?
I am trying to extract the shape and scale parameters of a wind speed
distribution for different sites. I can do this in a clunky way, but
I was hoping to find a way using data.table or plyr. However, when I
try I am met with the following:
set.seed(144)
weib.dist<-rweibull(10000,shape=3,scale=8)
weib.test<-data.table(cbind(1:10,weib.dist))
2011 Aug 10
1
Sequential Naming of ggplot .pngs using plyr
If I have data:
dat<-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5))
And want to plot like this:
ctr<-1
for(i in c('a','b','c','d')){
png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
width=11,units='in',pointsize=9,res=300)
print(ggplot(dat[,names(dat) %in%
2011 Sep 03
2
problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome
Dear R experts.
I might be missing something obvious. I have been trying to fix this problem
for some weeks. Please help.
#data
ped <- c(rep(1, 4), rep(2, 3), rep(3, 3))
y <- rnorm(10, 8, 2)
# variable set 1
M1a <- sample (c(1, 2,3), 10, replace= T)
M1b <- sample (c(1, 2,3), 10, replace= T)
M1aP1 <- sample (c(1, 2,3), 10, replace= T)
M1bP2 <- sample (c(1, 2,3), 10, replace= T)
2012 Mar 28
1
Why does this work? plyr within-subset normalization
Working code that normalize each row's value against the subset's maximum.
Does the invocation of max() somehow instruct R to 'step back' and evaluate
the subset?
Thanks, Zack
--
View this message in context: http://r.789695.n4.nabble.com/Why-does-this-work-plyr-within-subset-normalization-tp4512989p4512989.html
Sent from the R help mailing list archive at Nabble.com.
2012 Jul 11
1
do I need plyr, apply or something else?
Dear all,
This is what I'd like to do (I have an implementation using for loops, which I designed before I realised just how slow R is at executing them - this process currently takes days to run).
I have a large dataframe containing corporate bond data, columns are:
BondID
Date (goes back 5years)
Var1
Var2
Term2Maturity
What I want to do is this:
1) For each bond, at each given date,
2011 Nov 13
1
New PLYR issue
Issue with PLYR.
Now using R 2.14 and this data and plyr command line worked with 2.13
I am also loading the same saved data that worked previously, but now
some issue.
> library(plyr)
> UNESCO <- dget('C:/Carbon-GJ/BZE_ecosys.robj')
> df2 <- ddply(df, "UNESCO", summarise, total_ha = sum(Ha))
*Error in if (empty(.data)) return(.data) :
missing value where
2011 May 17
1
Subsetting depth profiles based on maximum depth by group with plyr
Hello,
Apologies for a similar earlier post. I didn't include enough details in
that one.
I am having a little trouble subsetting some data based on a grouping
variable. I am using an instrument that does depth profiles of a water
column. The instrument records on the way down as well as the way up. So
thanks to an off-list reply I can subset the data so that all data collected
at the
2013 Apr 20
7
Reshape or Plyr?
H all,
I have relative abundance data from >100 sites. This is from acoustic
monitoring and usually the data is for 2-3 nights but in some cases my
be longer like months or years for each location..
The data output from my management data base is proved by species by
night for each location so data frame would look like this below. What I
need to do is sum the Survey_time by Spec_Code for