thr3ads.net - search: "dapli"

2010 Sep 09

1

Strange output daply with empty strata

Dear list, I get some strange results with daply from the plyr package. In the example below, the average age per municipality for employed en unemployed is calculated. If I do this using tapply (see code below) I get the following result: no yes A NA 36.94931 B 51.22505 34.24887 C 48.05759 51.00198 If I do this using daply: municipality no yes

plyr: set '.progress' argument to default to "text"

2012 Apr 10

1

plyr: set '.progress' argument to default to "text"

Dear all Is it possible to set globally the option .progress = "text" to all the apply functions in 'plyr'. For example, current default is daply(..., .progress = "none"). I would like to set it to daply(..., .progress = "text"), so as to avoid writing the argument every time I call such a function. I looked into ?daply and ?create_progress_bar without much

dataframe to a timeseries object

2011 Mar 11

1

dataframe to a timeseries object

I?m wondering which is the most efficient (time, than memory usage) way to obtain a multivariate time series object from a data frame (the easiest data structure to get data from a database trough RODBC). I have a starting point using timeSeries or xts library (these libraries can handle time zones), below you can find code to test. Merging parallelization (cbind) is something I?m thinking at

frequency, count rows, data for heat map

2010 Aug 25

3

frequency, count rows, data for heat map

Hi all, I have read posts of heat map creation but I am one step prior -- Here is what I am trying to do and wonder if you have any tips? We are trying to map sequence reads from tumors to viral genomes. Example input file : 111 abc 111 sdf 111 xyz 1079 abc 1079 xyz 1079 xyz 5576 abc 5576 sdf 5576 sdf How may xyz's are there for 1079 and 111? How many abc's, etc?

How to speed up grouping time series, help please

2011 Apr 04

3

How to speed up grouping time series, help please

I retrieve for a few hundred times a group of time series (10-15 ts with 10000 values each), on every group I do some calculation, graphs etc. I wonder if there is a faster method than what presented below to get an appropriate timeseries object. Making a query with RODBC for every group I get a data frame like this: > X ID DATE VALUE 14 3 2000-01-01 00:00:03 0.5726334

New package: plyr

2008 Sep 30

0

New package: plyr

plyr is a set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each pieces and then put all the pieces back together. It's already possible to do this with split and the apply functions, but plyr just makes it all a bit easier with: * consistent names, arguments and outputs * input from and output to data.frames,

[solutions] "tapply versus by" in function with more than 1 arguments

2008 Oct 02

0

[solutions] "tapply versus by" in function with more than 1 arguments

Thanks to all. I summarized (in order to thank the list) the solutions to help future workers searching subjects like this at R help. # Number of rows nr = 10 # Data set dataf = as.data.frame(matrix(c(rnorm(nr),rnorm(nr)*2,runif(nr),sort(c(1,1,2,2,3,3,sample(1:3,nr-6,replace=TRUE)))),ncol=4)) names(dataf)[4] = "class" #----------------------------------------------------- #Solution 1:

New package: plyr

2008 Sep 30

0

New package: plyr

plyr is a set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each pieces and then put all the pieces back together. It's already possible to do this with split and the apply functions, but plyr just makes it all a bit easier with: * consistent names, arguments and outputs * input from and output to data.frames,

parallel computation in plyr 1.7

2012 Jan 12

1

parallel computation in plyr 1.7

Dear all, I have a question regarding the possibility of parallel computation in plyr version 1.7. The help files of the following functions mention the argument '.parallel': ddply, aaply, llply, daply, adply, dlply, alply, ldply, laply However, the help files of the following functions do not mention this argument: ?d_ply, ?aply, ?lply Is it because parallel computation is not

plyr version 0.1.7

2009 Apr 15

0

plyr version 0.1.7

plyr is a set of tools for a common set of problems: you need to break down a big data structure into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to: * fit the same model to subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise transformations like scaling or standardising *

plyr version 0.1.7

2009 Apr 15

0

plyr version 0.1.7

plyr is a set of tools for a common set of problems: you need to break down a big data structure into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to: * fit the same model to subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise transformations like scaling or standardising *

plyr: version 1.2

2010 Sep 10

0

plyr: version 1.2

plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise transformations

plyr: version 1.2

2010 Sep 10

0

plyr: version 1.2

plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise transformations

Counting entries to create a new table

2011 Nov 01

1

Counting entries to create a new table

Hi, I am an R novice and I am trying to do something that it seems should be fairly simple, but I can't quite figure it out and I must not be using the right words when I search for answers. I have a dataset with a number of individuals and observations for each day (7 possible codes plus missing data) So it looks something like this Individual A, B, C, D Day1 1,1,1,1 Day 2 1,3,4,2 Day3

Question on matrix calculation

2013 Jan 24

2

Question on matrix calculation

Hello again, Ley say I have 1 matrix and 1 data frame: > mat <- matrix(1:15, 5) > match_df <- data.frame(Seq = 1:5, criteria = sample(letters[1:5], 5, replace = T)) > mat [,1] [,2] [,3] [1,] 1 6 11 [2,] 2 7 12 [3,] 3 8 13 [4,] 4 9 14 [5,] 5 10 15 > match_df Seq criteria 1 1 c 2 2 e 3 3 c 4 4 c 5

help need on working in subset within a dataframe

2011 Mar 22

1

help need on working in subset within a dataframe

Dear R-experts Execuse me for an easy question, but I need help, sorry for that. >From days I have been working with a large dataset, where operations are needed within a component of dataset. Here is my question: I have big dataset where x1:.....x1000 or so. What I need to do is to work on 4 consequite variables to calculate a statistics and output. So far so good. There are more vector

aggregate, by, *apply

2010 Sep 15

3

aggregate, by, *apply

Dear R gurus, I regularly come across a situation where I would like to apply a function to a subset of data in a dataframe, but I have not found an R function to facilitate exactly what I need. More specifically, I'd like my function to have a context of where the data it's analyzing came from. Here is an example: ### BEGIN ### func<-function(x){ m<-median(x$x) if(m > 2 &

How to remove rows based on frequency of factor and then difference date scores

2010 Aug 24

2

How to remove rows based on frequency of factor and then difference date scores

Hello- A basic question which has nonetheless floored me entirely. I have a dataset which looks like this: Type ID Date Value A 1 16/09/2020 8 A 1 23/09/2010 9 B 3 18/8/2010 7 B 1 13/5/2010 6 There are two Types, which correspond to different individuals in different conditions, and loads of ID labels (1:50)

Memory problem

2016 Apr 06

0

Memory problem

As Jim has indicated, memory usage problems can require very specific diagnostics and code changes, so generic help is tough to give. However, in most cases I have found the dplyr package to be more memory efficient than plyr, so you could consider that. Also, you can be explicit about only saving the minimum results you want to keep rather than making a list of complete results and extracting

"tapply versus by" in function with more than 1 arguments

2008 Oct 01

3

"tapply versus by" in function with more than 1 arguments

Hi. I searched the list and didn't found nothing similar to this. I simplified my example like below: #I need calculate correlation (for example) between 2 columns classified by a third one at a data.frame, like below: #number of rows nr = 10 #the third column is to enforce that I need correlation on two variables only dataf =

search for: dapli