Hi,
In addition to Rainer's suggestion (which are to give an small example
of what your input data look like and an example of what you want to
output), given the size of your input data, you might want to try to
use the data.table package instead of plyr::ddply -- especially while
you are exploring different combinations/calculations over your data.
Usually, the equivalent data.table approach (to the ddply one) tend to
be orders of magnitude faster and usually more memory efficient.
When the size of my data is small, I often use both (I think the
plyr/ddply "language" is rather beautiful), but when my data gets into
the 1000++ rows, I'll universally switch to data.table.
HTH,
-steve
On Sat, Aug 17, 2013 at 4:33 PM, Dylan Doyle <ddoyle.dub at gmail.com>
wrote:>
> Hello R users,
>
>
> I have recently begun a project to analyze a large data set of
approximately 1.5 million rows it also has 9 columns. My objective consists of
locating particular subsets within this data ie. take all rows with the same
column 9 and perform a function on that subset. It was suggested to me that i
use the ddply() function from the Pylr package. Any advice would be greatly
appreciated
>
>
> Thanks much,
>
> Dylan
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech