thr3ads.net - R help - [R] First time r user [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Dylan Doyle

2013-Aug-17 23:33 UTC

[R] First time r user

Hello R users,


I have recently begun a project to analyze a large data set of approximately 1.5
million rows it also has 9 columns. My objective consists of locating particular
subsets within this data ie. take all rows with the same column 9 and perform a
function on that subset. It was suggested to me that i use the ddply() function
from the Pylr package. Any advice would be greatly appreciated


Thanks much,

Dylan


	[[alternative HTML version deleted]]

Rainer Schuermann

2013-Aug-18 06:04 UTC

head link

[R] First time r user

It would be helpful if

- you give us some sample data:
   dput( head( myData ) )

- tell us what kind of function you want to apply, or
   how the result looks like that you want to achieve

- show us what you have done so far,
   and where you are stuck




On Saturday 17 August 2013 19:33:08 Dylan Doyle wrote:> 
> Hello R users,
> 
> 
> I have recently begun a project to analyze a large data set of
approximately 1.5 million rows it also has 9 columns. My objective consists of
locating particular subsets within this data ie. take all rows with the same
column 9 and perform a function on that subset. It was suggested to me that i
use the ddply() function from the Pylr package. Any advice would be greatly
appreciated
> 
> 
> Thanks much,
> 
> Dylan
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Steve Lianoglou

2013-Aug-18 06:38 UTC

head link

[R] First time r user

Hi,

In addition to Rainer's suggestion (which are to give an small example
of what your input data look like and an example of what you want to
output), given the size of your input data, you might want to try to
use the data.table package instead of plyr::ddply -- especially while
you are exploring different combinations/calculations over your data.

Usually, the equivalent data.table approach (to the ddply one) tend to
be orders of magnitude faster and usually more memory efficient.

When the size of my data is small, I often use both (I think the
plyr/ddply "language" is rather beautiful), but when my data gets into
the 1000++ rows, I'll universally switch to data.table.

HTH,
-steve

On Sat, Aug 17, 2013 at 4:33 PM, Dylan Doyle <ddoyle.dub at gmail.com>
wrote:>
> Hello R users,
>
>
> I have recently begun a project to analyze a large data set of
approximately 1.5 million rows it also has 9 columns. My objective consists of
locating particular subsets within this data ie. take all rows with the same
column 9 and perform a function on that subset. It was suggested to me that i
use the ddply() function from the Pylr package. Any advice would be greatly
appreciated
>
>
> Thanks much,
>
> Dylan
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech

R help - Aug 2013 - First time r user

[R] First time r user

[R] First time r user

[R] First time r user