thr3ads.net - R help - [R] Beyond reshape: automatically streamlining data [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Marshall Feldman

2010-Apr-09 12:59 UTC

[R] Beyond reshape: automatically streamlining data

Hello,

I've been very impressed by the reshape package and how easy it makes 
reorganizing statistical data structures. This makes me wonder if 
there's another package out there that addresses another set of tasks 
that one often does when preparing data for analysis.

For any particular set of analyses, one typically recodes variables and 
deletes cases and variables. It would be really nice to have a package 
that, for example, if one selected cases from a larger data set based on 
the values of certain variables would inspect the resulting data and 
drop any variables that have the same value for all cases. Similarly, if 
any cases are entirely zero or NA, the package could (under user 
control) drop these cases. Finally, it could take a set of data 
transformations and keep them as an object, so that the same 
selection/reshape/streamlining can easily be applied to similar data sets.

My motivation for this came from working with employment data this 
morning. I started out with 11 variables and 35569 cases for Rhode 
Island, a few selections later I had only 420 cases and 3 variables. It 
struck me that the process I went through, which included not only 
making selections but also inspecting the results and deleting 
unnecessary cases/variables, could be automated at least to eliminate 
the inspection step. Also, since I want to do the same thing with data 
for other states, automation would be very nice indeed.

I realize that programming this kind of stuff in R is relatively easy, 
but the reshape package makes me wonder if someone has already done it.

Thanks
     Marsh Feldman

Steve Lianoglou

2010-Apr-09 15:20 UTC

head link

[R] Beyond reshape: automatically streamlining data

Hi Marshall,

On Fri, Apr 9, 2010 at 8:59 AM, Marshall Feldman <marsh at uri.edu>
wrote:> ...
> For any particular set of analyses, one typically recodes variables and
> deletes cases and variables. It would be really nice to have a package
that,
> for example, if one selected cases from a larger data set based on the
> values of certain variables would inspect the resulting data and drop any
> variables that have the same value for all cases. Similarly, if any cases
> are entirely zero or NA, the package could (under user control) drop these
> cases. Finally, it could take a set of data transformations and keep them
as
> an object, so that the same selection/reshape/streamlining can easily be
> applied to similar data sets.
> ...
Some of the utilities in the caret package might be related to the
things your after:
http://cran.r-project.org/package=caret

There is a writeup about using caret to build predictive models in R
in the Journal of Statistical Software (it's a PDF):
http://www.jstatsoft.org/v28/i05/paper

I'd recommend reading through that if you haven't before, since caret
offers many handy wrapper/utility functions, but check out section 3:
Data Preparation, in particular, where Max talks about
zero-variance-predictors and the multicollinearity problem.

Hope that helps.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Apr 2010 - Beyond reshape: automatically streamlining data

[R] Beyond reshape: automatically streamlining data

[R] Beyond reshape: automatically streamlining data

Possibly Parallel Threads