thr3ads.net - R help - [R] Hierarchical data sets: which software to use? [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Anton du Toit

2010-Feb-01 04:24 UTC

[R] Hierarchical data sets: which software to use?

Dear R-helpers,

I’m writing for advice on whether I should use R or a different package or
language. I’ve looked through the R-help archives, some manuals, and some
other sites as well, and I haven’t done too well finding relevant info,
hence my question here.

I’m working with hierarchical data (in SPSS lingo). That is, for each case
(person) I read in three types of (medical) record:

1. demographic data: name, age, sex, address, etc

2. ‘admissions’ data: this generally repeats, so I will have 20 or so
variables relating to their first hospital admission, then the same 20 again
for their second admission, and so on

3. ‘collections’ data, about 100 variables containing the results of a
battery of standard tests. These are administered at intervals and so this
is repeating data as well.

The number of repetitions varies between cases, so in its one case per line
format the data is non-rectangular.

At present I have shoehorned all of this into SPSS, with each case on one
line. My test database has 2,500 variables and 1,500 cases (or persons), and
in SPSS’s *.SAV format is ~4MB. The one I finally work with will be larger
again, though likely within one order of magnitude. Down the track, funding
permitting, I hope to be working with tens of thousands of cases.

I am wondering if I should keep using SPSS, or try something else.

The types of analysis I’ll typically will have to do will involve comparing
measurements at different times, e.g. before/ after treatment. I’ll also
need to compare groups of people, e.g. treatment / no treatment. Regression
and factor analyses will doubtless come into it at some point too.

So:

1. should I use R or try something else?

2. can anyone advise me on using R with the type of data I’ve described?


Many thanks,

Anton du Toit

	[[alternative HTML version deleted]]

Juliet Hannah

2010-Feb-05 01:29 UTC

head link

[R] Hierarchical data sets: which software to use?

Check out the book

Linear Mixed Models: A Practical Guide Using Statistical Software  by
Brady West.

It sets up analyses, similar to ones you described, in SPSS, R, and
others as well.

In general, I think it is good to know a couple of different packages,
especially
if you plan on doing a lot of data analysis and data manipulation.



On Sun, Jan 31, 2010 at 11:24 PM, Anton du Toit <atdutoitrhelp at
gmail.com> wrote:> Dear R-helpers,
>
> I?m writing for advice on whether I should use R or a different package or
> language. I?ve looked through the R-help archives, some manuals, and some
> other sites as well, and I haven?t done too well finding relevant info,
> hence my question here.
>
> I?m working with hierarchical data (in SPSS lingo). That is, for each case
> (person) I read in three types of (medical) record:
>
> 1. demographic data: name, age, sex, address, etc
>
> 2. ?admissions? data: this generally repeats, so I will have 20 or so
> variables relating to their first hospital admission, then the same 20
again
> for their second admission, and so on
>
> 3. ?collections? data, about 100 variables containing the results of a
> battery of standard tests. These are administered at intervals and so this
> is repeating data as well.
>
> The number of repetitions varies between cases, so in its one case per line
> format the data is non-rectangular.
>
> At present I have shoehorned all of this into SPSS, with each case on one
> line. My test database has 2,500 variables and 1,500 cases (or persons),
and
> in SPSS?s *.SAV format is ~4MB. The one I finally work with will be larger
> again, though likely within one order of magnitude. Down the track, funding
> permitting, I hope to be working with tens of thousands of cases.
>
> I am wondering if I should keep using SPSS, or try something else.
>
> The types of analysis I?ll typically will have to do will involve comparing
> measurements at different times, e.g. before/ after treatment. I?ll also
> need to compare groups of people, e.g. treatment / no treatment. Regression
> and factor analyses will doubtless come into it at some point too.
>
> So:
>
> 1. should I use R or try something else?
>
> 2. can anyone advise me on using R with the type of data I?ve described?
>
>
> Many thanks,
>
> Anton du Toit
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Douglas Bates

2010-Feb-05 16:22 UTC

head link

[R] Hierarchical data sets: which software to use?

On Sun, Jan 31, 2010 at 10:24 PM, Anton du Toit <atdutoitrhelp at
gmail.com> wrote:> Dear R-helpers,
>
> I?m writing for advice on whether I should use R or a different package or
> language. I?ve looked through the R-help archives, some manuals, and some
> other sites as well, and I haven?t done too well finding relevant info,
> hence my question here.
>
> I?m working with hierarchical data (in SPSS lingo). That is, for each case
> (person) I read in three types of (medical) record:
>
> 1. demographic data: name, age, sex, address, etc
>
> 2. ?admissions? data: this generally repeats, so I will have 20 or so
> variables relating to their first hospital admission, then the same 20
again
> for their second admission, and so on
>
> 3. ?collections? data, about 100 variables containing the results of a
> battery of standard tests. These are administered at intervals and so this
> is repeating data as well.
>
> The number of repetitions varies between cases, so in its one case per line
> format the data is non-rectangular.
>
> At present I have shoehorned all of this into SPSS, with each case on one
> line. My test database has 2,500 variables and 1,500 cases (or persons),
and
> in SPSS?s *.SAV format is ~4MB. The one I finally work with will be larger
> again, though likely within one order of magnitude. Down the track, funding
> permitting, I hope to be working with tens of thousands of cases.
Although this may not be helpful for your immediate goal, storing and
manipulating data of this size and complexity (and, I expect, cost for
collection) really calls for tools like relational databases.  A
single flat file of 2500 variables by 1500 cases is almost never the
best way to organize such data.  A normalized representation as a
collection of interlinked tables in a relational data base is much
more effective and less error prone.  The widespread use of
spreadsheets or SPSS data sets or SAS data sets which encourage the
"single table with a gargantuan number of columns, most of which are
missing data in most cases" approach to organization of longitudinal
data is regrettable.

For later analysis in R it is better to start with "long" form of the
data, as opposed to the "wide" form, even if it means repeating
demographic information over several occasions.  Using a relational
database allows for a long view to be generated without the
possibility of inconsistency in the demographics.  I am using the
descriptions "long" and "wide" in the sense that they are
used in the
reshape help page.  See

?reshape

in R.  The long view is also called the subject/occasion view in the
sense that each row corresponds to one subject on one occasion.

Robert Gentleman's book "R Programming for Bioinformatics"
provides
background on linking R to relational databases.


As I said at the beginning, you may not want to undertake the
necessary study and effort to reorganize your data for this specific
project but if you do this a lot you may want to consider it.
> I am wondering if I should keep using SPSS, or try something else.
>
> The types of analysis I?ll typically will have to do will involve comparing
> measurements at different times, e.g. before/ after treatment. I?ll also
> need to compare groups of people, e.g. treatment / no treatment. Regression
> and factor analyses will doubtless come into it at some point too.
>
> So:
>
> 1. should I use R or try something else?
>
> 2. can anyone advise me on using R with the type of data I?ve described?
>
>
> Many thanks,
>
> Anton du Toit
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Possibly Parallel Threads

Search for more maybe matching threads

R help - Feb 2010 - Hierarchical data sets: which software to use?

[R] Hierarchical data sets: which software to use?

[R] Hierarchical data sets: which software to use?

[R] Hierarchical data sets: which software to use?

Possibly Parallel Threads