thr3ads.net - R help - [R] How to represent tree-structured values [May 2022]

If this information is useful, please help other people find it:
Share via:

Richard O'Keefe

2022-May-30 04:54 UTC

[R] How to represent tree-structured values

There is a kind of data I run into fairly often
which I have never known how to represent in R,
and nothing I've tried really satisfies me.

Consider for example
 ...
 - injuries
   ...
   - injuries to limbs
     ...
     - injuries to extremities
       ...
       - injuries to hands
         - injuries to dominant hand
         - injuries to non-dominant hand
       ...
     ...
   ...

This isn't ordinal data, because there is no
"left to right" order on the values.  But there
IS a "part/whole" order, which an analysis should
respect, so it's not pure nominal data either.

As one particular example, if I want to
tabulate data like this, an occurrence of one
value should be counted as an occurrence of
*every* superordinate value.

Examples of such data include "why is this patient
being treated", "what drug is this patient being
treated with", "what geographic region is this
school from", "what biological group does this
insect belong to".

So what is the recommended way to represent
and the recommended way to analyse such data in R?

	[[alternative HTML version deleted]]

Jeff Newmiller

2022-May-30 05:23 UTC

head link

[R] How to represent tree-structured values

Really this depends on the analysis you want to perform.

In the past, I have used a super/sub two-column format as a compact,
non-redundant representation for data entry, and after applying a recursive
algorithm to convert this to a super/sub/level/id table where _all_ sub
components have (duplicative) entries corresponding to each super component.

But there is always the recursive list structure that formats such as yaml and
json functions typically return.

On May 29, 2022 9:54:44 PM PDT, Richard O'Keefe <raoknz at gmail.com>
wrote:>There is a kind of data I run into fairly often
>which I have never known how to represent in R,
>and nothing I've tried really satisfies me.
>
>Consider for example
> ...
> - injuries
>   ...
>   - injuries to limbs
>     ...
>     - injuries to extremities
>       ...
>       - injuries to hands
>         - injuries to dominant hand
>         - injuries to non-dominant hand
>       ...
>     ...
>   ...
>
>This isn't ordinal data, because there is no
>"left to right" order on the values.  But there
>IS a "part/whole" order, which an analysis should
>respect, so it's not pure nominal data either.
>
>As one particular example, if I want to
>tabulate data like this, an occurrence of one
>value should be counted as an occurrence of
>*every* superordinate value.
>
>Examples of such data include "why is this patient
>being treated", "what drug is this patient being
>treated with", "what geographic region is this
>school from", "what biological group does this
>insect belong to".
>
>So what is the recommended way to represent
>and the recommended way to analyse such data in R?
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Jan van der Laan

2022-May-30 06:07 UTC

head link

[R] How to represent tree-structured values

The most common way of handling this, from what I have seen, and what I 
am also using myself, is to have multiple columns; one for each level of 
the tree. So in your example you would have (at least) five columns. A 
records with an injury to their dominant hand would have the following 
values in these columns:

level0: injuries
level1: injuries to limbs
level2: injuries to extremities
level3: injuries to hands
level4: injuries to dominant hand

That way you can easily aggregate to each of the levels. This solution 
becomes slightly more difficult if not each branch of the tree has the 
same depth. You can use NA's to fill non-existing levels.

Often these types of variables come with codes that have a nesting 
structure. For example, in the regional codes we use, the first 4 digits 
are the municipality, the next two the city district and the next two 
the neighbourhood. So neighbourhoods have 6 digit codes. If I want to 
aggregate to municipality I can take the first 4 digits (substr). That 
way I only need one column and can calculate the others when needed. 
However, aggregating to municipality is so common that I often end up 
with a separate column anyway.

In a hackaton at the uros-conference a while back, we did try to make a 
new type for this (https://github.com/uRosConf/categorical) but this 
never got completed enough to push to CRAN.

HTH,
Jan

On 30-05-2022 06:54, Richard O'Keefe wrote:> There is a kind of data I run into fairly often
> which I have never known how to represent in R,
> and nothing I've tried really satisfies me.
> 
> Consider for example
>   ...
>   - injuries
>     ...
>     - injuries to limbs
>       ...
>       - injuries to extremities
>         ...
>         - injuries to hands
>           - injuries to dominant hand
>           - injuries to non-dominant hand
>         ...
>       ...
>     ...
> 
> This isn't ordinal data, because there is no
> "left to right" order on the values.  But there
> IS a "part/whole" order, which an analysis should
> respect, so it's not pure nominal data either.
> 
> As one particular example, if I want to
> tabulate data like this, an occurrence of one
> value should be counted as an occurrence of
> *every* superordinate value.
> 
> Examples of such data include "why is this patient
> being treated", "what drug is this patient being
> treated with", "what geographic region is this
> school from", "what biological group does this
> insect belong to".
> 
> So what is the recommended way to represent
> and the recommended way to analyse such data in R?
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

2022-May-30 06:37 UTC

head link

[R] How to represent tree-structured values

Hi Richard,
Some years ago I had a try at illustrating Multiple Causes of Death
(MCoD) data. I settled on what is sometimes called a "sizetree". You
can see some examples in the sizetree function help page in "plotrix".
Unfortunately I can't use the original data as it was confidential.

Jim

On Mon, May 30, 2022 at 2:55 PM Richard O'Keefe <raoknz at gmail.com>
wrote:>
> There is a kind of data I run into fairly often
> which I have never known how to represent in R,
> and nothing I've tried really satisfies me.
>
> Consider for example
>  ...
>  - injuries
>    ...
>    - injuries to limbs
>      ...
>      - injuries to extremities
>        ...
>        - injuries to hands
>          - injuries to dominant hand
>          - injuries to non-dominant hand
>        ...
>      ...
>    ...
>
> This isn't ordinal data, because there is no
> "left to right" order on the values.  But there
> IS a "part/whole" order, which an analysis should
> respect, so it's not pure nominal data either.
>
> As one particular example, if I want to
> tabulate data like this, an occurrence of one
> value should be counted as an occurrence of
> *every* superordinate value.
>
> Examples of such data include "why is this patient
> being treated", "what drug is this patient being
> treated with", "what geographic region is this
> school from", "what biological group does this
> insect belong to".
>
> So what is the recommended way to represent
> and the recommended way to analyse such data in R?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - May 2022 - How to represent tree-structured values

[R] How to represent tree-structured values

[R] How to represent tree-structured values

[R] How to represent tree-structured values

[R] How to represent tree-structured values