Hi Toby,
Thanks for your reply! I haven?t heard about the R project sprint, but I?ll
definitely check it out. UK is going to be a little hard for me to get to
funding-wise, but I?ll try to apply for funding.
I appreciate your other comments. As far as coding style, I did do everything I
could think of to make sure it?s a drop-in replacement for the current version
with the default settings, so all the user-exposed arguments/variables should be
identical. I used the conventions in
https://github.com/wch/r-source/wiki/Contributing for commenting and whitespace,
so hopefully that all looks okay. I?m realizing there may be some differences in
tab widths, but I can fix that later today.
-Aidan
-----------------------
Aidan Lakshman (he/him)<https://www.ahl27.com/>
Doctoral Candidate, Wright Lab<https://www.wrightlabscience.com/>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ahl27 at pitt.edu
(724) 612-9940
From: Toby Hocking <tdhock5 at gmail.com>
Date: Friday, February 24, 2023 at 06:57
To: Lakshman, Aidan H <AHL27 at pitt.edu>
Cc: R-devel at r-project.org <R-devel at r-project.org>
Subject: Re: [Rd] `dendrapply` Enhancements
Hi Aidan, I think you are on the right email list.
I'm not R-core, but this looks like an interesting/meaningful/significant
contribution to base R.
I'm not sure what the original dendrapply looks like in terms of code style
(variable names/white space formatting/etc) but in my experience it is important
that your code contribution makes minimal changes in that area.
Did you hear about the R project sprint 2023?
https://contributor.r-project.org/r-project-sprint-2023/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontributor.r-project.org%2Fr-project-sprint-2023%2F&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oB%2BBivUsBjIgBtZNU8mh%2Fz2rujD3bv9MbWdxqNtyUFk%3D&reserved=0>
Your work falls into the "new developments" category so I think you
could apply for that funding to participate.
Toby
On Fri, Feb 24, 2023 at 3:47 AM Lakshman, Aidan H <AHL27 at
pitt.edu<mailto:AHL27 at pitt.edu>> wrote:
Hi everyone,
My apologies if this isn?t the right place to submit this?I?m new to the R-devel
community and still figuring out what is where.
If people want to skip my writeup and just look at the code, I?ve made a
repository for it here:
https://github.com/ahl27/new_dendrapply/tree/master<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fahl27%2Fnew_dendrapply%2Ftree%2Fmaster&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QI%2B5t1C%2BJB15D8o8noZra4W87fgyITm12nluGN%2BFNoE%3D&reserved=0>.
I?m not quite sure how to integrate it into a fork of R-devel; the package
structure is different from what I?m used to.
I had written a slightly improved version of dendrapply for one of my research
projects, and my advisor encouraged me to submit it to the R project. It took me
longer than I expected, but I?ve finally gotten my implementation to be a
drop-in replacement for `stats::dendrapply`. The man page for
`stats::dendrapply` says ?The implementation is somewhat experimental and
suggestions for enhancements (or nice examples of usage) are very welcome,? so I
figured this had the potential to be a worthwhile contribution. I wanted to send
it out to R-devel to see if this was something worth pursuing as an enhancement
to R.
The implementation I have is based in C, which I understand implies an increased
burden of maintenance over pure R code. However, it does come with the following
benefits:
- Completely eliminates recursion, so no memory overhead from function calls or
possibility of stack overflows (this was a major issue reported on some of the
functions in one of our Bioconductor packages that previously used
`dendrapply`).
- Modest runtime improvement, around 2x on my computer (2021 MBP, 32GB RAM). I?m
relatively confident this could be optimized more.
- Seemingly significant reduction in memory reduction, still working on a robust
benchmark. Suggestions for the best way to do that are welcome.
- Support for applying functions with an inorder traversal (as in
`stats::dendrapply`) as well as using a postorder traversal.
This implementation was tested manually as well as running all the unit tests in
`dendextend`, which comprises a lot of applications of `dendrapply`.
The postorder traversal would be a significant new functionality to dendrapply,
as it would allow for functions that use the child nodes to correctly execute. A
toy example of this is something like:
```
exFunc <- function(x){
attr(x, 'newA') <- 'a'
if(is.null(attr(x, 'leaf'))){
cat(attr(x[[1]], 'newA'), attr(x[[2]], 'newA'))
cat('\n')
}
x
})
dendrapply(dend, exFunc)
```
With the current version of dendrapply, this prints nothing, but the postorder
traversal version will print ?a? twice for each internal branch. If this would
be a worthwhile addition, I can refactor the code for brevity and add a
`how=c("in.order", "post.order")`, with the default value
?in.order? to maintain backwards compatibility. A preorder traversal version
should also be possible, I just haven?t gotten to it yet.
I think the runtime could be optimized more as well.
Thank you in advance for looking at my code and offering feedback; I?m excited
at the possibility of helping contribute to the R project! I?m happy to discuss
more either here, on GitHub, or on the R Contributors Slack.
Sincerely,
Aidan Lakshman
-----------------------
Aidan Lakshman
(he/him)<https://www.ahl27.com/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ahl27.com%2F&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7KUpJpdulSIzSXbpDJlyUV8pMJm%2BSVFvDOJTlVs9lhc%3D&reserved=0>>
Doctoral Candidate, Wright
Lab<https://www.wrightlabscience.com/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wrightlabscience.com%2F&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JMgw%2BMiQ6xdp3OokToJ2nyyco%2BryiFH%2B9ap5iU3yJH8%3D&reserved=0>>
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ahl27 at pitt.edu<mailto:ahl27 at pitt.edu>
(724) 612-9940
[[alternative HTML version deleted]]
______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C01%7CAHL27%40pitt.edu%7C41b702f139034cd7165608db165e4530%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638128366414606277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I88%2FQhGHXDRS2yHqvh53k3MSWHSd5z2KBgORUHIxfG0%3D&reserved=0>
[[alternative HTML version deleted]]