I went back and looked at the current survey, and I have a lot of thoughts I
wanted to share. I apologize for this being a dense response to a days-dormant
thread.
What information do we want to get out of the survey? The current survey is
mostly just giving people people a way to vote for their preferred solution. I
think this is a huge missed opportunity. One of the things I've found most
frustrating about the Git-related threads is that there have been several
assertions following the form "most people <blah>". This is a
really great opportunity for us to actually get some real data to prove or
disprove these assertions. For some data points I pulled a few assertions out of
the mono-repo thread:
David Chisnall wrote:
"clang-tools-extra is explicitly a bunch of stuff that doesn’t belong in
the main clang repo because it’s not of interest to most people doing clang
work"
Paul Robinson wrote:
"I'm not clear why imposing this cost on everybody who wants
less-than-all (which I'd think would be most people)"
Justin Lebar wrote:
"If you use the workflow that we currently have, then on the client side,
there is no guarantee that your subprojects will be sync'ed. (This is the
same as most peoples' client-side git workflows today.)"
I wrote:
"I think we have some pretty strong evidence in the form of the github fork
counts (https://github.com/llvm-mirror/) that most people aren’t using all of
the LLVM projects."
In a very general sense I think the survey as written is little more than a vote
for which option people prefer, and an opportunity to rate how good or bad they
think the alternative is. As a result I think the current survey has a selection
bias that will exclude people who may not have clear or strong opinions on the
proposals. As I said in an earlier response I also think the reliance on text
fields will make the data harder to process and understand if we get a large
number of responses (and I really hope we get a lot of responses).
I think we should consider approaching this problem differently. Instead of
structuring a vote, we could focus on gathering data about users and workflows,
and using that real-world data to guide a decision that is best for the most
common use cases. Correlating information about people's workflow answers
against their relationship to the community will allow us to categorize and
weigh the results.
I've compiled a list of a few pieces of data I think we should gather. If we
took an approach like I'm proposing for the survey we would want more people
in the community to suggest additional things to gather information around.
My list is:
(1) Which projects people contribute to, and which ones they use (separately)
By combining the projects you use or contribute to into a single question
we're actually losing a lot of relevant information. I believe a lot of
people contribute to Clang, but only use libcxx. I believe this based on the
number of contributors to clang and libcxx over the last year (284 and 41
respectively). Mashing these into the same question loses information that I
think is relevant. In particular I believe it is common for clang contributors
to use projects that they don't contribute to, and we should try and
quantify that. If we don't want to have multiple questions for this, we
could infer the projects a person contributes to if we match the email address
in the survey against the email address on commits, which would also be an
acceptable route to this information.
(2) How many people build clang against an installed LLVM?
I know it does get used this way, but have no idea how common it is. We recently
had a series of changes because cc1_main.cpp was including llvm's Config.h
which isn't installed. I think this is a very uncommon use case, my evidence
for this is that the change breaking the standalone build was months old before
it was detected. Alternatively it might be a common use case that is only used
on the release branches (which would make some sense). Either way it would be
good to gather data around it. Knowing how end users and package maintainers are
using our existing source distributions is useful information when thinking
about infrastructure changes. This doesn't necessarily mean we shouldn't
do something that impacts them, but it allows us to make informed decisions.
(3) How many people use runtime projects without LLVM or Clang?
There have been several discussions lately about supporting runtimes without
LLVM sources, we might want to figure out how common that desire is. It also
might be nice to be able to correlating people who want that support with people
who contribute to the runtimes.
Data points:
C Bergström wrote on llvm-dev:
/* Side rant - I wish I didn't even need the llvm sources. I just want to
build libcxxrt */
Michał Górny filed:
Bug 18331 - [cmake] Please make compiler-rt's build system stand-alone
Bug 29109 - [cmake / compiler-rt] Please make tests runnable against installed
LLVM
(4) How many people are people getting LLVM sources today?
Over the course of the many discussions on moving to Git we still actually
don't know how many people are using Git already. Knowing how many people
are using Git, or Git-SVN when interacting with LLVM sources is a really simple
question that will tell us a lot about the impact of a move to Git on the wider
community. We also don't know whether people are getting sources from the
LLVM SVN repository, or the git mirrors, or the GitHub mirrors, or Takumi's
mono-repo. It would be really great to gather information about where people are
getting LLVM sources, and how they interact with them.
Structuring a survey to gather primarily information either in addition to or
instead of opinion we can augment any decision with data providing a
justification.
-Chris
> On Aug 17, 2016, at 2:23 PM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
> On 17 August 2016 at 22:18, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>> I think the survey should be regarding question based-off a single
document
>> putting side-by-side the options that we came-up with on the
mailing-list.
>> Indeed I don’t plan to write a document describing a “mono-repo”
proposal to
>> counter the submodules one, but I plan instead to unify it with the
existing
>> one (submodules…) along with the possible variants/options in a single
>> document.
>
> I agree this is probably the most sensible solution. Thanks for
> merging the options.
>
>
>> I plan to include examples of workflow today and after for each
scenario,
>> side-by-side. I hope to have it up for public review by the end of the
>> month.
>
> Excellent! I'll get the form rolling in parallel, and hopefully
we'll
> reach maturity around the same time.
>
>
>> I’d regret not having the results of the survey for the BoF as these
data
>> seem critical to drive the discussion.
>
> Agreed. Let's aim for that.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160824/d83cf4ea/attachment.html>