Hi,
In response to Chris's suggestion to construct a more precise callgraph
builder.
I express my personal thinking here on possible refinement. Any advice is
welcome.
1. Basic data structure changes. Current callgraph builder maintains two null
nodes, expressing the meaning of "anything calls !internal linkage
functions" and "functions not defined in this unit call anything"
respectively. However, although we know some functions call "anything"
and
others called by "anything", they just don't have explicit
callgraph edges
between them.
So I just want to:
Merge these two null nodes to one representing "external unknown
functions". It has edges to everything *might* be visible outside this
unit,
and has edges to itself from everything might call external functions having
not appeared in this unit. External functions already declared in this unit
have their own callnodes and should have explicit edges to/from others. It
gets its improvement if we have some "pre-knowledge" of external
function
already declared in this unit. For example, we are sure that standard libc
function "strcmp" never calls *anything*(no matter what linkage it
might be)
in our own unit.
2. Improvements regarding indirect calls and functions whose address being
taken. Existing callgraph builder considers them rather conservatively.
I think the possible refinements are:
a). Further classify the call edges to: MUST, ALIAS, MAY edges.
MUST: for edges found though direct callsites or if we can prove that a
function pointer called by some indirect call must alias some function
(Actually I think this should be promoted to be direct callsite by
some optimization). We have strong evidence that if A --> B, there must be
some(if not all) run traces of the program that do make A call B.
ALIAS: for edges comes from indirect calls that may alias a set of
functions. Furthur information could be maintained for which alias set a
ALIAS edge belongs to. It means we have strong evidence that there must be
some run traces of the program that make a caller call at least one of the
functions belonging to an alias set, but we are not sure about which subset
of it must be get called due to the convertive nature of any alias analysis.
May: If there is a MAY edge from A to B, it only means we do not have
strong evidence that A cannot call B. For example, if B's address is passed
as a parameter to external A.
Besides these three calling behaviors, if node A does not have an edge
to B, We *MUST* have strong evidence that A cannot call B. Otherwise at least
a MAY edge must exist from A to B.
b). Get more strong evidence about indirect calls and for functions whose
address is being taken more than direct calls.
Indirect call information could be constructed either from partial result
of DSA as suggested by Andrew. And alternatively, it can use the alias
information constructed by any AA. I think both interface may be
provided to let clients chose between the generality and the tight
integration with wonderful DSA.
As for the case of function address being taken. I think further analysis
could help to find the evidence if the address is only propagated to be
used within this unit and thus should not be visable to external node.
c) Light weight alias analysis for only function pointers may be good for
clients whose interest is only function pointers alias or the construction of
callgraph but not general point-to analysis. Because just as Milanova, et al
pointed out in "Precise and Efficient Call Graph Construction for C
programs
with function pointers", "even inexpensive pointer analyses may
provide
precision comparable to the precision of expensive pointer anlayses, and
therefore the use of more expensive analyses may be unnecessary." It could
improve the performance and space used.
--
Regards,
Nai