CodeQL analysis
By default the call graph comes entirely from Jedi’s lexical analysis. That’s fast and needs no external tooling, but lexical resolution can’t see every edge — calls through dynamic dispatch, RPC, and some third-party boundaries slip past it. Passing --codeql adds a second engine that resolves those, then merges its edges with Jedi’s.
What it adds
Section titled “What it adds”With --codeql, codeanalyzer does two extra things:
- Resolves additional edges — including RPC, third-party, and dynamically-dispatched targets — tagged
provenance=["codeql"], and merges them with the Jedi-derived edges. An edge both engines see carries both provenance tokens. - Backfills call sites — where Jedi left a
PyCallsite.callee_signatureunresolved, CodeQL fills it in. The single CodeQL query is shared (cached on the analysis instance), so this costs no extra database work.
flowchart LR
ST[symbol table] --> J[Jedi edges]
ST --> Q["CodeQL query
(direct + constructor calls)"]
Q --> B[backfill unresolved call sites]
Q --> CE[CodeQL edges]
J --> M[merge_edges]
CE --> M
M --> CG[call graph]
How it runs
Section titled “How it runs”The first time you enable CodeQL on a project, codeanalyzer sets up everything it needs under the cache directory:
- CLI binary. It looks for a binary in
<cache-dir>/codeql/bin/, then forcodeqlon yourPATH, and otherwise downloads the CLI into<cache-dir>/codeql/bin/. The project-local copy is preferred overPATHso the version it installed stays deterministic. - Query library pack. The CLI install ships only the language extractors, so codeanalyzer materializes a small
qlpack.ymldepending oncodeql/python-alland runscodeql pack installonce — colocating the temporary query inside that pack soimport pythonresolves cleanly. - Database. It builds a CodeQL database for the project under
<cache-dir>/codeql/<project>-db.
Database caching
Section titled “Database caching”The CodeQL database is keyed by a checksum over all .py files in the project. On a later run, codeanalyzer reuses the cached database when the checksum still matches and the db-python directory exists; otherwise it rebuilds. --eager forces a rebuild regardless.
The resolution ladder
Section titled “The resolution ladder”CodeQL and Jedi describe the same definitions slightly differently, so CodeQL endpoints have to be mapped back into Jedi’s PyCallable.signature space. codeanalyzer uses a resolution ladder rather than a brittle exact match:
- Exact
(file, start_line)match. - Same
(file, short_name)— if there’s a single candidate, take it; otherwise pick the neareststart_lineamong those whose parameter count matches CodeQL’s positional arity. - No match — the caller is skipped, or the callee becomes a ghost node (as it would have been without CodeQL).
This matters because CodeQL and Jedi often disagree on a definition’s start line — commonly for decorated functions, where an exact-only join would silently drop the edge. The CodeQL query emits each endpoint’s function name and positional arity to drive the tiebreak. (Jedi’s parameter count includes *args/**kwargs/keyword-only slots while CodeQL’s arity is positional only, so the arity filter is exact for plain signatures and yields to the nearest-line tiebreak otherwise.)
Graceful degradation
Section titled “Graceful degradation”If CodeQL extraction fails for any reason, codeanalyzer logs a warning and falls back to the Jedi-only call graph — the run still completes and still produces a valid artifact. CodeQL deepens the graph; it never gates it.