What is codeanalyzer-python?
codeanalyzer-python is a static-analysis tool for Python source code. You point it at a project directory and it produces one typed artifact — a PyApplication — that captures the project’s symbol table (modules, classes, callables, fields), its call graph (who-calls-whom), and its framework entrypoints (the routes, tasks, and commands a framework dispatches into). You stop grepping source by hand and start querying a structured model of the program.
It is the Python backend behind CLDK, the multilingual analysis SDK — the same role codeanalyzer plays for Java. You can use it through CLDK’s typed facade, or directly: as a CLI that writes analysis.json, or as a Python library that hands you PyApplication objects.
The mental model
Section titled “The mental model”Every run follows the same shape: point at a project, build the artifact, consume the typed model.
-
Point at a project.
codeanalyzer --input ./my-project. The tool discovers every.pyfile (test files excluded by default), and creates an isolated virtual environment so dependencies resolve. -
It builds a
PyApplication. Jedi and Tree-sitter extract the symbol table; a call graph is derived from it; optional CodeQL resolution and a pluggable pass pipeline enrich it with extra edges and entrypoints. -
Consume the typed model. Get
analysis.json(or msgpack) on disk, or the in-memoryPyApplication. Everything is a Pydantic model:symbol_table,call_graph,entrypoints.
flowchart LR
A["codeanalyzer --input"] --> B[Symbol table<br/>Jedi + Tree-sitter]
B --> C[Call graph<br/>Jedi edges]
B -.->|--codeql| D[CodeQL edges]
C --> E[Analysis passes<br/>entrypoints + synthetic edges]
D -.-> E
E --> F["PyApplication
analysis.json / msgpack"]
What you get back
Section titled “What you get back”The artifact is a single PyApplication with three top-level pieces:
| Field | Type | What it holds |
|---|---|---|
symbol_table | Dict[str, PyModule] | One PyModule per source file — its imports, classes, functions, and module-level variables. |
call_graph | List[PyCallEdge] | Identity-keyed source -> target edges (by PyCallable.signature) with a weight and provenance. |
entrypoints | Dict[str, List[PyEntrypoint]] | Framework-dispatched roots, keyed by framework name. |
Two ways to use it
Section titled “Two ways to use it”# Write analysis.json to ./outcodeanalyzer --input ./my-project --output ./out
# Or stream JSON to stdout (no --output)codeanalyzer --input ./my-project | jq '.entrypoints'from pathlib import Pathfrom codeanalyzer.core import Codeanalyzerfrom codeanalyzer.options import AnalysisOptions
options = AnalysisOptions(input=Path("./my-project"))with Codeanalyzer(options) as analyzer: app = analyzer.analyze() # -> PyApplication
print(len(app.symbol_table), "modules")print(len(app.call_graph), "edges")from cldk import CLDKfrom cldk.analysis import AnalysisLevel
analysis = CLDK(language="python").analysis( project_path="my-project", analysis_level=AnalysisLevel.call_graph,)print(analysis.get_call_graph()) # -> networkx.DiGraphWhy a dedicated tool
Section titled “Why a dedicated tool”A code LLM asked “what calls this function?” without analysis crawls: file read after file read, grep after grep, burning tokens on an answer it still can’t be sure of. codeanalyzer-python resolves that once, statically, into a graph — so the answer is a lookup, not a guess. Jedi gives you that for free on every run; CodeQL deepens it when dynamic dispatch and third-party calls matter; the pass pipeline surfaces the framework roots that make reachability questions meaningful.