Skip to content

What is codeanalyzer-python?

codeanalyzer-python is a static-analysis tool for Python source code. You point it at a project directory and it produces one typed artifact — a PyApplication — that captures the project’s symbol table (modules, classes, callables, fields), its call graph (who-calls-whom), and its framework entrypoints (the routes, tasks, and commands a framework dispatches into). You stop grepping source by hand and start querying a structured model of the program.

It is the Python backend behind CLDK, the multilingual analysis SDK — the same role codeanalyzer plays for Java. You can use it through CLDK’s typed facade, or directly: as a CLI that writes analysis.json, or as a Python library that hands you PyApplication objects.

Every run follows the same shape: point at a project, build the artifact, consume the typed model.

  1. Point at a project. codeanalyzer --input ./my-project. The tool discovers every .py file (test files excluded by default), and creates an isolated virtual environment so dependencies resolve.

  2. It builds a PyApplication. Jedi and Tree-sitter extract the symbol table; a call graph is derived from it; optional CodeQL resolution and a pluggable pass pipeline enrich it with extra edges and entrypoints.

  3. Consume the typed model. Get analysis.json (or msgpack) on disk, or the in-memory PyApplication. Everything is a Pydantic model: symbol_table, call_graph, entrypoints.

flowchart LR
    A["codeanalyzer --input"] --> B[Symbol table<br/>Jedi + Tree-sitter]
    B --> C[Call graph<br/>Jedi edges]
    B -.->|--codeql| D[CodeQL edges]
    C --> E[Analysis passes<br/>entrypoints + synthetic edges]
    D -.-> E
    E --> F["PyApplication
analysis.json / msgpack"]

The artifact is a single PyApplication with three top-level pieces:

FieldTypeWhat it holds
symbol_tableDict[str, PyModule]One PyModule per source file — its imports, classes, functions, and module-level variables.
call_graphList[PyCallEdge]Identity-keyed source -> target edges (by PyCallable.signature) with a weight and provenance.
entrypointsDict[str, List[PyEntrypoint]]Framework-dispatched roots, keyed by framework name.
Terminal window
# Write analysis.json to ./out
codeanalyzer --input ./my-project --output ./out
# Or stream JSON to stdout (no --output)
codeanalyzer --input ./my-project | jq '.entrypoints'

A code LLM asked “what calls this function?” without analysis crawls: file read after file read, grep after grep, burning tokens on an answer it still can’t be sure of. codeanalyzer-python resolves that once, statically, into a graph — so the answer is a lookup, not a guess. Jedi gives you that for free on every run; CodeQL deepens it when dynamic dispatch and third-party calls matter; the pass pipeline surfaces the framework roots that make reachability questions meaningful.