vllm.compilation.codegen ¶
Code generation for split_gm stitching graph execution.
Generates a plain Python function that replaces the FX GraphModule's interpreter-based execution of the stitching graph, eliminating nn.Module.call overhead and getattr dispatch.
_node_ref ¶
Convert an FX node argument to a source code reference.
Source code in vllm/compilation/codegen.py
compile_execution_fn ¶
compile_execution_fn(
code: str,
submod_callables: dict[str, Callable[..., Any]],
submod_names: list[str],
consts: list[Any] | None = None,
) -> Callable[..., Any]
Compile execution code and bind submodule callables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code | str | Python source from generate_execution_code(). | required |
submod_callables | dict[str, Callable[..., Any]] | Mapping of submodule names to their callables. | required |
submod_names | list[str] | Ordered list of submodule names matching the indices used in the generated code. | required |
consts | list[Any] | None | List of non-primitive constant objects referenced by the generated code via vllm_consts. None for legacy cached code that predates this feature. | None |
Returns:
| Type | Description |
|---|---|
Callable[..., Any] | A callable that executes the stitching logic. |
Source code in vllm/compilation/codegen.py
generate_execution_code ¶
Generate Python source code from a split_gm's stitching graph.
Walks split_gm.graph.nodes and produces a function that calls submodules via a vllm_submods list, avoiding FX GraphModule overhead and dict lookup cost.
Non-primitive constant arguments (e.g. torch.device, DTensor placement types) are collected into a constants list and referenced by index in the generated code, avoiding reliance on repr() being eval-able.
If a submodule is a plain torch.fx.GraphModule, it is inlined directly in the generated code and we do not need to serialize it in the artifact.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split_gm | GraphModule | The split graph module produced by split_graph(). | required |
Returns:
| Type | Description |
|---|---|
str | A tuple of (code, submod_names, consts) where code is the Python |
list[str] | source, submod_names is the ordered list of submodule target names |
list[Any] | corresponding to list indices used in the generated code, and |
tuple[str, list[str], list[Any]] | consts is a list of non-primitive constant objects referenced |
tuple[str, list[str], list[Any]] | by the generated code via vllm_consts. These objects are |
tuple[str, list[str], list[Any]] | kept alive for the lifetime of the compiled function. |