vllm.compilation.partition_rules ¶
inductor_partition_rule_context ¶
inductor_partition_rule_context(
overloads: list[OpOverload],
)
Context manager to temporarily register Inductor partition rules.
Registers custom partition rules for specified operators, forcing the Inductor scheduler to partition the graph at these operators. The rules are automatically restored to their previous state on exit.
Note: Callers should use resolve_defined_ops() to convert operator names to OpOverload objects before calling this function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
overloads | list[OpOverload] | List of resolved operator overload objects. | required |
Source code in vllm/compilation/partition_rules.py
resolve_defined_ops ¶
Resolve operator names to OpOverload objects.
Skips operators that fail to resolve (e.g., operators not registered or model-specific operators not present in the current model).
Note: Users should inspect the operator graph before lowering and ensure the specified operators are present in the final graph. Built-in PyTorch operators (aten::, torch::) may be decomposed, fused, or transformed during Inductor's compilation passes, so use them with caution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op_names | list[str] | List of operator names in PyTorch format (e.g., "vllm::unified_attention") | required |
Returns:
Type | Description |
---|---|
list[OpOverload] | List of successfully resolved operator overloads |