vllm.model_executor.layers.quantization.qutlass_utils ¶
ceil_div ¶
to_blocked ¶
Rearrange a large matrix by breaking it into blocks and applying the rearrangement pattern.
See
https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_matrix | Tensor | Input tensor of shape (H, W) | required |
backend | Literal['torch', 'triton'] | "torch" (PyTorch path) or "triton" (Triton kernel) | 'triton' |
Returns:
Type | Description |
---|---|
Tensor | Rearranged tensor of shape (32ceil_div(H,128), 16ceil_div(W,4)) |
Source code in vllm/model_executor/layers/quantization/qutlass_utils.py
triton_mx_block_rearrange ¶
Rearranges an E8M0 tensor scale from row-major format to block-scaled swizzle format.
This format is suitable for Tmem as described in NVIDIA documentation: https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale_tensor | Tensor | Input tensor in row-major format with 8-bit elements | required |
Returns:
Type | Description |
---|---|
Tensor | Rearranged tensor in block-scaled swizzle format |
Source code in vllm/model_executor/layers/quantization/qutlass_utils.py
triton_scale_swizzle ¶
triton_scale_swizzle(
scale_ptr: Tensor,
scale_rows: int,
scale_cols: int,
output_ptr: Tensor,
input_row_stride: int,
output_block_stride: int,
BLOCK_ROWS: constexpr,
BLOCK_COLS: constexpr,
)
Rearranges tensor data from row-major to block-scaled swizzle format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale_ptr | Tensor | Pointer to the input scale tensor | required |
scale_rows | int | Number of rows in the scale tensor | required |
scale_cols | int | Number of columns in the scale tensor | required |
output_ptr | Tensor | Pointer to the output tensor | required |
input_row_stride | int | Stride between rows in the input tensor | required |
output_block_stride | int | Stride between blocks in the output tensor | required |
BLOCK_ROWS | constexpr | Number of rows in a tile (compile-time constant) | required |
BLOCK_COLS | constexpr | Number of columns in a tile (compile-time constant) | required |