Prepare arrays on the device that hold the pointers to each individual matrix in the group (e.g., an array of pointers to all matrices).
Enter – a game changer for batched, variable-sized matmul operations.
October 26, 2023 Subject: Documentation and Usage of Grouped GEMM in NVIDIA cuBLASLt
Standard GEMM performs a single matrix multiplication ($C = \alpha AB + \beta C$). Grouped GEMM extends this by processing a list of independent GEMM problems ($A_i, B_i, C_i$) in parallel.
Prepare arrays on the device that hold the pointers to each individual matrix in the group (e.g., an array of pointers to all matrices).
Enter – a game changer for batched, variable-sized matmul operations.
October 26, 2023 Subject: Documentation and Usage of Grouped GEMM in NVIDIA cuBLASLt
Standard GEMM performs a single matrix multiplication ($C = \alpha AB + \beta C$). Grouped GEMM extends this by processing a list of independent GEMM problems ($A_i, B_i, C_i$) in parallel.