Writing a reduction algorithm for CUDA GPU can be tricky.
NumbaPro provides a
@reduce decorator for converting simple binary operation into a reduction kernel.
import numpy from numbapro import cuda @cuda.reduce def sum_reduce(a, b): return a + b A = (numpy.arange(1234, dtype=numpy.float64)) + 1 expect = A.sum() # numpy sum reduction got = sum_reduce(A) # cuda sum reduction assert expect == got
User can also use a lambda function:
sum_reduce = cuda.reduce(lambda a, b: a + b)
The decorated function must not use CUDA specific features because it is also used for host-side execution for the final round of reduction.
reduce decorator creates an instance of the
Reduce class. (Currently,
reduce is an alias to
Reduce, but this behavior is not guaranteed.)