GPU Reduction

Writing a reduction algorithm for CUDA GPU can be tricky. NumbaPro provides a @reduce decorator for converting simple binary operation into a reduction kernel.

@reduce

Example:

import numpy
from numbapro import cuda

@cuda.reduce
def sum_reduce(a, b):
    return a + b

A = (numpy.arange(1234, dtype=numpy.float64)) + 1
expect = A.sum()      # numpy sum reduction
got = sum_reduce(A)   # cuda sum reduction
assert expect == got

User can also use a lambda function:

sum_reduce = cuda.reduce(lambda a, b: a + b)

The decorated function must not use CUDA specific features because it is also used for host-side execution for the final round of reduction.

class Reduce

The reduce decorator creates an instance of the Reduce class. (Currently, reduce is an alias to Reduce, but this behavior is not guaranteed.)