CUDA Sorting¶
Accelerate provides routines for sorting arrays on CUDA GPUs.
Sorting Large Arrays¶
The accelerate.cuda.sorting.RadixSort
class is recommended for
sorting large (approx. more than 1 million items) arrays of numeric types.

class
accelerate.cuda.sorting.
RadixSort
(maxcount, dtype, descending=False, stream=0)¶ Provides radix sort and radix select.
The algorithm implemented here is best for large arrays (
N > 1e6
) due to the latency introduced by its use of multiple kernel launches. It is recommended to usesegmented_sort
instead for batches of smaller arrays.Parameters:  maxcount (int) – Maximum number of items to sort
 dtype (numpy.dtype) – The element type to sort
 descending (bool) – Sort in descending order?
 stream – The CUDA stream to run the kernels in

argselect
(k, keys, begin_bit=0, end_bit=None)¶ Similar to
RadixSort.select
but returns the new sorted indices.Parameters:  keys (numpy.ndarray) – Keys to sort inplace
 begin_bit (int) – The first bit to sort
 end_bit (int) – Optional. The last bit to sort
Returns: The indices indicating the new ordering as an array on the CUDA device or on the host.

argsort
(keys, begin_bit=0, end_bit=None)¶ Similar to
RadixSort.sort
but returns the new sorted indices.Parameters:  keys (numpy.ndarray) – Keys to sort inplace
 begin_bit (int) – The first bit to sort
 end_bit (int) – Optional. The last bit to sort
Returns: The indices indicating the new ordering as an array on the CUDA device or on the host.

close
()¶ Explicitly release internal resources
Called automatically when the object is deleted.

init_arg
(size)¶ Initialize an empty CUDA ndarray of uint32 with ascending integers starting from zero
Parameters: size (int) – Number of elements for the output array Returns: An array with values [0, 1, 2, ...m size  1 ]

select
(k, keys, vals=None, begin_bit=0, end_bit=None)¶ Perform a inplace kselect on
keys
.Memory transfer is performed automatically.
Parameters:  keys (numpy.ndarray) – Keys to sort inplace
 vals (numpy.ndarray) – Optional. Additional values to be reordered along the sort.
It is modified in place. Only the
uint32
dtype is supported in this version.  begin_bit (int) – The first bit to sort
 end_bit (int) – Optional. The last bit to sort

sort
(keys, vals=None, begin_bit=0, end_bit=None)¶ Perform a inplace sort on
keys
. Memory transfer is performed automatically.Parameters:  keys (numpy.ndarray) – Keys to sort inplace
 vals (numpy.ndarray) – Optional. Additional values to be reordered along the sort.
It is modified in place. Only the
uint32
dtype is supported in this version.  begin_bit (int) – The first bit to sort
 end_bit (int) – Optional. The last bit to sort
Sorting Many Small Arrays¶
Using accelerate.cuda.sorting.RadixSort
on small (approx. less than
1 million items) arrays has significant overhead due to multiple kernel
launches. A better alternative is to use
accelerate.cuda.sorting.segmented_sort()
which launches a single kernel
for sorting a batch of many small arrays.

accelerate.cuda.sorting.
segmented_sort
(keys, vals, segments, stream=0)¶ Performs an inplace sort on small segments (N < 1e6).
Parameters:  keys (numpy.ndarray) – Keys to sort inplace.
 vals (numpy.ndarray) – Values to be reordered inplace along the sort. Only the
uint32
dtype is supported in this implementation.  segments (numpy.ndarray) – Segment separation location. e.g.
array([3, 6, 8])
for segments ofkeys[:3]
,keys[3:6]
,keys[6:8]
,keys[8:]
.  stream – Optional. A cuda stream in which the kernels are executed.