ArgMin and ArgMax reductions #574

neworderofjamie · 2023-03-02T14:20:02Z

@FabianSchubert and @tnowotny - our discussion got me thinking, these operations would be pretty simple to implement.

Batch reduction

Max reduction currently generates code something like (thread per synapse/neuron):

scalar lrMax = SCALAR_MIN;
for(unsigned int batch = 0; batch < 512; batch++) {
    const unsigned int batchOffset = size * batch;
    const scalar lValue = group->Value[batchOffset + lid];

    lrMax = fmax(lrMax, lValue);
}
group->Max[lid] = lrMax;

Argmax would simply look like:

unsigned int lrArgMax = 0;
scalar lrMax = SCALAR_MIN;
for(unsigned int batch = 0; batch < 512; batch++) {
    const unsigned int batchOffset = size * batch;
    const scalar lValue = group->Value[batchOffset + lid];
    
    if(lValue > lrMax) {
        lrMax = lValue;
        lrArgMax = batch;
    }
}
group->ArgMax[lid] = lrArgMax;

Neuron reductions

These are a little gnarlier but max currently looks like (32-thread warp per batch):

scalar lrMax = SCALAR_MIN;
for(unsigned int idx = lane; idx < group->size; idx += 32) {
    const scalar lVal = group->Val[batchOffset + idx];
    lrMax = fmax(lrMax, lVal);
}
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 16));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 8));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 4));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 2));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 1));
if(lane == 0) {
    group->Max[batch] = lrMax;
}

Argmax would look like (totally untested):

unsigned int lrLaneArgMax = 0;
scalar lrLaneMax = SCALAR_MIN;
for(unsigned int idx = lane; idx < group->size; idx += 32) {
    const scalar lVal = group->Val[batchOffset + idx];
    if(lValue > lrMax) {
        lrLaneMax = lValue;
        lrLaneArgMax = idx;
    }
}
scalar lrMax = lrLaneMax;
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 16));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 8));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 4));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 2));
lrMax = fmax(lrMax, __shfl_down_sync(0xFFFFFFFF, lrMax, 1));

// Find which lane(s) provided max value
const unsigned int ballot = __ballot_sync(0xFFFFFFFF, lrMax == lrLaneMax);

// If this is the first lane (arbitrary decision), use the index of the max within the neurons we've considered
if(lane == (__ffs(ballot) − 1)) {
    group->ArgMax[batch] = lrLaneArgMax;
}

Most of this will be trivia and involve:

Add a new VarAccessMode and corresponding VarAccess to https://github.com/genn-team/genn/blob/master/include/genn/genn/varAccess.h
Extend getReductionOperation and getReductionInitialValue in https://github.com/genn-team/genn/blob/master/src/genn/genn/code_generator/codeGenUtils.cc to generate and update multiple values (convention is when the code being generated is potentially a bit longer you take CodeStream by reference and write to that as it will respect indentation etc.
Add some special-case logic to https://github.com/genn-team/genn/blob/master/src/genn/genn/code_generator/backendSIMT.cc#L908 to still generate standard max operations for the lane reduction and do the balloting.
Check that the naive single-thread CPU versions in https://github.com/genn-team/genn/blob/master/src/genn/backends/single_threaded_cpu/backend.cc still work

The text was updated successfully, but these errors were encountered:

neworderofjamie added the enhancement label Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArgMin and ArgMax reductions #574

ArgMin and ArgMax reductions #574

neworderofjamie commented Mar 2, 2023 •

edited

Loading

ArgMin and ArgMax reductions #574

ArgMin and ArgMax reductions #574

Comments

neworderofjamie commented Mar 2, 2023 • edited Loading

Batch reduction

Neuron reductions

neworderofjamie commented Mar 2, 2023 •

edited

Loading