cpu: pooling: fix crashes of large tensor processing #2875
+45
−38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes crashes of simple_nchw and simple_nhwc in some cases that were found in MFDNN-13286.
It also reduces memory allocation in simple_nchw in case of back propagation for non-f32 data types: simple_nchw requested scratchpad memory for every available thread even if the number of work items was smaller (e.g. it requested more than 4TB for mb1ic1iw4294967311ow858993461kw7sw5pw0 according to logs in MFDNN-13286; the size depends on the number of available threads). This helps in some cases (particularly, in the cases from MFDNN-13286).