Optimize memory usage #97

lorenzoridolfi · 2023-12-02T14:40:17Z

Small Change, Big Impact: Optimizing GPU Memory Usage

This pull request introduces a small yet impactful optimization to the GPU memory usage in the AutoModel class, leveraging PyTorch's Automatic Mixed Precision (AMP) feature. By simply wrapping the model's inference code within the autocast context manager from torch.cuda.amp, we significantly reduce memory usage during GPU operations. This small change is particularly beneficial for users with lower-memory GPUs, as it allows more efficient use of available resources.

Key Change

Integrated with autocast(): within the translate_sentences method of AutoModel.
This minor change enables mixed-precision computation, reducing the GPU memory footprint by using float16 precision where possible without affecting model performance.

Impact

Remarkable Efficiency Gains: This change drastically reduces GPU memory consumption. It's a game-changer for users with GPUs that have lower memory, allowing for smoother operation and reducing out-of-memory errors.
Maintains Compatibility: The update maintains full backward compatibility. On systems without a GPU or those not supporting AMP, autocast simply becomes a no-operation, preserving existing functionality.
Expands Accessibility: By making the package more memory-efficient, we're opening the doors to a broader range of users who might have been limited by hardware constraints.

lorenzoridolfi · 2023-12-02T15:17:23Z

I've recently added a commit introducing a new optional parameter, with_autocast, to the translate_sentences method in the AutoModel class. This parameter defaults to False, maintaining the current behavior for existing codebases.

The key enhancement offered by this update is the conditional use of PyTorch's autocast feature. Users now have the option to enable autocast by setting with_autocast=True when invoking the method. This flexibility allows for more efficient GPU memory usage, which is especially beneficial for those with memory constraints on their GPUs or those who wish to optimize performance on compatible hardware.

This addition carefully preserves the existing functionality for all current users while providing an accessible pathway to leverage mixed-precision computation for improved memory efficiency. It's a thoughtful balance between retaining the reliability of the existing code and offering a performance optimization tool for those who need or desire it.

lorenzo-caylent and others added 3 commits December 2, 2023 11:15

Add autocast to optimize GPU memory usage

7dbba8f

Update .gitignore to exclude .DS_Store

15d3b0f

Adding a parameter to control the optional usage of autocast

c99ddec

lorenzoridolfi marked this pull request as draft December 2, 2023 15:07

lorenzoridolfi marked this pull request as ready for review December 2, 2023 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize memory usage #97

Optimize memory usage #97

lorenzoridolfi commented Dec 2, 2023

lorenzoridolfi commented Dec 2, 2023

Optimize memory usage #97

Are you sure you want to change the base?

Optimize memory usage #97

Conversation

lorenzoridolfi commented Dec 2, 2023

Small Change, Big Impact: Optimizing GPU Memory Usage

Key Change

Impact

lorenzoridolfi commented Dec 2, 2023