Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory usage #97

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lorenzoridolfi
Copy link

Small Change, Big Impact: Optimizing GPU Memory Usage

This pull request introduces a small yet impactful optimization to the GPU memory usage in the AutoModel class, leveraging PyTorch's Automatic Mixed Precision (AMP) feature. By simply wrapping the model's inference code within the autocast context manager from torch.cuda.amp, we significantly reduce memory usage during GPU operations. This small change is particularly beneficial for users with lower-memory GPUs, as it allows more efficient use of available resources.

Key Change

  • Integrated with autocast(): within the translate_sentences method of AutoModel.
  • This minor change enables mixed-precision computation, reducing the GPU memory footprint by using float16 precision where possible without affecting model performance.

Impact

  • Remarkable Efficiency Gains: This change drastically reduces GPU memory consumption. It's a game-changer for users with GPUs that have lower memory, allowing for smoother operation and reducing out-of-memory errors.
  • Maintains Compatibility: The update maintains full backward compatibility. On systems without a GPU or those not supporting AMP, autocast simply becomes a no-operation, preserving existing functionality.
  • Expands Accessibility: By making the package more memory-efficient, we're opening the doors to a broader range of users who might have been limited by hardware constraints.

@lorenzoridolfi lorenzoridolfi marked this pull request as draft December 2, 2023 15:07
@lorenzoridolfi
Copy link
Author

I've recently added a commit introducing a new optional parameter, with_autocast, to the translate_sentences method in the AutoModel class. This parameter defaults to False, maintaining the current behavior for existing codebases.

The key enhancement offered by this update is the conditional use of PyTorch's autocast feature. Users now have the option to enable autocast by setting with_autocast=True when invoking the method. This flexibility allows for more efficient GPU memory usage, which is especially beneficial for those with memory constraints on their GPUs or those who wish to optimize performance on compatible hardware.

This addition carefully preserves the existing functionality for all current users while providing an accessible pathway to leverage mixed-precision computation for improved memory efficiency. It's a thoughtful balance between retaining the reliability of the existing code and offering a performance optimization tool for those who need or desire it.

@lorenzoridolfi lorenzoridolfi marked this pull request as ready for review December 2, 2023 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants