ssm_enhancement #689

vishesh9131 · 2024-09-06T12:38:43Z

Pull Request: Enhancements to Mamba and Jamba State-space Models

Summary

This pull request introduces several enhancements to the Mamba and Jamba state-space models (SSMs) implementation, including new recurrence methods, hybrid approaches, and comprehensive testing.

Changes

1. New Recurrence Methods

HybridMambaRecurrence: Combines different recurrence methods to leverage their strengths.
AlternativeMambaRecurrence: Implements an alternative recurrence method for potentially better performance or accuracy.

2. Enhancements to `ssm.py`

Added HybridMambaRecurrence and AlternativeMambaRecurrence classes.
Updated MambaMixerLayer and JambaMixerLayer to integrate the new recurrence methods.

3. Comprehensive Testing in `ssm_test.py`

Added tests for HybridMambaRecurrence and AlternativeMambaRecurrence in MambaMixerLayerTest.
Added tests for hybrid and alternative recurrence methods in StackedMambaTest.
Added tests for hybrid and alternative recurrence methods in StackedMixedSSMTransformerTest.

(Documentation and Examples : Updated docstrings and comments to reflect the new features and changes.)

Testing

All new features have been thoroughly tested with the following configurations:

Different data types (jnp.float32, jnp.bfloat16).
Various model dimensions, state dimensions, and hidden dimensions.
Integration within MambaBlock, JambaMambaBlock, and StackedMixedSSMTransformerLayer.

Conclusion

These enhancements provide additional flexibility and options for implementing and experimenting with different recurrence methods in the Mamba and Jamba models, potentially improving performance and accuracy for various tasks.

vishesh9131 · 2024-09-08T21:36:21Z

Hey @markblee ,

Could you please take a look at my PR when you get a chance? Thanks!

vishesh9131 · 2024-09-10T17:11:59Z

Hey @swiseman ,

Could you please take a look at my PR when you get a chance? Thanks!

swiseman

Thanks for the PR!

axlearn/common/ssm.py

axlearn/common/ssm_test.py

swiseman · 2024-09-11T13:46:03Z

axlearn/common/ssm.py

+
+
+
+class HybridMambaRecurrence(BaseMambaRecurrence):


Thanks for these new classes. Do people use either the hybrid recurrences or alternative recurrences defined below? Is there evidence that they are useful empirically? If not, I think it would be simpler to leave these classes out for now, and if necessary let people define them in downstream experiment files which import axlearn.common.ssm.

Hey @swiseman , Thank you for your valuable input. I've reviewed the hybrid recurrences and alternative recurrences, and it seems that they haven't been used extensively in practice. Based on your benchmarking results, it appears that the AssociativeScanMambaRecurrence is more efficient than the HybridMambaRecurrence.

Given the lack of empirical evidence and the performance advantage of the AssociativeScanMambaRecurrence, I agree that it's reasonable to remove the HybridMambaRecurrence and other less-used recurrences from the core axlearn.common.ssm module for now.

This will simplify the codebase and make it easier for users to understand and use. If there's a strong need for these recurrences in the future, they can be defined in downstream experiment files as you suggested.

-Vishesh

- fixed functions redundant definitions - fixed Incorrect Module Import in layers.py

ruomingp

Hi, shall we close the PR or turn it into draft? Thanks.

vishesh9131 · 2024-09-19T16:23:50Z

Hi, shall we close the PR or turn it into draft? Thanks.

Hey @ruomingp we can close this PR, I am working on these hybrid structures and it will take a while...

Thanks for reading;
@vishesh9131

ssm_enhancement

09a940d

These enhancements provide additional flexibility and options for implementing and experimenting with different recurrence methods in the Mamba and Jamba models, potentially improving performance and accuracy for various tasks.

vishesh9131 requested review from ruomingp and markblee as code owners September 6, 2024 12:38

markblee requested a review from swiseman September 8, 2024 21:46

swiseman reviewed Sep 11, 2024

View reviewed changes

some fixes

8a0982f

- fixed functions redundant definitions - fixed Incorrect Module Import in layers.py

vishesh9131 closed this Sep 11, 2024

vishesh9131 reopened this Sep 11, 2024

ruomingp reviewed Sep 19, 2024

View reviewed changes

Merge branch 'apple:main' into main

4fe6aea

ruomingp marked this pull request as draft October 6, 2024 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ssm_enhancement #689

ssm_enhancement #689

vishesh9131 commented Sep 6, 2024

vishesh9131 commented Sep 8, 2024

vishesh9131 commented Sep 10, 2024

swiseman left a comment

swiseman Sep 11, 2024

vishesh9131 Sep 11, 2024 •

edited

Loading

ruomingp left a comment

vishesh9131 commented Sep 19, 2024




		class HybridMambaRecurrence(BaseMambaRecurrence):

ssm_enhancement #689

Are you sure you want to change the base?

ssm_enhancement #689

Conversation

vishesh9131 commented Sep 6, 2024

Pull Request: Enhancements to Mamba and Jamba State-space Models

Summary

Changes

1. New Recurrence Methods

2. Enhancements to ssm.py

3. Comprehensive Testing in ssm_test.py

Testing

Conclusion

vishesh9131 commented Sep 8, 2024

vishesh9131 commented Sep 10, 2024

swiseman left a comment

Choose a reason for hiding this comment

swiseman Sep 11, 2024

Choose a reason for hiding this comment

vishesh9131 Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

ruomingp left a comment

Choose a reason for hiding this comment

vishesh9131 commented Sep 19, 2024

2. Enhancements to `ssm.py`

3. Comprehensive Testing in `ssm_test.py`

vishesh9131 Sep 11, 2024 •

edited

Loading