Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snakecase converter for strings with multi-digit numbers #519

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jesp1999
Copy link

@jesp1999 jesp1999 commented Feb 12, 2024

Resolves #518

Change to the snake_case letter converter to give more consistent behavior for strings containing multiple numeric characters in sequence. For instance:

>>print(old_snakecase('Alice'), new_snakecase('Alice'))
alice alice

>>print(old_snakecase('Alice123'), new_snakecase('Alice123'))
alice_1_2_3 alice123

>>print(old_snakecase('AliceBob123'), new_snakecase('AliceBob123'))
alice_bob_1_2_3 alice_bob123

Edit 1

@USSX-Hares: Updated description formatting.

@USSX-Hares
Copy link
Collaborator

USSX-Hares commented Feb 14, 2024

Wait, there are no tests for this?

@jesp1999, would you mind writing units for the stringcase.py? This will prevent issues like this in the future.

@USSX-Hares USSX-Hares changed the title Fix snakecase converter for multi-numeric strings Fix snakecase converter for strings with multi-digit numbers Feb 14, 2024
Copy link
Collaborator

@USSX-Hares USSX-Hares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the implementation. From my POV, Alice123 should be alice_123, not alice123 or alice_1_2_3. Same for AliceBob123.

As I see, these should be the cases:

'AliceBib' -> 'alice_bob'
'TAliceBob' -> 't_alice_bob'
'ConfigDCJMeta' -> 'config_dcj_meta'

I see the following patterns to be a group:

  • [A-Z]+, followed by another group or end of string
  • [A-Z][a-z]+
  • [0-9]+ (or \d)

If @lidatong, @george-zubrienko, or any other collaborator disagree, I will listen to their arguments.

Copy link
Collaborator

@USSX-Hares USSX-Hares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that:

import re
from unittest import TestCase

def snake_case(s: str) -> str:
    return re.sub(r'[_\W]+', '_', re.sub(r'[A-Z][a-z]+|\d+|[A-Z]+(?=[A-Z\d\W]|\b)', r'_\g<0>', s)).lower().strip('_')

class StringCaseTestCase(TestCase):
    def test_snakecase_from_camel(self):
        with self.subTest("Normal case"):
            self.assertEqual(snake_case('Alice'), 'alice')
            self.assertEqual(snake_case('AliceBob'), 'alice_bob')
        
        with self.subTest("With digits"):
            self.assertEqual(snake_case('Alice1'), 'alice_1')
            self.assertEqual(snake_case('Alice123'), 'alice_123')
            self.assertEqual(snake_case('123Alice'), '123_alice')
            self.assertEqual(snake_case('Alice123Bob'), 'alice_123_bob')
            self.assertEqual(snake_case('Alice1Bob'), 'alice_1_bob')
        
        with self.subTest("With abbreviations"):
            self.assertEqual(snake_case('TAliceBob'), 't_alice_bob')
            self.assertEqual(snake_case('ConfigDCJMeta'), 'config_dcj_meta')
    
    def test_snakecase_from_snake(self):
        # Should be identity
        with self.subTest("Normal case"):
            self.assertEqual(snake_case('alice'), 'alice')
            self.assertEqual(snake_case('alice_bob'), 'alice_bob')
        
        with self.subTest("With digits"):
            self.assertEqual(snake_case('alice_1'), 'alice_1')
            self.assertEqual(snake_case('alice_123'), 'alice_123')
            self.assertEqual(snake_case('123_alice'), '123_alice')
            self.assertEqual(snake_case('alice_123_bob'), 'alice_123_bob')
            self.assertEqual(snake_case('alice_1_bob'), 'alice_1_bob')
        
        with self.subTest("With abbreviations"):
            self.assertEqual(snake_case('t_alice_bob'), 't_alice_bob')
            self.assertEqual(snake_case('config_dcj_meta'), 'config_dcj_meta')


if (__name__ == '__main__'):
    from unittest import main as unittest_main
    unittest_main()

@george-zubrienko
Copy link
Collaborator

I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released.
I can suggest adding a new case converter instead - much smoother and can be used w/o breaking stuff for other people.

@jesp1999
Copy link
Author

Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source.

I'll implement some tests soon to rigidly define the behavior of the casing in the scenarios you present, as well as any more I can think of.

Continuing the discussion, @george-zubrienko do you or anyone else have any suggestions for an appropriate name for this modified snake casing style based on the current conventions in the codebase?

@USSX-Hares
Copy link
Collaborator

I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released. I can suggest adding a new case converter instead - much smoother and can be used w/o breaking stuff for other people.

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version. Also, I don't know any person who wants alice_1_2_3 naming (especially when converting from alice123) and thus I believe this will not be a breaking change (yeah, I know the story behind jQuery v4.0.0, but, let's be honest, we are both not jQuery and not v3.x).

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Feb 20, 2024

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version.

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

WDYT? @USSX-Hares @jesp1999

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Feb 20, 2024

Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source.

No need to apologize! It is always good to open a PR, because that shows your ideas best. Rest is technical stuff :)

@USSX-Hares
Copy link
Collaborator

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

Well, I just said my opinion on the topic -- however, I don't go against your proposal.

The only thing I am concired is the name of converter you have suggested, it's a bit confusing. It took me some time that SNAKE_V1 is not the old syntax. Usually, when something has v1 in its name when there are alternatives, v1 is the oldest. How about smth like SNAKE_FUTURE, SNAKE_V2, or SNAKE_NEW? Any other variants and/or suggestions are welcomed.

@jesp1999
Copy link
Author

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version.

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

WDYT? @USSX-Hares @jesp1999

My primary concern with this approach is that this will force those who are watching library updates to either

  • make a breaking change now and revert it at V1 release (SNAKE -> SNAKE_V1 -> SNAKE)
    or
  • make a potentially breaking change at the V1 release

I feel like the first option is an undesirable one for most consumers, including myself, as it leaves a lingering threat of deprecation over my head whether I choose to comply with the change now or later. I'm still trying to think if I have a better solution, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] SNAKE LetterCase conversion adding extra underscores for numeric parts
3 participants