Fix snakecase converter for strings with multi-digit numbers #519

jesp1999 · 2024-02-12T21:38:50Z

Resolves #518

Change to the snake_case letter converter to give more consistent behavior for strings containing multiple numeric characters in sequence. For instance:

>>print(old_snakecase('Alice'), new_snakecase('Alice'))
alice alice

>>print(old_snakecase('Alice123'), new_snakecase('Alice123'))
alice_1_2_3 alice123

>>print(old_snakecase('AliceBob123'), new_snakecase('AliceBob123'))
alice_bob_1_2_3 alice_bob123

Edit 1

@USSX-Hares: Updated description formatting.

USSX-Hares · 2024-02-14T11:07:35Z

Wait, there are no tests for this?

@jesp1999, would you mind writing units for the stringcase.py? This will prevent issues like this in the future.

USSX-Hares

I disagree with the implementation. From my POV, Alice123 should be alice_123, not alice123 or alice_1_2_3. Same for AliceBob123.

As I see, these should be the cases:

'AliceBib' -> 'alice_bob'
'TAliceBob' -> 't_alice_bob'
'ConfigDCJMeta' -> 'config_dcj_meta'

I see the following patterns to be a group:

[A-Z]+, followed by another group or end of string
[A-Z][a-z]+
[0-9]+ (or \d)

If @lidatong, @george-zubrienko, or any other collaborator disagree, I will listen to their arguments.

USSX-Hares

Something like that:

import re
from unittest import TestCase

def snake_case(s: str) -> str:
    return re.sub(r'[_\W]+', '_', re.sub(r'[A-Z][a-z]+|\d+|[A-Z]+(?=[A-Z\d\W]|\b)', r'_\g<0>', s)).lower().strip('_')

class StringCaseTestCase(TestCase):
    def test_snakecase_from_camel(self):
        with self.subTest("Normal case"):
            self.assertEqual(snake_case('Alice'), 'alice')
            self.assertEqual(snake_case('AliceBob'), 'alice_bob')
        
        with self.subTest("With digits"):
            self.assertEqual(snake_case('Alice1'), 'alice_1')
            self.assertEqual(snake_case('Alice123'), 'alice_123')
            self.assertEqual(snake_case('123Alice'), '123_alice')
            self.assertEqual(snake_case('Alice123Bob'), 'alice_123_bob')
            self.assertEqual(snake_case('Alice1Bob'), 'alice_1_bob')
        
        with self.subTest("With abbreviations"):
            self.assertEqual(snake_case('TAliceBob'), 't_alice_bob')
            self.assertEqual(snake_case('ConfigDCJMeta'), 'config_dcj_meta')
    
    def test_snakecase_from_snake(self):
        # Should be identity
        with self.subTest("Normal case"):
            self.assertEqual(snake_case('alice'), 'alice')
            self.assertEqual(snake_case('alice_bob'), 'alice_bob')
        
        with self.subTest("With digits"):
            self.assertEqual(snake_case('alice_1'), 'alice_1')
            self.assertEqual(snake_case('alice_123'), 'alice_123')
            self.assertEqual(snake_case('123_alice'), '123_alice')
            self.assertEqual(snake_case('alice_123_bob'), 'alice_123_bob')
            self.assertEqual(snake_case('alice_1_bob'), 'alice_1_bob')
        
        with self.subTest("With abbreviations"):
            self.assertEqual(snake_case('t_alice_bob'), 't_alice_bob')
            self.assertEqual(snake_case('config_dcj_meta'), 'config_dcj_meta')


if (__name__ == '__main__'):
    from unittest import main as unittest_main
    unittest_main()

george-zubrienko · 2024-02-14T14:31:47Z

I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released.
I can suggest adding a new case converter instead - much smoother and can be used w/o breaking stuff for other people.

jesp1999 · 2024-02-20T15:12:53Z

Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source.

I'll implement some tests soon to rigidly define the behavior of the casing in the scenarios you present, as well as any more I can think of.

Continuing the discussion, @george-zubrienko do you or anyone else have any suggestions for an appropriate name for this modified snake casing style based on the current conventions in the codebase?

USSX-Hares · 2024-02-20T15:22:09Z

I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released. I can suggest adding a new case converter instead - much smoother and can be used w/o breaking stuff for other people.

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version. Also, I don't know any person who wants alice_1_2_3 naming (especially when converting from alice123) and thus I believe this will not be a breaking change (yeah, I know the story behind jQuery v4.0.0, but, let's be honest, we are both not jQuery and not v3.x).

george-zubrienko · 2024-02-20T20:45:00Z

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version.

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

WDYT? @USSX-Hares @jesp1999

george-zubrienko · 2024-02-20T20:46:29Z

Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source.

No need to apologize! It is always good to open a PR, because that shows your ideas best. Rest is technical stuff :)

USSX-Hares · 2024-02-20T20:53:15Z

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

Well, I just said my opinion on the topic -- however, I don't go against your proposal.

The only thing I am concired is the name of converter you have suggested, it's a bit confusing. It took me some time that SNAKE_V1 is not the old syntax. Usually, when something has v1 in its name when there are alternatives, v1 is the oldest. How about smth like SNAKE_FUTURE, SNAKE_V2, or SNAKE_NEW? Any other variants and/or suggestions are welcomed.

jesp1999 · 2024-02-20T22:06:59Z

I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version.

That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional letter_case.SNAKE_V1. We allow people to use V1 and the old SNAKE as-is, and announce that old SNAKE will be removed in v1 release and SNAKE_V1 renamed to regular SNAKE.

Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE

WDYT? @USSX-Hares @jesp1999

My primary concern with this approach is that this will force those who are watching library updates to either

make a breaking change now and revert it at V1 release (SNAKE -> SNAKE_V1 -> SNAKE)
or
make a potentially breaking change at the V1 release

I feel like the first option is an undesirable one for most consumers, including myself, as it leaves a lingering threat of deprecation over my head whether I choose to comply with the change now or later. I'm still trying to think if I have a better solution, though...

Fix snakecase converter for multi-numeric strings

21dfb4b

USSX-Hares changed the title ~~Fix snakecase converter for multi-numeric strings~~ Fix snakecase converter for strings with multi-digit numbers Feb 14, 2024

USSX-Hares requested changes Feb 14, 2024

View reviewed changes

USSX-Hares reviewed Feb 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix snakecase converter for strings with multi-digit numbers #519

Fix snakecase converter for strings with multi-digit numbers #519

jesp1999 commented Feb 12, 2024 •

edited by USSX-Hares

Loading

USSX-Hares commented Feb 14, 2024 •

edited

Loading

USSX-Hares left a comment •

edited

Loading

USSX-Hares left a comment •

edited

Loading

george-zubrienko commented Feb 14, 2024

jesp1999 commented Feb 20, 2024

USSX-Hares commented Feb 20, 2024

george-zubrienko commented Feb 20, 2024 •

edited

Loading

george-zubrienko commented Feb 20, 2024 •

edited

Loading

USSX-Hares commented Feb 20, 2024

jesp1999 commented Feb 20, 2024

Fix snakecase converter for strings with multi-digit numbers #519

Are you sure you want to change the base?

Fix snakecase converter for strings with multi-digit numbers #519

Conversation

jesp1999 commented Feb 12, 2024 • edited by USSX-Hares Loading

Edit 1

USSX-Hares commented Feb 14, 2024 • edited Loading

USSX-Hares left a comment • edited Loading

Choose a reason for hiding this comment

USSX-Hares left a comment • edited Loading

Choose a reason for hiding this comment

george-zubrienko commented Feb 14, 2024

jesp1999 commented Feb 20, 2024

USSX-Hares commented Feb 20, 2024

george-zubrienko commented Feb 20, 2024 • edited Loading

george-zubrienko commented Feb 20, 2024 • edited Loading

USSX-Hares commented Feb 20, 2024

jesp1999 commented Feb 20, 2024

jesp1999 commented Feb 12, 2024 •

edited by USSX-Hares

Loading

USSX-Hares commented Feb 14, 2024 •

edited

Loading

USSX-Hares left a comment •

edited

Loading

USSX-Hares left a comment •

edited

Loading

george-zubrienko commented Feb 20, 2024 •

edited

Loading

george-zubrienko commented Feb 20, 2024 •

edited

Loading