-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix snakecase converter for strings with multi-digit numbers #519
base: master
Are you sure you want to change the base?
Conversation
Wait, there are no tests for this? @jesp1999, would you mind writing units for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the implementation. From my POV, Alice123
should be alice_123
, not alice123
or alice_1_2_3
. Same for AliceBob123
.
As I see, these should be the cases:
'AliceBib' -> 'alice_bob'
'TAliceBob' -> 't_alice_bob'
'ConfigDCJMeta' -> 'config_dcj_meta'
I see the following patterns to be a group:
[A-Z]+
, followed by another group or end of string[A-Z][a-z]+
[0-9]+
(or\d
)
If @lidatong, @george-zubrienko, or any other collaborator disagree, I will listen to their arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that:
import re
from unittest import TestCase
def snake_case(s: str) -> str:
return re.sub(r'[_\W]+', '_', re.sub(r'[A-Z][a-z]+|\d+|[A-Z]+(?=[A-Z\d\W]|\b)', r'_\g<0>', s)).lower().strip('_')
class StringCaseTestCase(TestCase):
def test_snakecase_from_camel(self):
with self.subTest("Normal case"):
self.assertEqual(snake_case('Alice'), 'alice')
self.assertEqual(snake_case('AliceBob'), 'alice_bob')
with self.subTest("With digits"):
self.assertEqual(snake_case('Alice1'), 'alice_1')
self.assertEqual(snake_case('Alice123'), 'alice_123')
self.assertEqual(snake_case('123Alice'), '123_alice')
self.assertEqual(snake_case('Alice123Bob'), 'alice_123_bob')
self.assertEqual(snake_case('Alice1Bob'), 'alice_1_bob')
with self.subTest("With abbreviations"):
self.assertEqual(snake_case('TAliceBob'), 't_alice_bob')
self.assertEqual(snake_case('ConfigDCJMeta'), 'config_dcj_meta')
def test_snakecase_from_snake(self):
# Should be identity
with self.subTest("Normal case"):
self.assertEqual(snake_case('alice'), 'alice')
self.assertEqual(snake_case('alice_bob'), 'alice_bob')
with self.subTest("With digits"):
self.assertEqual(snake_case('alice_1'), 'alice_1')
self.assertEqual(snake_case('alice_123'), 'alice_123')
self.assertEqual(snake_case('123_alice'), '123_alice')
self.assertEqual(snake_case('alice_123_bob'), 'alice_123_bob')
self.assertEqual(snake_case('alice_1_bob'), 'alice_1_bob')
with self.subTest("With abbreviations"):
self.assertEqual(snake_case('t_alice_bob'), 't_alice_bob')
self.assertEqual(snake_case('config_dcj_meta'), 'config_dcj_meta')
if (__name__ == '__main__'):
from unittest import main as unittest_main
unittest_main()
I agree with @USSX-Hares. Quick search reveals there is no standard for snake case, so this subject is rather tricky to discuss. Also considering this is not backwards-compat, we'll have to bump major for this change to be released. |
Thanks for following up with all of the useful feedback, apologies if I'm violating any sort of convention by submitting a code change PR without a discussion first -- I'm new to open source. I'll implement some tests soon to rigidly define the behavior of the casing in the scenarios you present, as well as any more I can think of. Continuing the discussion, @george-zubrienko do you or anyone else have any suggestions for an appropriate name for this modified snake casing style based on the current conventions in the codebase? |
I don't see a problem releasing this in a minor version, especially since we are in the v0 stage where it is legal (according to semver) to release breaking changes in non-patch version. Also, I don't know any person who wants |
That is true, but I won't be so optimistic about the rest of internet accepting v0 conventions. DCJ is part of many large-scale frameworks (including langchain), and if we can contribute to creating slightly less chaos in python library ecosystem, I say we should. I propose we add this change as a normal patch release as an additional Given that V1 release might see the light of day by end of this year, that gives plenty of time for people to adjust. Can also put a deprecation warning on old SNAKE WDYT? @USSX-Hares @jesp1999 |
No need to apologize! It is always good to open a PR, because that shows your ideas best. Rest is technical stuff :) |
Well, I just said my opinion on the topic -- however, I don't go against your proposal. The only thing I am concired is the name of converter you have suggested, it's a bit confusing. It took me some time that |
My primary concern with this approach is that this will force those who are watching library updates to either
I feel like the first option is an undesirable one for most consumers, including myself, as it leaves a lingering threat of deprecation over my head whether I choose to comply with the change now or later. I'm still trying to think if I have a better solution, though... |
Resolves #518
Change to the snake_case letter converter to give more consistent behavior for strings containing multiple numeric characters in sequence. For instance:
Edit 1
@USSX-Hares: Updated description formatting.