Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow backslash-escaped special characters in double-quoted strings (#700) #703

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jmknoble
Copy link

@jmknoble jmknoble commented Dec 3, 2024

@coveralls
Copy link

coveralls commented Dec 3, 2024

Coverage Status

coverage: 99.825%. remained the same
when pulling 3f84294 on jmknoble:fix/issue-700-double-quoted-special-chars
into 8513d9b on adrienverge:master.

@jmknoble
Copy link
Author

jmknoble commented Dec 3, 2024

Tests succeed before any changes:

$ git url
https://github.com/jmknoble/adrienverge_yamllint
$ git branch
* (HEAD detached at 8513d9b)
  fix/issue-700-double-quoted-special-chars
  master
$ git log -n 1
commit 8513d9b97da3b32453b3fccb221f4ab134a028d7 (HEAD, origin/master, origin/HEAD, master)
Author: Adrien Vergé <[email protected]>
Date:   Sat Nov 16 17:55:08 2024 +0100

    docs: Import 'yamllint.config' rather than 'yamllint'
    
    Since yamllint version 0.5.2 (in 2016, just after yamllint was created)
    this example in documentation doesn't work. Indeed, commit dbbecb5
    (which aimed to solve another problem) removed the ability to import
    yamllint submodules directly.
    
    This commit makes it clearer inside documentation.
    
    Fixes https://github.com/adrienverge/yamllint/issues/698
$ ./run-tests.sh 
+ uv run python -m unittest -v tests/rules/test_quoted_strings.py
test_allow_quoted_quotes (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_allow_quoted_quotes) ... ok
test_any_quotes_not_required (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_any_quotes_not_required) ... ok
test_default (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_default) ... ok
test_disabled (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_disabled) ... ok
test_octal_values (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_octal_values) ... ok
test_only_when_needed (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed) ... ok
test_only_when_needed_corner_cases (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_corner_cases) ... ok
test_only_when_needed_extras (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_extras) ... ok
test_only_when_needed_single_quotes (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_single_quotes) ... ok
test_quote_type_any (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_any) ... ok
test_quote_type_double (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_double) ... ok
test_quote_type_single (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_single) ... ok
test_single_quotes_not_required (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_single_quotes_not_required) ... ok
test_allow_quoted_quotes (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_allow_quoted_quotes) ... ok
test_any_quotes_not_required (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_any_quotes_not_required) ... ok
test_disabled (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_disabled) ... ok
test_octal_values (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_octal_values) ... ok
test_only_when_needed (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed) ... ok
test_only_when_needed_corner_cases (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_corner_cases) ... ok
test_only_when_needed_extras (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_extras) ... ok
test_only_when_needed_single_quotes (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_single_quotes) ... ok
test_quote_type_any (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_any) ... ok
test_quote_type_double (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_double) ... ok
test_quote_type_single (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_single) ... ok
test_single_quotes_not_required (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_single_quotes_not_required) ... ok

----------------------------------------------------------------------
Ran 25 tests in 0.486s

OK
$ 

@jmknoble
Copy link
Author

jmknoble commented Dec 3, 2024

Repro test case fails properly before fix applied:

$ git url
https://github.com/jmknoble/adrienverge_yamllint
$ git branch
* (HEAD detached at 9b8ce28)
  fix/issue-700-double-quoted-special-chars
  master
$ git log -n 1
commit 9b8ce28d05597470042c49c6b6d9159c412f9e00 (HEAD)
Author: Jim Knoble <[email protected]>
Date:   Tue Dec 3 15:06:09 2024 -0800

    Add repro test case for #700
$ ./run-tests.sh 
+ uv run python -m unittest -v tests/rules/test_quoted_strings.py
test_allow_quoted_quotes (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_allow_quoted_quotes) ... ok
test_any_quotes_not_required (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_any_quotes_not_required) ... ok
test_default (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_default) ... ok
test_disabled (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_disabled) ... ok
test_octal_values (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_octal_values) ... ok
test_only_when_needed (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed) ... ok
test_only_when_needed_corner_cases (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_corner_cases) ... ok
test_only_when_needed_extras (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_extras) ... ok
test_only_when_needed_single_quotes (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_single_quotes) ... ok
test_quote_type_any (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_any) ... ok
test_quote_type_double (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_double) ... ok
test_quote_type_single (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_single) ... ok
test_single_quotes_not_required (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_single_quotes_not_required) ... ok
test_allow_quoted_quotes (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_allow_quoted_quotes) ... ok
test_any_quotes_not_required (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_any_quotes_not_required) ... ok
test_disabled (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_disabled) ... ok
test_octal_values (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_octal_values) ... ok
test_only_when_needed (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed) ... ok
test_only_when_needed_corner_cases (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_corner_cases) ... ok
test_only_when_needed_extras (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_extras) ... ok
test_only_when_needed_single_quotes (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_single_quotes) ... ok
test_only_when_needed_special_characters (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_special_characters) ... ERROR
test_only_when_needed_special_characters_exceptions (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_special_characters_exceptions) ... ok
test_quote_type_any (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_any) ... ok
test_quote_type_double (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_double) ... ok
test_quote_type_single (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_single) ... ok
test_single_quotes_not_required (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_single_quotes_not_required) ... ok

======================================================================
ERROR: test_only_when_needed_special_characters (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_special_characters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/tests/rules/test_quoted_strings.py", line 481, in test_only_when_needed_special_characters
    self.check('---\n'
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/tests/common.py", line 55, in check
    real_problems = list(linter.run(source, self.build_fake_config(conf)))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/yamllint/linter.py", line 199, in _run
    for problem in get_cosmetic_problems(buffer, conf, filepath):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/yamllint/linter.py", line 136, in get_cosmetic_problems
    for problem in rule.check(rule_conf,
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/yamllint/rules/quoted_strings.py", line 301, in check
    not _quotes_are_needed(token.value,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/yamllint/rules/quoted_strings.py", line 212, in _quotes_are_needed
    loader = yaml.BaseLoader('key: ' + string)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/.venv/lib/python3.12/site-packages/yaml/loader.py", line 14, in __init__
    Reader.__init__(self, stream)
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/.venv/lib/python3.12/site-packages/yaml/reader.py", line 74, in __init__
    self.check_printable(stream)
  File "/Users/jmknoble/Stuff/RevisionControl/git/personal/fork/adrienverge_yamllint/.venv/lib/python3.12/site-packages/yaml/reader.py", line 143, in check_printable
    raise ReaderError(self.name, position, ord(character),
yaml.reader.ReaderError: unacceptable character #x001b: special characters are not allowed
  in "<unicode string>", position 5

----------------------------------------------------------------------
Ran 27 tests in 0.494s

FAILED (errors=1)
$ 

@jmknoble
Copy link
Author

jmknoble commented Dec 3, 2024

Tests pass, including repro test case, after fix and linting applied:

$ git url
https://github.com/jmknoble/adrienverge_yamllint
$ git branch
* fix/issue-700-double-quoted-special-chars
  master
$ git log -n 1
commit 4cb3eedea2709de13c77152c7c65b89a84777c70 (HEAD -> fix/issue-700-double-quoted-special-chars, origin/fix/issue-700-double-quoted-special-chars)
Author: Jim Knoble <[email protected]>
Date:   Tue Dec 3 15:22:20 2024 -0800

    React to flake8 linting
$ ./run-tests.sh 
+ uv run python -m unittest -v tests/rules/test_quoted_strings.py
test_allow_quoted_quotes (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_allow_quoted_quotes) ... ok
test_any_quotes_not_required (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_any_quotes_not_required) ... ok
test_default (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_default) ... ok
test_disabled (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_disabled) ... ok
test_octal_values (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_octal_values) ... ok
test_only_when_needed (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed) ... ok
test_only_when_needed_corner_cases (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_corner_cases) ... ok
test_only_when_needed_extras (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_extras) ... ok
test_only_when_needed_single_quotes (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_only_when_needed_single_quotes) ... ok
test_quote_type_any (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_any) ... ok
test_quote_type_double (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_double) ... ok
test_quote_type_single (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_quote_type_single) ... ok
test_single_quotes_not_required (tests.rules.test_quoted_strings.QuotedKeysTestCase.test_single_quotes_not_required) ... ok
test_allow_quoted_quotes (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_allow_quoted_quotes) ... ok
test_any_quotes_not_required (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_any_quotes_not_required) ... ok
test_disabled (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_disabled) ... ok
test_octal_values (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_octal_values) ... ok
test_only_when_needed (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed) ... ok
test_only_when_needed_corner_cases (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_corner_cases) ... ok
test_only_when_needed_extras (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_extras) ... ok
test_only_when_needed_single_quotes (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_single_quotes) ... ok
test_only_when_needed_special_characters (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_special_characters) ... ok
test_only_when_needed_special_characters_exceptions (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_only_when_needed_special_characters_exceptions) ... ok
test_quote_type_any (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_any) ... ok
test_quote_type_double (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_double) ... ok
test_quote_type_single (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_quote_type_single) ... ok
test_single_quotes_not_required (tests.rules.test_quoted_strings.QuotedValuesTestCase.test_single_quotes_not_required) ... ok

----------------------------------------------------------------------
Ran 27 tests in 0.483s

OK
$ 

@jmknoble
Copy link
Author

jmknoble commented Dec 3, 2024

It could be argued that we should accept single-quoted strings containing special characters as well, since the strings are quoted. Obviously, though, since backslash-escapes are not active in single-quoted YAML scalars, the special characters would be embedded in such strings without escaping.

However:

  1. Yamllint is likely already raising exceptions about those.
  2. Folks are either depending on those exceptions as feedback or are working around them in other ways (such as changing the quote settings or excluding the files from being checked).

Any wider change in this area likely requires more thought to understand whether it would be a breaking change.

(Note, this fix actually accepts such embedded special characters in double-quoted strings, since we can't really tell that they were escaped in the original YAML, unless we do something a lot more invasive; however, testing seems to show that yaml.reader.ReaderError is raised for that case prior to calling _quotes_are_needed()).

It's my opinion that the narrower fix is most appropriate here, and that the wider, more invasive change should go into a backlog if it's at all desired.

@jmknoble
Copy link
Author

jmknoble commented Dec 5, 2024

Coverage decreased (-0.03%) to 99.8%

Unsure why this would be the case, as the self.check() calls in the added test cases should be covering all the added code paths in _quotes_are_needed() and its caller.

@jmknoble
Copy link
Author

jmknoble commented Dec 9, 2024

@adrienverge, this feels ready for review.

@adrienverge
Copy link
Owner

Hello @jmknoble, thanks for contributing!

This fix looks good.

Any wider change in this area likely requires more thought to understand whether it would be a breaking change.

Understood. I agree with your conclusion: let's fix double-quotes for now.

However I'm not comfortable with relying on an English string (e.reason == "special characters are not allowed"), it could change in the future and break. Instead, what about using yaml.reader.Reader.check_printable(), which is the source of this string? For example:

>>> from yaml.reader import Reader
>>> Reader('').check_printable('\u0020')
>>> Reader('').check_printable('\u001b')
yaml.reader.ReaderError: unacceptable character #x001b: special characters are not allowed

Repro test case fails properly before fix applied:
Tests pass, including repro test case, after fix and linting applied:

👌

Also can you squash all your commits into one, with a commit message starting with quoted-strings: … and a minimal rationale + reproducer inside it?

@jmknoble
Copy link
Author

I'm not comfortable with relying on an English string (e.reason == "special characters are not allowed"), it could change in the future and break. Instead, what about using yaml.reader.Reader.check_printable(), [which is the source of this string]

I get the brittleness concern. How does this look?

--- a/yamllint/rules/quoted_strings.py
+++ b/yamllint/rules/quoted_strings.py
@@ -204,12 +204,21 @@ def _quote_match(quote_type, token_style):
             (quote_type == 'double' and token_style == '"'))
 
 
-def _quotes_are_needed(string, is_inside_a_flow):
+def _quotes_are_needed(string, style, is_inside_a_flow):
     # Quotes needed on strings containing flow tokens
     if is_inside_a_flow and set(string) & {',', '[', ']', '{', '}'}:
         return True
 
+    if style == '"':
+        try:
+            yaml.reader.Reader('').check_printable('key: ' + string)
+        except yaml.reader.ReaderError:
+            # Special characters in a double-quoted string
+            # are assumed to have been backslash-escaped
+            return True
+
     loader = yaml.BaseLoader('key: ' + string)
+
     # Remove the 5 first tokens corresponding to 'key: ' (StreamStartToken,
     # BlockMappingStartToken, KeyToken, ScalarToken(value=key), ValueToken)
     for _ in range(5):

Also can you squash all your commits into one, with a commit message starting with quoted-strings: … and a minimal rationale + reproducer inside it?

Let me know what you think of the above, please, before I go to the trouble of doing that.

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look?

This looks good! Except for the last added empty line: it seems unrelated?

tests/rules/test_quoted_strings.py Outdated Show resolved Hide resolved
tests/rules/test_quoted_strings.py Outdated Show resolved Hide resolved
@jmknoble
Copy link
Author

jmknoble commented Dec 11, 2024

Ok, @adrienverge, let me know if you approve of the changes as of 3f84294, and I will then close this PR and open a new one with a squashed commit.

Proposed commit message:

quoted-strings: only-when-needed: allow escaped special characters in double-quoted strings

- Fixes #700 
- Accepts escaped special characters in double quoted strings (`key: "\u001b"`)
- Still throws exception for unescaped (embedded) special characters, whether or not quotes are present
- To reproduce: `yamllint -d 'rules: {quoted-strings: {required: only-when-needed}}' - <<<'key: "\u001b"'`

@adrienverge
Copy link
Owner

Hello @jmknoble,

Thanks for the update!

No need to open a new pull request, let's track this feature's history at a single place. You can push force to this one. Otherwise I'll squash myself before merging, with a commit message like:

quoted-strings: Fix crash with only-when-needed and escaped special characters

Prevent yamllint crash upon non-printable characters:

    $ yamllint -d 'rules: {quoted-strings: {required: only-when-needed}}' - <<<'key: "\u001b"'
    …
    yaml.reader.ReaderError: unacceptable character #x001b: special characters are not allowed
      in "<unicode string>", position 5

Accept escaped special characters in double quoted strings (`key:
"\u001b"`). But still throw exceptions for unescaped (embedded) special
characters, whether or not quotes are present.

When checking whether quotes are needed, assume that double-quoted
strings containing special characters have already been escaped (and
claim the quotes are in fact needed).

Fixes https://github.com/adrienverge/yamllint/issues/700

Before this I just have a question, regarding your comment #703 (comment). Do you confirm that writing k1: '\u001b' or k1: \u001b is valid YAML, and that the 2 are equivalent? If my understanding is correct, then what do you think of adding the following code to make it clear?

     def test_only_when_needed_special_characters(self):
         conf = 'quoted-strings: {required: only-when-needed}\n'
         self.check('---\n'
                    'k1: "\\u001b"\n',
                    conf)
+        self.check('---\n'
+                   "k1: '\\u001b'\n",
+                   conf, problem=(2, 5))
+        self.check('---\n'
+                   'k1: \\u001b\n',
+                   conf)
         self.assertRaises(yaml.reader.ReaderError, self.check,
                           '---\n'
                           'k1: "\u001b"\n',
                           conf)
         self.assertRaises(yaml.reader.ReaderError, self.check,
                           '---\n'
                           "k1: '\u001b'\n",
                           conf)
         self.assertRaises(yaml.reader.ReaderError, self.check,
                           '---\n'
                           'k1: \u001b\n',
                           conf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants