Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of --skip with folder names #1915

Open
peternewman opened this issue May 8, 2021 · 12 comments · May be fixed by #2058
Open

Behavior of --skip with folder names #1915

peternewman opened this issue May 8, 2021 · 12 comments · May be fixed by #2058

Comments

@peternewman
Copy link
Collaborator

Also these should all work the same:

  • --skip ./foo/ - Currently doesn't work
  • --skip ./foo - Currently works
  • --skip foo/ - Untested
  • --skip foo - Untested

We also need to check if --skip ./foo is equivalent to --skip ./foobar (on a file/folder named foobar) and likewise for --skip foo.

Originally posted by @peternewman in #1528 (comment)

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 22, 2021

In my case, none of these work:

--skip ./.git/
--skip ./.git
--skip .git/
--skip .git
--skip=./.git/
--skip=./.git
--skip=.git/
--skip=.git
--skip ./tests/ci/checkpatch/
--skip ./tests/ci/checkpatch
--skip tests/ci/checkpatch/
--skip tests/ci/checkpatch
--skip=./tests/ci/checkpatch/
--skip=./tests/ci/checkpatch
--skip=tests/ci/checkpatch/
--skip=tests/ci/checkpatch

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 10, 2021

codespell -S ./.git attempts the following fnmatch calls and eventually succeeds:

>>> fnmatch.fnmatch(".git", "./.git")
False
>>> fnmatch.fnmatch("./.git", "./.git")
True
>>> 

codespell -S ./.git/ attempts the following fnmatch calls, all of them fail:

>>> fnmatch.fnmatch(".git", "./.git/")
False
>>> fnmatch.fnmatch("./.git", "./.git/")
False
>>> 

I guess the key is in the fnmatch documentation:

Note that the filename separator ('/' on Unix) is not special to this module. See module glob for pathname expansion (glob uses filter() to match pathname segments).

@DimitriPapadopoulos
Copy link
Collaborator

With the following setup:

$ tree
.
└── tests
    └── ci
        └── checkpatch
            └── spelling.txt

3 directories, 1 file
$ 
  1. First in my list, codespell -S tests/ci/checkpatch attempts among other the following fnmatch call, which fails:
>>> fnmatch.fnmatch("./tests/ci/checkpatch", "tests/ci/checkpatch")
False
>>> 
  1. Same for codespell -S tests/ci/checkpatch/:
>>> fnmatch.fnmatch("./tests/ci/checkpatch", "tests/ci/checkpatch/")
False
>>> 
  1. Same for codespell -S ./tests/ci/checkpatch/:
>>> fnmatch.fnmatch("./tests/ci/checkpatch", "./tests/ci/checkpatch/")
False
>>> 
  1. On the other hand, codespell -S ./tests/ci/checkpatch does succeed after all:
>>> fnmatch.fnmatch("./tests/ci/checkpatch", "./tests/ci/checkpatch")
True
>>> 

I don't know why I thought it didn't in #2047 - I was probably tired and confused by too many attempts to understand.

So we just need to:

  1. take care of the leading ./: either add it to both the pattern and path under test, when missing, or remove it from both the pattern and path under test, if present,
  2. remove any trailing / from the pattern.

@DimitriPapadopoulos DimitriPapadopoulos linked a pull request Sep 10, 2021 that will close this issue
@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Sep 10, 2021

The trailing / is easy to fix.

About the leading ./, here are a few things I have noticed:

  • In the absence of arguments that specify the files or directories to check, codespell will start with the current directory, initialised as .:
    https://github.com/codespell-project/codespell/blob/f3901ed/codespell_lib/_codespell.py#L427-L428
    Hence, when matching, we need to remove the initial ./ added to all file paths, or add ./ to all patterns.
  • We can also run codespell on the current directory using codespell $PWD. If we do that, adding ./ to the pattern won't help, we would have to add $PWD/.

Clearly, something's wrong here. We want to match relative paths - relative to the directory passed as an argument to codespell. Instead, codespell prepends the directory to the file path, or ./ if no argument is given. This is a nuisance because it is counterintuitive, because the -skip argument needs to be adapted (--skip ./foo with plain codespell and --skip /current/dir/foo with codespell /current/dir), and because the special case of the current directory ./ feels completely wrong.

@akien-mga
Copy link
Contributor

For the record, --skip also seems to be broken when used together with explicit paths:

$ codespell --skip ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:7: Retruns ==> Returns

My use case is running codespell via GitHub Actions only on files changed in a PR instead of the whole repository (so passing an explicit list of changed files, using the method in codespell-project/actions-codespell#65), but still using a skip list to ignore thirdparty files, various specific files with false positive typos, and .git (which codespell seems to consistently fail at ignoring despite its claims).

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented May 8, 2023

You're doing the exact opposite of what you want, you are actually instructing codespell to check ./thirdparty/README.md instead of skipping it:

$ mkdir thirdparty
$ 
$ cat > thirdparty/README.md 
Retruns
$ 
$ codespell -S thirdparty
$ codespell -S ./thirdparty
$ codespell -S ./thirdparty/README.md
$ codespell -S ./thirdparty,./thirdparty/README.md 
$ codespell -S ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:1: Retruns ==> Returns
$ 

@akien-mga
Copy link
Contributor

akien-mga commented May 8, 2023

I think you misunderstood my example. I want codespell to ignore all files in the thirdparty folder, even if some or all of them get manually listed as paths to check. In other word, I want the skip list to take precedence over the paths to check list.

It works fine with individual files on the skip list, but not with folders:

$ codespell -S thirdparty/README.md thirdparty/README.md 
$ codespell -S ./thirdparty/README.md ./thirdparty/README.md 
$ codespell -S ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:8: Retruns ==> Returns

There's still this issue of mismatch between paths with or without the ./ prefix (so codespell -S ./thirdparty/README.md thirdparty/README.md doesn't actually skip the file as it doesn't match).

You may ask why I'm specifying the file manually in the list of paths to check if I also want to skip it, but that was just to have a simple example.

My real use case is as described above. In script form:

files=$(gh pr diff ${{ github.event.pull_request.number }} --name-only | xargs -I {} sh -c 'echo "./{}"' | tr '\n' ' ')
codespell -w -q 3 -S "./.git,./bin,./thirdparty,*.desktop,*.gen.*,*.po,*.pot,*.rc,./AUTHORS.md,./COPYRIGHT.txt,./DONORS.md,./core/input/gamecontrollerdb.txt,./core/string/locales.h,./editor/project_converter_3_to_4.cpp,./misc/scripts/codespell.sh,./platform/android/java/lib/src/com,./platform/web/node_modules,./platform/web/package-lock.json" -L "curvelinear,doubleclick,expct,findn,gird,hel,inout,lod,nd,numer,ot,te,vai" --builtin "clear,rare,en-GB_to_en-US" "$files"

Or shorter with what matters:

files=$(gh pr diff ${{ github.event.pull_request.number }} --name-only | xargs -I {} sh -c 'echo "./{}"' | tr '\n' ' ')
codespell -S "./thirdparty" "$files"

"$files is computed and may include files which might the pattern of the skip list. When the skip list includes folders, codespell doesn't match them properly.

(And the xargs -I {} sh -c 'echo "./{}"' | tr '\n' ' ' part is to workaround the weird ./ prefix issue.)

@DimitriPapadopoulos
Copy link
Collaborator

I think you misunderstood me, and I still think you do the exact opposite of what you would want.

Have a look at my examples: ---skip requires a comma-separated list of files and folders.

@akien-mga
Copy link
Contributor

akien-mga commented May 8, 2023

I'm fully aware, and I know what I'm doing. I think the workaround I had to do in godotengine/godot#76842 for this bug makes my use case very explicit. The non pseudo code example I gave here makes it very clear that I know how to use --skip.

In case it's not clear:

usage: codespell [-h] [--version] [-d] [-c] [-w] [-D DICTIONARY] [--builtin BUILTIN-LIST] [--ignore-regex IGNORE_REGEX] [-I FILE] [-L WORDS] [--uri-ignore-words-list WORDS] [-r REGEX] [--uri-regex URI_REGEX]
                 [-s] [--count] [-S SKIP] [-x FILE] [-i INTERACTIVE] [-q QUIET_LEVEL] [-e] [-f] [-H] [-A LINES] [-B LINES] [-C LINES] [--config CONFIG] [--toml TOML]
                 [files ...]

I'm making use of this optional [files ...] argument to specify which files I want codespell to check. Some of those may or may not match patterns in the skip list, and I don't want codespell to check them if they do. It works with files, not with folders. That's all.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented May 8, 2023

In the example you have sent, it may be expected that codespell does not skip ./thirdparty/README.md because you pass it as an explicit argument:

$ codespell -S thirdparty/README.md thirdparty/README.md 
$ codespell -S ./thirdparty/README.md ./thirdparty/README.md 
$ codespell -S ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:8: Retruns ==> Returns

This is a corner case, where at the same time you ask codespell to check a file and to skip it:

codespell -S <filename> <filename>

Should the --skip option win or the explicit file argument?

In any case, the whole skipping business needs to be overhauled, a large part of the broken use cases involve ./path vs. path.

This works:

$ codespell -S ./thirdparty
$ 

@akien-mga
Copy link
Contributor

The three examples you quoted show clearly that what I'm expecting (skip takes precedence over explicit files listing when there's a conflict) works fine for files, and not for folders in the skip list.

But I leave it at that, we won't understand each other evidently.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented May 8, 2023

Let's start over. These examples actually raise a precedence question. Explicit file argument or skip list? I think the skip list should win, and that is what usually happens. However, in this case, we do seem to have a precedence issue with folders:

$ mkdir thirdparty
$ 
$ cat > thirdparty/README.md 
Retruns
$ 
$ 
$ 
$ codespell -S ./thirdparty ./thirdparty/README.md
./thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S thirdparty ./thirdparty/README.md
./thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S ./thirdparty thirdparty/README.md
thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S thirdparty thirdparty/README.md
thirdparty/README.md:1: Retruns ==> Returns
$ 
$ 
$ 
$ codespell -S ./thirdparty/README.md ./thirdparty/README.md
$ 
$ codespell -S thirdparty/README.md ./thirdparty/README.md
./thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S ./thirdparty/README.md thirdparty/README.md
thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S thirdparty/README.md thirdparty/README.md
$ 
$ 
$  
$ codespell -S ./thirdparty
$ 
$ codespell -S thirdparty
$ 

But then, I feel this a different issue, not the usual ./path vs. path issue, not an issue with folder names. You should probably open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants