Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMP: Remove UTF-8 BOM from source files #1215

Merged
merged 1 commit into from
Aug 17, 2024

Conversation

@jcfr
Copy link
Member

jcfr commented Aug 16, 2024

For now, the easiest is to keep using ASCII in all CTK source files. If there is a good reason to change this in the future then it should be changed for all source files.

To enforce this, is there a check we could add to https://github.com/commontk/CTK/blob/master/.pre-commit-config.yaml ?

@Punzo
Copy link
Contributor Author

Punzo commented Aug 16, 2024

For now, the easiest is to keep using ASCII in all CTK source files. If there is a good reason to change this in the future then it should be changed for all source files.

To enforce this, is there a check we could add to https://github.com/commontk/CTK/blob/master/.pre-commit-config.yaml ?

I am not familiar with the pre-commit check infrastructure. But if possible we could run a script that checks the first 3 characters of a file, i.e. something like this:

# Read the first three bytes of the file
BOM=$(head -c 3 "$FILE" | xxd -p -c 3)

# Check if the BOM matches the UTF-8 BOM (EFBBBF)
if [ "$BOM" = "efbbbf" ]; then
  echo "The file '$FILE' contains a UTF-8 BOM."
else
  echo "The file '$FILE' does not contain a UTF-8 BOM."
fi

@lassoan
Copy link
Member

lassoan commented Aug 16, 2024

Any non-ASCII character (code > 127) might cause trouble, such as accented characters. Also, we should only check source code files.

I've got Copilot generate this code, does this look good?

# Get the list of staged source files
source_files=$(git diff --cached --name-only -- '*.cpp' '*.h' '*.in' '*.cmake' 'CMakeLists.txt' '*.py')

# Check for non-ASCII characters in the staged source files
non_ascii_files=$(echo "$source_files" | xargs grep -P -n "[^\x00-\x7F]")

if [ -n "$non_ascii_files" ]; then
  echo "Error: The following source files contain non-ASCII characters:"
  echo "$non_ascii_files"
  exit 1
fi

@lassoan lassoan merged commit 6292d09 into commontk:master Aug 17, 2024
4 checks passed
@Punzo Punzo deleted the removeUTF-8BOM branch August 18, 2024 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants