Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filenames and directory names need to be detoxed #71

Open
whitis opened this issue Apr 11, 2023 · 0 comments
Open

filenames and directory names need to be detoxed #71

whitis opened this issue Apr 11, 2023 · 0 comments

Comments

@whitis
Copy link

whitis commented Apr 11, 2023

Not only does the program fail to detox the filenames and directorynames, but it interferes with you doing so.

hbd -s "$(cat cookie)" --library-path autodownload/ --progress
detox -r autodownload/
hbd -s "$(cat cookie)" --library-path autodownload/ --progress
cd autodownload
ls
'Humble Book Bundle - The Joy of Coding by No Starch Press'
 Humble_Book_Bundle-The_Joy_of_Coding_by_No_Starch_Press
'Humble Tech Book Bundle - Cookbooks for Coders'
 Humble_Tech_Book_Bundle-Cookbooks_for_Coders
'Humble Tech Book Bundle - Linux MEGA Bundle by Packt'
 Humble_Tech_Book_Bundle-Linux_MEGA_Bundle_by_Packt
'Humble Tech Book Bundle - Pocket Guides 2023 by OReilly'
 Humble_Tech_Book_Bundle-Pocket_Guides_2023_by_OReilly

For each repaired directory name, the program has recreated one with spaces in the names.

And each of those extraneous directories contains additional unwanted and impropertly named directories

ls  *\ Book\ *
'Humble Book Bundle - The Joy of Coding by No Starch Press':
'Algorithmic Thinking - A Problem-Based Introduction'
'Clojure for the Brave and True - Learn the Ultimate Language and Become a Better Programmer'
'Computer Graphics from Scratch'
'Effective C'
'Hardcore Programming for Mechanical Engineers'
'If Hemingway Wrrote Javascript'
'Introduction to Computer Organization'
'Land of Lisp - Learn to Program in Lisp One Game at a Time'
'Learn to Code by Solving Problems'
'Learn You A Haskell for Great Good'
'Network Programming with Go'
'Racket Programming the Fun Way'
'Rust for Rustaceans'
'The Rust Programming Language Rust 2018'
'The Secret Life of Programs'
'Write Great Code Vol. 1 2nd Edition'
'Write Great Code Vol. 2 2nd Edition'
'Write Great Code Vol. 3 2nd Edition'

'Humble Tech Book Bundle - Cookbooks for Coders':
'AWS Cookbook'
'bash Cookbook 2E'
'C Cookbook'
'Cloud Native Security Cookbook'
'Deep Learning Cookbook'
'Google Cloud Cookbook'
'Java Cookbook 4th Edition'
'JavaScript Cookbook 3E'
'Linux Cookbook 2E'
'Machine Learning with Python Cookbook'
'MySQL Cookbook'
'Powershell Cookbook 4E'
'Python Cookbook 3E'
'Raspberry Pi Cookbook 3rd Edition'
'R Cookbook 2nd Edition'
'React Cookbook'
'Regular Expressions Cookbook 2E'
'RESTful Web API Patterns  Practices Cookbook'

'Humble Tech Book Bundle - Linux MEGA Bundle by Packt':
'Digital Forensics with Kali Linux - Second Edition'
'Hands-On Enterprise Automation on Linux'
'Hands-On Linux Administration on Azure - Second Edition'
'Linux Administration Best Practices'
'Linux Command Line and Shell Scripting Techniques'
'Linux Device Driver Development'
'Linux for Networking Professionals'
'Linux Kernel Debugging'
'Linux Kernel Programming'
'Linux Kernel Programming Part 2 - Char Device Drivers and Kernel Synchronization'
'Linux Service Management Made Easy with systemd'
'Linux System Programming Techniques'
'Mastering Embedded Linux Programming'
'Mastering Kali Linux for Advanced Penetration Testing'
'Mastering Linux Administration'
'Mastering Linux Device Driver Development'
'Mastering Linux Security and Hardening'
'Mastering Linux Security and Hardening - Third Edition'
'Migrating Linux to Microsoft Azure'
'Red Hat Enterprise Linux 8 Administration'
'Red Hat Enterprise Linux 9 Administration'
'SELinux System Administration - Third Edition'
'The Ultimate Kali Linux Book'
'Windows and Linux Penetration Testing from Scratch'
'Windows Subsystem for Linux 2 WSL 2 Tips Tricks and Techniques'

'Humble Tech Book Bundle - Pocket Guides 2023 by OReilly':
'Bash Pocket Reference 2E'           'PowerShell Pocket Reference 3E'
'C 10 Pocket Guide'                  'PyTorch Pocket Reference'
'C Pocket Reference'                 'Qiskit Pocket Guide'
'Data Pipelines Pocket Reference'    'Q Pocket Guide'
'Git Pocket Guide 1E'                'Regular Expressions Pocket Reference'
'grep Pocket Reference'              'sed and awk Pocket Reference 2E'
'Linux Pocket Guide'                 'SQL Pocket Guide 4E'
'Machine Learning Pocket Reference'  'vi and Vim Editors Pocket Reference 2E'

crude workaround:
find autodownload/*\ Book\ * -exec rmdir {} \;

In this case, it didn't download the files again, apparently because they were recorded in autodownload/.cache.json, but it did recreate the badly named directories.

Detox itself can be used to process the name (name must actually exist in the current directory)
detox --dry-run *
too many spaces -> too_many_spaces
detox --dry-run too\ many\ spaces
too many spaces -> too_many_spaces

Or it can be built in, if you don't need to worry about utf-8 characters.
Characters other than [A-Za-z0-9.-] should be changed to "" and adjacent underscores replaced with a single underscore.

Character which need to be escaped in shell should definitely not be allowed:

badchars=" !\"#$&'()*,;<>?[\]^`{|}"   # note, escaped double quote
goodchars="%+-./:=@_" 

/ is path separator
Which Characters need to be escaped when using bashdetox converts & to and and converts a number of other specific iso8859/unicode characters to ascii equivalents
detox/src/builtin_table.c

Also, in addition to the usual detox rules, regex "^Humble [A-Za-z]*Book Bundle ?- ?" should be deleted.

ReportABug Summary

Generated 2023-04-11 05:43:42.428100 with arguments ['humblebundle-downloader']

  • Python 3.6.9 64bit
  • Platform Linux-5.4.0-144-generic-x86_64-with-Ubuntu-18.04-bionic

Module info

humblebundle-downloader

_error_full = "No module named 'humblebundle-downloader'"
_error_type = 'ModuleNotFoundError'

sys

argv = ['humblebundle-downloader']
executable = '/usr/bin/python3'
implementation = {'_multiarch': "'x86_64-linux-gnu'", 'cache_tag': "'cpython-36'", 'hexversion': '50727408', 'name': "'cpython'"}
path = [
    '/home/$USER/.local/bin',
    '/usr/lib/python36.zip',
    '/usr/lib/python3.6',
    '/usr/lib/python3.6/lib-dynload',
    '/home/$USER/.local/lib/python3.6/site-packages',
    '/usr/local/lib/python3.6/dist-packages',
    '/usr/lib/python3/dist-packages',
]
platform = 'linux'
prefix = '/usr'

platform

architecture = '64bit'
build = 'default'
build_date = 'Mar 10 2023 16:46:00'
machine = 'x86_64'
os = 'posix'
platform = 'Linux-5.4.0-144-generic-x86_64-with-Ubuntu-18.04-bionic'
version = '3.6.9'

Environment

PATH = '/home/$USER/.local/bin'
       '/home/$USER/bin'
       '/usr/local/sbin'
       '/usr/local/bin'
       '/usr/sbin'
       '/usr/bin'
       '/sbin'
       '/bin'
       '/usr/games'
       '/usr/local/games'
       '/snap/bin'
       '/usr/bin'
cwd = '/home/$USER/limbo_work_area/ebooks/purchased/humble_bundle/autodownload'

Censored words

Key Info
$HOST md5=3be8c5739d8f056b124838de345dec56, Unicode=Ll
$USER md5=f50c8fae274ecef0bd44f9ad874340c1, Unicode=Ll
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant