Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify pickle files #239

Open
kam193 opened this issue Jul 7, 2024 · 0 comments
Open

Identify pickle files #239

kam193 opened this issue Jul 7, 2024 · 0 comments
Assignees
Labels
assess We still haven't decided if this will be worked on or not enhancement New feature or request

Comments

@kam193
Copy link

kam193 commented Jul 7, 2024

Is your feature request related to a problem? Please describe.
Recently, I came across Python pickle files used in a suspicious package, and I'd like to eventually use some analysis for pickles. To implement any in an AL service, AssemblyLine would need to correctly identify pickle files. Currently, the identification is left as unknown.

Describe the solution you'd like
I looked at the definition of the protocol, and when it looks impossible to detect every pickle version, newer versions (since Python 2.3) have specific start bytes indicating protocol version, and later (since Python 3.4, default in 3.8) also uses frames what seems to introduce another static byte.

See following Python source:

Thus, every object pickled in protocol v4 starts with 80 04 95, and in v5 80 05 95. So far, I haven't found any other file format using those bytes as header. Given that AssemblyLine doesn't assign any type to pickle files, it looks safe to introduce those bytes as identifiers.

I'd suggest adding them to the file identification.

Describe alternatives you've considered

  • It looks like 80 02 and 80 03 may also be unique, but I feel unsure about using just them. In addition, v4 has been the default for years, so I don't think it's necessary.
  • Every pickle should end with STOP opcode; however, I don't think it makes sense to check this.

Additional context
For some reason, identifying pickles isn't popular yet. The python module does not perform any special verification - it just breaks once an unknown opcode is found.

@kam193 kam193 added assess We still haven't decided if this will be worked on or not enhancement New feature or request labels Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess We still haven't decided if this will be worked on or not enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants