You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Recently, I came across Python pickle files used in a suspicious package, and I'd like to eventually use some analysis for pickles. To implement any in an AL service, AssemblyLine would need to correctly identify pickle files. Currently, the identification is left as unknown.
Describe the solution you'd like
I looked at the definition of the protocol, and when it looks impossible to detect every pickle version, newer versions (since Python 2.3) have specific start bytes indicating protocol version, and later (since Python 3.4, default in 3.8) also uses frames what seems to introduce another static byte.
Thus, every object pickled in protocol v4 starts with 80 04 95, and in v5 80 05 95. So far, I haven't found any other file format using those bytes as header. Given that AssemblyLine doesn't assign any type to pickle files, it looks safe to introduce those bytes as identifiers.
I'd suggest adding them to the file identification.
Describe alternatives you've considered
It looks like 80 02 and 80 03 may also be unique, but I feel unsure about using just them. In addition, v4 has been the default for years, so I don't think it's necessary.
Every pickle should end with STOP opcode; however, I don't think it makes sense to check this.
Additional context
For some reason, identifying pickles isn't popular yet. The python module does not perform any special verification - it just breaks once an unknown opcode is found.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Recently, I came across Python pickle files used in a suspicious package, and I'd like to eventually use some analysis for pickles. To implement any in an AL service, AssemblyLine would need to correctly identify pickle files. Currently, the identification is left as unknown.
Describe the solution you'd like
I looked at the definition of the protocol, and when it looks impossible to detect every pickle version, newer versions (since Python 2.3) have specific start bytes indicating protocol version, and later (since Python 3.4, default in 3.8) also uses frames what seems to introduce another static byte.
See following Python source:
PROTO
opcodeFRAME
opcodeThus, every object pickled in protocol v4 starts with
80 04 95
, and in v580 05 95
. So far, I haven't found any other file format using those bytes as header. Given that AssemblyLine doesn't assign any type to pickle files, it looks safe to introduce those bytes as identifiers.I'd suggest adding them to the file identification.
Describe alternatives you've considered
80 02
and80 03
may also be unique, but I feel unsure about using just them. In addition, v4 has been the default for years, so I don't think it's necessary.Additional context
For some reason, identifying pickles isn't popular yet. The python module does not perform any special verification - it just breaks once an unknown opcode is found.
The text was updated successfully, but these errors were encountered: