pre-filter strings, bytes based on whats found in the file #2126

williballenthin · 2024-06-06T08:25:32Z

To avoid searching for strings/bytes that won't ever be found at a particular scope, we could first check that each string/bytes is present somewhere in the file.

If its not, then we can partially evaluate some rule logic (like and statements) to see if further logic can be pruned and/or rules skipped.

For example, we have HTTP User-Agent rules that contain tons of strings that match under a single or. If none are present in the file, we can skip the whole rule.

We'd want to ensure that the up-front scan to find the file matches doesn't take much time, and that it doesn't outweigh any performance improvements. Remember we may have hundreds or thousands of terms to look for. We can also use evaluation counts to show that less logic needs to be matched when some branches are pruned.

The text was updated successfully, but these errors were encountered:

williballenthin mentioned this issue Jun 6, 2024

investigate optimization of rule matching (May, 2024) #2063

Closed

williballenthin added the performance Related to capa's performance label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-filter strings, bytes based on whats found in the file #2126

pre-filter strings, bytes based on whats found in the file #2126

williballenthin commented Jun 6, 2024

pre-filter strings, bytes based on whats found in the file #2126

pre-filter strings, bytes based on whats found in the file #2126

Comments

williballenthin commented Jun 6, 2024