-
Notifications
You must be signed in to change notification settings - Fork 22
Early Profiling Data
We located a fairly complex-looking EC2-related policy.
The CEL expression looks like this.
resource["IamInstanceProfile"]["Arn"].matches("(.*)(?=Enterprise-Reserved-CloudCustodian.*)")
&& (resource["Tags"].filter(x, x["Key"] == "ASSET")[0]["Value"] != "CLOUDCUSTODIAN"
&& resource["Tags"].filter(x, x["Key"] == "ASSET")[0]["Value"] != "CLOUDCORESERVICES"
&& present(resource["Tags"].filter(x, x["Key"] == "ASSET")[0]["Value"]))
This is not actually completely useful. It was a mechanical translation of an actual Cloud Custodian policy (lightly edited.)
This filter will break for resources with an ASSET
tag of None. It will also break for resources with no ASSET
tag at all.
A small change to the CEL can fix some of these problems. The point, however, was to gather profile data.
filters:
- and:
- key: IamInstanceProfile.Arn
op: regex
type: value
value: (.*)(?=Enterprise-Reserved-CloudCustodian.*)
- and:
- key: tag:ASSET
op: ne
type: value
value: CLOUDCUSTODIAN
- key: tag:ASSET
op: ne
type: value
value: CLOUDCORESERVICES
- key: tag:ASSET
type: value
value: present
We created 1,000 EC2 dummy descriptions using the AWS documentation, and filled in Tag values randomly.
Note that the filter is poorly-design, and there are some cases for which it will not work. There are some issues (like Tags) that CEL does not handle as cleanly as C7N.
This is a piece of the profile showing calls with a total time of (about) 100ms or more.
49036254 function calls (30102877 primitive calls) in 18.999 seconds
Ordered by: internal time
List reduced from 573 to 115 due to restriction <0.2>
ncalls tottime percall cumtime percall filename:lineno(function)
9313000/297000 6.521 0.000 6.932 0.000 tree.py:26(__repr__)
3996000/668000 2.264 0.000 4.596 0.000 typing.py:664(__hash__)
6060000 1.151 0.000 1.531 0.000 typing.py:479(__hash__)
10537148/7080021 1.126 0.000 1.805 0.000 {built-in method builtins.hash}
1835554 0.717 0.000 0.717 0.000 celtypes.py:915(__repr__)
297000/1000 0.709 0.000 17.684 0.018 evaluation.py:598(concrete_method)
996146/7298 0.571 0.000 1.507 0.000 visitors.py:203(visit)
209000/41000 0.493 0.000 1.223 0.000 celtypes.py:822(__repr__)
1088000 0.411 0.000 0.411 0.000 lexer.py:126(__repr__)
996418/996338 0.409 0.000 0.866 0.000 visitors.py:168(_call_userfunc)
272000/1000 0.395 0.000 17.356 0.017 evaluation.py:727(visit_children)
1669146 0.297 0.000 0.299 0.000 {built-in method builtins.getattr}
2713421/2630421 0.246 0.000 0.355 0.000 {built-in method builtins.isinstance}
302000/1000 0.204 0.000 17.686 0.018 visitors.py:241(visit)
347000 0.196 0.000 4.792 0.000 typing.py:248(inner)
272000/1000 0.184 0.000 17.337 0.017 visitors.py:250(<listcomp>)
612002 0.174 0.000 0.272 0.000 __init__.py:1368(info)
162000/1000 0.145 0.000 0.662 0.001 adapter.py:97(json_to_cel)
272000/1000 0.142 0.000 17.338 0.017 visitors.py:249(visit_children)
144767 0.117 0.000 0.190 0.000 celtypes.py:899(__new__)
618162 0.099 0.000 0.099 0.000 __init__.py:1614(isEnabledFor)
The biggest offender is celtypes.py:915
which is the __repr__()
method of the celtypes.StringType
class. Clearly, logging is a costly part of this.
The second biggest offender is evaluation.py:598
which is the @trace
decorator used to capture logging details.
After that, celtypes.py:822
is the celtypes.MapType.__repr__()
method.
evaluation.py:727
is a wrapper around the Lark Interpreter.visit_children()
. This is where the bulk of the work is done. This accounts for 709 ms to process 1,000 resources. That's 0.7 ms per resource.
adapter.py:97
is the JSON-to-CEL conversion done to native Python objects to wrap then as CEL data types. While this is invoked at least once for each incoming resource, it seems to perform reasonably well when compared to the cost of logging and tracing.