Skip to content

Early Profiling Data

S.Lott edited this page Sep 3, 2020 · 4 revisions

Example

We located a fairly complex-looking EC2-related policy.

The CEL expression looks like this.

resource["IamInstanceProfile"]["Arn"].matches("(.*)(?=Enterprise-Reserved-CloudCustodian.*)") 
&& (resource["Tags"].filter(x, x["Key"] == "ASSET")[0]["Value"] != "CLOUDCUSTODIAN" 
&& resource["Tags"].filter(x, x["Key"] == "ASSET")[0]["Value"] != "CLOUDCORESERVICES" 
&& present(resource["Tags"].filter(x, x["Key"] == "ASSET")[0]["Value"]))

This is not optimal in any sense. It was a mechanical translation of a Cloud Custodian policy (lightly edited.)

        filters:
        - and:
          - key: IamInstanceProfile.Arn
            op: regex
            type: value
            value: (.*)(?=Enterprise-Reserved-CloudCustodian.*)
          - and:
            - key: tag:ASSET
              op: ne
              type: value
              value: CLOUDCUSTODIAN
            - key: tag:ASSET
              op: ne
              type: value
              value: CLOUDCORESERVICES
            - key: tag:ASSET
              type: value
              value: present

We created 1,000 EC2 dummy descriptions using the AWS documentation, and filled in Tag values randomly.

Note that the filter is poorly-design, and there are some cases for which it will not work. There are some issues (like Tags) that CEL does not handle as cleanly as C7N.

Raw Data

This is a piece of the profile showing calls with a total time of (about) 100ms or more.

        49036254 function calls (30102877 primitive calls) in 18.999 seconds

   Ordered by: internal time
   List reduced from 573 to 115 due to restriction <0.2>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
9313000/297000    6.521    0.000    6.932    0.000 tree.py:26(__repr__)
3996000/668000    2.264    0.000    4.596    0.000 typing.py:664(__hash__)
  6060000    1.151    0.000    1.531    0.000 typing.py:479(__hash__)
10537148/7080021    1.126    0.000    1.805    0.000 {built-in method builtins.hash}
  1835554    0.717    0.000    0.717    0.000 celtypes.py:915(__repr__)
297000/1000    0.709    0.000   17.684    0.018 evaluation.py:598(concrete_method)
996146/7298    0.571    0.000    1.507    0.000 visitors.py:203(visit)
209000/41000    0.493    0.000    1.223    0.000 celtypes.py:822(__repr__)
  1088000    0.411    0.000    0.411    0.000 lexer.py:126(__repr__)
996418/996338    0.409    0.000    0.866    0.000 visitors.py:168(_call_userfunc)
272000/1000    0.395    0.000   17.356    0.017 evaluation.py:727(visit_children)
  1669146    0.297    0.000    0.299    0.000 {built-in method builtins.getattr}
2713421/2630421    0.246    0.000    0.355    0.000 {built-in method builtins.isinstance}
302000/1000    0.204    0.000   17.686    0.018 visitors.py:241(visit)
   347000    0.196    0.000    4.792    0.000 typing.py:248(inner)
272000/1000    0.184    0.000   17.337    0.017 visitors.py:250(<listcomp>)
   612002    0.174    0.000    0.272    0.000 __init__.py:1368(info)
162000/1000    0.145    0.000    0.662    0.001 adapter.py:97(json_to_cel)
272000/1000    0.142    0.000   17.338    0.017 visitors.py:249(visit_children)
   144767    0.117    0.000    0.190    0.000 celtypes.py:899(__new__)
   618162    0.099    0.000    0.099    0.000 __init__.py:1614(isEnabledFor)

Analysis

The biggest offender is celtypes.py:915 which is the __repr__() method of the celtypes.StringType class. Clearly, logging is a costly part of this.

The second biggest offender is evaluation.py:598 which is the @trace decorator used to capture logging details.

After that, celtypes.py:822 is the celtypes.MapType.__repr__() method.

evaluation.py:727 is a wrapper around the Lark Interpreter.visit_children(). This is where the bulk of the work is done. This accounts for 709 ms to process 1,000 resources. That's 0.7 ms per resource.

adapter.py:97 is the JSON-to-CEL conversion done to native Python objects to wrap then as CEL data types. While this is invoked at least once for each incoming resource, it seems to perform reasonably well when compared to the cost of logging and tracing.

Clone this wiki locally