lambda-python: Exception while exporting Span batch - Read timed out. #2817

aidanmorgan · 2024-08-31T13:59:50Z

Attempting to send spans from a python 3.11 lambda are slow to start (significant pause after the collector starts) and then time out sending spans to X-Ray.

My lambda has an attached EFS mount, is running in a VPC and has a API gateway fronting it. The role my lambda is executing in has the AWSXrayWriteOnlyAccess profile attached to it.

I have followed the instructions for instrumentation from the documentation.

The layer I am using is: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-25-0:1

This issue occurs when using a custom collector file, but also with the default that is provided in the layer. Configuration is set correctly according to the documentation (AWS_LAMBDA_EXEC_WRAPPER and OPENTELEMETRY_COLLECTOR_CONFIG_FILE are set).

Reading other forums I have tried increasing the memory available to the lambda with no luck.

I am accessing the trace context for my code as a global, using:

tracer = trace.get_tracer(__name__)

My lambda does create some sub-spans for capturing specific hotspots I am interested in, using code similar to:

    with tracer.start_as_current_span('SPAN_NAME') as span:

collector.yml (which is the same as the documentation):

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "localhost:4317"
      http:
        endpoint: "localhost:4318"

exporters:
  logging:
  awsxray:

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [awsxray]
    metrics:
      receivers: [otlp]
      exporters: [logging]
  telemetry:
    metrics:
      address: localhost:8888

The full exception trace from the logs is:

Exception while exporting Span batch.
Traceback (most recent call last):
  File "/var/lang/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/var/lang/lib/python3.11/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/var/lang/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/http/client.py", line 286, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/socket.py", line 706, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/python/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/site-packages/urllib3/connectionpool.py", line 801, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/site-packages/urllib3/util/retry.py", line 552, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/var/lang/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/var/lang/lib/python3.11/site-packages/urllib3/connectionpool.py", line 469, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/var/lang/lib/python3.11/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=4318): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/python/opentelemetry/sdk/trace/export/__init__.py", line 367, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 169, in export
    return self._export_serialized_spans(serialized_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 139, in _export_serialized_spans
    resp = self._export(serialized_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 114, in _export
    return self._session.post(
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/python/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/opentelemetry/instrumentation/requests/__init__.py", line 152, in instrumented_send
    return wrapped_send(self, request, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=4318): Read timed out. (read timeout=10)

The text was updated successfully, but these errors were encountered:

aidanmorgan · 2024-09-01T00:49:35Z

I have also tried changing the lambda to use a function url and directly invoking the lambda instead of using an async invocation from the API gateway, but get the same error.

Deploying the same code, but without a security group for attaching an EFS share appears to work, suspect there's some missing documentation about what permissions or ports are required for configuring the security groups?

Adding additional rules to the security group for ports 4317,4318) against the CIDR for my VPC still has the timeout issue.

I have also attempted to specify the OTEL_EXPORTER_OTLP_ENDPOINT to point to http://localhost:4318 (and http://127.0.0.1:4318) and updated the collector.yml file appropriately to bind to that address, however still receiving timeout issues.

github-actions · 2024-11-03T20:01:59Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions bot added the stale label Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lambda-python: Exception while exporting Span batch - Read timed out. #2817

lambda-python: Exception while exporting Span batch - Read timed out. #2817

aidanmorgan commented Aug 31, 2024 •

edited

Loading

aidanmorgan commented Sep 1, 2024 •

edited

Loading

github-actions bot commented Nov 3, 2024

lambda-python: Exception while exporting Span batch - Read timed out. #2817

lambda-python: Exception while exporting Span batch - Read timed out. #2817

Comments

aidanmorgan commented Aug 31, 2024 • edited Loading

aidanmorgan commented Sep 1, 2024 • edited Loading

github-actions bot commented Nov 3, 2024

aidanmorgan commented Aug 31, 2024 •

edited

Loading

aidanmorgan commented Sep 1, 2024 •

edited

Loading