Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Content-Length fix for Apache HttpClients #6749

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

anuragagarwal561994
Copy link
Contributor

@anuragagarwal561994 anuragagarwal561994 commented Sep 25, 2022

Closes #6747

Changes Summary

  • Fetches content-length from request entity, instead of header as the header is set internally by the http client in a request copy, but not the original user request.
  • Best effort to find the content-length for chunked encoding, by reading bytes length read / written.
  • Converts apache-httpclient-4.0 unit tests in java. Version 4.0 was using GuardedBy annotation which was interfering with the test cases, hence the library version was shifted to 4.1. Help wanted! Convert Groovy tests to Java #7195
  • Moves common code for httpclient and httpclient-4.0 in a separate module
  • Adds new HttpProcessor instrumentation, to be able to fetch the final internal HttpRequest / HttpResponse. These are then bound to the current otel context and used for instrumenting. This allows us to instrument on the actual headers and attributes used in the request by the client and not just what is being sent by the user which is the minimal information. By using this in HttpClient 4.0 test case we were able to record 302 redirects which was not being recorded earlier. More such issues of request headers / response headers to attributes will also be resolved.

Implementation Summary

BytesTransferMetrics: This is used to record the actual bytes transferred when a request is sent or the response is received. The recorded metrics is used in case when the request is sent / received using chunked encoding, because in that case the content-length header is not present and we can't determine the content length without consuming the whole content. It fills up the content length attributes when the headers are not present.

CountingOutputStream: This is used to calculate the bytes written by the request when the entity is chunked. We keep a counter to add the buffer sizes. Implementation has picked up from Guava & Apache Libraries.

ApacheHttpClientProcessorInstrumentation
The internal request is actually present in HttpContext and the first approach was to fetch from the same, but there were couple of problems doing so.

  • request / response in not populated in context until version 4.3, thus we would have to break in 2 modules < 4.0 & >= 4.3
  • http classic client has 8 interface methods with some having http context while some doesn't have. We always needed a context to be able to fetch final request / response

To resolve the above limitation a different approach was incorporated. We instrumented the HttpProcessor which will work for both the http classic client / server and it will receive the request / response as parameters. We will store them with respect to otel context in a virtual field and use the value at the time of ending the instrumentation.
Note that here the context of the http classic client will be same as the context of the request, hence we can directly use currentContext.

For async client, the context will change but in async client we can ensure we always have a HttpContext or we create one if not provided by the user. Hence we add otel context in the http context and use it when this http context is passed by the client to the processor methods. We keep updating the latest request / response received because in case of retries or authentication logics, we might want to keep the latest.

Side Note: HttpProcessor can also run for Apache Http Server provided by the library and also for the suppressed spans. Hence we check for HTTP_CLIENT SpanKey to be part of the current context and only then we will execute the logic.

HttpOtelContext: This is just an adapter to make our life easier to use http context in a unified way, to store, fetch and remove otel related information.

ApacheHttpClientEntityStorage: To deal with storage of and within http client entities.

ApacheHttpClientAttributesHelper: Contains all attribute related helpers

ApacheHttpClientInstrumentationHelper: Contains instrumentation related helpers

Other instrumentation changes

opensearch-rest-client: added REQEUST_CONTENT_LENGTH attribute as 0 for GET requests
elasticsearch-rest-client: added REQEUST_CONTENT_LENGTH attribute as 0 for GET requests

twilio: here although we should add REQEUST_CONTENT_LENGTH & RESPONSE_CONTENT_LENGTH in the tests, but the tests are using mocked clients instead of actual once, hence they don't follow the entire process of instrumentation.

I tried implementing non mocked client, but since the urls for twilio are hardcoded, I would have had to handle that case as well. Moreover it would have required making adding a dummy server in the test cases as well. Left that implementation, because anyways the instrumentations guarantees might not be applicable on mocked entities as they are not following the whole process.

@anuragagarwal561994 anuragagarwal561994 requested a review from a team September 25, 2022 04:43
@anuragagarwal561994 anuragagarwal561994 marked this pull request as draft September 25, 2022 04:43
@anuragagarwal561994 anuragagarwal561994 marked this pull request as ready for review January 4, 2023 05:09
@anuragagarwal561994
Copy link
Contributor Author

anuragagarwal561994 commented Jan 4, 2023

@trask @mateuszrzeszutek the change is ready and I have been able to cover all possible scenarios to be able to provide these features till the oldest version of http client 4.x 😌

I would be grateful if you could find some time from your busy schedules to review these changes. Thanks.

@anuragagarwal561994
Copy link
Contributor Author

@mateuszrzeszutek if you get time can you also start review of this PR?

@anuragagarwal561994
Copy link
Contributor Author

@trask this seems like a big change, but can you find some time to look into this if possible.

@anuragagarwal561994 anuragagarwal561994 force-pushed the request-content-length-apache-httpclient branch from d64d59d to 7760022 Compare January 28, 2023 21:15
@anuragagarwal561994 anuragagarwal561994 force-pushed the request-content-length-apache-httpclient branch from b0cfe98 to a106ca4 Compare January 28, 2023 22:05
@anuragagarwal561994
Copy link
Contributor Author

@mateuszrzeszutek @trask can you please look into this MR, I had completed the changes almost a month ago and had to rebase the changes all over again because of few changes being pushed to master recently.

Since this is a big change, rebasing becomes very difficult and eats up a lot of time.

@anuragagarwal561994
Copy link
Contributor Author

@trask any ETA when this can be looked into?

@trask
Copy link
Member

trask commented Feb 24, 2023

hi @anuragagarwal561994, sorry for the long delay. do you think you could find some ways to split this out into a few smaller PRs that would make it easier to review? thx!

@anuragagarwal561994
Copy link
Contributor Author

Hello @trask thanks for taking time to respond back.

This MR involves a lot of small improvements.

I will try my best to break it down in smaller pieces but I will be needing constant advice on the same.

Major change here have been because common code was extracted from all the modules and put in a common place to avoid repetitions

@trask
Copy link
Member

trask commented Feb 24, 2023

Major change here have been because common code was extracted from all the modules and put in a common place to avoid repetitions

maybe a first PR that just moves things around (in preparation for next PR(s))?

@anuragagarwal561994
Copy link
Contributor Author

So I was thinking that I can rebase merge few commits here and there first.

Each commit representing the problem that is solved as stated in the description.

There you can either review commit by commit or I can break those commits as separate merge requests.

@anuragagarwal561994
Copy link
Contributor Author

anuragagarwal561994 commented Feb 26, 2023

I have changed my approach as rebasing was slowing me down and requiring more efforts.

I will be submitting a list of small PRs with and will link the issue on this thread.
I will keep merging them in this PR, as soon as all of them gets merged this will become more streamlined and will only have 1 change.

@anuragagarwal561994
Copy link
Contributor Author

@trask let's start to review the above 2 PRs and then we I will create more, because they are kind of dependent. If there are more review comments, I will have to update all of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request Content-Length not recorded for Apache Http Clients
2 participants