Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework HTTP message forwarning #1955

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Rework HTTP message forwarning #1955

wants to merge 33 commits into from

Conversation

const-t
Copy link
Contributor

@const-t const-t commented Aug 16, 2023

HTTP part of #1103.

fw/http.h Outdated Show resolved Hide resolved
@const-t const-t marked this pull request as ready for review August 16, 2023 12:47
@const-t const-t marked this pull request as draft August 16, 2023 12:49
fw/http.h Outdated Show resolved Hide resolved
@const-t const-t marked this pull request as ready for review August 16, 2023 14:32
fw/http_msg.c Outdated Show resolved Hide resolved
fw/http_sess.c Outdated Show resolved Hide resolved
fw/http_sess.c Outdated Show resolved Hide resolved
fw/http.c Outdated Show resolved Hide resolved
fw/http_msg.c Outdated Show resolved Hide resolved
Copy link
Contributor

@EvgeniiMekhanik EvgeniiMekhanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need several small fixes

fw/http.c Outdated Show resolved Hide resolved
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just couple of comments

fw/http.h Outdated Show resolved Hide resolved
fw/http.h Outdated Show resolved Hide resolved
fw/http.c Outdated Show resolved Hide resolved
fw/http.c Show resolved Hide resolved
fw/http.c Outdated Show resolved Hide resolved
fw/http.c Show resolved Hide resolved
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request is an impressive amount of work, but I think there are still things to optimize, maybe we just need some discussions - please have a look onto my comments and let's discuss them.

I reviewed https://tempesta-tech.com/knowledge-base/Modify-HTTP-Messages/ regarding headers appending and it seems now we don't have ,-appending and there is no actual logic in the code for appending. Again - let's discuss.

fw/cache.c Outdated
Comment on lines 2583 to 2584
const bool h2 = TFW_MSG_H2(req), tls = TFW_CONN_TLS(req->conn);
const bool sh_frag = h2 ? false : (true & tls);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as

Suggested change
const bool h2 = TFW_MSG_H2(req), tls = TFW_CONN_TLS(req->conn);
const bool sh_frag = h2 ? false : (true & tls);
const bool sh_frag = !TFW_MSG_H2(req) && TFW_CONN_TLS(req->conn);

?

fw/http_parser.c Show resolved Hide resolved

it->hdrs_len += h_app.len;
return tfw_strcat(req->pool, orig_hdr, &h_app);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now append is used for the -ENOENT check only, i.e. it does not affect any logic, so __h2_req_hdrs() should not accept append.

Moreover, now it seems nobody actually use TfwHdrModsDesc->append, so the whole logic in http.c and http_msg.c could be simplified by removing the flag processing. tfw_http_msg_hdr_xfrm() is also called in only one place with append = 0.

In my understanding now the wiki page is incorrect in (@RomanBelozerov could you also please confirm with tests?): for resp_hdr_add Cache-Control "no-cache" it seems now we get

Cache-Control: no-store
Cache-Control: no-cache

Copy link
Contributor Author

@const-t const-t May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking append before calling __h2_req_hdrs() is reasonable and I will do it. But I can't remove TfwHdrModsDesc->append, this flag indicates type of operation: resp_hdr_add or resp_header_set. If append == 0 we use resp_header_set otherwise resp_hdr_add. append means: Header must be appended to message. We can choose another name for the flag if it confusing.

tfw_http_msg_hdr_xfrm_str() has a little bit of garbage code, I'll rewrite it.

fw/http_msg.h Outdated Show resolved Hide resolved
const TfwStr *c, *end;
unsigned int room, skb_room, n_copy, rlen, off, acc = 0;
TfwMsgIter *it = &hm->iter;
TfwPool* pool = hm->pool;

BUG_ON(it->skb->len > SS_SKB_MAX_DATA_LEN);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the function is called a lot of times, so it should be well optimized. I think it makes sense to make if (unlikely(skb_headlen(it->skb))) since even if we have a lot of such skbs, then only the first call hits the condition and all following calls go further. The inner if can eliminate unliekly - it's no sense to have 2 nested unlikely statements.

Also probably it makes sense to add if (likely(TFW_STR_PLAIN(s))) to TFW_STR_FOR_EACH_CHUNK_INIT() and unlikely(room == 0) in the while loop at the below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will be addressed in #2136

fw/http_msg.c Outdated
@@ -1336,32 +1287,30 @@ tfw_http_msg_expand_data(TfwMsgIter *it, struct sk_buff **skb_head,
return 0;
}

static int
tfw_http_msg_alloc_from_pool(TfwHttpTransIter *mit, TfwPool* pool, size_t size)
static char *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns an address in memory, not an array of chars, so I'd propose to make it void *

fw/http.c Outdated
Comment on lines 2532 to 2534
if (req->cleanup) {
__tfw_http_msg_cleanup(req->cleanup);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (req->cleanup) {
__tfw_http_msg_cleanup(req->cleanup);
}
if (req->cleanup)
__tfw_http_msg_cleanup(req->cleanup);

fw/http.c Outdated
req->cleanup = tfw_pool_alloc(hm->pool, sizeof(TfwHttpMsgCleanup));
if (unlikely(!req->cleanup))
return -ENOMEM;
memset(req->cleanup, 0, sizeof(TfwHttpMsgCleanup));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have pages_sz to check whether we have any pages in the data structures, so why do we do memset() instead of just to zeroize pages_sz and maybe skb_head?

if (!cache)
r = tfw_http_msg_expand_from_pool((TfwHttpMsg *)resp, &hdr);
else
r = tfw_http_msg_expand_data(&resp->iter, skb_head, &hdr, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add a TODO #634 comment. The function is relatively heavyweight operating with skbs and probably it makes sense to reconstruct all the headers from the cache also into TfwPool allocated area.

Please review #634 (comment) and the whole issue - maybe you have more thoughts after the pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will be addressed in #2136

}

if (tfw_http_hdr_sub(hid, hdr, h_mods))
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we make the headers substitution here? The function is just traverses the list of the headers to be substituted, but doesn't do anything useful. The same for tfw_http_adjust_resp(). If it looks too big, then let's add a comment in the code and in #634 to do it there.

Copy link
Contributor Author

@const-t const-t May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers substitution splitted into two parts: first tfw_http_hdr_sub() where we just skip header and don't add it to message, because we iterating over headers not over list of modifications. Second tfw_h1_add_loc_hdrs() where we adding all headers. We don't do it in tfw_http_hdr_sub() because in this case we need add additional if condition into tfw_h1_add_loc_hdrs() to skip headers if append == 0 - resp/req_header_set case.

Name tfw_http_hdr_sub() looks little bit confusing, I will rename it.

Also fix comments.
Parse HTTP 1.1 method and store it in `msg->h_tbl[TFW_HTTP_METHOD]`.
Method will be used during http request forwarding to rebuild the
request.
An ability to append values to headers was removed for HTTP2 requests
as well as earlier for whole HTTP1 and HTTP2 responses. It was removed
because we can't control correctness of appending accordingly to RFC.
Values could be appended to headers that doesn't have comma-separated
list syntax.
`curr_ptr` is not used because we expanding the last fragment of skb
from pool or adding new fragment from pool as well.
TFW_HTTP_B_NEED_STRIP_LEADING_LF and TFW_HTTP_B_NEED_STRIP_LEADING_CR
was used during HTTP1 request forwarding to cutoff first CR LF in skb.
AUTO_SEGS_N not changed after #1103 fix, because
most of responses has large size and with this
value (8) we can cover higher percentage of
responses
Now we use cleanup field instead of `old_head`, cleanup also used
during http1 request adjusting.
Allowed to use unknown method for headers that stored in dynamic table.
Before if tempesta received such request it was dropped.

Unknown method is method that processed as _TFW_HTTP_METH_UNKNOWN.
`tfw_http_add_hdr_clen()` replaced by new function
`tfw_http_resp_term_add_hdr_clen()`. Now it constructs
content-length header into headers table, without
skb modification as it done before. The headers
copies to response in the tfw_http_adjust_resp() along
with other headers.

`tfw_http_msg_hdr_xfrm()` - deleted, because it's un-
used. Some of functions that were used only by
`tfw_http_msg_hdr_xfrm()/tfw_http_msg_hdr_xfrm_str()`
not deleted, because we consider them as library
functions. For instance: `tfw_strcpy_comp_ext()`.

`ss_skb_get_room()` could be lib function, but it's
just a wrapper for `ss_skb_get_room_w_frag()`.

`__hdr_is_singular()` was used during adding headers
configured by administrator, but we treat adding
singular RAW headers via `resp_hdr_add` as
misconfiguration, there is no sense to check it for
each header.
In this patch we optimize header skipping logic that
applies in header modification for HTTP1 request/re-
sponse and HTTP2 response including cache.

       Overview of current behavior.
It consists of two parts: skipping and addition.
During response transformation or copying we
skipping headers that must be substituted or deleted
using `tfw_http_hdr_skip()`, when all headers from
original response are copied we add new headers
(directive (resp/req)_hdr_(add/set)) and add headers
that must be substituted ((resp/req)_hdr_set).

What have done:
1. `tfw_hdr_mods_t::spec_hdrs` turned into bitmap.
2. `tfw_hdr_mods_t::spec_hdrs` contains used only
    for *_hdr_set directives, headers related to
    *_hdr_add must not be skipped, thus excluded from
    `spec_hdrs`.
2. `tfw_http_hdr_skip()` only iterates over *_hdr_set
    headers and doesn't check `append` flag.
3. added `tfw_hdr_mods_t::scan_off`, this is offset
   where we start itarating headers and comparing the
   name. It's the case when modification not found by
   special header index or hpack index.
4. changed the layout of `tfw_hdr_mods_t::hdrs`,
   now we storing headers in following way:

   .----------------------------------------.
   |          `tfw_hdr_mods_t::hdrs`        |
   :----------------------------------------:
   | Special headers *hdr_set               |
   :----------------------------------------:
   | Hpack indexed raw headers *hdr_set     |
   :----------------------------------------:
   | Non-indexed raw headers *hdr_set       |
   :----------------------------------------:
   | All headers *hdr_add                   |
   '----------------------------------------'

   due this structure we can iterate only over
   unindexed raw headers.

Note: Looks like iterating over headers modification
table especially for removing is more profit approach,
because if modification applied we will not iterate
over this opearation again, we using it during http2
request headers modification, but it has disadvantages.
For instance we must realloc header table or split
skipping and adition(similar with current approach) and
such approach can't be used with current cache
architecture.
Just fix to commit with logic unifying, this patch
adds abillity to adding headers to HTTP2 request,
without this header will be substituted even if
req_hdr_add directive was used.
Add a new method intended to close parsed HTTP1.1
method. `tfw_http_msg_hdr_close()` is not suitable
in this case, it has a lot of logic that unnecessary
for method.
fix 1:
Skip trailer headers during copying regular headers
to not duplicate them in headers and in trailer.

fix 2:
Call `mark_spec_hbh` for headers with only LF in the
end of a message.

fix 3:
Headers `connection` and `keep-alive` marked as
hop-by-hop during parsing, it's allowed to remove
following conditions:
```
if (hid == TFW_HTTP_HDR_KEEP_ALIVE
     || hid == TFW_HTTP_HDR_CONNECTION)
```
do only simple `if (tgt->flags & TFW_STR_HBH_HDR)`.

Before this patch `keep-alive` was marked as HBH, but
only if `keep-alive` specified in `connection` header.
RFC 9110 7.6.1 say that intermediaries SHOULD remove or
replace such headers even if the not listed in
Connection.

fix 4:
Add Host header only once: Host header must be added only
once, if it modified from config, therefore during headers
modification if not then as first header.
We had a bug during the suffix match in function
`tfw_http_search_cookie()`, before comparing
the name we store current chunk `t = *chunk`, then
modify and compare the chunk, after we rostore the
chunk `*chunk = t;` that leads to header corruption,
because macros `TFW_STR_EQ_CSTR` may assign new
address to `chunk`, therefore restoration happens to
a newly assigned adress instead of original adress in
`chunk`. Fixed by storing address to temporary
variable and then restore `data` and `len` by this
adress.

`__tfw_str_eq_cstr` rewritten to inline function.
Inline functions is more clear, less error prone and
suitable in this case, because we don't need to define
tmp vars and explicitly save values, only move `chunk`
pointer forward. Generated assembly will be the same.
We must copy a mark from old skb head to new, otherwise
we will lose the mark.
Call `mark_spec_hbh` only for parsed special header
and do not iterate over headers table.
When absolute URI is specified in message, we
completely ignore `Host` header, therefore there is no
sense to do comparision between authority from URI and
the `Host` header. Authority from URI will be used for
message forwarding. However we still validating a
syntax of the `Host` header, to catch protocol
violations.
 - return replaced with `goto clean`
   in `tfw_h1_adjust_req()`
@EvgeniiMekhanik
Copy link
Contributor

Please check that #2266 is ok after this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants