-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework HTTP message forwarning #1955
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need several small fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just couple of comments
4933987
to
97db973
Compare
76aeaa3
to
2ce9209
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request is an impressive amount of work, but I think there are still things to optimize, maybe we just need some discussions - please have a look onto my comments and let's discuss them.
I reviewed https://tempesta-tech.com/knowledge-base/Modify-HTTP-Messages/ regarding headers appending and it seems now we don't have ,
-appending and there is no actual logic in the code for appending. Again - let's discuss.
fw/cache.c
Outdated
const bool h2 = TFW_MSG_H2(req), tls = TFW_CONN_TLS(req->conn); | ||
const bool sh_frag = h2 ? false : (true & tls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as
const bool h2 = TFW_MSG_H2(req), tls = TFW_CONN_TLS(req->conn); | |
const bool sh_frag = h2 ? false : (true & tls); | |
const bool sh_frag = !TFW_MSG_H2(req) && TFW_CONN_TLS(req->conn); |
?
|
||
it->hdrs_len += h_app.len; | ||
return tfw_strcat(req->pool, orig_hdr, &h_app); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now append
is used for the -ENOENT
check only, i.e. it does not affect any logic, so __h2_req_hdrs()
should not accept append
.
Moreover, now it seems nobody actually use TfwHdrModsDesc->append
, so the whole logic in http.c and http_msg.c could be simplified by removing the flag processing. tfw_http_msg_hdr_xfrm()
is also called in only one place with append = 0
.
In my understanding now the wiki page is incorrect in (@RomanBelozerov could you also please confirm with tests?): for resp_hdr_add Cache-Control "no-cache"
it seems now we get
Cache-Control: no-store
Cache-Control: no-cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking append before calling __h2_req_hdrs()
is reasonable and I will do it. But I can't remove TfwHdrModsDesc->append
, this flag indicates type of operation: resp_hdr_add or resp_header_set. If append == 0
we use resp_header_set otherwise resp_hdr_add. append
means: Header must be appended to message. We can choose another name for the flag if it confusing.
tfw_http_msg_hdr_xfrm_str()
has a little bit of garbage code, I'll rewrite it.
const TfwStr *c, *end; | ||
unsigned int room, skb_room, n_copy, rlen, off, acc = 0; | ||
TfwMsgIter *it = &hm->iter; | ||
TfwPool* pool = hm->pool; | ||
|
||
BUG_ON(it->skb->len > SS_SKB_MAX_DATA_LEN); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the function is called a lot of times, so it should be well optimized. I think it makes sense to make if (unlikely(skb_headlen(it->skb)))
since even if we have a lot of such skbs, then only the first call hits the condition and all following calls go further. The inner if
can eliminate unliekly
- it's no sense to have 2 nested unlikely statements.
Also probably it makes sense to add if (likely(TFW_STR_PLAIN(s)))
to TFW_STR_FOR_EACH_CHUNK_INIT()
and unlikely(room == 0)
in the while
loop at the below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment will be addressed in #2136
fw/http_msg.c
Outdated
@@ -1336,32 +1287,30 @@ tfw_http_msg_expand_data(TfwMsgIter *it, struct sk_buff **skb_head, | |||
return 0; | |||
} | |||
|
|||
static int | |||
tfw_http_msg_alloc_from_pool(TfwHttpTransIter *mit, TfwPool* pool, size_t size) | |||
static char * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns an address in memory, not an array of chars, so I'd propose to make it void *
fw/http.c
Outdated
if (req->cleanup) { | ||
__tfw_http_msg_cleanup(req->cleanup); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (req->cleanup) { | |
__tfw_http_msg_cleanup(req->cleanup); | |
} | |
if (req->cleanup) | |
__tfw_http_msg_cleanup(req->cleanup); |
fw/http.c
Outdated
req->cleanup = tfw_pool_alloc(hm->pool, sizeof(TfwHttpMsgCleanup)); | ||
if (unlikely(!req->cleanup)) | ||
return -ENOMEM; | ||
memset(req->cleanup, 0, sizeof(TfwHttpMsgCleanup)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have pages_sz
to check whether we have any pages in the data structures, so why do we do memset()
instead of just to zeroize pages_sz
and maybe skb_head
?
if (!cache) | ||
r = tfw_http_msg_expand_from_pool((TfwHttpMsg *)resp, &hdr); | ||
else | ||
r = tfw_http_msg_expand_data(&resp->iter, skb_head, &hdr, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to add a TODO #634
comment. The function is relatively heavyweight operating with skbs and probably it makes sense to reconstruct all the headers from the cache also into TfwPool allocated area.
Please review #634 (comment) and the whole issue - maybe you have more thoughts after the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment will be addressed in #2136
} | ||
|
||
if (tfw_http_hdr_sub(hid, hdr, h_mods)) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we make the headers substitution here? The function is just traverses the list of the headers to be substituted, but doesn't do anything useful. The same for tfw_http_adjust_resp()
. If it looks too big, then let's add a comment in the code and in #634 to do it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers substitution splitted into two parts: first tfw_http_hdr_sub()
where we just skip header and don't add it to message, because we iterating over headers not over list of modifications. Second tfw_h1_add_loc_hdrs()
where we adding all headers. We don't do it in tfw_http_hdr_sub()
because in this case we need add additional if condition into tfw_h1_add_loc_hdrs()
to skip headers if append == 0
- resp/req_header_set case.
Name tfw_http_hdr_sub()
looks little bit confusing, I will rename it.
Also fix comments.
Parse HTTP 1.1 method and store it in `msg->h_tbl[TFW_HTTP_METHOD]`. Method will be used during http request forwarding to rebuild the request.
An ability to append values to headers was removed for HTTP2 requests as well as earlier for whole HTTP1 and HTTP2 responses. It was removed because we can't control correctness of appending accordingly to RFC. Values could be appended to headers that doesn't have comma-separated list syntax.
`curr_ptr` is not used because we expanding the last fragment of skb from pool or adding new fragment from pool as well.
TFW_HTTP_B_NEED_STRIP_LEADING_LF and TFW_HTTP_B_NEED_STRIP_LEADING_CR was used during HTTP1 request forwarding to cutoff first CR LF in skb.
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
Now we use cleanup field instead of `old_head`, cleanup also used during http1 request adjusting.
Allowed to use unknown method for headers that stored in dynamic table. Before if tempesta received such request it was dropped. Unknown method is method that processed as _TFW_HTTP_METH_UNKNOWN.
`tfw_http_add_hdr_clen()` replaced by new function `tfw_http_resp_term_add_hdr_clen()`. Now it constructs content-length header into headers table, without skb modification as it done before. The headers copies to response in the tfw_http_adjust_resp() along with other headers. `tfw_http_msg_hdr_xfrm()` - deleted, because it's un- used. Some of functions that were used only by `tfw_http_msg_hdr_xfrm()/tfw_http_msg_hdr_xfrm_str()` not deleted, because we consider them as library functions. For instance: `tfw_strcpy_comp_ext()`. `ss_skb_get_room()` could be lib function, but it's just a wrapper for `ss_skb_get_room_w_frag()`. `__hdr_is_singular()` was used during adding headers configured by administrator, but we treat adding singular RAW headers via `resp_hdr_add` as misconfiguration, there is no sense to check it for each header.
In this patch we optimize header skipping logic that applies in header modification for HTTP1 request/re- sponse and HTTP2 response including cache. Overview of current behavior. It consists of two parts: skipping and addition. During response transformation or copying we skipping headers that must be substituted or deleted using `tfw_http_hdr_skip()`, when all headers from original response are copied we add new headers (directive (resp/req)_hdr_(add/set)) and add headers that must be substituted ((resp/req)_hdr_set). What have done: 1. `tfw_hdr_mods_t::spec_hdrs` turned into bitmap. 2. `tfw_hdr_mods_t::spec_hdrs` contains used only for *_hdr_set directives, headers related to *_hdr_add must not be skipped, thus excluded from `spec_hdrs`. 2. `tfw_http_hdr_skip()` only iterates over *_hdr_set headers and doesn't check `append` flag. 3. added `tfw_hdr_mods_t::scan_off`, this is offset where we start itarating headers and comparing the name. It's the case when modification not found by special header index or hpack index. 4. changed the layout of `tfw_hdr_mods_t::hdrs`, now we storing headers in following way: .----------------------------------------. | `tfw_hdr_mods_t::hdrs` | :----------------------------------------: | Special headers *hdr_set | :----------------------------------------: | Hpack indexed raw headers *hdr_set | :----------------------------------------: | Non-indexed raw headers *hdr_set | :----------------------------------------: | All headers *hdr_add | '----------------------------------------' due this structure we can iterate only over unindexed raw headers. Note: Looks like iterating over headers modification table especially for removing is more profit approach, because if modification applied we will not iterate over this opearation again, we using it during http2 request headers modification, but it has disadvantages. For instance we must realloc header table or split skipping and adition(similar with current approach) and such approach can't be used with current cache architecture.
Just fix to commit with logic unifying, this patch adds abillity to adding headers to HTTP2 request, without this header will be substituted even if req_hdr_add directive was used.
Add a new method intended to close parsed HTTP1.1 method. `tfw_http_msg_hdr_close()` is not suitable in this case, it has a lot of logic that unnecessary for method.
fix 1: Skip trailer headers during copying regular headers to not duplicate them in headers and in trailer. fix 2: Call `mark_spec_hbh` for headers with only LF in the end of a message. fix 3: Headers `connection` and `keep-alive` marked as hop-by-hop during parsing, it's allowed to remove following conditions: ``` if (hid == TFW_HTTP_HDR_KEEP_ALIVE || hid == TFW_HTTP_HDR_CONNECTION) ``` do only simple `if (tgt->flags & TFW_STR_HBH_HDR)`. Before this patch `keep-alive` was marked as HBH, but only if `keep-alive` specified in `connection` header. RFC 9110 7.6.1 say that intermediaries SHOULD remove or replace such headers even if the not listed in Connection. fix 4: Add Host header only once: Host header must be added only once, if it modified from config, therefore during headers modification if not then as first header.
We had a bug during the suffix match in function `tfw_http_search_cookie()`, before comparing the name we store current chunk `t = *chunk`, then modify and compare the chunk, after we rostore the chunk `*chunk = t;` that leads to header corruption, because macros `TFW_STR_EQ_CSTR` may assign new address to `chunk`, therefore restoration happens to a newly assigned adress instead of original adress in `chunk`. Fixed by storing address to temporary variable and then restore `data` and `len` by this adress. `__tfw_str_eq_cstr` rewritten to inline function. Inline functions is more clear, less error prone and suitable in this case, because we don't need to define tmp vars and explicitly save values, only move `chunk` pointer forward. Generated assembly will be the same.
We must copy a mark from old skb head to new, otherwise we will lose the mark.
Call `mark_spec_hbh` only for parsed special header and do not iterate over headers table.
When absolute URI is specified in message, we completely ignore `Host` header, therefore there is no sense to do comparision between authority from URI and the `Host` header. Authority from URI will be used for message forwarding. However we still validating a syntax of the `Host` header, to catch protocol violations.
- return replaced with `goto clean` in `tfw_h1_adjust_req()`
Please check that #2266 is ok after this PR |
HTTP part of #1103.