-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAI Proposal TAM stream telemetry #2089
base: master
Are you sure you want to change the base?
SAI Proposal TAM stream telemetry #2089
Conversation
4098992
to
e62f676
Compare
515bbf2
to
b033dfa
Compare
inc/saitam.h
Outdated
* @brief Tam telemetry reporting type | ||
* | ||
* @type sai_tam_reporting_type_t | ||
* @flags CREATE_AND_SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CREATE_AND_SET attributes should all be CREATE_ONLY
inc/saitam.h
Outdated
* @type sai_u8_list_t | ||
* @flags READ_ONLY | ||
*/ | ||
SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sessions that exceeds the max length of template or data, we need to split the template into multiple templates:
- Modify the template type to sai_u8_list_list_t
- Rename SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE to SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE_LIST
- Rename SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID to SAI_TAM_REPORT_ATTR_REPORT_IPFIX_BASE_TEMPLATE_ID
Each template data has the template id in it, so we don't need another attribute or field to specify the template id list.
inc/saitam.h
Outdated
* @flags CREATE_AND_SET | ||
* @default 0 | ||
*/ | ||
SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the ring buffer is created on driver load, we need to move it to sai switch attribute.
/** | ||
* @brief Report type is count based | ||
*/ | ||
SAI_TAM_REPORTING_TYPE_COUNT_BASED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to SIZE_BASED to be more explicit.
inc/saitam.h
Outdated
* @default 1 | ||
* @validonly SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_COUNT_BASED | ||
*/ | ||
SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHUNK_BYTES
sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS; | ||
|
||
sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; | ||
sai_attr_list[3].value.oid = index; // Element ID of the object in the IPFIX template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add explicit capability check here instead of leveraging the return value.
for capability query, we need to add a new API in saiobject.h:
sai_status_t sai_query_stats_hft_capability(
_In_ sai_object_id_t switch_id,
_In_ sai_object_type_t object_type,
_Inout_ sai_stat_hft_capability_list_t *stats_capability);
Then in saitypes.h, add the sai_stat_hft_capability_list_t following the sai_stat_capability_list_t, and we need to add the minimum polling interval in it.
sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; | ||
sai_attr_list[3].value.oid = index; // Element ID of the object in the IPFIX template | ||
|
||
attr_count = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to support the get-and-clear operation on queue watermark counters, we need another attribute to specify the mode.
sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID; | ||
sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS; | ||
|
||
sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update label comment on sai.h
- The number of stats types for a single SAI object type should not exceed 32,768. | ||
- The number of extension stats types for a single SAI object type should not exceed 32,768. | ||
- The number of SAI objects of the same type should not exceed 32,768. | ||
- The vendor SDK should support publishing stats in IPFIX format and its IPFIX template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPFIX template is created using SAI APIs. Based on that NOS can derive the template and there is no need for SAI or SDK to publish IPFIX template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the IPFIX recording is published by the vendor, So, according to the self-consistency principle, the IPFIX template will also be published from the vendor side as well. For example, we don't limit the order of data in the IPFIX , it's acceptable once the order of date is consistent in the IPFIX template or recording.
We expect the generation of all IPFIX template or reocrding is maintained by one side.
- The number of extension stats types for a single SAI object type should not exceed 32,768. | ||
- The number of SAI objects of the same type should not exceed 32,768. | ||
- The vendor SDK should support publishing stats in IPFIX format and its IPFIX template. | ||
- If a polling frequency for stats cannot be supported, the vendor's SDK should report this error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supported polling frequency should be part of the SAI capability query. Based on this if NOS configures a unsupported poll frequency SAI will return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, agree. Will provide a new API to query this capability
* | ||
* @return #SAI_STATUS_SUCCESS on success, #SAI_STATUS_BUFFER_OVERFLOW if lists size insufficient, failure status code on error | ||
*/ | ||
sai_status_t sai_query_stats_st_capability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should make this more generic ? since very similar function already exists and this is adding only 1 field to structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi Kamil, we discussed internally using the sai_query_stats_capability directly and add the polling interval in it, but the reason we are not using that API is because of 2 reasons:
- Not all counters support using this way to retrieve, so if the counters are not returned in the list, it means it is not supported.
- Adding another field in the current stats capability breaks the ABI and making this frequently used function not backward compatible.
So after discussing it, we ended up deciding creating a new API for it.
db5391e
to
e77e917
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAI already supports streaming telemetry at sub-second intervals. Please refer: https://github.com/opencomputeproject/SAI/pull/1670/files
SAI also supports sending packets to local collector over GENETLINK. Please refer: https://github.com/opencomputeproject/SAI/pull/1986/files
-
Title
The current PR is adding some enhancements to the already supported TAM streaming telemetry to local collector. The title is misleading, it is incorrectly suggesting that streaming telemetry is so far not supported in SAI. Maybe rename to "TAM Streaming Telemetry Count mode". -
List of enhancements
It would be good to clearly call out the new SAI attributes and enums being defined, their meaning and why they are needed.
The new attributes/enums and APIs are limited to just the below functionality but this is not clearly called out at the beginning of the document like other SAI specs.
- Bulking based on count
- SAI vendor to publish IPFIX templates
- Switch TAM Cache (motivation and semantics-FIFO/LIFO are unclear)
- Streaming Stats Capability query
nla_append(skb_out, msg->data, msg->len); | ||
} | ||
|
||
genlmsg_multicast(&genl_family, skb_out, 0, 0/* group_id to ipfix group */, GFP_KERNEL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genlmsg_multicast(&nl_bench_family, skb_out, 0, 0/* group_id to ipfix group */, GFP_KERNEL);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment. It's a typo. Change it to stel_family.
}; | ||
|
||
// Family definition | ||
static struct genl_family nl_bench_family = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add comments to explain the variable name "nl_bench_family".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be stel_family
that stands for stream telemetry family
. nl_bench_family is a typo that I copied from a sample code.
|
||
#### Netlink message | ||
|
||
We expect all control messages and out-of-band information to be transmitted by the SAI. Therefore, it is unnecessary to read the attribute header of netlink and the message header of Genetlink from the socket. Instead, we can insert a bulk of IPFIX recordings as the payload of the netlink message. The sample code for building the message from the kernel side is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more on this? When defining new Generic Netlink message formats it is recommended to make use of the Netlink attributes wherever possible. The Netlink attribute mechanism has been carefully designed to allow for future message expansion while preserving backward compatibility. In order to follow the GENETLINK semantics, we should at the least have one GENETLINK attribute say "STEL_ATTRIBUTE_IPFIX_ENTRIES" and define its type, say 'nested'. The example code is adding the IPFIX entries without using any attributes.
Standardizing the attribute name and type allows the NOS code to operate across different SAI vendors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's intentionally designed. Because this genetlink is specific for stream telemetry with IPFIX, we don't want to pass any other information, including IPFIX template, on this Channel. We expect other information should only pass by SAI. If this channel is designed more flexible, we have to define/design another kind of schema out of SAI for it. That's what we want to avoid.
I don't feel it's not standardizing. Because this API is just provided by the netlink library. Why you said it's not standardizing?
* @flags CREATE_ONLY | ||
* @default 0 | ||
*/ | ||
SAI_SWITCH_ATTR_TAM_CACHE_COUNT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the caching semantics? Say the collector is not ready, and more reports are generated than the cache capacity, are the first 'n' or the last 'n' cached?
Please also explain why this is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments. Update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hope it's possible to collect the counters during the collector restarts after the collector crash.
``` c++ | ||
|
||
sai_attr_list[0].id = SAI_TAM_REPORT_ATTR_TYPE; | ||
sai_attr_list[0].value.s32 = SAI_TAM_REPORT_TYPE_IPFIX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The report type should be GENETLINK, and the encoding within the GENETLINK is IPFIX. The latter part is already specified using SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I didn't see an attribute for encoding format if not specify at here.
Make sense, Thanks for your suggestion. I will change it. |
52c9a5f
to
06a86bd
Compare
Signed-off-by: Ze Gan <[email protected]>
06a86bd
to
72be662
Compare
|
||
- For high-frequency counters, the native IPFIX timestamp unit of seconds is insufficient. Therefore, we introduce an additional element, `observationTimeNanoseconds`, for each record to meet our requirements. | ||
- The enterprise bit is always set to 1 for stats records. | ||
- The element ID of IPFIX is derived from the object index. For example, for `Ethernet5`, the element ID will be `0x5 | 0x8000 = 0x8005`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain where the 0x8000 comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It indicates the enterprise bit is set to 1. Thanks for your pointing this unclear part. Polish it.
|
||
The existing telemetry solution relies on a process to proactively query stats and counters via the SAI API. This approach causes the process to spend excessive time on SAI communication. The stream telemetry described in this document aims to provide a more efficient method for collecting object stats. The main idea is that selected stats will be proactively pushed from the vendor's driver to the collector via netlink. | ||
|
||
## Requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, these are all the constraints (not requirements) that arise due to the data encoding format described later in this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. This template of document is just following the SONiC hld . As it said, the requirements section includes exemptions (not supported). But I will update the constraints title for this section for more clear.
Signed-off-by: Ze Gan <[email protected]>
Proposal a new streaming method for telemetry