Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAI Proposal TAM stream telemetry #2089

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Pterosaur
Copy link
Contributor

Proposal a new streaming method for telemetry

@Pterosaur Pterosaur force-pushed the ipfix_stream_telemetry branch 7 times, most recently from 4098992 to e62f676 Compare October 12, 2024 02:41
inc/saitam.h Outdated
* @brief Tam telemetry reporting type
*
* @type sai_tam_reporting_type_t
* @flags CREATE_AND_SET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CREATE_AND_SET attributes should all be CREATE_ONLY

inc/saitam.h Outdated
* @type sai_u8_list_t
* @flags READ_ONLY
*/
SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sessions that exceeds the max length of template or data, we need to split the template into multiple templates:

  • Modify the template type to sai_u8_list_list_t
  • Rename SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE to SAI_TAM_REPORT_ATTR_IPFIX_TEMPLATE_LIST
  • Rename SAI_TAM_REPORT_ATTR_REPORT_IPFIX_TEMPLATE_ID to SAI_TAM_REPORT_ATTR_REPORT_IPFIX_BASE_TEMPLATE_ID

Each template data has the template id in it, so we don't need another attribute or field to specify the template id list.

inc/saitam.h Outdated
* @flags CREATE_AND_SET
* @default 0
*/
SAI_TAM_TELEMETRY_ATTR_CACHE_SIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the ring buffer is created on driver load, we need to move it to sai switch attribute.

/**
* @brief Report type is count based
*/
SAI_TAM_REPORTING_TYPE_COUNT_BASED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to SIZE_BASED to be more explicit.

inc/saitam.h Outdated
* @default 1
* @validonly SAI_TAM_TELEMETRY_ATTR_TAM_REPORTING_TYPE == SAI_TAM_REPORTING_TYPE_COUNT_BASED
*/
SAI_TAM_TELEMETRY_ATTR_REPORTING_CHUNK_SIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHUNK_BYTES

sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS;

sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL;
sai_attr_list[3].value.oid = index; // Element ID of the object in the IPFIX template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add explicit capability check here instead of leveraging the return value.

for capability query, we need to add a new API in saiobject.h:

sai_status_t sai_query_stats_hft_capability(
        _In_ sai_object_id_t switch_id,
        _In_ sai_object_type_t object_type,
        _Inout_ sai_stat_hft_capability_list_t *stats_capability);

Then in saitypes.h, add the sai_stat_hft_capability_list_t following the sai_stat_capability_list_t, and we need to add the minimum polling interval in it.

sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL;
sai_attr_list[3].value.oid = index; // Element ID of the object in the IPFIX template

attr_count = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support the get-and-clear operation on queue watermark counters, we need another attribute to specify the mode.

sai_attr_list[2].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID;
sai_attr_list[2].value.oid = SAI_PORT_STAT_IF_IN_OCTETS;

sai_attr_list[3].id = SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update label comment on sai.h

- The number of stats types for a single SAI object type should not exceed 32,768.
- The number of extension stats types for a single SAI object type should not exceed 32,768.
- The number of SAI objects of the same type should not exceed 32,768.
- The vendor SDK should support publishing stats in IPFIX format and its IPFIX template.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPFIX template is created using SAI APIs. Based on that NOS can derive the template and there is no need for SAI or SDK to publish IPFIX template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the IPFIX recording is published by the vendor, So, according to the self-consistency principle, the IPFIX template will also be published from the vendor side as well. For example, we don't limit the order of data in the IPFIX , it's acceptable once the order of date is consistent in the IPFIX template or recording.
We expect the generation of all IPFIX template or reocrding is maintained by one side.

- The number of extension stats types for a single SAI object type should not exceed 32,768.
- The number of SAI objects of the same type should not exceed 32,768.
- The vendor SDK should support publishing stats in IPFIX format and its IPFIX template.
- If a polling frequency for stats cannot be supported, the vendor's SDK should report this error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported polling frequency should be part of the SAI capability query. Based on this if NOS configures a unsupported poll frequency SAI will return error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, agree. Will provide a new API to query this capability

*
* @return #SAI_STATUS_SUCCESS on success, #SAI_STATUS_BUFFER_OVERFLOW if lists size insufficient, failure status code on error
*/
sai_status_t sai_query_stats_st_capability(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should make this more generic ? since very similar function already exists and this is adding only 1 field to structure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi Kamil, we discussed internally using the sai_query_stats_capability directly and add the polling interval in it, but the reason we are not using that API is because of 2 reasons:

  1. Not all counters support using this way to retrieve, so if the counters are not returned in the list, it means it is not supported.
  2. Adding another field in the current stats capability breaks the ABI and making this frequently used function not backward compatible.

So after discussing it, we ended up deciding creating a new API for it.

@Pterosaur Pterosaur force-pushed the ipfix_stream_telemetry branch 17 times, most recently from db5391e to e77e917 Compare October 28, 2024 12:00
Copy link
Contributor

@rck-innovium rck-innovium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAI already supports streaming telemetry at sub-second intervals. Please refer: https://github.com/opencomputeproject/SAI/pull/1670/files

SAI also supports sending packets to local collector over GENETLINK. Please refer: https://github.com/opencomputeproject/SAI/pull/1986/files

  1. Title
    The current PR is adding some enhancements to the already supported TAM streaming telemetry to local collector. The title is misleading, it is incorrectly suggesting that streaming telemetry is so far not supported in SAI. Maybe rename to "TAM Streaming Telemetry Count mode".

  2. List of enhancements
    It would be good to clearly call out the new SAI attributes and enums being defined, their meaning and why they are needed.
    The new attributes/enums and APIs are limited to just the below functionality but this is not clearly called out at the beginning of the document like other SAI specs.

  • Bulking based on count
  • SAI vendor to publish IPFIX templates
  • Switch TAM Cache (motivation and semantics-FIFO/LIFO are unclear)
  • Streaming Stats Capability query

nla_append(skb_out, msg->data, msg->len);
}

genlmsg_multicast(&genl_family, skb_out, 0, 0/* group_id to ipfix group */, GFP_KERNEL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genlmsg_multicast(&nl_bench_family, skb_out, 0, 0/* group_id to ipfix group */, GFP_KERNEL);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. It's a typo. Change it to stel_family.

};

// Family definition
static struct genl_family nl_bench_family = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add comments to explain the variable name "nl_bench_family".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be stel_family that stands for stream telemetry family. nl_bench_family is a typo that I copied from a sample code.


#### Netlink message

We expect all control messages and out-of-band information to be transmitted by the SAI. Therefore, it is unnecessary to read the attribute header of netlink and the message header of Genetlink from the socket. Instead, we can insert a bulk of IPFIX recordings as the payload of the netlink message. The sample code for building the message from the kernel side is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more on this? When defining new Generic Netlink message formats it is recommended to make use of the Netlink attributes wherever possible. The Netlink attribute mechanism has been carefully designed to allow for future message expansion while preserving backward compatibility. In order to follow the GENETLINK semantics, we should at the least have one GENETLINK attribute say "STEL_ATTRIBUTE_IPFIX_ENTRIES" and define its type, say 'nested'.  The example code is adding the IPFIX entries without using any attributes.

Standardizing the attribute name and type allows the NOS code to operate across different SAI vendors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's intentionally designed. Because this genetlink is specific for stream telemetry with IPFIX, we don't want to pass any other information, including IPFIX template, on this Channel. We expect other information should only pass by SAI. If this channel is designed more flexible, we have to define/design another kind of schema out of SAI for it. That's what we want to avoid.

I don't feel it's not standardizing. Because this API is just provided by the netlink library. Why you said it's not standardizing?

* @flags CREATE_ONLY
* @default 0
*/
SAI_SWITCH_ATTR_TAM_CACHE_COUNT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the caching semantics? Say the collector is not ready, and more reports are generated than the cache capacity, are the first 'n' or the last 'n' cached?

Please also explain why this is required?

Copy link
Contributor Author

@Pterosaur Pterosaur Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments. Update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hope it's possible to collect the counters during the collector restarts after the collector crash.

``` c++

sai_attr_list[0].id = SAI_TAM_REPORT_ATTR_TYPE;
sai_attr_list[0].value.s32 = SAI_TAM_REPORT_TYPE_IPFIX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report type should be GENETLINK, and the encoding within the GENETLINK is IPFIX. The latter part is already specified using SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this attribute is for specifying the format of message, Other values of this attribute is more like the format instead of channel.
image

Copy link
Contributor Author

@Pterosaur Pterosaur Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I didn't see an attribute for encoding format if not specify at here.

@Pterosaur
Copy link
Contributor Author

SAI already supports streaming telemetry at sub-second intervals. Please refer: https://github.com/opencomputeproject/SAI/pull/1670/files

SAI also supports sending packets to local collector over GENETLINK. Please refer: https://github.com/opencomputeproject/SAI/pull/1986/files

  1. Title
    The current PR is adding some enhancements to the already supported TAM streaming telemetry to local collector. The title is misleading, it is incorrectly suggesting that streaming telemetry is so far not supported in SAI. Maybe rename to "TAM Streaming Telemetry Count mode".
  2. List of enhancements
    It would be good to clearly call out the new SAI attributes and enums being defined, their meaning and why they are needed.
    The new attributes/enums and APIs are limited to just the below functionality but this is not clearly called out at the beginning of the document like other SAI specs.
  • Bulking based on count
  • SAI vendor to publish IPFIX templates
  • Switch TAM Cache (motivation and semantics-FIFO/LIFO are unclear)
  • Streaming Stats Capability query

Make sense, Thanks for your suggestion. I will change it.

@Pterosaur Pterosaur force-pushed the ipfix_stream_telemetry branch 3 times, most recently from 52c9a5f to 06a86bd Compare October 31, 2024 01:46
Signed-off-by: Ze Gan <[email protected]>

- For high-frequency counters, the native IPFIX timestamp unit of seconds is insufficient. Therefore, we introduce an additional element, `observationTimeNanoseconds`, for each record to meet our requirements.
- The enterprise bit is always set to 1 for stats records.
- The element ID of IPFIX is derived from the object index. For example, for `Ethernet5`, the element ID will be `0x5 | 0x8000 = 0x8005`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain where the 0x8000 comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indicates the enterprise bit is set to 1. Thanks for your pointing this unclear part. Polish it.


The existing telemetry solution relies on a process to proactively query stats and counters via the SAI API. This approach causes the process to spend excessive time on SAI communication. The stream telemetry described in this document aims to provide a more efficient method for collecting object stats. The main idea is that selected stats will be proactively pushed from the vendor's driver to the collector via netlink.

## Requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, these are all the constraints (not requirements) that arise due to the data encoding format described later in this proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. This template of document is just following the SONiC hld . As it said, the requirements section includes exemptions (not supported). But I will update the constraints title for this section for more clear.

Signed-off-by: Ze Gan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants