Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add client statistics to the connection spec #308

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

Jarema
Copy link
Member

@Jarema Jarema commented Sep 30, 2024

Signed-off-by: Tomasz Pietrek [email protected]

adr/ADR-40.md Outdated
#### Messages out
Every message (`PUB` and `HPUB`) sent to the server.

#### Connets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconnects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant connects.
I feel that counting only reconnects gets messy when you have allow initial reconnect, which starts with 0 reconnects and 0 connets. Then, after initial connection is established, it would still be 0 reconnects, but 1 connect. Hence I feel it's clearer if we count every successful connect, including the initial one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is Reconnects in the nats.go client already, so would try to model that rather than changing it at this point: https://github.com/nats-io/nats.go/blob/f0c0194ad0ca716dfa162baac7e94d900a53f2c9/nats.go#L780

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not model the Statistics after Go client, or any other client, as each is counting bytes in/out in their own way. That's because the feature predates ADR and parity concepts.

We agreed that we should do what Server does - count all bytes in and out. Go counts just headears + payloads.

Regarding the connect versus reconnect - maybe we could allow this to be client specific? It can make sense depending on how retry_on_connect is implemented in given language after all.

Copy link
Member

@wallyqs wallyqs Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think first pass at least should try to converge on what the nats.go client does which is the practical spec already used, trying to add new behavior on top via the ADR that is not implemented is not helpful. It could be a different field for example or a revised Statistics v2 implementation. If you want to stamp out the behavior for this ADR it has to be what the Go client does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not new behaviour that is not align with clients. Every client counts things diferently:

  • counting just payloads as bytes in/out
  • counting headers + payloads (Go does that)
  • counting subjects + headers + payloads (rust)
  • counting everything that goes through the socket (server & Javascript)

additionally, some languages have more fields than others.

As you can see, its impossible to make consistent ADR without introducing changes everywhere.
Because of that, we decided to count bytes as server does, which also seems to make most sense.

I agree that the connected vs reconnected might be a change we should avoid in some clients, depending how they handle retry_initial_connect, so I'll update the ADR that both are ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wallyqs I think that Piotr's idea is the best here:

he proposed that in clients that count data differently, we could add total_bytes_in total_bytes_out etc.
Check the upates to this PR.

@scottf
Copy link
Collaborator

scottf commented Sep 30, 2024

@Jarema This is what the java client currently collects:

  • Pings
  • Reconnects
  • DroppedCount
  • OKs
  • Errs
  • Exceptions
  • RequestsSent
  • RepliesReceived
  • InMsgs
  • OutMsgs
  • InBytes
  • OutBytes
  • FlushCounter
  • OutstandingRequests

Advanced Stats

  • DuplicateRepliesReceived
  • OrphanRepliesReceived

Copy link
Contributor

@piotrpio piotrpio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@aricart aricart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of what we should be doing doesn't seem to align with what the server is reporting.

Clients should implement statistics gathered throughout the lifetime of the client (persisted between connections):

#### Bytes In
Should account for everything received from the server, a record of bytes received on the socket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jarema - so went looking augment with what is discussed here, but the discussion on the alignment with server is incorrect - Server accounting seems to only include headers/payload for published messages - it doesn't include other protocol messages, subjects, etc. We could add a raw metric for in/out bytes, but right now at least for JavaScript, it lines up with what the server does.

Deno.test("basics - stats", async () => {
  const { ns, nc } = await _setup(connect);

  const cid = nc.info?.client_id || -1;
  if (cid === -1) {
    fail("client_id not found");
  }

  await nc.flush();
  console.log(nc.stats())

  console.log(await ns.connz(cid, "detail"));

  nc.publish("hello", "world");
  await nc.flush();

  console.log(nc.stats())
  console.log(await ns.connz(cid, "detail"));


  nc.subscribe("hello", {callback: () => {}});
  nc.publish("hello", "hi");
  await nc.flush();

  console.log(nc.stats())
  console.log(await ns.connz(cid, "detail"));

  await cleanup(ns, nc);
});
{ inBytes: 0, outBytes: 0, inMsgs: 0, outMsgs: 0 }
{
  server_id: "NBN7DZNKVGIGZBAVMGYTK7RHZZITVELFTCJI5BKOHORHTTO2TOS4OHTL",
  now: "2024-10-15T23:32:44.636996Z",
  num_connections: 1,
  total: 1,
  offset: 0,
  limit: 1,
  connections: [
    {
      cid: 6,
      kind: "Client",
      type: "nats",
      ip: "127.0.0.1",
      port: 59457,
      start: "2024-10-15T18:32:44.633103-05:00",
      last_activity: "2024-10-15T23:32:44.634232Z",
      rtt: "1.129333ms",
      uptime: "0s",
      idle: "0s",
      pending_bytes: 0,
      in_msgs: 0,
      out_msgs: 0,
      in_bytes: 0,
      out_bytes: 0,
      subscriptions: 0,
      lang: "nats.deno",
      version: "3.0.0-6"
    }
  ]
}
{ inBytes: 0, outBytes: 5, inMsgs: 0, outMsgs: 1 }
{
  server_id: "NBN7DZNKVGIGZBAVMGYTK7RHZZITVELFTCJI5BKOHORHTTO2TOS4OHTL",
  now: "2024-10-15T23:32:44.638501Z",
  num_connections: 1,
  total: 1,
  offset: 0,
  limit: 1,
  connections: [
    {
      cid: 6,
      kind: "Client",
      type: "nats",
      ip: "127.0.0.1",
      port: 59457,
      start: "2024-10-15T18:32:44.633103-05:00",
      last_activity: "2024-10-15T18:32:44.638146-05:00",
      rtt: "1.129333ms",
      uptime: "0s",
      idle: "0s",
      pending_bytes: 0,
      in_msgs: 1,
      out_msgs: 0,
      in_bytes: 5,
      out_bytes: 0,
      subscriptions: 0,
      lang: "nats.deno",
      version: "3.0.0-6"
    }
  ]
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I probably made a mistake checking the server code.
Though I still agree with your point that we should count everything in and out.
We for sure can have more stats - current one and raw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be the correct thing to do. The other nuance is we should reset on reconnect the current counters this way the reporting from the client will always match the current session and the server. We can add a total_in/out which accounts for all the in/out but these are always all session - Not sure what the benefit is to say that a current client over all its reconnects has sent XX amount of data, so preserving that session data is possibly not useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants