Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing working group #671

Closed
Qard opened this issue Jan 30, 2015 · 63 comments · Fixed by ryan-ally/node#20 or B020239/node#13 · May be fixed by ryan-ally/node#18, Mement-Mori/node#8 or Iceymann18777/node#10
Closed

tracing working group #671

Qard opened this issue Jan 30, 2015 · 63 comments · Fixed by ryan-ally/node#20 or B020239/node#13 · May be fixed by ryan-ally/node#18, Mement-Mori/node#8 or Iceymann18777/node#10

Comments

@Qard
Copy link
Member

Qard commented Jan 30, 2015

@bnoordhuis @sam-github @othiym23 @wraithan @groundwater @brycebaril @trevnorris

We should revisit the tracing situation. Perhaps a working group is in order? Some of you have moved on from APM since our last discussion, some were not present, but your input is valuable.

@Qard Qard added the ideas label Jan 30, 2015
@mikeal
Copy link
Contributor

mikeal commented Jan 30, 2015

quick question, is tracing part of the solution to a larger problem or is it totally separate?

could tracing and dtrace support and systemtap support be classified as part of a larger effort the WG would be solving?

@mikeal
Copy link
Contributor

mikeal commented Jan 30, 2015

also @thlorenz who is working on some related stuff.

@Qard
Copy link
Member Author

Qard commented Jan 30, 2015

I think they are all related. At the time of our previous discussion, AsyncListener was being removed from core. Now we have async_wrap, so I think we should reconvene to discuss what we can do with that to make tracing more pleasant. DTrace and SystemTap are just other destinations for trace data, so I think that could be part of the discussion.

The topic of the working group could potentially be a bit broader, reaching into other debugging issues, like what to do with domains.

@trevnorris
Copy link
Contributor

@Qard IMO the working group should discuss what tracing functionality is more generally wanted, and then the minimalist hooks to allow that functionality to live in user-land should be implemented here. There is a never ending list of features people want, and adding/maintaining all those is the wrong decision. Extending the hooks that currently live in async_wrap sounds great. I'd love to get more feedback on what more is needed.

@mikeal
Copy link
Contributor

mikeal commented Jan 30, 2015

I'm working on outreach right now to get wide input in to the roadmap. @trevnorris can you think of a good question to ask individuals and companies that could inform what they need from tracings? Maybe something like "what do you wish you knew about a running Node application that you don't know now?"

@Qard
Copy link
Member Author

Qard commented Jan 30, 2015

In our previous meeting, we came to a rough consensus on what our problems were as APM providers. Knowing more about what the customers want to see would certainly be valuable too.

I agree that a lot of the applications of this would likely be userland stuff, but we can figure out what that is when we get to it.

@trevnorris
Copy link
Contributor

@mikeal Would it be possible for devs to provide pseudo examples of what they want? Meaning, they provide some example code and can explain what it is they want traced. Seeing usage cases, imho, would be the most beneficial.

@sam-github
Copy link
Contributor

@Qard do you have a link to the write up for that last meeting? I'm having trouble finding it, and its pretty relevant here.

@Qard
Copy link
Member Author

Qard commented Jan 30, 2015

@sam-github Yep, here it is.

https://gist.github.com/groundwater/942dad5c0c4cfae21af9

These are the trace meeting notes compiled by @groundwater after our previous meeting, for anyone else that wants to have a look.

@brycebaril
Copy link
Contributor

@Qard yes! Definitely still interested

@hayes
Copy link

hayes commented Jan 31, 2015

I am definitely interested as well

@othiym23
Copy link
Contributor

I am most definitely interested, both in following up on the conversation we started last year and in whatever broader working-group discussion that might result from Mikeal's canvassing.

@thlorenz
Copy link
Contributor

I feel there are two goals here which are solved differently:

    1. improve tracing of the node process from within which in most cases requires either some sort of hooks, monkey patching, addons or changes to core
    1. simplify integration of system tool tracing which can be solved in user land in most cases

I created an issue to collect info about existing user land tools, some of which are addons that interface with v8 in order to pull out profiling info.

I'm mostly focusing on 2) ATM, which can be converters, parsers to consume output of perf, dtrace, system tap, etc. and plugins into tools like debuggers,

At the moment we have the following (some rather new): I'm most likely missing lots, so please add some you know of

For those interested, some of us are gathering in #ngin8 on IRC to discuss some of these efforts.
@paulirish and I were talking there about _sunburst_s - (think flamegraphs+) - and how to get them integrated with system tools.

So a wide scope to cover here, not sure if it makes sense to have this all in one group or if we should split it into one group for each section as outlined in 1) and 2).

@Qard
Copy link
Member Author

Qard commented Jan 31, 2015

I see the working group mostly focusing on goal 1, but goal 2 ties into it in many ways.

Being able to smoothly correlate trace data across JS and C++ boundaries is one thing that comes to mind. Visibility beyond the nebulous "it went into libuv somewhere, it'll probably call back at some point" would be great.

Buffer usage in native modules also seems like something that'd be good to get some visibility into.

I'm sure there's plenty of areas you all can think of where integrating with the native side could provide some very valuable data.

@rvagg
Copy link
Member

rvagg commented Jan 31, 2015

I'm very interested in helping this WG get off the ground. This topic is at the core of the "completeness" story for Node IMO. Whenever we interact with companies shifting from more mature platforms, the lack of insight into what their programs are doing is one of the key areas lacking from Node. It could be part of a larger "debugging" topic, but given that this particular group is so focused on tracing and have all been working on this area I think a tracing-focused WG would be a good start.

How about y'all make sure that you've collected all of the relevant people with an interest in this area and a history of actually tackling this problem and then we'll try and find a time for you to have a kick-off Hangout to discuss how best to proceed.

@Qard
Copy link
Member Author

Qard commented Jan 31, 2015

I would like to see the greater debugging issue getting tackled, but I think we should focus on one thing at a time. Given most of us have a background in APM, I think tracing is the obvious first topic.

Also, do we have any connections to V8 core people working on the profiling tools? It would be good to coordinate with them. Perhaps @domenic can help with that?

@bnoordhuis
Copy link
Member

I believe most of the debugger and profiler work was done by non-Googlers. If you restrict it to just people from the V8 team, Yang is probably the most active in that area.

Apropos the tracing WG, sign me on. StrongLoop would be most interested in (async) tracing and getting more metrics out of io.js and libuv (and V8, but that's a separate story.)

@othiym23
Copy link
Contributor

I think a WG that doesn't have a strong...StrongLoop presence would be incomplete, because @sam-github, @rmg, and @piscisaureus have all done significant work in this area, in addition to you, @bnoordhuis. I also think focusing on APM / production performance analysis is a sensible move. "Debugging" is a huge and broad area.

@groundwater
Copy link
Contributor

The consensus from our meeting was roughly "we want to collect arbitrary data, at arbitrary places" which is probably too vague to anyone who did not attend. I would suggest we come up with a dozen well-defined user stories (sorry for the product manager speak) before we all jump into solution land.

At a high level, I think the stories should cover at least the following

  1. one or more APM use cases
  2. helping a poor user who uses console.log as their debugging tool
  3. getting detailed low-level GC/mem/libuv info
  4. stitching together async transactions (i.e. continuation-local-storage)

Other than that, we probably want to discuss constraints the solution must meet

  1. does not negatively impact performance when not in use
  2. impacts performance in a production-safe way when in use
  3. is the minimum necessary changes to implement solutions in npm-land

@Qard
Copy link
Member Author

Qard commented Jan 31, 2015

So we have clear interest in being involved in a WG expressed by myself, along with:

@brycebaril
@hayes
@othiym23
@bnoordhuis
@thlorenz

If anyone else commenting here wants in, please say so.

Thanks @rvagg for the offer to help get this going. We should figure out how this fits into the Hangouts calendar and get a doodle started.

@thlorenz
Copy link
Contributor

@Qard please add me to this list as I'm also interested in being involved.

@othiym23
Copy link
Contributor

othiym23 commented Feb 1, 2015

This may be of interest to @AndreasMadsen as well, given the existence of @AndreasMadsen/trace and the work he's doing with async_wrap.

@rmg
Copy link
Contributor

rmg commented Feb 1, 2015

@othiym23 thanks for the mention.

+1 on this being part of the completeness story, @rvagg. I think it would be beneficial to pretty much everyone in node land if we could raise the bar for the level of VM and libuv inspection available without having to rely on native addons.

@domenic
Copy link
Contributor

domenic commented Feb 1, 2015

Also, do we have any connections to V8 core people working on the profiling tools? It would be good to coordinate with them. Perhaps @domenic can help with that?

If you guys have specific things you're interested in I can try to reach out. @paulirish might also be a good contact as he's working on dev tools in specific.

One thing I'd personally be interested in seeing out of this group is some idea of what hooks into the VM or event loop or runtime environment are necessary for this kind of work. Then maybe we can standardize those and put them in V8 and in browsers after io.js proves them out in the real world. Cf. node-forward/discussions#28. So basically when defining the solution and "polyfilling" it in io.js, give some thought to how generalizable it would be.

@dberesford
Copy link

We in @nearform have been dabbling quite a lot with LTTNG and loving it. We have a fork of io.js on the go which adds LTTNG tracepoints in a similar manner to dtrace and ETW: https://github.com/nearform/io.js/tree/tracing. Can we be included in this working group?

@bnoordhuis
Copy link
Member

@dberesford I don't see why not. I suspect the focus will be more on dynamic tracepoints rather than static ones, though. If you want to, the LTTNG support can (with a bit of rework) land upstream.

@dberesford
Copy link

@bnoordhuis that would be great, we'll get a PR ready for review

@mikeal
Copy link
Contributor

mikeal commented Feb 1, 2015

Preliminary feedback from companies I've conducted in the past surfaced a strong need for "Linux debugging/tracing" so I'll volunteer to do any grunt work here to get the group off the ground (schedule first meeting, arrange agenda, write the charter) but in the first meeting I'll probably call out someone to take on the role of facilitator moving forward.

Here's a doodle for the first meeting, scheduled for this Wednesday/Thursday.

http://doodle.com/x53auvrtffeia2in

For building the initial agenda, please propose topics here and I'll put together a list.

@sam-github
Copy link
Contributor

@natduca, re:

in other libraries like Blink, Skia or V8, we copy the trace_event.h header file over and then apply a small tweak to it that causes it to trampoline back over to chrome via a local singleton.

I can't find any sign of trace_event.h in https://chromium.googlesource.com/v8/v8 ... am I looking in the wrong place, or are you describing speculative future work?

@sam-github
Copy link
Contributor

Found it linked from https://codereview.chromium.org/827993003

@natduca
Copy link

natduca commented Feb 17, 2015

Yeah, this patch is stalled because the author has been busy. But its still coming. :)

@sam-github
Copy link
Contributor

@natduca can you comment on the differences and similarities between the Timeline view in Dev Tools, and the chrome://tracing view... do they both display trace_event data, or does only chrome:tracing show trace event data? Its a bit confusing, they seem very similar, perhaps one is destined to replace the other?

@thlorenz
Copy link
Contributor

@sam-github Timeline in DevTools shows sampled data .cpuprofile, vs. chrome://tracing shows traced (structural) data.
traceviewify actually emulates function entries/exits to convert .cpuprofiles to trace-viewer format.

@natduca
Copy link

natduca commented Feb 17, 2015

@sam-github @thlorenz as in all big software, there's a lot of nuance because we're in between the old thing that is deprecated and the new thing that does't work quite right. :)

Timeline view in devtools is almost completely based on chrome://tracing data these days. The corner cases are cpuprofile but we're working on moving that over completely. Same for network panel. The future we're working toward is that all performance data you see in devtools came through the tracing data stream.

But, the UI are different: devtools ui is focused on ease of use for a web developer, whereas chrome://tracing is just the raw "good enough for chrome hackers" view. Devtools' timeline won't move to that chrome://tracing ui, though it may pick up ideas from that UI, or vice versa.

@paulirish
Copy link

Pretty close, thorsten.

differences and similarities between the Timeline view in Dev Tools, and the chrome://tracing view

There's at least there UIs here:

  1. chrome://tracing aka trace-viewer - screenshot
  2. devtools timeline (flame chart) - screenshot
  3. devtools sampling profiler & flame chart - screenshot

Once the above commit lands, all of these will be powered by the trace event data stream (v8 sampling and network being the remaining parts). And after that all 3 UIs will allow import/export of .trace files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment