Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export a lockfile in an ad-hoc JSON graph format. #2047

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benjyw
Copy link
Collaborator

@benjyw benjyw commented Feb 5, 2023

It's useful to see the lockfile in graph form, for visualization, reasoning about how some transitive dep gets pulled in, etc.

This change emits a very simple JSON graph that other tools can process.

For example, this will turn a lockfile into a .png of a DOT graph:

echo "digraph {\
  $(pex lock export --format=graph lockfile.json  | \
  jq -r '.edges | to_entries[] | .key as $src | .value[] | "\"" + ($src) + "\" -> \"" + (.) + "\""') \
}"  | dot -Tpng

One use case is to allow Pants to display this data in the dependencies goal, which users have asked for, e.g., in pantsbuild/pants#12733

@benjyw benjyw marked this pull request as draft February 5, 2023 00:43
@benjyw
Copy link
Collaborator Author

benjyw commented Feb 5, 2023

This is a speculative change. I'm not convinced this captures all the nuances of lockfiles (e.g., right now it ignores extras, at the least it should probably list them in the vertex name). I want some feedback on the approach before beefing it up and writing tests.

@benjyw benjyw requested a review from jsirois February 5, 2023 01:09
Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think Pants should define a model if not a format. Presumably, internally, it will map all resolve graph information into that model before rendering, allowing JVM, Python, etc backends to map to a model and the core graphing goal to do the rendering in 1 spot in a consistent way that even in the future allows for mixed-language graphs. Definition of that model would help ensure the Pex format defined here need not thrash, since Pants needs have been thought through and worked out as evidenced by the model.

)
)
else:
vertices = set()
Copy link
Member

@jsirois jsirois Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The current bit of Pex producing graphs is here: https://github.com/pantsbuild/pex/blob/97a2497e0938ece709310a03e7e41b5c26992952/pex/tools/commands/graph.py#L30-L83
That does take markers into account to good effect in the rendered graph, showing which requirements are not active in the present PEX being run by the present interpreter on the present machine. Of course the format is the dot adjacency list format and the side-band information about activation is stored in labels and colors.

If it's still the case that Pants doesn't want to define a format and Pex must my requirements are:

  • Embedding a version for the format: allow an escape hatch for backwards incompatible evolution (but with the intent to try hard to never actually use it)
  • Eventually merge the existing model / logic driving the tool example above with this: one kicks out dot, the other JSON.

@@ -345,6 +345,7 @@ def create(
target, # type: Target
direct_requirements, # type: Iterable[Requirement]
resolved_artifacts, # type: Iterable[_ResolvedArtifact]
adjacency_list, # type: Dict[Pin, Set[Pin]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A LockedResolve already has all the information needed. There should be no need to add graph rendering-specific data or methods to it. I think this is just a sign a (stripped-down) model for exported graph data is needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, we could derive the graph entirely in Pants, no? But it seemed to me (naively?) that this logic had to be here because request_resolve() is where we know who actually requires who, in the context of a target.

Copy link
Member

@jsirois jsirois Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, we could derive the graph entirely in Pants, no?

Yup. Just be ready to break / fix when Pex changes its private, undocumented format.

But it seemed to me (naively?) that this logic had to be here because request_resolve() is where we know who actually requires who, in the context of a target.

I think naively - maybe. So how will Pants use the graph info? If the lock is universal, which it is in Pants case, will ./pants dependencies ... include this info now and if so, what the heck will that pick for a target? Will it solve for a local interpreter that matches the constraints of all targets requested and use that if success and fail otherwise? If so and success, you already know the target up in Pants and can annotate the graph display with that info; i.e. this is the dependency graph when Python 3.7 is used, but the results may be different for 3.8 and 3.9 (and the repo is like Pants itself and supports >=3.7,<3.10).

I think Pants really ought to have what it wants to do sorted. The sorting can probably best be done today using the lock file json directly. Then, once sorted, a solid idea of the requirements for a stable PEX graph export format will be in hand and we can make sure it works for the not-pants cases, like non-universal locks with multiple resolves contained within and not just 1 universal resolve, etc.

vertices.add(str(src_pin))
vertices.update(dst_pin_strs)
edges[str(src_pin)] = dst_pin_strs
graph = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single target selected above on 508-510 is controlled by command line flags like --python, --python-path, --interpreter-constraint, --platform and --complete-platform, but none of these need be present either. It seems like some node to store information about the rendered graph context would make sense; i.e.: for a universal lock, this would give the context for environment marker activation / deactivation assuming you include activation information in the graph and for a multi-lock lock file, it also indicates which lock is being talked about (say the CPython 3.7 locked resolve from a lock file containing 3 locks, 1 for 3.7, 1 for 3.8 and 1 for 3.9.

@benjyw
Copy link
Collaborator Author

benjyw commented Feb 10, 2023

Yeah, not a bad idea to make progress on Pants ingesting the data, to validate the format.

@cosmicexplorer
Copy link
Contributor

Are you familiar with the pip install --report json output? I worked on that with @sbidoul (see pypa/pip#10748 and pypa/pip#10771). I have been planning to add that to pex for a while now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants