New Source File and Lock Specification Approach #316

srilman · 2023-01-16T00:58:33Z

This PR implements the new method of parsing source files, combining them, and representing lock specifications discussed in #278 (comment). The new approach is as follows:

First, we parse each source file individually, in a platform-agnostic way, into a SourceFile object
For example, we now parse environment.yaml files using ruamel.yaml instead of pyyaml because it can keep track of comments (and thus selectors) around.

A SourceFile object is defined as containing:

class SourceFile(StrictModel):
    file: pathlib.Path
    dependencies: List[SourceDependency]
    # TODO: Should we store the auth info in here?
    channels: List[Channel]
    platforms: Set[str]

We determine the list of platforms to render for. This is determined either from the passed-in arguments, the source files, or the default list
Then, for each platform, we render each SourceFile object into the list of deps for that platform.
We combine the lists for each platform together using the aggregate_deps object getting us dictionary mapping from platform to a list of unique dependencies (unique by name and manager)
We combine all this info together to construct the new LockSpecification object, now defined as

class LockSpecification(BaseModel):
    dependencies: Dict[str, List[Dependency]]  # Dict mapping platform to deps
    # TODO: Should we store the auth info in here?
    channels: List[Channel]
    sources: List[pathlib.Path]
    virtual_package_repo: Optional[FakeRepoData] = None

netlify · 2023-01-16T00:58:37Z

✅ Deploy Preview for conda-lock ready!

Name	Link
🔨 Latest commit	`3fd7106`
🔍 Latest deploy log	https://app.netlify.com/sites/conda-lock/deploys/63d58efacacd190009b709d8
😎 Deploy Preview	https://deploy-preview-316--conda-lock.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

mariusvniekerk · 2023-01-16T15:23:53Z

Can you break this into a few smaller prs doing the model class moves first. The way this is now is very hard to review

srilman · 2023-01-21T19:00:57Z

#320 contains some of the refactoring I did in this PR. Its about 500 lines of changes, but most of it is moving classes and functions around with no content changes.

maresb · 2023-01-22T04:16:00Z

Wonderful. Now we need a rebase here, whenever you get the chance, and then we can review.

srilman · 2023-01-22T15:31:42Z

conda_lock/conda_lock.py

@@ -243,44 +236,6 @@ def fn_to_dist_name(fn: str) -> str:
    return fn


-def make_lock_spec(


Both make_lock_spec and parse_source_files are specifically related to parsing source files, and don't have a big effect on the rest of the program. Thus, I moved them to conda_lock/src_parser/__init__.py to reduce the amount of code in this file (since its about 1500 lines of code).

Thanks a lot for all this work!!! It would still really help for reviewing to have these refactor steps in separate commits so that I could view the substantive changes separately. (In general it's much easier to follow several smaller logical commits than one massive one.)

I'm sincerely very eager to see this through quickly, but my schedule looks difficult at the moment. I'll see what I can do, but apologies in advance if I'm slow to respond.

@maresb I tried to break this PR down into a couple of commits to make it a bit easier. I had some trouble breaking down the last couple of commits since the contents is very much tied together. But if it is still difficult to look through, let me know.

Thanks, the additional commits! This is much better for review.

It could still be even better... The best would be one single logical change per commit. Please don't change this particular commit now, but to explain what I mean, your first commit "Move Function Sub PR" could be further broken down into:

Add pip_support as argument to make_lock_spec and parse_source_files

Move parse_source_files to src_parser

Move make_lock_spec to src_parser

because this is the level to which I need to deconstruct the changes to see what's going on. (Currently I have to diff each function removed from conda_lock.py with each function added to __init__.py in order to see exactly what changed, so a verbatim cut-from-one-file and paste-in-the-other easier to process as a single logical change.)

I see, good point. I'm used to writing large PRs in general. In the future, I can definitely break down my commits even further. Would you like me to modify the last large commit of this PR? My concern with modifying that commit is that I'm not sure how to break it down without having some commit be broken. I normally try to ensure that every commit exposed to master is a somewhat-working impl of the app or library.

Large PRs are fine, it's just large commits which are difficult to understand.

Your commits don't need to be perfect, and I'm asking for a fairly high standard. I can explain how I try to write commits:

I'm not sure how to break it down without having some commit be broken.

This is a good rule in general. But in some situations I think it's fine to break something in one commit and fix it in a subsequent one. (For example, in one commit I might remove functionality X, and then in the next commit I add functionality Y which replaces X. This way the new details of Y aren't confused with the old details of X.)

What also helps is to stage partial changes. For example, in the process of implementing X, I may modify some type hints in another part of the code. In this case, I can stage and commit the type hints in a separate commit, so that my implementation of X remains focused.

The most complicated technique is to rebase code you've already committed, but that is really a lot of work.

A few concrete suggestions for how you could break up the main commit:

Switch to ruamel

Define new classes

Implement core logic using the new classes

I think I can handle the large commit as-is, but I would need to find an uninterrupted block of time where I could work through the whole thing at once. If you manage to break up the commit, then I can probably finish reviewing it sooner.

Have you spent a lot of time looking at the last 2 commits: Initial Version of SourceFile Approach and Adding Test Yaml Files. If not, I can try to split those further.

I think you already looked through the first 2 commits thoroughly, so I will leave them alone.

I think you already looked through the first 2 commits thoroughly, so I will leave them alone.

Yes, in fact, if you create a separate PR for those I think we can already merge them since they are minor refactoring changes.

For the big commit I was thinking: since this is a major change, Marius will have to review it after me. Thus it may be worthwhile to invest extra time to make it readable.

srilman · 2023-01-22T19:41:07Z

I rewrote #300 to work on top of this PR. So we would need this PR in first.

maresb

I've started review. This looks like some really great work. However, I still have a ways to go before I get a grasp on the main commit 0ad6f7b. For now I have included some initial comments.

maresb · 2023-02-05T13:40:47Z

conda_lock/conda_lock.py

@@ -243,44 +236,6 @@ def fn_to_dist_name(fn: str) -> str:
    return fn


-def make_lock_spec(


Thanks, the additional commits! This is much better for review.

It could still be even better... The best would be one single logical change per commit. Please don't change this particular commit now, but to explain what I mean, your first commit "Move Function Sub PR" could be further broken down into:

Add pip_support as argument to make_lock_spec and parse_source_files

Move parse_source_files to src_parser

Move make_lock_spec to src_parser

because this is the level to which I need to deconstruct the changes to see what's going on. (Currently I have to diff each function removed from conda_lock.py with each function added to __init__.py in order to see exactly what changed, so a verbatim cut-from-one-file and paste-in-the-other easier to process as a single logical change.)

maresb · 2023-02-05T13:46:37Z

conda_lock/src_parser/__init__.py

+    from conda_lock.src_parser.environment_yaml import parse_environment_file
+    from conda_lock.src_parser.meta_yaml import parse_meta_yaml_file
+    from conda_lock.src_parser.pyproject_toml import parse_pyproject_toml


Is there any particular reason to put these imports inside the function? For more standardized code, I'd prefer to have imports at the top of the file unless there's a good reason.

Those modules import conda_lock/src_parser/__init__.py. When I included them at the top, I was getting a circular dependency error. Those files use the SourceDependency, SourceFile, VersionedDependency, and URLDependency classes. If I move these classes to a new file like conda_lock/src_parser/models.py, then I can get rid of the circular dependency and have these be top level imports. Thoughts?

Ah, yes, circular dependencies in Python are really annoying.

Those files use the SourceDependency, SourceFile, VersionedDependency, and URLDependency classes.

Are they using them just for type hints? If so, then they are not true import cycles. In that case, you can do

from typing import TYPE_CHECKING if TYPE_CHECKING; from ... import SourceDependency, ...

and the types won't be imported at runtime, avoiding the cycle. (Very nice, I see that you already know this trick! 😄)

If it's not just for type hints, then there may be some genuinely circular logic occurring. For instance, if a imports B from c and c imports D from a, then you may need to make a new module e which contains B and D. (Then a imports B from e and c imports D from e, and all is well.)

It's not just for type hints. I will move them to a new module.

I forgot to mention in my previous comment that Python does permit certain types of circular imports.

Running from a import B is essentially equivalent to import c; B = c.B. Python will allow you to import c from a while also importing a from c, as long as you don't access the module attributes (i.e. c.B) before the module has been fully loaded. (This works when all a.B accesses occur within function definitions.)

So there is another strategy: rather than fix the cycles, try to import modules lazily, i.e. replace from c import B → import c and B → c.B. But I have the impression that this leads to very fragile code. I think it's generally much more robust to remove all non-typing-related circular imports when possible.

maresb · 2023-02-05T13:48:11Z

conda_lock/src_parser/pyproject_toml.py

@@ -380,3 +342,41 @@ def parse_pdm_pyproject_toml(
    res.dependencies.extend(dev_reqs)

    return res
+
+
+def parse_pyproject_toml(


Out of curiosity, was there any particular reason to move parse_pyproject_toml to the end in da18f01?

Was it for type annotations? (I just opened #329, which would help a lot with this.)

Mainly for code-discoverability. The general pattern in most of the modules is that the last function in a module is the main function exposed by that module and used by the rest of the program. Since parse_pyproject_toml is the main exported function, I thought it should be at the end to fit that pattern.

maresb · 2023-02-05T14:27:23Z

conda_lock/src_parser/pyproject_toml.py

+            if dep.dep.name in force_pypi:
+                dep.dep.manager = "pip"


Having dep.dep and .to_source() seems a bit awkward. I wonder if there's a more natural approach to SourceDependency?

For the dep.dep case, we can use properties to expose common attributes of Dependency objects like name, manager, etc.

Would it make sense to have SourceDependency as a subclass of Dependency?

We'd need a SourceURLDependency and a SourceVersionedDependency in that case, right?

maresb · 2023-02-05T14:28:06Z

conda_lock/src_parser/__init__.py

+class SourceDependency(StrictModel):
+    dep: Dependency
+    selectors: Selectors = Selectors()


Can you explain in words what's the idea behind a SourceDependency?

Sure (I can also add a comment). So we want the LockSpecification to be a mapping from environment (platform right now, but potentially other attributes in the future) to a list of dependencies. Those dependencies don't need selectors, because selectors are only used when constructing the LockSpecification for determining if a dep is required for an environment, or multiple versions to use.

Thus, a SourceDependency represents a dependency with any additional info associated with it from a source file. Right now, its just selectors, but in the future, we may have other limiters like min or max Python version.

Would something like this make sense as a docstring?

A SourceDependency represents information about a particular dependency specification which has been extracted from a SourceFile (e.g. environment.yml or pyproject.toml). Generally, information about the target environment (e.g. platform) is not included, but it might be specified by including selectors.

Please rewrite in case I misunderstood the details.

That's perfect! Will add

srilman marked this pull request as ready for review January 16, 2023 15:15

srilman mentioned this pull request Jan 21, 2023

Lockfile-Related Code Refactored Before Sourcefile PR #320

Merged

srilman force-pushed the new-sourcefile-approach branch from 0b6d2ee to e0e6412 Compare January 22, 2023 15:28

srilman commented Jan 22, 2023

View reviewed changes

srilman mentioned this pull request Jan 22, 2023

Merge Version Constraints from Multiple Input Files #300

Open

srilman added 5 commits January 28, 2023 14:13

Move Functions Sub PR

dafb50d

Move parse_pyproject_toml subfunction to end

da18f01

Initial Version of SourceFile Approach

0ad6f7b

Adding Test Yaml Files

0e9b2a6

Retry CI

48df28e

srilman force-pushed the new-sourcefile-approach branch from 97ae704 to 48df28e Compare January 28, 2023 19:26

Retry CI

3fd7106

maresb reviewed Feb 5, 2023

View reviewed changes

This was referenced Feb 9, 2023

Default platforms are unexpectedly added from multiple sources #337

Closed

[WIP] Refactoring on Source Parsing Related Functions from #316 #341

Closed

Refactoring on Source Parsing Related Functions #347

Merged

srilman mentioned this pull request Mar 5, 2023

Refactor LockSpecification as a Dictionary from Platforms to List of Deps #383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Source File and Lock Specification Approach #316

New Source File and Lock Specification Approach #316

srilman commented Jan 16, 2023 •

edited

Loading

netlify bot commented Jan 16, 2023 •

edited

Loading

mariusvniekerk commented Jan 16, 2023 •

edited

Loading

srilman commented Jan 21, 2023

maresb commented Jan 22, 2023

srilman Jan 22, 2023

maresb Jan 23, 2023

srilman Jan 28, 2023

maresb Feb 5, 2023 •

edited

Loading

srilman Feb 5, 2023

maresb Feb 6, 2023

srilman Feb 6, 2023

srilman Feb 6, 2023

maresb Feb 8, 2023

srilman commented Jan 22, 2023

maresb left a comment

maresb Feb 5, 2023 •

edited

Loading

maresb Feb 5, 2023

srilman Feb 5, 2023 •

edited

Loading

maresb Feb 6, 2023

srilman Feb 6, 2023

maresb Feb 8, 2023

maresb Feb 5, 2023

srilman Feb 5, 2023

maresb Feb 5, 2023

srilman Feb 5, 2023

maresb Feb 6, 2023

srilman Feb 6, 2023

maresb Feb 5, 2023

srilman Feb 5, 2023 •

edited

Loading

maresb Feb 6, 2023

srilman Feb 6, 2023

		@@ -243,44 +236,6 @@ def fn_to_dist_name(fn: str) -> str:
		return fn


		def make_lock_spec(

New Source File and Lock Specification Approach #316

Are you sure you want to change the base?

New Source File and Lock Specification Approach #316

Conversation

srilman commented Jan 16, 2023 • edited Loading

netlify bot commented Jan 16, 2023 • edited Loading

✅ Deploy Preview for conda-lock ready!

mariusvniekerk commented Jan 16, 2023 • edited Loading

srilman commented Jan 21, 2023

maresb commented Jan 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maresb Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srilman commented Jan 22, 2023

maresb left a comment

Choose a reason for hiding this comment

maresb Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srilman Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srilman Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srilman commented Jan 16, 2023 •

edited

Loading

netlify bot commented Jan 16, 2023 •

edited

Loading

mariusvniekerk commented Jan 16, 2023 •

edited

Loading

maresb Feb 5, 2023 •

edited

Loading

maresb Feb 5, 2023 •

edited

Loading

srilman Feb 5, 2023 •

edited

Loading

srilman Feb 5, 2023 •

edited

Loading