Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a shallow-clone option for git packages #3556

Open
OgieBen opened this issue Sep 6, 2022 · 10 comments
Open

Add a shallow-clone option for git packages #3556

OgieBen opened this issue Sep 6, 2022 · 10 comments
Labels
type-enhancement A request for a change that isn't a bug

Comments

@OgieBen
Copy link

OgieBen commented Sep 6, 2022

At my company we are currently combining unity with flutter through https://pub.dev/packages/flutter_unity_widget and we host the unity widget dependency on a git repo. It gets big really quickly and we have to keep deleting tags.

I would like to submit a PR that allows a user to notify Pub to make a shallow clone of a Git repository instead of a mirror clone of the remote repository.

For example a user could use the following command:

dart pub add http --git-url=https://github.com/my/http.git --git-ref=tmpfixes --git-shallow-clone=true

or inside the pubspec.yaml file

dependencies:
 vm_service:
   git:
     url: https://dart.googlesource.com/sdk
     ref: refs/changes/80/156980/3
     path: pkg/vm_service
     shallow-clone: true

There is also the possibility of specifying the depth of the shallow clone like so, instead of passing a boolean flag:

dart pub add http --git-url=https://github.com/my/http.git --git-ref=tmpfixes --git-shallow-clone=1

or inside the pubspec.yaml file

dependencies:
 vm_service:
   git:
     url: https://dart.googlesource.com/sdk
     ref: refs/changes/80/156980/3
     path: pkg/vm_service
     shallow-clone: 1

There is a similar issue here: #2686.

@jonasfj
Copy link
Member

jonasfj commented Sep 8, 2022

This is a private package right? And the issue is that it's large, thus, the git-dependency with full clone takes up a lot of space and bandwidth.

So is to correct that the possible solutions might be:

  • Use a private package repository? (Probably, less preferable because you can't piggy back off the authentication you already have for git)
  • git LFS (maybe?), if we tweaked pub to allow it?
  • shallow git clones?

I'm curious, if you have multiple Dart SDKs installed. How shallow clones affect the PUB_CACHE and how will an old Dart SDK interact with it?
(There is possible a solution, just saying we need to figure this out)

Also how do git shallow clones actually work? How shallow are they? What does the depth mean, and when is that sensible? Are they supported by all git versions, or will we need feature detection?

Should we migrate to only use shallow clones? Or is full clones still sensible in some scenarios.

Sorry, for the dumb questions, I'm not fully versed in all details of modern git. And anything that changes layout in PUB_CACHE requires care to ensure it works when users upgrade/downgrade SDKs.

@OgieBen
Copy link
Author

OgieBen commented Sep 9, 2022

This is a private package right? And the issue is that it's large, thus, the git-dependency with full clone takes up a lot of space and bandwidth.

So is to correct that the possible solutions might be:

  • Use a private package repository? (Probably, less preferable because you can't piggy back off the authentication you already have for git)
  • git LFS (maybe?), if we tweaked pub to allow it?
  • shallow git clones?

I'm curious, if you have multiple Dart SDKs installed. How shallow clones affect the PUB_CACHE and how will an old Dart SDK interact with it? (There is possible a solution, just saying we need to figure this out)

Also how do git shallow clones actually work? How shallow are they? What does the depth mean, and when is that sensible? Are they supported by all git versions, or will we need feature detection?

Should we migrate to only use shallow clones? Or is full clones still sensible in some scenarios.

Sorry, for the dumb questions, I'm not fully versed in all details of modern git. And anything that changes layout in PUB_CACHE requires care to ensure it works when users upgrade/downgrade SDKs.

Hi @jonasfj , I think most of your questions are valid.

I am not sure the following options you suggested below will resolve the issue because we will still need to pull a large history of our repository.

Use a private package repository? (Probably, less preferable because you can't piggy back off the authentication you already have for git)
git LFS (maybe?), if we tweaked pub to allow it?

About this question:

I'm curious, if you have multiple Dart SDKs installed. How shallow clones affect the PUB_CACHE and how will an old Dart SDK interact with it?
(There is possible a solution, just saying we need to figure this out)

The project using an older version of Dart will use the same version of the package cached in PUB_CACHE. The only difference between the mirror cloned and shallow cloned version is that the shallow cloned package will have a small history or commits than the mirror clone.

This how the depth option works:

--depth
Create a shallow clone with a history truncated to the specified number of commits. Implies --single-branch unless --no-single-branch is given to fetch the histories near the tips of all branches. If you want to clone submodules shallowly, also pass --shallow-submodules.

It basically allow us to pull a specific number of commit instead of fetching the entire git repository history.

Should we migrate to only use shallow clones? Or is full clones still sensible in some scenarios.

Basically the idea is to make a mirror clone when the shallow-clone option is not provided i.e we make mirror clone the default strategy for fetching git packages and only make a git shallow clone when the shallow-clone option is provided.

@jonasfj
Copy link
Member

jonasfj commented Sep 13, 2022

Basically the idea is to make a mirror clone when the shallow-clone option is not provided i.e we make mirror clone the default strategy for fetching git packages and only make a git shallow clone when the shallow-clone option is provided.

I get that, my question is if it's better to always make a shallow clone.

@jonasfj
Copy link
Member

jonasfj commented Sep 13, 2022

Use a private package repository?

Would certainly alleviate concerns about having a huge git history.

@OgieBen
Copy link
Author

OgieBen commented Sep 13, 2022

Basically the idea is to make a mirror clone when the shallow-clone option is not provided i.e we make mirror clone the default strategy for fetching git packages and only make a git shallow clone when the shallow-clone option is provided.

I get that, my question is if it's better to always make a shallow clone.

I am not sure if it is best to always make a shallow clone but I think it will be good to have an option to make a shallow clone when making a mirror clone becomes infeasible.

@2shrestha22
Copy link

Any update on this?

@sigurdm
Copy link
Contributor

sigurdm commented Jun 9, 2023

Reading this: https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ made me think that partial blob-less clones or maybe even partial tree-less clones might work well for pub. That would save a lot of bandwidth, while working well with how eg. github is serving repos.

I guess there is still a lot of questions to answer before attempting this.

  • Can we git fetch in a tree-less fashion?
  • Will this interact well with existing pub caches with full checkouts
  • Is this too breaking to do always (now you can no longer rely on the past history of your dependencies being available offline)
  • Are there any other unintended side-effects?

@mahmuttaskiran
Copy link

We have the same issue where we have a couple of private dependencies hosted on GitHub, and one of them has a huge history, which makes pub get slow. It would be great if we had a shallow clone option.

If I'm not mistaken, the way git source dependencies are currently cloned involves cloning the repository with the entire history and then checking out the specific ref. Would there be any issues if we cloned with the ref using --depth 1? I know that's a stupid question—it has been discussed above—but I need assistance in identifying the test cases if I propose this solution as an optional parameter.

@sigurdm
Copy link
Contributor

sigurdm commented Nov 18, 2024

Would there be any issues if we cloned with the ref using --depth 1?

As I understand it, --depth 1 would risk putting extra stress on the server when switching between commits.

https://github.blog/open-source/git/get-up-to-speed-with-partial-clone-and-shallow-clone/#:~:text=the%20Git%20client.-,Shallow%20clones,-Partial%20clones%20are

@sigurdm
Copy link
Contributor

sigurdm commented Nov 18, 2024

I looked a bit into how go is doing it. We can probably learn something!

Instead of running clone, they do a init --bare,

https://github.com/golang/go/blob/db8c208cbd5e20c80c1587b0d9d4166d8238089d/src/cmd/go/internal/modfetch/codehost/git.go#L84

and fetches HEADs and tags with --depth=1

https://github.com/golang/go/blob/db8c208cbd5e20c80c1587b0d9d4166d8238089d/src/cmd/go/internal/modfetch/codehost/git.go#L541

Maybe that is the right way to go. Combined with #3806 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

5 participants