Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream changed #3

Open
JayFoxRox opened this issue Jun 14, 2020 · 8 comments
Open

Upstream changed #3

JayFoxRox opened this issue Jun 14, 2020 · 8 comments

Comments

@JayFoxRox
Copy link
Member

https://github.com/llvm-mirror/libcxx is no longer being updated. Instead, https://github.com/llvm/llvm-project should be used.

Unfortunately, this new repository is not just libcxx, but all of LLVM, so the repository is probably very large now.
I'm not sure how to address this, while also keeping it maintainable and without requiring users to download a bunch of stuff they won't need.

@thrimbor
Copy link
Member

thrimbor commented Jun 14, 2020

Hmm, I hoped they'd keep the split repo. So far the options I see are:

  • Include the full LLVM repo as a submodule (which is 2.2GiB in size)
  • Use git filter-branch to update our libcxx repo out of the LLVM repo (I have no experience with git filter-branch and therefore don't know how feasible this is - there may be conflicts whenever we try to update)

@JayFoxRox
Copy link
Member Author

Include the full LLVM repo as a submodule (which is 2.2GiB in size)

Including 2.2GiB is not an option to me.

I assume it would also be a huge issue for people trying to get into nxdk development, if they have to clone the repo for a very long time (and waste so much memory).
I don't have so much space available and it would actually mean that I'd had to stop my involvement with nxdk.

It was only an option, if we decentralized nxdk, and made binary releases (so only people who want to work on libcxx would have to clone it).

Use git filter-branch to update our libcxx repo out of the LLVM repo

Yes, I also considered this. This is also really bad, because we will have different hashes for the revisions.

So we'll manually have keep track of upstream revisions (github won't offer auto-comparisons for example). Rebasing might become an issue, because we need stability for the underlying revisions.

There's also git subtree
I'm not sure what the difference to git filter-branch is.


I guess another option is to only do very shallow clones. - but even at --depth=1 it's still 140MiB in .git and 900MiB combined with the checked out master.

@GXTX
Copy link

GXTX commented Aug 8, 2020

Including 2.2GiB is not an option to me.

I really don't see this as a large issue.
Take for example Visual Studio & required SDKs for .NET development: takes 5.1GB just for .NET Core & and a couple of targeting packs.

I don't have so much space available and it would actually mean that I'd had to stop my involvement with nxdk.

Sorry to hear this, really. Can I offer gifting you a 16GB USB drive? You can get them readily for essentially pennies.

@Teufelchen1
Copy link

Take for example Visual Studio & required SDKs for .NET development: takes 5.1GB just for .NET Core & and a couple of targeting packs.

I would heavily disagree. We don't offer an app like Visual Studio - why would our source be as big as that? Additionally that comparison is not really useful at all.
But I would agree that the personal disk space of some developers should not be part of the argument. However, generally speaking, having an insanely large repo might indeed turn of humans, who are not lucky enough to own state of the art hardware. We should at least have that in mind.

I would vote for the git subtree / git filter-branch solution. It looks a bit messy to setup and maintain but I think its our best bet? Personally, I would probably be very annoyed by a repo size >1GiB which would be the result of the suggested --depth=1 solution.

@JayFoxRox
Copy link
Member Author

JayFoxRox commented May 8, 2021

This sounds interesting https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/; especially when considering https://stackoverflow.com/questions/6238590/set-git-submodule-to-shallow-clone-sparse-checkout.
Also see https://stackoverflow.com/questions/600079/how-do-i-clone-a-subdirectory-only-of-a-git-repository/52269934#52269934

So it might be possible to have the full monorepo online, but we could checkout the individual submodule directory.

@thrimbor
Copy link
Member

thrimbor commented May 8, 2021

Hm, this would save some space for the checkout, but it appears that it still has to clone the full repo - this can be fixed with the sparse cloning options, but that comes with the usual drawbacks of requiring extra steps if you want to be able to actually work on the code in the submodule.

@JayFoxRox
Copy link
Member Author

JayFoxRox commented May 9, 2021

if you want to be able to actually work on the code in the submodule.

Yes, but at least you get to do some smaller changes. Just don't attempt to bring an IDE, do a full blame or similar (personally, I do most of that on the GitHub web-ui anyway).


I'll have to do some more testing, my favorite so far is this:

git clone https://github.com/llvm/llvm-project --sparse --filter=tree:0 libcxx
cd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx

It is about ~200MiB when initialized, but contains the entire commit history for libcxx. Problems only arise when you do git log -p and scroll to the end, because then it will download all the blobs and trees, so your repo suddenly explodes (without warning you). Annoyingly, this also happens when looking for the log of a specific file or folder... so you'd have to avoid that.
Note that https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/ also strongly discourages this workflow; however, the next best option is to keep a repository of about 500MiB (by keeping the trees, but removing blobs):

git clone https://github.com/llvm/llvm-project --sparse --filter=blob:none libcxx
cd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx

Another way to prevent exploding the repo size could be to also clone shallow-since by date or a fixed depth (a couple thousand commits should be good enough). I don't think there's a clone shallow-since by commit for some reason?
However, I was unable to make this work properly, because the shallow clone by date caused fetching of all trees and blobs.. thereby negating the filter. Fetching by explicit depth seems to work, but it's hard to control.

For users, we can do this, which is ~50MiB, but obviously shallow:

git clone https://github.com/llvm/llvm-project --sparse --filter=tree:0 libcxx --depth=1
pushd libcxx
git sparse-checkout init --cone
git sparse-checkout set libcxx

In size, this is comparable to a git clone https://github.com/llvm-mirror/libcxx.git which is ~55MiB (that is: the old repo, non-shallow, fully ready for development).
So we are still losing a lot of functionality in the migration, even with cutting-edge git features.


However, regardless of what you prefer, all of the above is inherently incompatible with submodules.

Currently, we use .gitmodules in nxdk, but it's not possible to force sparse checkout or filters for the submodule. We could only do a shallow clone from .gitmodules, but that's already large and would mean extra steps for setting up a tree for development.
Overwriting the submodule update command with a custom script is only possible locally (in ".git/" or gitconfig), and it's not allowed in ".gitmodules".

I think it's still worth considering to migrate to the monorepo. Personally I'd like to slim down nxdk anyway. If we have a simple shell script to initialize / install the packages, that's probably fine. We could have different settings for setting up a user and development tree.

As a user, I'd just shallow clone and partial clone for using nxdk anyway - heck, I'd probably even delete the git repositories after building binary libs. I'd only clone those repositories if I need to do actual changes to them.
Even then I'd probably do partial clone or shallow-clone to some degree, too (with a very high depth, but not since 2001 - libcxx was imported in 2010).


So the potential options I see are:

  1. Migrate to monorepo
    1. Drop submodules from nxdk, add some scripts and take the risky route of experimental git partial clone / dangerous shallow clone).
    2. ..or buy larger hard-drives.
  2. Set up our a LLVM / libcxx mirror (which also means breaking the commit chain, so we can't send patches upstream + we can't easily pull from upstream + we have trouble tracking bugfixes across upstream and split-repo mirror history).
    1. Pressure LLVM into doing it.
    2. .. or set up our own; potentially find other communities to maintain it with us / potentially a github org dedicated to mirror monorepos as split repos.
  3. Stick to (soon) ancient libcxx versions and the old repository, waiting how the situation develops; potentially adding pressure on git / github to support our use-case.

I think all of those options suck.

However, I think migrating to the mono-repo is the best option though:

  • We rarely touch libcxx, so most updates would likely be rebasing or pulling upstream updates. The entirety of nxdk-libcxx changes is in 4 commits by 1 author (excluding current PRs).
  • Most users don't care about having a full development tree. nxdk docs even encourage shallow-clones.
  • Availability of 1GiB of harddisk space is a niche issue in 2021. It's only needed for development (of libcxx).
  • Availability of 55MiB of harddisk space for installing libcxx as a user is a non-issue - it's comparable to current solution.
  • The situation of git is likely to improve in the future, so this will hopefully be less painfull over time.
  • Even if it's too horrible for some reason, we could probably roll-back relatively easily (we'll just have to spend some time rewriting history, or squashing up our Xbox changes).
  • Going out of date would be bad for nxdk, as it's one of the key benefits over XDK. It also means potential for bitrot.
  • Refactoring nxdk to avoid submodules is something I'd like to see anyway.

@glebm
Copy link

glebm commented Jan 16, 2023

Is there consensus on migrating to the mono repo?
A large repo is a small issue in 2023 compared to the time cost of a more complicated solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants