Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt_download_from_hub, using newly uploaded https://huggingface.co/Ultralytics/YOLOv5/tree/main #261

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

topherbuckley
Copy link

Summary

updating attempt_download_from_hub to make use of newly uploaded ultralytics yolov5 models on huggingface.

Why Draft?

Hello,

Been using this for a few months now, so thanks for creating it! I could not find any contribution guidelines, so was weary to submit any sort of PR, but saw a few recent external contributor merges, so thought I'd give it a shot. I'm happy to rework this as necessary.

I also note that I have not accounted for how this will affect the current usage of attempt_download_from_hub, so would have to consider how to make it back-compatible if that is desired. I know this does not pass unittests as is so expect to make some changes after discussing with you.

grep -R attempt_download_from_hub **/*.py
classify/train.py:from yolov5.utils.downloads import attempt_download_from_hub
classify/train.py:    result = attempt_download_from_hub(opt.model, hf_token=None)
models/common.py:from yolov5.utils.downloads import attempt_download_from_hub
models/common.py:        result = attempt_download_from_hub(w, hf_token=hf_token, revision=revision)
segment/train.py:from yolov5.utils.downloads import attempt_download, is_url, attempt_download_from_hub
segment/train.py:    result = attempt_download_from_hub(weights, hf_token=None)
train.py:from yolov5.utils.downloads import attempt_download, is_url, attempt_download_from_hub
train.py:    result = attempt_download_from_hub(weights, hf_token=None)
utils/downloads.py:    result = attempt_download_from_hub(repo, hf_token=hf_token, model_file=file)
utils/downloads.py:def attempt_download_from_hub(repo_id, hf_token=None, revision=None, model_file=None):

I need to clarify a few things before I can make any meaningful refactors though. Is the current implementation regarding passing file as repo_id intentional? And if so why? I would like to align this PR with your design goals:

In attempt_download

result = attempt_download_from_hub(file, hf_token=hf_token)

it seems to pass a file, which according to your README should be a filename like yolov5.pt,

# load pretrained model
model = yolov5.load('yolov5s.pt')

despite the attempt_download_from_hub function definition specifying the first arg as repo_id rather than a file:

def attempt_download_from_hub(repo_id, hf_token=None, revision=None):

Furthermore a few lines below this file is used in list_repo_files as the repo_id:

repo_files = list_repo_files(repo_id=repo_id, repo_type='model', token=hf_token)

despite the HfApi calling for the actual repo_id not a filename.

Ideally, for my own purposes I'd like to see:
model_file = [f for f in repo_files if f.endswith('.pt')][0]
replaced by something that can specify the actual file, rather than just the first in a list of many, but replacing this completely would likely cause further issues with refactoring existing usage, thus why I made some attempt to retain this in this PR.

I see in your tests that you use

    YOLOV5N_HUB_ID = "fcakyon/yolov5n-v7.0"
    YOLOV5S_HUB_ID = "fcakyon/yolov5s-v7.0"
    YOLOV5N_MODEL_PATH = "yolov5/weights/yolov5n.pt"
    YOLOV5S_MODEL_PATH = "yolov5/weights/yolov5s.pt"
    YOLOV5L_MODEL_PATH = "yolov5/weights/yolov5l.pt"

Setting a breakpoint above result = attempt_download_from_hub(file, hf_token=hf_token) here it appears for all TestConstants this returns None, so I'm guessing this is simply an error, but let me know if it has an otherwise purpose.

Looking at how you set up the unittests, it seems you are trying to handle the input to yolov5.load() using either a repo_id or a weights file as a single str arg. This seems quite complicated to me. Why not just align the input args with the hf_hub_download that you are wrapping, so as to take in repo_id and filename separately? If not for aligning with hf, the ultralytics yolov5 README also seems to separate them with their example of loading a model in their README with:

torch.hub.load("ultralytics/yolov5", "yolov5s")

If you provided a similar interface, this would seem to work for both github and hf in a way that supports the multi-model in one repo way provided in the ultralytics huggingface repo. I know hf has a dogma of one-repo one-model, but if you are already implementing the search/match for the github releases, I'm not sure why you wouldn't use the existing hf_hub_download's ability to do the same already, especially since you already have both file and repo_id in the scope at this point of the code.

After I understand your design goals surrounding this I can update this draft PR to include some ideas for refactoring in ways that should keep existing usages from breaking.

@fcakyon
Copy link
Owner

fcakyon commented Dec 6, 2024

@topherbuckley ill look into it once i found the time 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants