attempt_download_from_hub, using newly uploaded https://huggingface.co/Ultralytics/YOLOv5/tree/main #261

topherbuckley · 2024-12-05T13:49:31Z

Summary

updating attempt_download_from_hub to make use of newly uploaded ultralytics yolov5 models on huggingface.

Why Draft?

Hello,

Been using this for a few months now, so thanks for creating it! I could not find any contribution guidelines, so was weary to submit any sort of PR, but saw a few recent external contributor merges, so thought I'd give it a shot. I'm happy to rework this as necessary.

I also note that I have not accounted for how this will affect the current usage of attempt_download_from_hub, so would have to consider how to make it back-compatible if that is desired. I know this does not pass unittests as is so expect to make some changes after discussing with you.

grep -R attempt_download_from_hub **/*.py
classify/train.py:from yolov5.utils.downloads import attempt_download_from_hub
classify/train.py:    result = attempt_download_from_hub(opt.model, hf_token=None)
models/common.py:from yolov5.utils.downloads import attempt_download_from_hub
models/common.py:        result = attempt_download_from_hub(w, hf_token=hf_token, revision=revision)
segment/train.py:from yolov5.utils.downloads import attempt_download, is_url, attempt_download_from_hub
segment/train.py:    result = attempt_download_from_hub(weights, hf_token=None)
train.py:from yolov5.utils.downloads import attempt_download, is_url, attempt_download_from_hub
train.py:    result = attempt_download_from_hub(weights, hf_token=None)
utils/downloads.py:    result = attempt_download_from_hub(repo, hf_token=hf_token, model_file=file)
utils/downloads.py:def attempt_download_from_hub(repo_id, hf_token=None, revision=None, model_file=None):

I need to clarify a few things before I can make any meaningful refactors though. Is the current implementation regarding passing file as repo_id intentional? And if so why? I would like to align this PR with your design goals:

In attempt_download

result = attempt_download_from_hub(file, hf_token=hf_token)

it seems to pass a file, which according to your README should be a filename like yolov5.pt,

# load pretrained model
model = yolov5.load('yolov5s.pt')

despite the attempt_download_from_hub function definition specifying the first arg as repo_id rather than a file:

def attempt_download_from_hub(repo_id, hf_token=None, revision=None):

Furthermore a few lines below this file is used in list_repo_files as the repo_id:

repo_files = list_repo_files(repo_id=repo_id, repo_type='model', token=hf_token)

despite the HfApi calling for the actual repo_id not a filename.

Ideally, for my own purposes I'd like to see:
model_file = [f for f in repo_files if f.endswith('.pt')][0]
replaced by something that can specify the actual file, rather than just the first in a list of many, but replacing this completely would likely cause further issues with refactoring existing usage, thus why I made some attempt to retain this in this PR.

I see in your tests that you use

    YOLOV5N_HUB_ID = "fcakyon/yolov5n-v7.0"
    YOLOV5S_HUB_ID = "fcakyon/yolov5s-v7.0"
    YOLOV5N_MODEL_PATH = "yolov5/weights/yolov5n.pt"
    YOLOV5S_MODEL_PATH = "yolov5/weights/yolov5s.pt"
    YOLOV5L_MODEL_PATH = "yolov5/weights/yolov5l.pt"

Setting a breakpoint above result = attempt_download_from_hub(file, hf_token=hf_token) here it appears for all TestConstants this returns None, so I'm guessing this is simply an error, but let me know if it has an otherwise purpose.

Looking at how you set up the unittests, it seems you are trying to handle the input to yolov5.load() using either a repo_id or a weights file as a single str arg. This seems quite complicated to me. Why not just align the input args with the hf_hub_download that you are wrapping, so as to take in repo_id and filename separately? If not for aligning with hf, the ultralytics yolov5 README also seems to separate them with their example of loading a model in their README with:

torch.hub.load("ultralytics/yolov5", "yolov5s")

If you provided a similar interface, this would seem to work for both github and hf in a way that supports the multi-model in one repo way provided in the ultralytics huggingface repo. I know hf has a dogma of one-repo one-model, but if you are already implementing the search/match for the github releases, I'm not sure why you wouldn't use the existing hf_hub_download's ability to do the same already, especially since you already have both file and repo_id in the scope at this point of the code.

After I understand your design goals surrounding this I can update this draft PR to include some ideas for refactoring in ways that should keep existing usages from breaking.

fcakyon · 2024-12-06T13:02:41Z

@topherbuckley ill look into it once i found the time 🚀

Initial idea

8afa2cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attempt_download_from_hub, using newly uploaded https://huggingface.co/Ultralytics/YOLOv5/tree/main #261

attempt_download_from_hub, using newly uploaded https://huggingface.co/Ultralytics/YOLOv5/tree/main #261

topherbuckley commented Dec 5, 2024

fcakyon commented Dec 6, 2024

attempt_download_from_hub, using newly uploaded https://huggingface.co/Ultralytics/YOLOv5/tree/main #261

Are you sure you want to change the base?

attempt_download_from_hub, using newly uploaded https://huggingface.co/Ultralytics/YOLOv5/tree/main #261

Conversation

topherbuckley commented Dec 5, 2024

Summary

Why Draft?

fcakyon commented Dec 6, 2024