Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option: reduce connecting to huggingface #16298

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

w-e-w
Copy link
Collaborator

@w-e-w w-e-w commented Jul 30, 2024

Description

add option: reduce connecting to huggingface
for assets if local cache is available
note: enabling this with prevent the assets from being updated

option is disabled by default because it can prevent updates to those assets if updates are necessary


some user are not too happy with webui connecting to hugging face to fetch metadata for assets

like when loading SD3Tokenizer

class SD3Tokenizer:
def __init__(self):
clip_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
self.clip_l = SDTokenizer(tokenizer=clip_tokenizer)
self.clip_g = SDXLClipGTokenizer(clip_tokenizer)
self.t5xxl = T5XXLTokenizer()

the .from_pretrained internally calls hf_hub_download() whitch connects to hugging face (basically checking if there's any revisions to the files) and downloading the files if not locally available

one way for those sensitive users who really don't want it to connect to hugging face se to set the environment variable TRANSFORMERS_OFFLINE to 1 this would disabled connecting to hugging face
when local_files_only=True it basically it doesn't cause to hugging face checking for updates and just newest revision in the local
if the cache isn't found then it raise LocalEntryNotFoundError

note TRANSFORMERS_OFFLINE=1 sets local_files_only to True

the problem with using TRANSFORMERS_OFFLINE is that if those files haven't been downloaded, it won't be able to download them at all, which makes this rather not user friendly

even sensitive users should understand that it will have to at least make the connection once in order to download the necessary files

for these users ideally hugging face hub should have an only downloads once and don't check for updates option
but since there isn't the easiest way to add this functionality is to patch file_download and let it first try with local_files_only=True
and if it raise LocalEntryNotFoundError error try again with local_files_only=False


to prevent crushing if there's updates to huggingface_hub

  • this entire patch is wrapped inside a try block
  • check if keyword-only arg local_files_only is a parameter of file_download.hf_hub_download

this should mean so as long as huggingface_hub did not completely modify how file_download.hf_hub_download the local_files_only arg works it should not break the patch should be disabled and shouldn't cause issues
if all safety checks failed and it raise an unexpected exception it would just try the function without patch again


not too happy with my description of this option

"hd_dl_local_first": OptionInfo(False, "Prevent connecting to huggingface for assets if cache is available").info('this will also prevent assets from being updated'),


normally I would prefer to introduce this type of thing as an extension
but this happens way too early during the even before preload
unless something to pre-initialize callback is added I don't think this is possible as an extension

Checklist:

for assets if local cache is available
note: enabling this with prevent the assets from being updated
@w-e-w w-e-w force-pushed the option-reduce-connecting-to-huggingface branch from 8574528 to 842dd5e Compare July 30, 2024 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant