Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when using the -untie_encoder param for DPR - config.json not in the correct path. #69

Open
lboesen opened this issue Nov 28, 2022 · 4 comments

Comments

@lboesen
Copy link

lboesen commented Nov 28, 2022

Hi,

I had the following issue today:

When using the -untie_encoder params in this guide: https://github.com/texttron/tevatron/blob/main/examples/example_dpr.md
When done training the DPR model it does not include the config.json in the --output_dir.

This gives issues when trying the encode corpus and queries.
I moved the config.json from either the passage_model/ or the query_model/ folder to the path stated in the "--output_dir":

Screenshot 2022-11-28 at 17 28 22

@MXueguang
Copy link
Contributor

Hi @lboesen
--untie_encoder need to be added during encoding also. I forgot to mention that in the doc.

@lboesen
Copy link
Author

lboesen commented Nov 28, 2022

Hi @MXueguang,
Thank you for the quick reply!

@lboesen lboesen closed this as completed Nov 28, 2022
@lboesen lboesen reopened this Nov 28, 2022
@lboesen
Copy link
Author

lboesen commented Nov 28, 2022

Hi @MXueguang,

(I was a little too quick closing the issues :))

I tried including the --untie_encoder param when encoding the corpus and queries, but get following error message:

OSError: /home/model_runs/DPR does not appear to have a file named config.json. Checkout 'https://huggingface.co//home/model_runs/DPR/None' for available files.

Since passage_model/config.json and query_model/config.json are identical I move the config.json file from one of them out to the path where the path containing all relevant files for a BertTokenizerFast tokenizer.
And looks like its loading the correct weights:
11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - found separate weight for query/passage encoders
11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - loading query model weight from /home/model_runs/DPR/query_model
11/28/2022 22:20:24 - INFO - tevatron.modeling.encoder - loading passage model weight from /home/model_runs/DPR/passage_model


Tried to check if I needed to direct the path directly to the passage_model folder:

If I directed the --model_name_or_path .../passage_model (for corpus encoding and verse visa for the queries encoding) it gives following error:

OSError: Can't load tokenizer for '../passage_model'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../passage_model' is the correct path to a directory containing all relevant files for a BertTokenizerFast tokenizer.

So currently I can only get it to work if I include the --untie_encoder params and move the config.json out of either passage_model or query_model folder and out the the path containing the tokenizer.

@Bauhinia-bloom
Copy link

Hi @MXueguang,

(I was a little too quick closing the issues :))

I tried including the --untie_encoder param when encoding the corpus and queries, but get following error message:

OSError: /home/model_runs/DPR does not appear to have a file named config.json. Checkout 'https://huggingface.co//home/model_runs/DPR/None' for available files.

Since passage_model/config.json and query_model/config.json are identical I move the config.json file from one of them out to the path where the path containing all relevant files for a BertTokenizerFast tokenizer. And looks like its loading the correct weights: 11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - found separate weight for query/passage encoders 11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - loading query model weight from /home/model_runs/DPR/query_model 11/28/2022 22:20:24 - INFO - tevatron.modeling.encoder - loading passage model weight from /home/model_runs/DPR/passage_model

Tried to check if I needed to direct the path directly to the passage_model folder:

If I directed the --model_name_or_path .../passage_model (for corpus encoding and verse visa for the queries encoding) it gives following error:

OSError: Can't load tokenizer for '../passage_model'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../passage_model' is the correct path to a directory containing all relevant files for a BertTokenizerFast tokenizer.

So currently I can only get it to work if I include the --untie_encoder params and move the config.json out of either passage_model or query_model folder and out the the path containing the tokenizer.

Same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants