error when using the -untie_encoder param for DPR - config.json not in the correct path. #69

lboesen · 2022-11-28T16:32:58Z

Hi,

I had the following issue today:

When using the -untie_encoder params in this guide: https://github.com/texttron/tevatron/blob/main/examples/example_dpr.md
When done training the DPR model it does not include the config.json in the --output_dir.

This gives issues when trying the encode corpus and queries.
I moved the config.json from either the passage_model/ or the query_model/ folder to the path stated in the "--output_dir":

MXueguang · 2022-11-28T17:34:18Z

Hi @lboesen
--untie_encoder need to be added during encoding also. I forgot to mention that in the doc.

lboesen · 2022-11-28T19:54:13Z

Hi @MXueguang,
Thank you for the quick reply!

lboesen · 2022-11-28T21:26:14Z

Hi @MXueguang,

(I was a little too quick closing the issues :))

I tried including the --untie_encoder param when encoding the corpus and queries, but get following error message:

OSError: /home/model_runs/DPR does not appear to have a file named config.json. Checkout 'https://huggingface.co//home/model_runs/DPR/None' for available files.

Since passage_model/config.json and query_model/config.json are identical I move the config.json file from one of them out to the path where the path containing all relevant files for a BertTokenizerFast tokenizer.
And looks like its loading the correct weights:
11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - found separate weight for query/passage encoders
11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - loading query model weight from /home/model_runs/DPR/query_model
11/28/2022 22:20:24 - INFO - tevatron.modeling.encoder - loading passage model weight from /home/model_runs/DPR/passage_model

Tried to check if I needed to direct the path directly to the passage_model folder:

If I directed the --model_name_or_path .../passage_model (for corpus encoding and verse visa for the queries encoding) it gives following error:

OSError: Can't load tokenizer for '../passage_model'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../passage_model' is the correct path to a directory containing all relevant files for a BertTokenizerFast tokenizer.

So currently I can only get it to work if I include the --untie_encoder params and move the config.json out of either passage_model or query_model folder and out the the path containing the tokenizer.

Bauhinia-bloom · 2023-06-18T01:53:08Z

Hi @MXueguang,

(I was a little too quick closing the issues :))

I tried including the --untie_encoder param when encoding the corpus and queries, but get following error message:

OSError: /home/model_runs/DPR does not appear to have a file named config.json. Checkout 'https://huggingface.co//home/model_runs/DPR/None' for available files.

Since passage_model/config.json and query_model/config.json are identical I move the config.json file from one of them out to the path where the path containing all relevant files for a BertTokenizerFast tokenizer. And looks like its loading the correct weights: 11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - found separate weight for query/passage encoders 11/28/2022 22:20:23 - INFO - tevatron.modeling.encoder - loading query model weight from /home/model_runs/DPR/query_model 11/28/2022 22:20:24 - INFO - tevatron.modeling.encoder - loading passage model weight from /home/model_runs/DPR/passage_model

Tried to check if I needed to direct the path directly to the passage_model folder:

If I directed the --model_name_or_path .../passage_model (for corpus encoding and verse visa for the queries encoding) it gives following error:

OSError: Can't load tokenizer for '../passage_model'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../passage_model' is the correct path to a directory containing all relevant files for a BertTokenizerFast tokenizer.

So currently I can only get it to work if I include the --untie_encoder params and move the config.json out of either passage_model or query_model folder and out the the path containing the tokenizer.

Same problem

lboesen closed this as completed Nov 28, 2022

lboesen reopened this Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error when using the -untie_encoder param for DPR - config.json not in the correct path. #69

error when using the -untie_encoder param for DPR - config.json not in the correct path. #69

lboesen commented Nov 28, 2022

MXueguang commented Nov 28, 2022

lboesen commented Nov 28, 2022

lboesen commented Nov 28, 2022 •

edited

Loading

Bauhinia-bloom commented Jun 18, 2023

error when using the -untie_encoder param for DPR - config.json not in the correct path. #69

error when using the -untie_encoder param for DPR - config.json not in the correct path. #69

Comments

lboesen commented Nov 28, 2022

MXueguang commented Nov 28, 2022

lboesen commented Nov 28, 2022

lboesen commented Nov 28, 2022 • edited Loading

Bauhinia-bloom commented Jun 18, 2023

lboesen commented Nov 28, 2022 •

edited

Loading