What should be the right data format for fine-tuning and inference? #5

nonstopfor · 2020-08-16T14:13:35Z

I want to fine tune MaUde on my own data and use the fine tuned model to do inference. But I don't know the right data format (including train data and test data) . Does anyone know that?

koustuvsinha · 2020-08-17T14:25:18Z

Currently the data is extracted through ParlAI using ParlAIExtractor. If you run the standalone script (online_dialog_eval/data.py) then you'll see the data format the code expects.

nonstopfor · 2020-08-18T14:49:39Z

Currently the data is extracted through ParlAI using ParlAIExtractor. If you run the standalone script (online_dialog_eval/data.py) then you'll see the data format the code expects.

Could you give a complete pipeline? Suppose I have origin train, valid and test data (Just some native dialogs). What are the steps I need to fine tune MaUde and do inference?

nonstopfor · 2020-09-03T13:26:56Z

Could you tell me in ParlAIExtractor，which function is used to read data from the file? Beacuse finding data format from more than 1000 lines of code in data.py is really a hard work......

nonstopfor · 2020-09-03T13:35:39Z

Also, when computing backtranslation and corruption files, what should the data format be?

koustuvsinha · 2020-09-06T22:43:36Z

The extract_interactions def from ParlAIExtractor is used to build the data. I would suggest you to read ParlAI docs to understand how the data is internally represented, as this repo is heavily dependent on it (as in, we don't have a standard input/output file).

koustuvsinha · 2020-09-06T23:35:52Z

@nonstopfor I just released the entire data used and processed in PersonChat dialog (backtranslation / corruption), which is in this readme. You can view the data format from these files.

nonstopfor · 2020-09-15T08:36:23Z

@nonstopfor I just released the entire data used and processed in PersonChat dialog (backtranslation / corruption), which is in this readme. You can view the data format from these files.

In this directory, what is the origin data file? And what do these .csv files come from?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should be the right data format for fine-tuning and inference? #5

What should be the right data format for fine-tuning and inference? #5

nonstopfor commented Aug 16, 2020

koustuvsinha commented Aug 17, 2020

nonstopfor commented Aug 18, 2020

nonstopfor commented Sep 3, 2020

nonstopfor commented Sep 3, 2020

koustuvsinha commented Sep 6, 2020

koustuvsinha commented Sep 6, 2020

nonstopfor commented Sep 15, 2020

What should be the right data format for fine-tuning and inference? #5

What should be the right data format for fine-tuning and inference? #5

Comments

nonstopfor commented Aug 16, 2020

koustuvsinha commented Aug 17, 2020

nonstopfor commented Aug 18, 2020

nonstopfor commented Sep 3, 2020

nonstopfor commented Sep 3, 2020

koustuvsinha commented Sep 6, 2020

koustuvsinha commented Sep 6, 2020

nonstopfor commented Sep 15, 2020