-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should be the right data format for fine-tuning and inference? #5
Comments
Currently the data is extracted through ParlAI using ParlAIExtractor. If you run the standalone script (online_dialog_eval/data.py) then you'll see the data format the code expects. |
Could you give a complete pipeline? Suppose I have origin train, valid and test data (Just some native dialogs). What are the steps I need to fine tune MaUde and do inference? |
Could you tell me in ParlAIExtractor,which function is used to read data from the file? Beacuse finding data format from more than 1000 lines of code in data.py is really a hard work...... |
Also, when computing backtranslation and corruption files, what should the data format be? |
The |
@nonstopfor I just released the entire data used and processed in PersonChat dialog (backtranslation / corruption), which is in this readme. You can view the data format from these files. |
In this directory, what is the origin data file? And what do these .csv files come from? |
I want to fine tune MaUde on my own data and use the fine tuned model to do inference. But I don't know the right data format (including train data and test data) . Does anyone know that?
The text was updated successfully, but these errors were encountered: