Fine tuning araBert for sentiment analysis #135

sin0x0 · 2021-09-18T11:29:23Z

sin0x0
Sep 18, 2021

Hi,

I am trying to fine-tune arabert for the sentiment analysis task, I am following the jupyter notebook here.

However, when It comes to training my Trainer object, I get this error:
RuntimeError: cuda runtime error (711) : peer mapping resources exhausted at /pytorch/aten/src/THC/THCGeneral.cpp:139

Thanks in advance for your help

WissamAntoun · 2021-09-18T12:44:53Z

WissamAntoun
Sep 18, 2021
Maintainer

lower the batch size, and increase the gradient accumulation to account for the change. Also, you can change the maximum sequence length in the tokenizer to make the model size smaller

5 replies

sin0x0 Sep 18, 2021
Author

I tried to change these values but I am still getting the same error.
training_args.per_device_train_batch_size = 2 training_args.per_device_eval_batch_size = 2 training_args.gradient_accumulation_steps = 8.

Also I am getting this, does that mean that I am missing something here!?

Here is the warning if the font size is too small to read:

`Some weights of the model checkpoint at bert-base-arabertv02 were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']

This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-arabertv02 and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.`

WissamAntoun Sep 18, 2021
Maintainer

The warning is fine. No problem there.

are you using multiple gpus?

sin0x0 Sep 18, 2021
Author

yes, using 10 indeed!

WissamAntoun Sep 18, 2021
Maintainer

I think it has to do with this https://forums.developer.nvidia.com/t/cuda-peer-resources-error-when-running-on-more-than-8-k80s-aws-p2-16xlarge/45351 .

Not sure how to fix it though. You should check the trainer arguments maybe they some flag there that should be enabled

sin0x0 Sep 18, 2021
Author

so to whoever wants to use multiple GPUs, and faces the above issues, please use this code to make it work,

import os
import torch
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" 
os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2" # use as much as you want! I am only using 3 here

@WissamAntoun Thanks man for your help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning araBert for sentiment analysis #135

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Fine tuning araBert for sentiment analysis #135

sin0x0 Sep 18, 2021

Replies: 1 comment · 5 replies

WissamAntoun Sep 18, 2021 Maintainer

sin0x0 Sep 18, 2021 Author

WissamAntoun Sep 18, 2021 Maintainer

sin0x0 Sep 18, 2021 Author

WissamAntoun Sep 18, 2021 Maintainer

sin0x0 Sep 18, 2021 Author

sin0x0
Sep 18, 2021

Replies: 1 comment 5 replies

WissamAntoun
Sep 18, 2021
Maintainer

sin0x0 Sep 18, 2021
Author

WissamAntoun Sep 18, 2021
Maintainer

sin0x0 Sep 18, 2021
Author

WissamAntoun Sep 18, 2021
Maintainer

sin0x0 Sep 18, 2021
Author