✘ Config validation error #12304

@Tokenizers · 2023-02-20T07:21:22Z

RGreyWolf
Feb 20, 2023

Dear everybody!
Now, I want to train my classification model like this !python -m spacy train /content/gdrive/MyDrive/classification/config.cfg --output /content/gdrive/MyDrive/classification/output

but I got following error.

     `✘ Config validation error

training -> before_update extra fields not permitted {'dev_corpus': 'corpora.dev', 'train_corpus': 'corpora.train', 'seed': 0, 'gpu_allocator': None, 'dropout': 0.1, 'accumulate_gradient': 1, 'patience': 1600, 'max_epochs': 0, 'max_steps': 20000, 'eval_frequency': 200, 'frozen_components': [], 'annotating_components': [], 'before_to_disk': None, 'before_update': None, 'batcher': {'@batchers': 'spacy.batch_by_words.v1', 'discard_oversize': False, 'tolerance': 0.2, 'get_length': None, 'size': {'@schedules': 'compounding.v1', 'start': 100, 'stop': 1000, 'compound': 1.001, 't': 0.0}}, 'logger': {'@loggers': 'spacy.ConsoleLogger.v1', 'progress_bar': False}, 'optimizer': {'@optimizers': 'Adam.v1', 'beta1': 0.9, 'beta2': 0.999, 'L2_is_weight_decay': True, 'L2': 0.01, 'grad_clip': 1.0, 'use_averages': False, 'eps': 1e-08, 'learn_rate': 0.001}, 'score_weights': {'cats_score': 1.0, 'cats_score_desc': None, 'cats_micro_p': None, 'cats_micro_r': None, 'cats_micro_f': None, 'cats_macro_p': None, 'cats_macro_r': None, 'cats_macro_f': None, 'cats_macro_auc': None, 'cats_f_per_type': None}}

My entire config file is like this :

`[paths]
train = "./train.spacy"
dev = "./valid.spacy"
vectors = null
init_tok2vec = null

[system]
gpu_allocator = null
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","textcat"]
batch_size = 32
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@Tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.textcat]
factory = "textcat"
scorer = {"@scorers":"spacy.textcat_scorer.v1"}
threshold = 0.0

[components.textcat.model]
@architectures = "spacy.TextCatEnsemble.v2"
nO = null

[components.textcat.model.linear_model]
@architectures = "spacy.TextCatBOW.v2"
exclusive_classes = true
ngram_size = 1
no_output_layer = false
nO = null

[components.textcat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.tokenizer_config]
use_fast = true

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@Loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
cats_score = 1.0
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]`

svlandeg · 2023-02-20T12:03:00Z

svlandeg
Feb 20, 2023
Maintainer

Hi! In cases like this, it makes sense to run debug config to understand why exactly your config is not valid. If you run that and go through the config, and put this together with our docs on how the training config should look like, you should hopefully be able to figure out what's wrong. If not, let us know!

PS: for multi-line code blocks or config files, you can use 3 backticks on a line before and a line after the code block ;-)

3 replies

RGreyWolf Feb 20, 2023
Author

If I do my script on local, all goes well. But When I only try this on google colab, I get this error.
what is reason?
Thanks

lzmodel Feb 20, 2023

Hi, on colab you should specify correct paths to train and dev. This is different from local paths.

RGreyWolf Feb 20, 2023
Author

I've specified all paths like that but that problem remained yet.

svlandeg · 2023-02-21T13:25:26Z

svlandeg
Feb 21, 2023
Maintainer

Hi @johngrey0324: did you run debug config as I mentioned, and look at the docs I linked? I was trying to show you how to debug these problems yourself ;-)

If you look at the docs, you'll see that before_update is a new feature from spaCy 3.5. I'm guessing your current environment uses the latest spaCy, while colab has an older version.

1 reply

RGreyWolf Feb 22, 2023
Author

Shown result was after debugged result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✘ Config validation error #12304

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

✘ Config validation error #12304

RGreyWolf Feb 20, 2023

Replies: 2 comments · 4 replies

svlandeg Feb 20, 2023 Maintainer

RGreyWolf Feb 20, 2023 Author

lzmodel Feb 20, 2023

RGreyWolf Feb 20, 2023 Author

svlandeg Feb 21, 2023 Maintainer

RGreyWolf Feb 22, 2023 Author

RGreyWolf
Feb 20, 2023

Replies: 2 comments 4 replies

svlandeg
Feb 20, 2023
Maintainer

RGreyWolf Feb 20, 2023
Author

RGreyWolf Feb 20, 2023
Author

svlandeg
Feb 21, 2023
Maintainer

RGreyWolf Feb 22, 2023
Author