Tok2Vec loss is increasing during training. Bad performance on short sentences #13505

skarokin · 2024-05-22T16:18:22Z

skarokin
May 22, 2024

I'm training tok2vec, parser, tagger, and morphologizer with vectors=null. I'm trying to train the parser and tagger on ungrammatical sentences so I can't use pretrained vectors.

I used spacy init config with the --efficiency flag and --gpu and left all parameters default except the ones described below.

My training data is OntoNotes 5.0 with ~30% of the data augmented with some predetermined grammatical errors, then copied and appended to the dataset as new .conllu files containing ~120 augmented sentences each.

I've tried training many times with learning rates ranging from 0.01 to 0.0005, with batch sizes ranging from 128 to 1024, and the width and depth of [components.tok2vec.model.encode] with width=96,128 and depth=4,6

Another weird thing is that my model seems to be more performant on test data when Tok2Vec losses are higher. For example...

With a 0.0005 learning rate, 128 width and 6 depth, and 128 batch size, I get losses Tok2Vec around the 40,000 mark
With a 0.001 learning rate, 96 width and 4 depth, and 1000 batch size, I get Tok2Vec losses around the 200,000 mark

The model with 200,000 Tok2Vec loss performs much better on test data and my domain data than the model with 40,000 loss.

Is loss generally not an important metric for this use case? If it helps all of the models I've trained fall within 1% of these below metrics

tag_acc: 0.94
dep_uas: 0.88
dep_las: 0.85
pos_acc: 0.94

Below is my most recent training with a 0.001 learning rate, 96 width and 4 depth, and 1000 batch size

============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'tagger', 'parser', 'morphologizer']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS TAGGER  LOSS PARSER  LOSS MORPH...  TAG_ACC  DEP_UAS  DEP_LAS  SENTS_F  POS_ACC  MORPH_ACC  SCORE
---  ------  ------------  -----------  -----------  -------------  -------  -------  -------  -------  -------  ---------  ------
  0       0          0.00        34.45        74.94          32.91     3.55     5.36     5.36     0.06     5.65     100.00    0.21
  0     200       3671.10     15544.63     29636.27       12875.67    79.99    64.25    53.36    64.29    84.47     100.00    0.78
  0     400       6122.01      8527.39     21478.64        7232.17    85.60    68.29    60.50    55.89    87.74     100.00    0.82
  0     600       7875.70      7110.92     19854.54        6374.54    87.58    72.79    66.22    77.49    88.74     100.00    0.85
  0     800      10626.38      7334.66     20414.19        6681.57    88.16    74.79    68.74    74.57    89.34     100.00    0.86
  0    1000      12497.03      7079.43     20700.78        6460.47    88.74    72.46    66.41    88.19    89.51     100.00    0.85
  0    1200      13249.71      6617.31     18984.02        6154.44    89.15    77.14    71.71    85.28    90.08     100.00    0.87
  0    1400      16585.27      6755.39     21108.73        6182.91    89.66    78.28    73.25    89.07    90.49     100.00    0.88
  0    1600      19433.66      7734.85     22045.42        7117.11    89.97    78.20    73.63    83.05    90.65     100.00    0.88
  0    1800      25004.19      9749.74     26334.80        9074.58    90.32    78.23    73.90    85.98    91.06     100.00    0.88
  0    2000      32342.80     11602.64     30763.73       10857.76    90.58    80.33    76.00    89.32    91.27     100.00    0.89
  0    2200      41789.79     14623.06     37912.22       13678.05    90.77    80.53    76.70    87.24    91.44     100.00    0.89
  0    2400      53496.32     18154.42     47358.88       16956.93    91.07    81.74    77.91    90.23    91.75     100.00    0.90
  0    2600      66091.55     21120.15     55719.15       19724.31    91.30    82.12    78.39    90.17    91.92     100.00    0.90
  0    2800      81112.81     25094.07     66231.19       23453.12    91.29    82.47    78.99    91.01    91.81     100.00    0.90
  0    3000      73373.34     23909.98     63588.79       22541.92    91.65    82.81    79.40    90.91    92.19     100.00    0.90
  0    3200      77278.72     23860.34     64321.95       22475.00    91.77    83.35    80.02    90.60    92.29     100.00    0.91
  0    3400      78371.58     23869.69     62264.29       22379.74    91.82    83.58    80.25    90.00    92.30     100.00    0.91
  0    3600      78464.48     22892.17     61567.24       21380.37    91.86    83.63    80.38    90.59    92.38     100.00    0.91
  0    3800      79674.21     22778.31     59858.77       21476.61    92.02    83.82    80.67    90.09    92.51     100.00    0.91
  0    4000      81558.47     23885.97     60889.65       22482.16    91.94    84.15    81.08    89.97    92.46     100.00    0.91
  1    4200      84901.90     23075.82     60496.14       21870.60    92.18    84.52    81.39    90.62    92.66     100.00    0.91
  1    4400      83691.64     21271.93     58560.56       19981.63    92.32    84.46    81.42    90.12    92.74     100.00    0.91
  1    4600      88780.74     22699.36     57487.77       21426.60    92.44    84.62    81.56    90.75    92.88     100.00    0.92
  1    4800      86760.54     20759.33     56566.60       19573.03    92.42    84.87    81.87    90.96    92.88     100.00    0.92
  1    5000      88316.69     19725.81     55966.53       18665.24    92.52    84.76    81.84    89.06    92.98     100.00    0.92
  1    5200      98452.06     22432.98     58614.21       21232.86    92.58    85.14    82.11    91.53    93.01     100.00    0.92
  1    5400      92130.02     20233.63     56978.50       19052.27    92.57    85.04    82.05    90.35    93.01     100.00    0.92
  1    5600      91514.38     20022.59     54300.63       18866.15    92.70    85.29    82.29    91.47    93.12     100.00    0.92
  1    5800      97730.25     19128.56     54313.89       17852.09    92.81    85.49    82.59    89.53    93.25     100.00    0.92
  1    6000      98498.84     20129.32     55281.29       18919.94    92.83    85.55    82.61    91.31    93.32     100.00    0.92
  1    6200     107000.29     21654.17     56449.14       20435.14    92.84    85.58    82.75    89.62    93.32     100.00    0.92
  1    6400     100281.59     19604.37     53715.79       18421.44    92.99    85.64    82.77    89.88    93.45     100.00    0.92
  1    6600     110774.58     21319.38     55219.43       20141.39    93.04    86.07    83.23    90.52    93.45     100.00    0.92
  2    6800     105504.30     19397.68     52036.96       18275.88    93.14    86.13    83.26    90.36    93.56     100.00    0.92
  2    7000     114037.98     18938.12     52683.17       17724.58    93.18    86.38    83.50    90.29    93.62     100.00    0.93
  2    7200     111715.39     18469.44     51432.09       17389.62    93.18    86.41    83.56    90.95    93.64     100.00    0.93
  2    7400     110170.98     17658.10     51743.43       16541.27    93.22    86.28    83.41    91.07    93.60     100.00    0.93
  2    7600     119120.89     19321.12     52249.10       18119.09    93.26    86.38    83.57    89.65    93.73     100.00    0.93
  2    7800     113480.73     17359.63     50625.31       16365.83    93.29    86.46    83.73    89.75    93.76     100.00    0.93
  2    8000     121214.19     18514.69     51905.43       17479.03    93.37    86.69    83.83    90.77    93.82     100.00    0.93
  2    8200     124178.49     18874.62     51887.46       17731.02    93.32    86.56    83.79    91.30    93.80     100.00    0.93
  2    8400     124607.47     18523.45     50048.07       17426.39    93.40    86.69    83.92    90.25    93.83     100.00    0.93
  2    8600     129470.30     17639.02     51285.08       16316.82    93.44    86.86    84.12    90.79    93.87     100.00    0.93
  2    8800     122415.31     17678.33     50160.06       16526.76    93.40    86.94    84.14    91.49    93.89     100.00    0.93
  2    9000     130812.84     17319.81     50124.97       16172.58    93.34    86.45    83.69    91.36    93.84     100.00    0.93
  3    9200     138580.18     16942.47     48736.78       16024.90    93.55    86.98    84.23    90.56    93.99     100.00    0.93
  3    9400     131658.01     16272.24     48145.75       15183.39    93.53    86.74    83.99    91.75    93.96     100.00    0.93
  3    9600     141750.65     17287.79     48173.69       16165.50    92.85    86.01    83.42    89.63    93.46     100.00    0.92
  3    9800     136904.36     16313.87     47929.09       15212.85    93.33    87.04    84.33    90.52    93.87     100.00    0.93
  3   10000     141798.59     16682.32     48130.67       15622.50    93.58    86.85    84.09    90.20    94.01     100.00    0.93
  3   10200     139808.15     17045.54     47789.60       15859.69    93.57    87.10    84.40    90.47    94.01     100.00    0.93
  3   10400     151053.76     17122.76     48881.50       15920.39    93.49    87.22    84.50    90.67    93.99     100.00    0.93
  3   10600     156880.96     18151.34     50129.81       17084.74    93.58    86.71    84.02    90.34    94.04     100.00    0.93
  3   10800     149389.82     16918.85     48721.72       15888.14    93.66    87.33    84.64    90.50    94.12     100.00    0.93
  3   11000     148716.26     16468.70     46817.69       15291.49    93.52    87.39    84.70    91.11    94.00     100.00    0.93
  3   11200     156776.32     16738.37     48271.91       15694.71    93.71    87.38    84.72    90.98    94.12     100.00    0.93
  3   11400     164258.34     17227.40     48418.69       16015.44    93.70    86.95    84.28    91.17    94.11     100.00    0.93
  4   11600     157411.18     15615.45     46844.56       14573.71    93.65    87.47    84.75    90.73    94.10     100.00    0.93
  4   11800     164857.14     15214.39     46767.08       14192.62    93.56    87.37    84.75    90.19    94.05     100.00    0.93
  4   12000     159305.73     15649.91     45221.24       14575.71    93.66    87.22    84.56    90.14    94.10     100.00    0.93
  4   12200     181995.95     16308.80     46807.58       15178.10    93.60    87.29    84.69    90.63    94.10     100.00    0.93
  4   12400     189812.43     16031.61     47668.99       14948.82    93.36    86.89    84.32    90.48    93.88     100.00    0.93
  4   12600     167664.29     15180.50     45535.96       14216.23    93.65    87.37    84.74    90.16    94.14     100.00    0.93
  4   12800     190793.90     16962.98     46476.78       15881.70    92.95    86.75    84.18    89.37    93.57     100.00    0.93

One final thing to note is that my model performs very poorly on short sentences. For example, using en_core_web_sm, this is the result of printing token.text_, token.tag_, token.dep_, token.head.text on 'She is beautiful.'

She PRP nsubj is
is VBZ ROOT is
beautiful JJ acomp is
. . punct is

When using the model I showed metrics for above, I get this for model-best and model-last respectively

She PRP nsubj beautiful
is VBZ cop beautiful
beautiful JJ ROOT beautiful
. . punct beautiful

She XX ROOT She
is XX dep She
beautiful XX dep She
. . dep She

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tok2Vec loss is increasing during training. Bad performance on short sentences #13505

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Tok2Vec loss is increasing during training. Bad performance on short sentences #13505

skarokin May 22, 2024

Replies: 0 comments

skarokin
May 22, 2024