Can I delete Vocab entries without reloading the whole model ? #12326
Replies: 1 comment
-
Your understanding is correct. There are two growing caches in the vocab, the lexeme cache in How often are you running into OOM errors? If this is happening very often (will vary by load obviously, but more than once a day-ish?), it might indicate a separate issue? If RAM is always very limited or 6 seconds of downtime is an issue, it sounds like it might be worthwhile to consider having multiple servers? |
Beta Was this translation helpful? Give feedback.
-
Hello!
I read different issues/discussions talking about the growth of vocabulary and the answer is pretty much always: "reload the model every now and then" but I don't understand why.
Context
I have a docker container managed by a kube instance which need to be always up with a minimum response time.
This service uses
nlp.pipe
with an infinite generator which yields strings as they come.The flow of the service is straightforward:
API receive string -> String is sent to a Queue-> queue appends the string in the infinite generator -> doc is processed and returned by the API.
Problem
My API can receive anything so, often, it receives unknown tokens which make the vocab grow and lead too an OOM.
The solution "reloads the model" takes 6 seconds (using trf or lg models). So it either doubles the RAM usage if I'm loading a new one while the old one is working or I have 6s downtime.
My understanding
So correct me if I misunderstood it, but the reason it is recommended to reload the model is to make sure no doc is load in memory, because all doc processed by the model share the same vocab. So removing some lexeme means some tokens in the already processed docs could lead to the wrong lexeme.
Questions
Is it a problem if I stop the pipeline, (meaning no doc exists in memory, because they were either returned or not yet processed.) and remove the new lexemes? Is there a proper way to do this or, if it is not possible, why?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions