Skip to content

Why can't pickable Doc be used in doc extension attributes when multiprocessing? #5571

Discussion options

You must be logged in to vote

You can pickle a Doc, but we would strongly recommend against it because there are better ways to save the annotation you need in a more secure and much more compact format. For the core token attributes, Doc.to_array() is a good option, and for a large collection of docs, you can use DocBin.

You can't serialize a Doc with msgpack, which is what this is trying to do, since msgpack doesn't support most of the object types for Doc.

What information do you really need to save from the character-based doc overall? Can you just save the words and spaces instead? You could also consider saving the output of doc.to_array() with the features you're interested in, since that would be serializable …

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants
Converted from issue

This discussion was converted from issue #5571 on December 10, 2020 16:59.