Unknown PostProcessor type: Sequence #122

jovisaib · 2024-09-05T20:37:35Z

It seems that the following post-processor block in tokenizer.json is not supported:

  "post_processor": {
    "type": "Sequence",
    "processors": [
      {
        "type": "ByteLevel",
        "add_prefix_space": true,
        "trim_offsets": false,
        "use_regex": true
      },
      {
        "type": "TemplateProcessing",
        "single": [
          {
            "SpecialToken": {
              "id": "<|begin_of_text|>",
              "type_id": 0
            }
          },
          {
            "Sequence": {
              "id": "A",
              "type_id": 0
            }
          }
        ],
        "pair": [
          {
            "SpecialToken": {
              "id": "<|begin_of_text|>",
              "type_id": 0
            }
          },

It throws the following error:

Unknown PostProcessor type: Sequence

You can find the specific pain point here, at Tokenizers/PostProcessor.swift:39

struct PostProcessorFactory {
    static func fromConfig(config: Config?) -> PostProcessor? {
        guard let config = config else { return nil }
        guard let typeName = config.type?.stringValue else { return nil }
        let type = PostProcessorType(rawValue: typeName)
        switch type {
          case .TemplateProcessing: return TemplateProcessing(config: config)
          case .ByteLevel         : return ByteLevelPostProcessor(config: config)
          case .RobertaProcessing : return RobertaProcessing(config: config)
          default                 : fatalError("Unsupported PostProcessor type: \(typeName)")
        }
    }
}

The original implementation in Rust can be found here: https://github.com/huggingface/tokenizers/blob/25aee8b88c8de3c5a52e2f9cb6281d6df00ad516/tokenizers/src/processors/sequence.rs#L18-L36

Should be something simple and I will look for a solution over the weekend, but maybe it's something you've already found.

You can assign it to me and I will have it ready as soon as possible.

The text was updated successfully, but these errors were encountered:

jovisaib · 2024-09-05T20:44:00Z

Checking hf transformers I have seen that BertProcessing should also be added.

DePasqualeOrg · 2024-09-25T21:51:46Z

This is needed for the Llama 3.2 models that were released today. It looks like this isn't a huge thing, so I'll see if I can port the Rust implementation to Swift.

DePasqualeOrg · 2024-09-25T22:19:01Z

#129

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unknown PostProcessor type: Sequence #122

Unknown PostProcessor type: Sequence #122

jovisaib commented Sep 5, 2024 •

edited

Loading

jovisaib commented Sep 5, 2024

DePasqualeOrg commented Sep 25, 2024 •

edited

Loading

DePasqualeOrg commented Sep 25, 2024

Unknown PostProcessor type: Sequence #122

Unknown PostProcessor type: Sequence #122

Comments

jovisaib commented Sep 5, 2024 • edited Loading

jovisaib commented Sep 5, 2024

DePasqualeOrg commented Sep 25, 2024 • edited Loading

DePasqualeOrg commented Sep 25, 2024

jovisaib commented Sep 5, 2024 •

edited

Loading

DePasqualeOrg commented Sep 25, 2024 •

edited

Loading