tokenizer.pad_token_id and tokenizer.bos_token_id are both equal to 1 (openllmplayground/openalpaca_7b_700bt_preview)openalpaca_7b_700bt_preview) #9

moruga123 · 2023-07-17T19:48:00Z

from transformers import LlamaTokenizer
model_path = r'openllmplayground/openalpaca_7b_700bt_preview'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
print(f"tokenizer.bos_token_id = {tokenizer.bos_token_id}")
print(f"tokenizer.eos_token_id = {tokenizer.eos_token_id}")
print(f"tokenizer.pad_token_id = {tokenizer.pad_token_id}")

Output:
tokenizer.bos_token_id = 1
tokenizer.eos_token_id = 2
tokenizer.pad_token_id = 2

Is this a problem?

I saw someone's code making this modification to the tokenizer, but I don't know why it would be recommended:

tokenizer.pad_token_id = (
0 # unk. we want this to be different from the eos token
)
tokenizer.padding_side = "left"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer.pad_token_id and tokenizer.bos_token_id are both equal to 1 (openllmplayground/openalpaca_7b_700bt_preview)openalpaca_7b_700bt_preview) #9

tokenizer.pad_token_id and tokenizer.bos_token_id are both equal to 1 (openllmplayground/openalpaca_7b_700bt_preview)openalpaca_7b_700bt_preview) #9

moruga123 commented Jul 17, 2023

tokenizer.pad_token_id and tokenizer.bos_token_id are both equal to 1 (openllmplayground/openalpaca_7b_700bt_preview)openalpaca_7b_700bt_preview) #9

tokenizer.pad_token_id and tokenizer.bos_token_id are both equal to 1 (openllmplayground/openalpaca_7b_700bt_preview)openalpaca_7b_700bt_preview) #9

Comments

moruga123 commented Jul 17, 2023