others-how to solve fastchat launch chatglm2 error?

1. Purpose

In this post, I will show you how to solve the following error when trying to run chatglm2-6b model using fastchat:

"/home/bswen/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
2023-11-09 11:12:53 | ERROR | stderr |     return self.tokenizer.n_words
2023-11-09 11:12:53 | ERROR | stderr | AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?



2. Solution

2.1 What is fastchat?

Fast Chat is a development platform for training/deploying and evaluating chatbots based on large language models. Its core features include: Weights/training code and evaluation code for state-of-the-art models (e.g. Vicuna/FastChat-T5) Distributed multi-model based service system with web interface and RESTful API compatible with OpenAI.

2.2 What is ChatGLM2-6B?

The ChatGLM model is a conversational language model open sourced by Tsinghua University that supports Chinese and English bilingual question and answer and is optimized for Chinese. The model is based on the General Language Model (GLM) architecture and has 6.2 billion parameters. Combined with model quantification technology, users can deploy it locally on consumer-grade graphics cards.

2.3 The startup command

I want to start a model chat agent using fastchat as follows:

(fastgpt_env) bswen@server:/home/apps/fastchat/FastChat$ python3 -m fastchat.serve.model_worker --model-path /opt/models/chatglm2-6b --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002"
  • –model-path: speficify the model path of the model
  • –model-names: speficify the protocol to interact with the model

By using the above command, I will get a interactive chat model which I can ask questions and get answers from the model.

2.4 The error

But I got the following error:

(fastgpt_env) bswen@server:/home/apps/fastchat/FastChat$ python3 -m fastchat.serve.model_worker --model-path /opt/models/chatglm2-6b --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002"
2023-11-09 11:12:53 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='/opt/models/chatglm2-6b', revision='main', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=['gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002'], conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False)
2023-11-09 11:12:53 | INFO | model_worker | Loading the model ['gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002'] on worker 4cae7b39 ...
2023-11-09 11:12:53 | ERROR | stderr | Traceback (most recent call last):
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-11-09 11:12:53 | ERROR | stderr |     return _run_code(code, main_globals, None,
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/runpy.py", line 86, in _run_code
2023-11-09 11:12:53 | ERROR | stderr |     exec(code, run_globals)
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/fastchat/FastChat/fastchat/serve/model_worker.py", line 361, in <module>
2023-11-09 11:12:53 | ERROR | stderr |     args, worker = create_model_worker()
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/fastchat/FastChat/fastchat/serve/model_worker.py", line 333, in create_model_worker
2023-11-09 11:12:53 | ERROR | stderr |     worker = ModelWorker(
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/fastchat/FastChat/fastchat/serve/model_worker.py", line 77, in __init__
2023-11-09 11:12:53 | ERROR | stderr |     self.model, self.tokenizer = load_model(
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/fastchat/FastChat/fastchat/model/model_adapter.py", line 312, in load_model
2023-11-09 11:12:53 | ERROR | stderr |     model, tokenizer = adapter.load_model(model_path, kwargs)
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/fastchat/FastChat/fastchat/model/model_adapter.py", line 757, in load_model
2023-11-09 11:12:53 | ERROR | stderr |     tokenizer = AutoTokenizer.from_pretrained(
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 755, in from_pretrained
2023-11-09 11:12:53 | ERROR | stderr |     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
2023-11-09 11:12:53 | ERROR | stderr |     return cls._from_pretrained(
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
2023-11-09 11:12:53 | ERROR | stderr |     tokenizer = cls(*init_inputs, **init_kwargs)
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/bswen/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 69, in __init__
2023-11-09 11:12:53 | ERROR | stderr |     super().__init__(padding_side=padding_side, **kwargs)
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in __init__
2023-11-09 11:12:53 | ERROR | stderr |     self._add_tokens(
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
2023-11-09 11:12:53 | ERROR | stderr |     current_vocab = self.get_vocab().copy()
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/bswen/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 108, in get_vocab
2023-11-09 11:12:53 | ERROR | stderr |     vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
2023-11-09 11:12:53 | ERROR | stderr |   File "/home/bswen/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
2023-11-09 11:12:53 | ERROR | stderr |     return self.tokenizer.n_words
2023-11-09 11:12:53 | ERROR | stderr | AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?
(fastgpt_env) bswen@server:/home/apps/fastchat/FastChat$

The core error is:

"/home/bswen/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
2023-11-09 11:12:53 | ERROR | stderr |     return self.tokenizer.n_words
2023-11-09 11:12:53 | ERROR | stderr | AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?

2.5 The solution

Let’s check the transformers package:

(fastgpt_env) bswen@server:/home/apps/fastchat/FastChat$ pip show transformers
Name: transformers
Version: 4.35.0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /home/apps/miniconda3/envs/fastgpt_env/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft

According to this: https://huggingface.co/THUDM/chatglm2-6b/discussions/87, It seems that we are using a new transformer library which is not compatible with fastchat, so we can reinstall transformers, downgrade the transformers library to 4.33:

pip install --force-reinstall -v "transformers==4.33.0"

then retry fastchat and chatglm2-6b:

(fastgpt_env) bswen@server:/home/apps/fastchat/FastChat$ python3 -m fastchat.serve.model_worker --model-path /opt/models/chatglm2-6b/
2023-11-09 11:27:17 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='/opt/models/chatglm2-6b/', revision='main', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False)
2023-11-09 11:27:17 | INFO | model_worker | Loading the model ['chatglm2-6b'] on worker 7a2a4712 ...
Loading checkpoint shards:   0%|                                                                                                       | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards:  14%|█████████████▌                                                                                 | 1/7 [00:01<00:06,  1.12s/it]
Loading checkpoint shards:  29%|███████████████████████████▏                                                                   | 2/7 [00:02<00:05,  1.17s/it]
Loading checkpoint shards:  43%|████████████████████████████████████████▋                                                      | 3/7 [00:03<00:04,  1.16s/it]
Loading checkpoint shards:  57%|██████████████████████████████████████████████████████▎                                        | 4/7 [00:04<00:03,  1.15s/it]
Loading checkpoint shards:  71%|███████████████████████████████████████████████████████████████████▊                           | 5/7 [00:05<00:02,  1.16s/it]
Loading checkpoint shards:  86%|█████████████████████████████████████████████████████████████████████████████████▍             | 6/7 [00:06<00:01,  1.16s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.01s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.09s/it]
2023-11-09 11:27:25 | ERROR | stderr |
2023-11-09 11:27:27 | INFO | model_worker | Register to controller
2023-11-09 11:27:27 | ERROR | stderr | INFO:     Started server process [86537]
2023-11-09 11:27:27 | ERROR | stderr | INFO:     Waiting for application startup.
2023-11-09 11:27:27 | ERROR | stderr | INFO:     Application startup complete.
2023-11-09 11:27:27 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)

Let’s test the model again:

(fastgpt_env) bswen@server:/home/apps/fastchat/FastChat$ python3 -m fastchat.serve.test_message --model-name chatglm2-6b
Models: ['chatglm2-6b']
worker_addr: http://localhost:21002
问: Tell me a story with more than 1000 words.
答: Once upon a time, in a land far, far away, there lived a young boy named Jack. Jack was a kind and gentle soul, always ready to

it works!

3. Summary

In this post, I demonstrated how to solve the problem when starting chatglm model using fastchat, the key point is to make sure your liraries are compatible with each other . That’s it, thanks for your reading.