others-how to solve rasa train error?

1. Purpose

In this post, I will show you how to solve the following error when using rasa to train a chatbot:

[root@local rasa-test-0608]# ./train.sh
/opt/venv/lib/python3.10/site-packages/rasa/core/tracker_store.py:1048: MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings.  Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
  Base: DeclarativeMeta = declarative_base()
/opt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/opt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/opt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/opt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/opt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel.yaml')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/opt/venv/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py:246: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  np.bool8: (False, True),
The configuration for policies was chosen automatically. It was written into the config file at 'config.yml'.
/opt/venv/lib/python3.10/site-packages/jieba/__init__.py:44: DeprecationWarning: invalid escape sequence '\.'
  re_han_default = re.compile("([\u4E00-\u9FD5a-zA-Z0-9+#&\._%\-]+)", re.U)
/opt/venv/lib/python3.10/site-packages/jieba/__init__.py:46: DeprecationWarning: invalid escape sequence '\s'
  re_skip_default = re.compile("(\r\n|\s)", re.U)
2023-06-14 06:00:53 INFO     rasa.engine.training.hooks  - Restored component 'JiebaTokenizer' from cache.
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.658 seconds.
Prefix dict has been built successfully.
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/opt/venv/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/opt/venv/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/rasa/engine/graph.py", line 394, in _load_component
    self._component: GraphComponent = constructor(  # type: ignore[no-redef]
  File "/opt/venv/lib/python3.10/site-packages/rasa/engine/graph.py", line 221, in load
    return cls.create(config, model_storage, resource, execution_context)
  File "/opt/venv/lib/python3.10/site-packages/rasa/nlu/featurizers/dense_featurizer/lm_featurizer.py", line 100, in create
    return cls(config, execution_context)
  File "/opt/venv/lib/python3.10/site-packages/rasa/nlu/featurizers/dense_featurizer/lm_featurizer.py", line 67, in __init__
    self._load_model_instance()
  File "/opt/venv/lib/python3.10/site-packages/rasa/nlu/featurizers/dense_featurizer/lm_featurizer.py", line 152, in _load_model_instance
    self.tokenizer = model_tokenizer_dict[self.model_name].from_pretrained(
  File "/opt/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1763, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "/opt/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/opt/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1326, in hf_hub_download
    http_get(
  File "/opt/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 505, in http_get
    r = _request_wrapper(
  File "/opt/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 442, in _request_wrapper
    return http_backoff(
  File "/opt/venv/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 129, in http_backoff
    response = requests.request(method=method, url=url, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/venv/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/requests/adapters.py", line 547, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/venv/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/opt/venv/lib/python3.10/site-packages/rasa/__main__.py", line 127, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/opt/venv/lib/python3.10/site-packages/rasa/cli/train.py", line 56, in <lambda>
    train_parser.set_defaults(func=lambda args: run_training(args, can_exit=True))
  File "/opt/venv/lib/python3.10/site-packages/rasa/cli/train.py", line 87, in run_training
    training_result = train_all(
  File "/opt/venv/lib/python3.10/site-packages/rasa/api.py", line 105, in train
    return train(
  File "/opt/venv/lib/python3.10/site-packages/rasa/model_training.py", line 207, in train
    return _train_graph(
  File "/opt/venv/lib/python3.10/site-packages/rasa/model_training.py", line 286, in _train_graph
    trainer.train(
  File "/opt/venv/lib/python3.10/site-packages/rasa/engine/training/graph_trainer.py", line 105, in train
    graph_runner.run(inputs={PLACEHOLDER_IMPORTER: importer})
  File "/opt/venv/lib/python3.10/site-packages/rasa/engine/runner/dask.py", line 101, in run
    dask_result = dask.get(run_graph, run_targets)
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 557, in get_sync
    return get_async(
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 500, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 542, in submit
    fut.set_result(fn(*args, **kwargs))
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 238, in batch_execute_tasks
    return [execute_task(*a) for a in it]
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 238, in <listcomp>
    return [execute_task(*a) for a in it]
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 229, in execute_task
    result = pack_exception(e, dumps)
  File "/opt/venv/lib/python3.10/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/opt/venv/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/venv/lib/python3.10/site-packages/rasa/engine/graph.py", line 474, in __call__
    self._load_component(**constructor_kwargs)
  File "/opt/venv/lib/python3.10/site-packages/rasa/engine/graph.py", line 407, in _load_component
    raise GraphComponentException(
rasa.engine.exceptions.GraphComponentException: Error initializing graph component for node run_LanguageModelFeaturizer1.

The startup command:

 docker run --user 0 --network host -it -v $(pwd):/app rasa/rasa:3.5.10-full train

The config.yml in rasa bot:

# https://rasa.com/docs/rasa/model-configuration/
recipe: default.v1

# The assistant project unique identifier
# This default value must be replaced with a unique assistant name within your deployment
assistant_id: test_assistant

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: zh

pipeline:
  - name: JiebaTokenizer
    dictionary_path: "pipline/jieba_userdict"
  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "bert-base-chinese"



2. Solution

Copy the language mode from local laptop to the server’s cache dir:

Then in ~/.cache/huggingface/hub, you will get this :

[root@local hub]# tree models--bert-base-chinese
models--bert-base-chinese
├── blobs
│   ├── 612acd33db45677c3d6ba70615336619dc65cddf1ecf9d39a22dd1934af4aff2
│   ├── a521dc2845bdddbe822864290c6b928396fc5ee8
│   ├── ca4f9781030019ab9b253c6dcb8c7878b6dc87a5
│   └── e3c6d456fb2616f01a9a6cd01a1be1a36353ed22
├── refs
│   └── main
└── snapshots
    └── 8d2a91f91cc38c96bb8b4556ba70c392f8d5ee55
        ├── config.json -> ../../blobs/a521dc2845bdddbe822864290c6b928396fc5ee8
        ├── tf_model.h5 -> ../../blobs/612acd33db45677c3d6ba70615336619dc65cddf1ecf9d39a22dd1934af4aff2
        ├── tokenizer_config.json -> ../../blobs/e3c6d456fb2616f01a9a6cd01a1be1a36353ed22
        └── vocab.txt -> ../../blobs/ca4f9781030019ab9b253c6dcb8c7878b6dc87a5

4 directories, 9 files

Then run the train again :

Downloading ()lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████| 624/624 [00:00<00:00, 433kB/s]
Downloading tf_model.h5: 100%|█████████████████████████████████████████████████████████████████████████████| 478M/478M [02:39<00:00, 2.99MB/s]
Some layers from the model checkpoint at bert-base-chinese were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-chinese.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
2023-06-14 06:11:16 INFO     rasa.engine.training.hooks  - Starting to train component 'RegexFeaturizer'.
2023-06-14 06:11:16 INFO     rasa.engine.training.hooks  - Finished training component 'RegexFeaturizer'.
2023-06-14 06:11:16 INFO     rasa.engine.training.hooks  - Starting to train component 'DIETClassifier'.
Epochs:   0%|                                                                                                         | 0/300 [00:00<?, ?it/s]
Epochs: 100%|████████████████████████████████████████████████████████████████| 300/300 [01:38<00:00,  3.05it/s, t_loss=0.376, i_acc=1, e_f1=1]
2023-06-14 06:12:54 INFO     rasa.engine.training.hooks  - Finished training component 'DIETClassifier'.
2023-06-14 06:12:54 INFO     rasa.engine.training.hooks  - Starting to train component 'EntitySynonymMapper'.
2023-06-14 06:12:54 INFO     rasa.engine.training.hooks  - Finished training component 'EntitySynonymMapper'.
2023-06-14 06:12:54 INFO     rasa.engine.training.hooks  - Starting to train component 'ResponseSelector'.
/opt/venv/lib/python3.10/site-packages/rasa/utils/train_utils.py:528: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
  rasa.shared.utils.io.raise_warning(
2023-06-14 06:12:55 INFO     rasa.nlu.selectors.response_selector  - Retrieval intent parameter was left to its default value. This response selector will be trained on training examples combining all retrieval intents.
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Finished training component 'ResponseSelector'.
Processed rules: 100%|██████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 1462.52it/s, # trackers=1]
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Starting to train component 'MemoizationPolicy'.
Processed trackers: 0it [00:00, ?it/s]
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Finished training component 'MemoizationPolicy'.
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Starting to train component 'RulePolicy'.
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 959.73it/s, # action=19]
Processed actions: 19it [00:00, 10969.27it/s, # examples=17]
Processed trackers: 0it [00:00, ?it/s]
Processed trackers: 100%|██████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 358.25it/s]
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 1905.88it/s]
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Finished training component 'RulePolicy'.
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Starting to train component 'TEDPolicy'.
Processed trackers: 0it [00:00, ?it/s]
/opt/venv/lib/python3.10/site-packages/rasa/core/policies/ted_policy.py:723: UserWarning: Skipping training of `TEDPolicy` as no data was provided. You can exclude this policy in the configuration file to avoid this warning.
  rasa.shared.utils.io.raise_warning(
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Finished training component 'TEDPolicy'.
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Starting to train component 'UnexpecTEDIntentPolicy'.
2023-06-14 06:12:55 WARNING  rasa.shared.utils.common  - The UnexpecTED Intent Policy is currently experimental and might change or be removed in the future 🔬 Please share your feedback on it in the forum (https://forum.rasa.com) to help us make this feature ready for production.
Processed trackers: 0it [00:00, ?it/s]
/opt/venv/lib/python3.10/site-packages/rasa/core/policies/ted_policy.py:723: UserWarning: Skipping training of `UnexpecTEDIntentPolicy` as no data was provided. You can exclude this policy in the configuration file to avoid this warning.
  rasa.shared.utils.io.raise_warning(
2023-06-14 06:12:55 INFO     rasa.engine.training.hooks  - Finished training component 'UnexpecTEDIntentPolicy'.
Your Rasa model is trained and saved at 'models/20230614-060720-threaded-quadtree.tar.gz'.

You can see that the model is downloaded successfully on the server.

3. Summary

In this post, I demonstrated how to solve the rasa train error, the key solution is to download the right model in your machine. That’s it, thanks for your reading.