haystack
haystack copied to clipboard
enable reading tokenizer from pipeline yaml file
Problem I want to load the following pipeline defined in YAML files into Haystack and run a query to get some results. This is my YAML file:
version: 1.19.0rc0
components:
- name: Prompter
type: PromptNode
params:
model_name_or_path: mosaicml/mpt-7b-chat
model_kwargs:
task_name: text-generation
trust_remote_code: True
torch_dtype: torch.bfloat16
tokenizer: mosaicml/mpt-7b-chat
pipelines:
- name: query
nodes:
- name: Prompter
inputs: [Query]
And this is my code for querying a result:
from haystack import Pipeline
import torch
from transformers import StoppingCriteriaList, StoppingCriteria
pipe = Pipeline.load_from_yaml(Path("example.yaml"))
prompt_template = """<|im_start|>system\nA conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|>\n{query}"""
result = pipe.run(query='Hello', params={
"Prompter": {"prompt_template": prompt_template, "generation_kwargs": generate_kwargs},
} )
print(result)
Since MPT-7B-Chat is a relatively new model and is not currently supported by transformers , this line code returns False, which leads to the following error:
Traceback (most recent call last):
File "/localdisk/fanli/project/haystack/haystack/pipelines/base.py", line 2150, in _load_or_get_component
component_instance = BaseComponent._create_instance(
File "/localdisk/fanli/project/haystack/haystack/nodes/base.py", line 158, in _create_instance
instance = subclass(**component_params)
File "/localdisk/fanli/project/haystack/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/prompt_node.py", line 112, in __init__
self.prompt_model = PromptModel(
File "/localdisk/fanli/project/haystack/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/prompt_model.py", line 71, in __init__
self.model_invocation_layer = self.create_invocation_layer(invocation_layer_class=invocation_layer_class)
File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/prompt_model.py", line 93, in create_invocation_layer
return invocation_layer(
File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/invocation_layer/hugging_face.py", line 154, in __init__
if self.max_length > self.pipe.tokenizer.model_max_length:
AttributeError: 'NoneType' object has no attribute 'model_max_length'
Solution
As a solution, we can enable loading the tokenizer in the _prepare_pipeline_kwargs and then pass it to transformers/pipeline as shown below:
def _prepare_pipeline_kwargs(self, **kwargs) -> Dict[str, Any]:
"""
Sanitizes and prepares the kwargs passed to the transformers pipeline function.
For more details about pipeline kwargs in general, see Hugging Face
[documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline).
"""
# as device and device_map are mutually exclusive, we set device to None if device_map is provided
device_map = kwargs.get("device_map", None)
device = kwargs.get("device") if device_map is None else None
# prepare torch_dtype for pipeline invocation
torch_dtype = self._extract_torch_dtype(**kwargs)
# and the model (prefer model instance over model_name_or_path str identifier)
model = kwargs.get("model") or kwargs.get("model_name_or_path")
#+++++++++++++++++++++++++++++++++++++++++++
trust_remote_code = kwargs.get("trust_remote_code", False)
tokenizer = kwargs.get("tokenizer", None)
if isinstance(tokenizer, str):
model_config = AutoConfig.from_pretrained(model, trust_remote_code=trust_remote_code)
load_tokenizer = type(model_config) in TOKENIZER_MAPPING or model_config.tokenizer_class is not None
if not load_tokenizer:
tokenizer = AutoTokenizer.from_pretrained(model,trust_remote_code=trust_remote_code)
#+++++++++++++++++++++++++++++++++++++++++++
pipeline_kwargs = {
"task": kwargs.get("task", None),
"model": model,
"config": kwargs.get("config", None),
###################################################
"tokenizer": tokenizer,
###################################################
"feature_extractor": kwargs.get("feature_extractor", None),
"revision": kwargs.get("revision", None),
"use_auth_token": kwargs.get("use_auth_token", None),
"device_map": device_map,
"device": device,
"torch_dtype": torch_dtype,
###################################################
"trust_remote_code": trust_remote_code,
###################################################
"model_kwargs": kwargs.get("model_kwargs", {}),
"pipeline_class": kwargs.get("pipeline_class", None),
}
return pipeline_kwargs
Alternatively, we can keep a list of currently not supported models, and based on the model name, we decide whether or not to load the tokenizer. But this brings additional work to maintain this list.