enable reading tokenizer from pipeline yaml file

Open faaany opened this issue 2 years ago • 0 comments

Problem I want to load the following pipeline defined in YAML files into Haystack and run a query to get some results. This is my YAML file:

version: 1.19.0rc0
components:
- name: Prompter
  type: PromptNode
  params:
    model_name_or_path: mosaicml/mpt-7b-chat
    model_kwargs:
      task_name: text-generation
      trust_remote_code: True
      torch_dtype: torch.bfloat16
      tokenizer: mosaicml/mpt-7b-chat
  
pipelines:
- name: query
  nodes:
  - name: Prompter
    inputs: [Query]

And this is my code for querying a result:

from haystack import Pipeline 
import torch
from transformers import StoppingCriteriaList, StoppingCriteria

pipe = Pipeline.load_from_yaml(Path("example.yaml"))
prompt_template = """<|im_start|>system\nA conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|>\n{query}"""

result = pipe.run(query='Hello', params={
    "Prompter": {"prompt_template": prompt_template, "generation_kwargs": generate_kwargs},
    } )

print(result)

Since MPT-7B-Chat is a relatively new model and is not currently supported by transformers , this line code returns False, which leads to the following error:

Traceback (most recent call last):
  File "/localdisk/fanli/project/haystack/haystack/pipelines/base.py", line 2150, in _load_or_get_component
    component_instance = BaseComponent._create_instance(
  File "/localdisk/fanli/project/haystack/haystack/nodes/base.py", line 158, in _create_instance
    instance = subclass(**component_params)
  File "/localdisk/fanli/project/haystack/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
    init_func(self, *args, **kwargs)
  File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/prompt_node.py", line 112, in __init__
    self.prompt_model = PromptModel(
  File "/localdisk/fanli/project/haystack/haystack/nodes/base.py", line 46, in wrapper_exportable_to_yaml
    init_func(self, *args, **kwargs)
  File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/prompt_model.py", line 71, in __init__
    self.model_invocation_layer = self.create_invocation_layer(invocation_layer_class=invocation_layer_class)
  File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/prompt_model.py", line 93, in create_invocation_layer
    return invocation_layer(
  File "/localdisk/fanli/project/haystack/haystack/nodes/prompt/invocation_layer/hugging_face.py", line 154, in __init__
    if self.max_length > self.pipe.tokenizer.model_max_length:
AttributeError: 'NoneType' object has no attribute 'model_max_length'

Solution As a solution, we can enable loading the tokenizer in the _prepare_pipeline_kwargs and then pass it to transformers/pipeline as shown below:

def _prepare_pipeline_kwargs(self, **kwargs) -> Dict[str, Any]:
        """
        Sanitizes and prepares the kwargs passed to the transformers pipeline function.
        For more details about pipeline kwargs in general, see Hugging Face
        [documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline).
        """
        # as device and device_map are mutually exclusive, we set device to None if device_map is provided
        device_map = kwargs.get("device_map", None)
        device = kwargs.get("device") if device_map is None else None
        # prepare torch_dtype for pipeline invocation
        torch_dtype = self._extract_torch_dtype(**kwargs)
        # and the model (prefer model instance over model_name_or_path str identifier)
        model = kwargs.get("model") or kwargs.get("model_name_or_path")
        #+++++++++++++++++++++++++++++++++++++++++++
        trust_remote_code = kwargs.get("trust_remote_code", False)
        tokenizer = kwargs.get("tokenizer", None)
        
        if isinstance(tokenizer, str):
            model_config = AutoConfig.from_pretrained(model, trust_remote_code=trust_remote_code)
            load_tokenizer = type(model_config) in TOKENIZER_MAPPING or model_config.tokenizer_class is not None
            if not load_tokenizer:
                tokenizer = AutoTokenizer.from_pretrained(model,trust_remote_code=trust_remote_code)
         #+++++++++++++++++++++++++++++++++++++++++++

        pipeline_kwargs = {
            "task": kwargs.get("task", None),
            "model": model,
            "config": kwargs.get("config", None),
             ###################################################
            "tokenizer": tokenizer,
             ###################################################
            "feature_extractor": kwargs.get("feature_extractor", None),
            "revision": kwargs.get("revision", None),
            "use_auth_token": kwargs.get("use_auth_token", None),
            "device_map": device_map,
            "device": device,
            "torch_dtype": torch_dtype,
             ###################################################
            "trust_remote_code": trust_remote_code,
             ###################################################
            "model_kwargs": kwargs.get("model_kwargs", {}),
            "pipeline_class": kwargs.get("pipeline_class", None),
        }

        return pipeline_kwargs

Alternatively, we can keep a list of currently not supported models, and based on the model name, we decide whether or not to load the tokenizer. But this brings additional work to maintain this list.

Jul 11 '23 00:07 faaany