ragas 'temperature' parameter in LangchainLLMWrapper.generate

Describe the bug

LangchainLLMWrapper has .generate_text() function which further calls .generate_prompt() of the underlying LLM. The LangchainLLMWrapper passes 'temperature' parameter in .generate_prompt() function which causes the following issues,

temperature parameter is not affecting the response when using HuggingFace LLM
Some Langchain Extensions like IBM Generative AI doesn't support temperature parameter to be passed in .generate_prompt() function.

Since when initialising an LangChain LLM we can pass the temperature as a parameter, it is not needed to be supplied additionally in LangchainLLMWrapper.

For ex in HuggingFacePipeline, you can specify the temperature when initialization using: pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, temperature=1)

Or when using IBM LLM you can specify the temperature by:

llm = LangChainInterface(
        model_id='google/flan-t5-xl',
        client=Client(credentials=Credentials.from_env()),
        parameters=TextGenerationParameters(
                  decoding_method=DecodingMethod.SAMPLE,
                  max_new_tokens=1000,
                  min_new_tokens=1,
                  temperature=0.2,
                  top_k=20,
                  top_p=1,
                  random_seed=42,
                  repetition_penalty = 1.1
              )
)

Ragas version: 0.1.1 Python version: 3.10.6

Code to Reproduce The following code explains why 'temperature' parameter not affecting the response in HuggingFaceLLM

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

from ragas.llms.base import BaseRagasLLM, LangchainLLMWrapper
from ragas.run_config import RunConfig
from ragas.llms.prompt import PromptValue

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100)
hf_llm = HuggingFacePipeline(pipeline=pipe)

pv = PromptValue(prompt_str='hi, how are you')
run_config = RunConfig()
ragas_hf_llm = LangchainLLMWrapper(hf_llm, run_config=run_config)
ragas_hf_llm.generate_text(
    prompt=pv,
    stop=None,
    temperature=0
)

# Output
LLMResult(generations=[[Generation(text=' feeling today?!\n\nThe sun was shining, and there was only a small light, but no more than a single drop from the sky. The clouds were white with white fringing, which they looked like a cloud filled with smoke. It covered the place with the smoke, and the man standing before them had a sword, as a symbol of protection. He was only slightly more than a hundred lightyears away from the Sun and the Moon.\n\n"Fuu...what?"\n\n')]], llm_output=None, run=[RunInfo(run_id=UUID('cbd105d1-ab2c-4069-a6f3-20fc9159443e'))])

In the above code I initialised the HuggingFacePipeline with gpt-2 model and wrapped it around ragas LangchainLLMWrapper and i was passing 'temperature=0' when calling .generate_text(), ideally this should generate error because 0 temperature is not accepted in HuggingFace.

You can also check by passing temperature as 99 in .generate_text() and its not raising any exception too for this high value of temperature. Thus its evident that temperature in .generate_text is not affecting the HuggingFace LLM. Also user can sent the temperature in pipeline() function so need to have an additional temperature in .generate_text() function.

The following code explains why passing 'temperature' raises an error in IBM LLM:

from genai import Client, Credentials
from genai.extensions.llama_index import IBMGenAILlamaIndex

from genai.extensions.langchain import LangChainInterface
from genai.extensions.langchain.chat_llm import LangChainChatInterface
from genai.extensions.langchain import LangChainEmbeddingsInterface
from genai.schema import (
    DecodingMethod,
    TextGenerationParameters,
    TextEmbeddingParameters
)

llm = LangChainInterface(
        model_id='google/flan-t5-xl',
        client=Client(credentials=Credentials.from_env()),
        parameters=TextGenerationParameters(
                  decoding_method=DecodingMethod.SAMPLE,
                  max_new_tokens=1000,
                  min_new_tokens=1,
                  temperature=0.2,
                  top_k=20,
                  top_p=1,
                  random_seed=42,
                  repetition_penalty = 1.1
              )
)

from ragas.llms.base import BaseRagasLLM, LangchainLLMWrapper
from ragas.run_config import RunConfig
from ragas.llms.prompt import PromptValue

pv = PromptValue(prompt_str='hi, how are you')
run_config = RunConfig()
ragas_ibm_llm = LangchainLLMWrapper(llm, run_config=run_config)
ragas_ibm_llm.generate_text(
    prompt=pv,
    stop=None,
    temperature=99
)

# Error Trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 ragas_ibm_llm.generate_text(
      2     prompt=pv,
      3     stop=None,
      4     temperature=99
      5 )

File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/ragas/llms/base.py:147, in LangchainLLMWrapper.generate_text(self, prompt, n, temperature, stop, callbacks)
    139     return self.langchain_llm.generate_prompt(
    140         prompts=[prompt],
    141         n=n,
   (...)
    144         callbacks=callbacks,
    145     )
    146 else:
--> 147     result = self.langchain_llm.generate_prompt(
    148         prompts=[prompt] * n,
    149         temperature=temperature,
    150         stop=stop,
    151         callbacks=callbacks,
    152     )
    153     # make LLMResult.generation appear as if it was n_completions
    154     # note that LLMResult.runs is still a list that represents each run
    155     generations = [[g[0] for g in result.generations]]

File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:530, in BaseLLM.generate_prompt(self, prompts, stop, callbacks, **kwargs)
    522 def generate_prompt(
    523     self,
    524     prompts: List[PromptValue],
   (...)
    527     **kwargs: Any,
    528 ) -> LLMResult:
    529     prompt_strings = [p.to_string() for p in prompts]
--> 530     return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)

File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:703, in BaseLLM.generate(self, prompts, stop, callbacks, tags, metadata, run_name, **kwargs)
    687         raise ValueError(
    688             "Asked to cache, but no cache found at `langchain.cache`."
    689         )
    690     run_managers = [
    691         callback_manager.on_llm_start(
    692             dumpd(self),
   (...)
    701         )
    702     ]
--> 703     output = self._generate_helper(
    704         prompts, stop, run_managers, bool(new_arg_supported), **kwargs
    705     )
    706     return output
    707 if len(missing_prompts) > 0:

File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:567, in BaseLLM._generate_helper(self, prompts, stop, run_managers, new_arg_supported, **kwargs)
    565     for run_manager in run_managers:
    566         run_manager.on_llm_error(e, response=LLMResult(generations=[]))
--> 567     raise e
    568 flattened_outputs = output.flatten()
    569 for manager, flattened_output in zip(run_managers, flattened_outputs):

File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:554, in BaseLLM._generate_helper(self, prompts, stop, run_managers, new_arg_supported, **kwargs)
    544 def _generate_helper(
    545     self,
    546     prompts: List[str],
   (...)
    550     **kwargs: Any,
    551 ) -> LLMResult:
    552     try:
    553         output = (
--> 554             self._generate(
    555                 prompts,
    556                 stop=stop,
    557                 # TODO: support multiple run managers
    558                 run_manager=run_managers[0] if run_managers else None,
    559                 **kwargs,
    560             )
    561             if new_arg_supported
    562             else self._generate(prompts, stop=stop)
    563         )
    564     except BaseException as e:
    565         for run_manager in run_managers:

File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/genai/extensions/langchain/llm.py:190, in LangChainInterface._generate(self, prompts, stop, run_manager, **kwargs)
    187     return final_result
    188 else:
    189     responses = list(
--> 190         self.client.text.generation.create(**self._prepare_request(inputs=prompts, stop=stop, **kwargs))
    191     )
    192     for response in responses:
    193         for result in response.results:

TypeError: GenerationService.create() got an unexpected keyword argument 'temperature'

As the error trace explains that using Langchain wrapped IBM LLM doesn't support 'temperature' as an additional parameter in .generate_prompt() function. The error resolves when i didn't pass temperature parameter. The same error occurs when calling 'evaluate()' function in ragas with the same IBM LLM.

Expected behavior A clear solution to this problem was to remove the temperature parameter in LangchainLLMWrapper

class LangchainLLMWrapper(BaseRagasLLM):
    ...
    def generate_text(
        self,
        prompt: PromptValue,
        n: int = 1,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        if is_multiple_completion_supported(self.langchain_llm):
            return self.langchain_llm.generate_prompt(
                prompts=[prompt],
                n=n,
                stop=stop,
                callbacks=callbacks,
            )
        else:
            result = self.langchain_llm.generate_prompt(
                prompts=[prompt] * n,
                stop=stop,
                callbacks=callbacks,
            )
            # make LLMResult.generation appear as if it was n_completions
            # note that LLMResult.runs is still a list that represents each run
            generations = [[g[0] for g in result.generations]]
            result.generations = generations
            return result
    ...

Additional context Add any other context about the problem here.

Feb 24 '24 15:02 Kirushikesh

Further I raised a PR to address the issue. #657

Feb 24 '24 15:02 Kirushikesh

+1 getting same error for trying out Google gemini models through langchain-google-genai

@Kirushikesh but removing the temperature arg impacts OpenAI behavior right?

Feb 28 '24 15:02 joy13975

@joy13975, when initialising the OpenAI LLM we are providing the temperature right llm = ChatOpenAI(temperature=0) and temperature in . generate_prompt() is also an optional parameter.

Feb 28 '24 15:02 Kirushikesh

Someone have any update about this bug?