'temperature' parameter in LangchainLLMWrapper.generate_text causing issues
Describe the bug
LangchainLLMWrapper has .generate_text() function which further calls .generate_prompt() of the underlying LLM. The LangchainLLMWrapper passes 'temperature' parameter in .generate_prompt() function which causes the following issues,
- temperature parameter is not affecting the response when using HuggingFace LLM
- Some Langchain Extensions like IBM Generative AI doesn't support temperature parameter to be passed in .generate_prompt() function.
Since when initialising an LangChain LLM we can pass the temperature as a parameter, it is not needed to be supplied additionally in LangchainLLMWrapper.
For ex in HuggingFacePipeline, you can specify the temperature when initialization using:
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, temperature=1)
Or when using IBM LLM you can specify the temperature by:
llm = LangChainInterface(
model_id='google/flan-t5-xl',
client=Client(credentials=Credentials.from_env()),
parameters=TextGenerationParameters(
decoding_method=DecodingMethod.SAMPLE,
max_new_tokens=1000,
min_new_tokens=1,
temperature=0.2,
top_k=20,
top_p=1,
random_seed=42,
repetition_penalty = 1.1
)
)
Ragas version: 0.1.1 Python version: 3.10.6
Code to Reproduce The following code explains why 'temperature' parameter not affecting the response in HuggingFaceLLM
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from ragas.llms.base import BaseRagasLLM, LangchainLLMWrapper
from ragas.run_config import RunConfig
from ragas.llms.prompt import PromptValue
model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100)
hf_llm = HuggingFacePipeline(pipeline=pipe)
pv = PromptValue(prompt_str='hi, how are you')
run_config = RunConfig()
ragas_hf_llm = LangchainLLMWrapper(hf_llm, run_config=run_config)
ragas_hf_llm.generate_text(
prompt=pv,
stop=None,
temperature=0
)
# Output
LLMResult(generations=[[Generation(text=' feeling today?!\n\nThe sun was shining, and there was only a small light, but no more than a single drop from the sky. The clouds were white with white fringing, which they looked like a cloud filled with smoke. It covered the place with the smoke, and the man standing before them had a sword, as a symbol of protection. He was only slightly more than a hundred lightyears away from the Sun and the Moon.\n\n"Fuu...what?"\n\n')]], llm_output=None, run=[RunInfo(run_id=UUID('cbd105d1-ab2c-4069-a6f3-20fc9159443e'))])
In the above code I initialised the HuggingFacePipeline with gpt-2 model and wrapped it around ragas LangchainLLMWrapper and i was passing 'temperature=0' when calling .generate_text(), ideally this should generate error because 0 temperature is not accepted in HuggingFace.
You can also check by passing temperature as 99 in .generate_text() and its not raising any exception too for this high value of temperature. Thus its evident that temperature in .generate_text is not affecting the HuggingFace LLM. Also user can sent the temperature in pipeline() function so need to have an additional temperature in .generate_text() function.
The following code explains why passing 'temperature' raises an error in IBM LLM:
from genai import Client, Credentials
from genai.extensions.llama_index import IBMGenAILlamaIndex
from genai.extensions.langchain import LangChainInterface
from genai.extensions.langchain.chat_llm import LangChainChatInterface
from genai.extensions.langchain import LangChainEmbeddingsInterface
from genai.schema import (
DecodingMethod,
TextGenerationParameters,
TextEmbeddingParameters
)
llm = LangChainInterface(
model_id='google/flan-t5-xl',
client=Client(credentials=Credentials.from_env()),
parameters=TextGenerationParameters(
decoding_method=DecodingMethod.SAMPLE,
max_new_tokens=1000,
min_new_tokens=1,
temperature=0.2,
top_k=20,
top_p=1,
random_seed=42,
repetition_penalty = 1.1
)
)
from ragas.llms.base import BaseRagasLLM, LangchainLLMWrapper
from ragas.run_config import RunConfig
from ragas.llms.prompt import PromptValue
pv = PromptValue(prompt_str='hi, how are you')
run_config = RunConfig()
ragas_ibm_llm = LangchainLLMWrapper(llm, run_config=run_config)
ragas_ibm_llm.generate_text(
prompt=pv,
stop=None,
temperature=99
)
# Error Trace
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 ragas_ibm_llm.generate_text(
2 prompt=pv,
3 stop=None,
4 temperature=99
5 )
File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/ragas/llms/base.py:147, in LangchainLLMWrapper.generate_text(self, prompt, n, temperature, stop, callbacks)
139 return self.langchain_llm.generate_prompt(
140 prompts=[prompt],
141 n=n,
(...)
144 callbacks=callbacks,
145 )
146 else:
--> 147 result = self.langchain_llm.generate_prompt(
148 prompts=[prompt] * n,
149 temperature=temperature,
150 stop=stop,
151 callbacks=callbacks,
152 )
153 # make LLMResult.generation appear as if it was n_completions
154 # note that LLMResult.runs is still a list that represents each run
155 generations = [[g[0] for g in result.generations]]
File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:530, in BaseLLM.generate_prompt(self, prompts, stop, callbacks, **kwargs)
522 def generate_prompt(
523 self,
524 prompts: List[PromptValue],
(...)
527 **kwargs: Any,
528 ) -> LLMResult:
529 prompt_strings = [p.to_string() for p in prompts]
--> 530 return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:703, in BaseLLM.generate(self, prompts, stop, callbacks, tags, metadata, run_name, **kwargs)
687 raise ValueError(
688 "Asked to cache, but no cache found at `langchain.cache`."
689 )
690 run_managers = [
691 callback_manager.on_llm_start(
692 dumpd(self),
(...)
701 )
702 ]
--> 703 output = self._generate_helper(
704 prompts, stop, run_managers, bool(new_arg_supported), **kwargs
705 )
706 return output
707 if len(missing_prompts) > 0:
File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:567, in BaseLLM._generate_helper(self, prompts, stop, run_managers, new_arg_supported, **kwargs)
565 for run_manager in run_managers:
566 run_manager.on_llm_error(e, response=LLMResult(generations=[]))
--> 567 raise e
568 flattened_outputs = output.flatten()
569 for manager, flattened_output in zip(run_managers, flattened_outputs):
File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/langchain_core/language_models/llms.py:554, in BaseLLM._generate_helper(self, prompts, stop, run_managers, new_arg_supported, **kwargs)
544 def _generate_helper(
545 self,
546 prompts: List[str],
(...)
550 **kwargs: Any,
551 ) -> LLMResult:
552 try:
553 output = (
--> 554 self._generate(
555 prompts,
556 stop=stop,
557 # TODO: support multiple run managers
558 run_manager=run_managers[0] if run_managers else None,
559 **kwargs,
560 )
561 if new_arg_supported
562 else self._generate(prompts, stop=stop)
563 )
564 except BaseException as e:
565 for run_manager in run_managers:
File /dccstor/kirushikesh/.conda/guardrails/lib/python3.10/site-packages/genai/extensions/langchain/llm.py:190, in LangChainInterface._generate(self, prompts, stop, run_manager, **kwargs)
187 return final_result
188 else:
189 responses = list(
--> 190 self.client.text.generation.create(**self._prepare_request(inputs=prompts, stop=stop, **kwargs))
191 )
192 for response in responses:
193 for result in response.results:
TypeError: GenerationService.create() got an unexpected keyword argument 'temperature'
As the error trace explains that using Langchain wrapped IBM LLM doesn't support 'temperature' as an additional parameter in .generate_prompt() function. The error resolves when i didn't pass temperature parameter. The same error occurs when calling 'evaluate()' function in ragas with the same IBM LLM.
Expected behavior A clear solution to this problem was to remove the temperature parameter in LangchainLLMWrapper
class LangchainLLMWrapper(BaseRagasLLM):
...
def generate_text(
self,
prompt: PromptValue,
n: int = 1,
stop: t.Optional[t.List[str]] = None,
callbacks: t.Optional[Callbacks] = None,
) -> LLMResult:
if is_multiple_completion_supported(self.langchain_llm):
return self.langchain_llm.generate_prompt(
prompts=[prompt],
n=n,
stop=stop,
callbacks=callbacks,
)
else:
result = self.langchain_llm.generate_prompt(
prompts=[prompt] * n,
stop=stop,
callbacks=callbacks,
)
# make LLMResult.generation appear as if it was n_completions
# note that LLMResult.runs is still a list that represents each run
generations = [[g[0] for g in result.generations]]
result.generations = generations
return result
...
Additional context Add any other context about the problem here.
Further I raised a PR to address the issue. #657
+1 getting same error for trying out Google gemini models through langchain-google-genai
@Kirushikesh but removing the temperature arg impacts OpenAI behavior right?
@joy13975, when initialising the OpenAI LLM we are providing the temperature right llm = ChatOpenAI(temperature=0) and temperature in . generate_prompt() is also an optional parameter.
Someone have any update about this bug?
Hey, @RazHadas There are two PRs raised on the same issue. You can check them out or wait till we merge them.
This issue should probably not be closed without merging the fixes. I am facing the same issue using langchain-google-genai.
thanks for bringing it to your attention @LostInCode404 , reopening this
thanks for bringing it to your attention @LostInCode404 , reopening this
Was this issue fixed? I am also getting the same error when I use langchain-google-genai but it works fine for langchain_openai.
Please help!