FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Chat Arena version of fastchat-t5-3b-v1.0 provides more refined answers than standard Huggingface model.

Open lix2k3 opened this issue 2 years ago • 26 comments

Hello - I notice that the chat arena version of fastchat-t5-3b-v1.0 provides quite different answers from the model compared to when it is downloaded manually and run using fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0. What is the reason for this difference? Are there any special prompt templates or prompt tuning that gives the chat arena version its better responses?

lix2k3 avatar May 15 '23 14:05 lix2k3

Hi, I also have same observation. The fastchat-t5-3b in Arena too model gives better much better responses compared to when I query the downloaded fastchat-t5-3b model. Please let us know, if there is any tuning happening in the Arena tool which results in better responses.

suraj-gade avatar May 17 '23 13:05 suraj-gade

They use the same prompt and weights. You can verify this by setting the temperature to 0.

If they are not the same, could you share your prompts and results?

merrymercy avatar May 20 '23 14:05 merrymercy

@lix2k3 @suraj-gade Can you share some results? I am also curious what is the most common failure mode for FastChat-T5?

DachengLi1 avatar May 21 '23 19:05 DachengLi1

I am having similar results. I have been unable to find any combination of code that matches the output of chat.lmsys.com. For example:


from transformers import T5Tokenizer, T5ForConditionalGeneration
model_name = 'lmsys/fastchat-t5-3b-v1.0'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
def summarize_text(text):
    input_ids = tokenizer.encode(text, return_tensors='pt', max_length=4096, truncation=True)
    summary_ids = model.generate(input_ids)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

input_text = '''Summarize the following text very concisely:
AIs like GPT-4 go through several different1 types of training. First, they train on giant text corpuses in order to work at all. Later, they go through a process called “reinforcement learning through human feedback” (RLHF) which trains them to be “nice”. RLHF is why they (usually) won’t make up fake answers to your questions, tell you how to make a bomb, or rank all human races from best to worst.

RLHF is hard. The usual method is to make human crowdworkers rate thousands of AI responses as good or bad, then train the AI towards the good answers and away from the bad answers. But having thousands of crowdworkers rate thousands of answers is expensive and time-consuming. And it puts the AI’s ethics in the hands of random crowdworkers. Companies train these crowdworkers in what responses they want, but they’re limited by the crowdworkers’ ability to follow their rules.

In their new preprint Constitutional AI: Harmlessness From AI Feedback, a team at Anthropic (a big AI company) announces a surprising update to this process: what if the AI gives feedback to itself?

Their process goes like this:
The AI answers many questions, some of which are potentially harmful, and generates first draft answers.
The system shows the AI its first draft answer, along with a prompt saying “rewrite this to be more ethical”.
The AI rewrites it to be more ethical.
The system repeats this process until it collects a large dataset of first draft answers, and rewritten more-ethical second-draft answers.
The system trains the AI to write answers that are less like the first drafts, and more like the second drafts.'''

summary = summarize_text(input_text)
print(summary)
'''<pad>  The  text  discusses  the  training  process  of  AIs '''

The output from chat.lmsys.com is: AIs like GPT-4 go through several types of training, including a process called reinforcement learning through human feedback (RLHF) which trains them to be "nice" and refrain from making up fake answers or ranking human races. However, the usual method of RLHF involves crowdworkers rating AI responses, which is expensive, time-consuming, and puts the AI's ethics in the hands of random crowdworkers. A team at Anthropic has announced a new process that trains AIs to give feedback to themselves, resulting in more ethical answers.

cvarrichio avatar May 22 '23 02:05 cvarrichio

This is exactly my experience. Significantly better responses on http://chat.lmsys.com/ when compared to the downloaded huggingface.com model with the same exact prompt. It would be great to understand what the difference is in the models so we can emulate it. Is one fine-tuned differently with a more comprehensive dataset? This was my first impression. Without question, there is a difference.

On Sun, May 21, 2023, 9:02 PM cvarrichio @.***> wrote:

I am having similar results. I have been unable to find any combination of code that matches the output of chat.lmsys.com. For example:

from transformers import T5Tokenizer, T5ForConditionalGenerationmodel_name = 'lmsys/fastchat-t5-3b-v1.0'model = T5ForConditionalGeneration.from_pretrained(model_name)tokenizer = T5Tokenizer.from_pretrained(model_name)def summarize_text(text): input_ids = tokenizer.encode(text, return_tensors='pt', max_length=4096, truncation=True) summary_ids = model.generate(input_ids) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) return summary input_text = '''Summarize the following text very concisely:AIs like GPT-4 go through several different1 types of training. First, they train on giant text corpuses in order to work at all. Later, they go through a process called “reinforcement learning through human feedback” (RLHF) which trains them to be “nice”. RLHF is why they (usually) won’t make up fake answers to your questions, tell you how to make a bomb, or rank all human races from best to worst.RLHF is hard. The usual method is to make human crowdworkers rate thousands of AI responses as good or bad, then train the AI towards the good answers and away from the bad answers. But having thousands of crowdworkers rate thousands of answers is expensive and time-consuming. And it puts the AI’s ethics in the hands of random crowdworkers. Companies train these crowdworkers in what responses they want, but they’re limited by the crowdworkers’ ability to follow their rules.In their new preprint Constitutional AI: Harmlessness From AI Feedback, a team at Anthropic (a big AI company) announces a surprising update to this process: what if the AI gives feedback to itself?Their process goes like this:The AI answers many questions, some of which are potentially harmful, and generates first draft answers.The system shows the AI its first draft answer, along with a prompt saying “rewrite this to be more ethical”.The AI rewrites it to be more ethical.The system repeats this process until it collects a large dataset of first draft answers, and rewritten more-ethical second-draft answers.The system trains the AI to write answers that are less like the first drafts, and more like the second drafts.''' summary = summarize_text(input_text)print(summary)''' The text discusses the training process of AIs '''

The output from chat.lmsys.com is: AIs like GPT-4 go through several types of training, including a process called reinforcement learning through human feedback (RLHF) which trains them to be "nice" and refrain from making up fake answers or ranking human races. However, the usual method of RLHF involves crowdworkers rating AI responses, which is expensive, time-consuming, and puts the AI's ethics in the hands of random crowdworkers. A team at Anthropic has announced a new process that trains AIs to give feedback to themselves, resulting in more ethical answers.

— Reply to this email directly, view it on GitHub https://github.com/lm-sys/FastChat/issues/1264#issuecomment-1556404205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG66DZQIUKXH77EV7C6RIRLXHLCNFANCNFSM6AAAAAAYCI2LFU . You are receiving this because you were mentioned.Message ID: @.***>

lix2k3 avatar May 22 '23 03:05 lix2k3

@cvarrichio your prompt setting is different from what we use. See this script for how to use huggingface api https://github.com/lm-sys/FastChat/blob/621bc89d7ac55c33d1b04d0bb5367902ef0881c5/fastchat/serve/huggingface_api.py#LL31C1

Note that the original poster uses ‘fastchat.serve.cli’, which correctly sets the prompt

merrymercy avatar May 22 '23 05:05 merrymercy

@merrymercy , @DachengLi1 Here are some Prompts and their responses generated through Chat Arena and offline inference.

Prompt 1:

"""
Given the context information below:
---
prescription /  certificate)  
Associate Handbook  
Page 31 of 71  
Confidentiality:  Private   
 o Death in the immediate  family  
o Major natural calamity leading to loss of  property  
 
All confirmed  associates  can set off available  PL balance  against  the notice  period  (calendar  
days) not  served,  after having  necessary  approval  from the authorities.  Also confirmed  
associates  serving complete notice period will be eligible for PL balance encashment during 
the final settlement. All PL adjustments will be calc ulated on the monthly basic  salary.  
 
Advanced privilege leave can be availed maximum upto 7 (Seven) days, only if all other types 
of leave have been adjusted.  
 
• Casual Leave (CL) accumulates on an accrual mode in two (2) equal semi -annual installments in a 
calendar year with effect from January 01, 2021, for tenured associates and as on date for new 
joiners  on prorate  basis;  total entitlement  for this category of  leave  for a year is ten (10). One cannot 
apply for more than 3 (three) days of CL at a time. CL is not encashable and cannot be carried 
forward to the next year. Any unavailed CL within a calendar year will automatically lapse at the 
end of the year. Advance CL cannot be availed. If an ass ociate avails excess of CL from the 
accumulated  CL balance  during  his/her  departure  from the organization,  the equivalent  amount  will 
be adjusted from the full and final settlement of the  associate.  
 
• Covid Medical Leave is a special category of leave introduced to fight COVID cases and is only 
valid till the pandemic lasts. Fifteen (15) calendar days of leave is granted to associates in case 
they or their immediate family members are affected by COVID and is subject to production of 
relevant medical  proofs.  
 
• Half Day (HD) can be availed to attend to exigencies. If an associate is spending less
---
Please answer the question: How many casual leaves am I entitled to?
"""

Response from Chat Arena Fastchat: "Based on the information provided, you are entitled to ten (10) days of Casual Leave (CL) for a calendar year. You cannot apply for more than three (three) days of CL at a time, and any unavailed CL within a calendar year will automatically lapse at the end of the year. If you avail excess of CL from the accumulated CL balance during your departure from the organization, the equivalent amount will be adjusted from the full and final settlement of the associate."

Response from offline Fastchat inference "You are entitled to ten (10) days of casual leaves for one year."

Prompt 2:

"""
Given the context information below:
---

 Conclusion:  Work  from home  approvals  from the Mentors/  Project  Managers  or Practice  Lead  (in 
absence of Project Managers) are mandatory prior to availing the  same.  
 
 
22. Leave  Policy  
 
Objective: The Company believes in maintaining a healthy work life balance for its associates.  
 
 
Applicability: This policy is applicable to all associates who are on the rolls of the company. The 
leave year of the Company will be the same as the calendar year i.e., 1st January - 31st December 
will be considered as for th e purpose of leave calculation.  
 
 
Purpose: The Company recognizes and respects the associate's need for leaves to enjoy a 
substantial break from work or to manage personal exigencies.  
 
Approval of Leave: All leaves have to be applied and approved through the online leave 
management system. Leave will be approved at the discretion of the sanctioning authority based on 
the delivery exigencies of the organization. Associates will be required to plan their long -term leave 
at least a month (30 days) in advance, in consultation with their Immediate Managers both 
administratively / functionally. At a time, maximum of fifteen (15) days of continuous leave is allowed. 
Such leaves have to be applied and approved in the system before the associate avails the l eave.  
 
In case  an associate  avails  urgent/unplanned  leave  to meet  an emergency,  it must  be informed  to 
his/her Mentor / Project Manager through mail or phone. Any leaves availed but not applied in the 
system will be treated as leave without  pay. 
 
Associates availing leave should leave their contact address / phone numbers with their Immediate 
Managers in HRMS.  
 
Leave during the notice period are not encouraged and can be availed only in case of emergencies 
with the consent and approval of the Mento r and Project Manager. Leaves taken otherwise will 
increase the notice period days by the number of leave days availed by the associate
---
Please answer the question: Do I need an approval for my leaves?
"""

Response from Chat Arena Fastchat: "Yes, according to the context information, you need to get approval for your leaves from your supervisor or mentor. The policy states that work from home approvals from the Mentors/Project Managers or Practice Lead (in absence of Project Managers) are mandatory prior to availing the leave."

Response from offline faschat inference: "Yes, I need an approval for my leaves."

There is clear difference between the responses from Chat Arena and offline inference. Though I have used the same prompt in both the cases and also same settings such as temperature= 0.7, top_p = 1.

The answers in both the cases are correct, but Chat Arena version is much more descriptive and refined. Please let me know your thoughts on this.

suraj-gade avatar May 22 '23 13:05 suraj-gade

@suraj-gade Thanks! Can you provide a test script in the offline setting like @cvarrichio did? Did you manually provide prompts through tokenizers?

DachengLi1 avatar May 22 '23 14:05 DachengLi1

I looked at the github and got no insight on how to reformat the prompts so that the output from the off-line model and the fast chat website are similar in quality. They are significantly different for anything above the complexity of a very simple "how are you doing" prompt. I tried the following:

from transformers import AutoTokenizer, T5ForConditionalGeneration
from transformers import (   
    AutoModelForSeq2SeqLM,   
    T5Tokenizer,
)
tokenizer = AutoTokenizer.from_pretrained("lmsys/fastchat-t5-3b-v1.0")
model = AutoModelForSeq2SeqLM.from_pretrained("lmsys/fastchat-t5-3b-v1.0")

def generate_response(input_text):
    input_text = input_text  # Ensure the task is properly formatted
    encoded_input = tokenizer.encode(input_text, return_tensors='pt')
    output = model.generate(encoded_input, max_length=512, temperature=0.7, top_p=1)
    decoded_output = tokenizer.decode(output[0, :], skip_special_tokens=True)
    return decoded_output
#I tried USER ASSISTANT, Human Assistant, various separators (e.g., ###) and the appending the system message before the #prompt
print(generate_response("""
Human: Teachers from alternative teacher certification programs typically receive accelerated training on a variety of core topics in a short amount of time. They pursue internships to fill in the gaps in their knowledge and learn while on the job. Teachers from traditional programs receive standardized instruction and serve as an assistant to a tenured teacher for several years before applying their learnings to the classroom. Describe 3 potential impacts to students and the curriculum due to the differences in how alternative certification and traditional certification prepare future teachers.
Assistant:"""))

Response from offline model (notice that #2 and #3 are repeated verbatim). These kinds of low-quality answers are happening in the off-line model but not on the fast chat website.

<pad> 1.  Accelerated  training  can  lead  to  a  lack  of  depth  in  core  subject  areas,  which  can  impact  the  curriculum  and  student  learning.  Teachers  from  alternative  certification  programs  may  not  have  the  time  to  develop  a  deep  understanding  of  the  subject  matter  and  may  not  have  the  opportunity  to  apply  their  learning  in  a  structured  way.
 2.  Traditional  certification  programs  typically  provide  more  time  for  teachers  to  develop  a  deep  understanding  of  the  subject  matter  and  to  gain  experience  in  the  classroom.  This  can  lead  to  a  more  structured  and  organized  curriculum,  which  can  be  beneficial  for  students  who  may  have  difficulty  keeping  up  with  traditional  curriculum  standards.
 3.  Traditional  certification  programs  typically  provide  more  time  for  teachers  to  develop  a  deep  understanding  of  the  subject  matter  and  to  gain  experience  in  the  classroom.  This  can  lead  to  a  more  structured  and  organized  curriculum,  which  can  be  beneficial  for  students  who  may  have  difficulty  keeping  up  with  traditional  curriculum  standards.

Response from https://chat.lmsys.org/:

1. Accelerated training can lead to a more dynamic and engaging learning environment for students because teachers from alternative certification programs are required to apply their knowledge and skills in a fast-paced and dynamic learning environment. This can lead to a more dynamic and engaging learning experience for students and may result in more student learning and progress over time.
2. Alternative certification programs provide teachers with more hands-on experience, which can lead to more personalized and adaptable instruction for students. This can result in a more personalized and engaging learning experience for students and may lead to more student success and progress.
3. Traditional certification programs provide teachers with more structured and predictable training, which can lead to more predictable and consistent instruction. This can result in more predictable and consistent instruction for students and may lead to more student success and progress.

lix2k3 avatar May 22 '23 18:05 lix2k3

@lix2k3 Thanks for sharing. By system prompt, do you mean "A chat between a friendly ..." something like this in the above mentioned file? For simplicity, can you try the cli instead? That will include all system prompts and format required. The arena version is simply calling functions in cli.

DachengLi1 avatar May 22 '23 18:05 DachengLi1

The same problem is observed with the cli. I notice it is somewhat better than what I got from the offline model in python but still some repetition that the fast chat website doesn't have. CLI response:

1. Student learning: Teachers from alternative programs often have a more rapid learning curve and may not have as much time to develop their instructional strategies. This can lead to less individualized instruction and less personalized attention for students.
2. Curriculum development: Teachers from traditional programs have more time to develop their curriculum and have more time to reflect on their teaching practices. This can lead to more innovative and engaging lessons that are more aligned with the needs of students.
3. Teacher effectiveness: Teachers from traditional programs are typically trained to be effective teachers, but they may not have the same level of experience or individualized instruction as teachers from alternative programs. This can lead to less effective classroom management and less control over student learning.

Yes, I tried the "A chat between a friendly"....at the beginning with the same result. By the way, how should the prompt be structured? Can you please share the full details of that here so we can try it? I could not figure it out from the github code provided earlier and that could partially solve it for us.

lix2k3 avatar May 22 '23 18:05 lix2k3

I also have not been able to find sufficient insight in the code about how to best format my prompt. For some reason there is no T5 prompt base in https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py, but it seems unlikely that any of the prompts there would help. Isn't it strange that a little bit of pre-prompting can wildly affect the quality of the output?

cvarrichio avatar May 22 '23 19:05 cvarrichio

As mentioned by @merrymercy above and in the README, the inference code to use huggingface framework is https://github.com/lm-sys/FastChat/blob/621bc89d7ac55c33d1b04d0bb5367902ef0881c5/fastchat/serve/huggingface_api.py#LL31C1

In particular, line 31 is where you get the default prompt for the T5 model. It is like a one-shot prompt to guide the model.

You can also provide your own one-shot or multi-shot pre-prompts depending on your tasks. In my experiments, it works even better than the generic preset pre-prompt.

It is also not surprising pre-prompting can affect the quality. Prompting is literally a subject of research now, and has been shown to be crucial at times.

ghost avatar May 22 '23 19:05 ghost

@cvarrichio @lix2k3 The answer provided by @jasontian6666 is correct. The cli uses the one-shot prompt for FastChat-T5. It is important to provide good system prompt to guide chatbots to behavior reasonably.

DachengLi1 avatar May 22 '23 19:05 DachengLi1

Thank you Jason. I think that is exactly the code we were provided before that I had trouble translating into practice. Translating it, does it mean the following should be the prompt: """ User: "Some user text, question, or directive" Assistant: """

I appreciate the help, but is there anyone from this amazing project that might be able to provide the prompt in text format rather than referencing back to the code?

On Mon, May 22, 2023 at 2:42 PM Jason.T @.***> wrote:

As mentioned by @merrymercy https://github.com/merrymercy above and in the README, the inference code to use huggingface framework is https://github.com/lm-sys/FastChat/blob/621bc89d7ac55c33d1b04d0bb5367902ef0881c5/fastchat/serve/huggingface_api.py#LL31C1

In particular, line 31 is where you get the default prompt for the T5 model. It is like a one-shot prompt to guide the model.

You can also provide your own one-shot or multi-shot pre-prompts depending on your tasks. In my experiments, it works even better than the generic preset pre-prompt.

It is also not surprising pre-prompting can affect the quality, as it has been shown in multiple instances elsewhere.

— Reply to this email directly, view it on GitHub https://github.com/lm-sys/FastChat/issues/1264#issuecomment-1557842096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG66DZWG34T3XTPYBETRPQDXHO6RRANCNFSM6AAAAAAYCI2LFU . You are receiving this because you were mentioned.Message ID: @.***>

lix2k3 avatar May 22 '23 19:05 lix2k3

@lix2k3 I don't quite follow what you mean by translation.

You can easily print out the prompt by adding print(prompt) or print(conv) to the main function. Is that what you're asking?

ghost avatar May 22 '23 20:05 ghost

Thanks for your response, Jason. I believe everyone who has posted their examples thus far and having trouble is not using the github code provided. Rather, we are manually trying to reconstruct the prompt. So, there is nothing to print(prompt) or print(conv) from. We are not actually running that code.

The question is can anyone from the project or anyone actually running that github code print out the default prompt here in plain text. For example, should it look like the following: """ User: "Some user text, question, or directive" Assistant: """

Otherwise, what should it look like?

On Mon, May 22, 2023 at 3:02 PM Jason.T @.***> wrote:

@lix2k3 https://github.com/lix2k3 I don't quite follow what you mean by translation.

You can easily print out the prompt by adding print(prompt) or print(conv) to the main function. Is that what you're asking?

— Reply to this email directly, view it on GitHub https://github.com/lm-sys/FastChat/issues/1264#issuecomment-1557884807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG66DZWVVE7GERRIGK3W52TXHPA67ANCNFSM6AAAAAAYCI2LFU . You are receiving this because you were mentioned.Message ID: @.***>

lix2k3 avatar May 22 '23 20:05 lix2k3

@lix2k3 If you type in "hi", the whole prompt t5 sees is:

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ### Human: What are the key differences between renewable and non-renewable energy sources? ### Assistant: Renewable energy sources are those that can be replenished naturally in a relatively short amount of time, such as solar, wind, hydro, geothermal, and biomass. Non-renewable energy sources, on the other hand, are finite and will eventually be depleted, such as coal, oil, and natural gas. Here are some key differences between renewable and non-renewable energy sources:

  1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable energy sources are finite and will eventually run out.
  2. Environmental impact: Renewable energy sources have a much lower environmental impact than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, and other negative effects.
  3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically have lower operational costs than non-renewable sources.
  4. Reliability: Renewable energy sources are often more reliable and can be used in more remote locations than non-renewable sources.
  5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different situations and needs, while non-renewable sources are more rigid and inflexible.
  6. Sustainability: Renewable energy sources are more sustainable over the long term, while non-renewable sources are not, and their depletion can lead to economic and social instability. ### Human: hi ### Assistant:

Hope this helps.

DachengLi1 avatar May 22 '23 20:05 DachengLi1

@lix2k3

Your question is not the same as others in this thread.

You raised the question about difference between arena and fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0. You should get exact same results by setting temperature=0, where no sampling is involved. Because they are using the exact same pre-prompt, no matter what that specific prompt is.

If you get different results, please share and we can see if there's a bug.

For @cvarrichio , it's because he/she is using a different prompt than the arena, in his/her code example. And @merrymercy pointed out where the discrepancy comes from.

ghost avatar May 22 '23 20:05 ghost

@lix2k3 If you type in "hi", the whole prompt t5 sees is:

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ### Human: What are the key differences between renewable and non-renewable energy sources? ### Assistant: Renewable energy sources are those that can be replenished naturally in a relatively short amount of time, such as solar, wind, hydro, geothermal, and biomass. Non-renewable energy sources, on the other hand, are finite and will eventually be depleted, such as coal, oil, and natural gas. Here are some key differences between renewable and non-renewable energy sources:

  1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable energy sources are finite and will eventually run out.
  2. Environmental impact: Renewable energy sources have a much lower environmental impact than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, and other negative effects.
  3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically have lower operational costs than non-renewable sources.
  4. Reliability: Renewable energy sources are often more reliable and can be used in more remote locations than non-renewable sources.
  5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different situations and needs, while non-renewable sources are more rigid and inflexible.
  6. Sustainability: Renewable energy sources are more sustainable over the long term, while non-renewable sources are not, and their depletion can lead to economic and social instability.

    Human: hi

    Assistant:

Hope this helps.

Thanks a lot for this! This is very helpful. I will try this now.

lix2k3 avatar May 22 '23 20:05 lix2k3

Thank you all for your help but I can confirm that with the identical prompts now, given what Dacheng Li posted, the responses are very different. Perhaps, they are even more different now between the Hugging face offline model that I'm using in Colab and what the Fast chat website version produces.

lix2k3 avatar May 22 '23 21:05 lix2k3

@suraj-gade Thanks! Can you provide a test script in the offline setting like @cvarrichio did? Did you manually provide prompts through tokenizers?

Hi @DachengLi1, Thanks for the response. Here is the code that I used to query the offline Fastchat model.

from llama_index import GPTListIndex, SimpleDirectoryReader, GPTVectorStoreIndex, PromptHelper, LLMPredictor
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext
from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = 'lmsys/fastchat-t5-3b-v1.0'
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

from transformers import pipeline

pipe = pipeline(
    "text2text-generation", model=model, tokenizer=tokenizer,
    max_length=1024, temperature=0.7, top_p = 1,no_repeat_ngram_size=0, early_stopping=True
)

from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipe)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

# set maximum input size
max_input_size = 2048
# set number of output tokens
num_outputs = 512
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 512
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap)

service_context = ServiceContext.from_defaults(embed_model=embed_model, llm_predictor=LLMPredictor(llm), prompt_helper=prompt_helper, chunk_size_limit=chunk_size_limit)

# build index
documents = SimpleDirectoryReader('data').load_data()

new_index = GPTListIndex.from_documents(documents, service_context=service_context)

# query with embed_model specified
query_engine = new_index.as_query_engine(
    retriever_mode="embedding", 
    verbose=True,
    #streaming=True,
    similarity_top_k=1
    #service_context=service_context
)

response = query_engine.query("Do I need an approval for my leaves?")

print(response)

>>>"Yes,   I   need   an   approval   for   my   leaves."

Here the "data" folder has my full input text in pdf format, and am using the llama_index and langchain pipeline to build the index on that and fetch the relevant chunk to generate the prompt with context and query the FastChat model as shown in the code. I have tried to keep all the settings similar to Chat Arena.

Please have a look and let me know if any gap is there.

suraj-gade avatar May 23 '23 05:05 suraj-gade

@suraj-gade Using cli is the easiest way. Given your use case, you can try the one I posted above. In the query, add those system prompts , i.e. "A chat between... ### Human: Do I need an approval for my leaves? ### Assistant: ". A minor suggestion: when you decode, please use T5Tokenizer, with skip_special_tokens=True, spaces_between_special_tokens=False. This won't affect the content but will delete the extra whitespaces in the output.

DachengLi1 avatar May 23 '23 14:05 DachengLi1

Passing the exact one-shot prompt from FastChat code allows to replicate the behavior, thanks

NeonBohdan avatar May 28 '23 15:05 NeonBohdan

Hi @DachengLi1 Thanks for helping out! Where would you pass

skip_special_tokens=True, spaces_between_special_tokens=False

in @suraj-gade s langchain example?

chirico85 avatar Jun 27 '23 08:06 chirico85

@suraj-gade While trying to create embeddings as you did with embed_model = LangchainEmbedding(HuggingFaceEmbeddings()), I'm getting an error as: RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback): cannot import name 'is_mlu_available' from 'accelerate.utils' Not able to solve it. Can you suggest anything?

heyomy avatar Sep 08 '24 11:09 heyomy