LMOps [UPRISE]After rereading the paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation,I have some questions.

1.How to get the scores through GPT-Neo-2.7B? 2.In which procedure,the prompt get positive or negative,after get the scores or after encode before score?

Sep 10 '24 08:09 zhouchang123

Q1: How to get the scores through GPT-Neo-2.7B? By calculating the task metric score of each input concatenation of prompt + testing input, see Section 3.2.

Q2: In which procedure, the prompt get positive or negative, after get the scores or after encode before score? After getting the scores. For all the scored prompts for a training example, we label the prompt with the highest score as positive. For negative samples, we randomly sample B training demonstrations from the prompt pool, in addition, we label B demonstrations corresponding to the lowest B scores in the sampled prompts as hard negatives, details are in Section 3.2.

Sep 10 '24 09:09 cdxeve

What about the score through prompt retriever? Is the similarity of the two vectors after encoder? Thanks very much.

Sep 10 '24 09:09 zhouchang123

You may refer to Section 3.4 to see how we get the score after tuning the prompt retriever.

Sep 10 '24 09:09 cdxeve

Section 3.4 introduced the inference part? It is the same in training pipline ?

Sep 11 '24 00:09 zhouchang123

Training is in Section 3.3, you may refer to the provided code as well.

Sep 11 '24 02:09 cdxeve

Section 3.3 only introduce sim(x, p) ,do you mean sim(x, p) is the score ?

Sep 11 '24 06:09 zhouchang123

Yes, sim(x, p) is the score.

Sep 11 '24 08:09 cdxeve

In paper,the positive prompt number is 1 and negative prompt number is 20.But not demonstrate the total number of prompts in one train epoch . What will happen if the prompts not positive or negative? To the prompts not positive or negative,InfoNCE seems not include these prompts.

Sep 11 '24 09:09 zhouchang123

Yes, InfoNCE would not consider the prompts that are neither positive nor negative.

Sep 11 '24 10:09 cdxeve

I found some confusion about the pipline of training and inferencing. In training pipline, the input is include the task name and the query and the metric considerates the task. However when inferencing,the input is only the query without task name. So could add a module that according to the query to clarify its task name,and first filter the task name then retriever? @cdxeve

Oct 23 '24 06:10 zhouchang123

We do not input the task name during training, and the task name in the image is only for ease of understanding. You may refer to the formula in section 3.2 for details.

Oct 23 '24 13:10 cdxeve

I viewed the file prompt_pool.json and each dict is annotated to different task name.So the task name is only to divide to its metric score? The normal state of mind when retrieving is to retriever in the prompts of similar task rather than all the prompts.

Oct 24 '24 04:10 zhouchang123

Q1: Is the task name only used to divide it by metric score?
A1: We keep the task name in the metadata to support many potential uses, but we don’t include it as input during training.

Q2: The normal state of mind when retrieving is to retriever in the prompts of similar task rather than all the prompts. A1: You could try this for a quick test, but I think the diversity will be too constrained since the number of tasks is much smaller than the number of demonstrations in the prompt pool.

Oct 24 '24 10:10 cdxeve