confused about the usage of BLOCK_FLAG
I'm trying to read the code. The 'BLOCK_FLAG' really confused me, I think it is used for calculating the length of prompt, but I'm not sure if values in 'BLOCK_FLAG' are right. Take BoolQPVP as example, I think other than 'passage', 'question', 'self.mask', which are not part of template, the length of rest words in PATTERN should be count. So I think BLOCK_FLAG should be
BLOCK_FLAG = [0, 1, 1, 1, 0, 1, 0 ,1]
insdead of
BLOCK_FLAG = [0, 0, 1, 0, 0, 0, 0, 0]
Is there something I misunderstood?
class BoolQPVP(PVP):
VERBALIZER = {
"False": ["No"],
"True": ["Yes"]
}
"""
VERBALIZER_B = {
"False": ["false"],
"True": ["true"]
}
"""
PATTERN = ['passage', '.', 'the', ' Question: ',
'question', '? Answer: ', 'self.mask', '.']
BLOCK_FLAG = [0, 0, 1, 0, 0, 0, 0, 0]
def get_parts(self, example: InputExample) -> FilledPattern:
passage = self.shortenable(example.text_a)
question = self.shortenable(example.text_b)
# searched patterns in fully-supervised learning
# string_list_a = [passage, '.', 'the', 'Question:', question, '?', 'the', 'Answer:', self.mask]
# string_list_a = [passage, '.', 'the', question, '?', 'the', self.mask]
# string_list_a = [passage, 'the', question, '?', 'the', self.mask]
# few-shot
if self.pattern_id == 1:
string_list_a = [passage, '.', 'the', ' Question: ',
question, '? Answer: ', self.mask, '.']
string_list_b = []
block_flag_a = self.BLOCK_FLAG
block_flag_b = []
assert len(string_list_a) == len(block_flag_a)
assert len(string_list_b) == len(block_flag_b)
return string_list_a, string_list_b, block_flag_a, block_flag_b
else:
raise ValueError("unknown pattern_id.")
Thanks for your attention! Here the block_flag==1 stands for replaceable tokens and 0 stands for unchanged tokens. In the method of our paper together with the P-Tuning method integrated in our codes, we replace the '1' tokens with special token representations and keep other parts of the input unchanged.