usr
usr copied to clipboard
About Table 1
❓ Questions & Help
I try to implement Table 1. But when calculating the correlation coefficient, I encounter a situation where the output is nan. How do you deal with it?
Question
In your paper, you said that the inter-annotator agreement was computed for each of the questions. For example, there are 60 questions(context) in the topic chat dataset. You compute correlation for every question (context) and then average these correlations. But I found the p-value can't less than 0.01 in each correlation. Or maybe there is some difference between our's implementation, so I hope you can public your code, thanks!
I have solved my questions
import scipy.stats
import numpy as np
def compute_human_correlation(all_human_scores):
num_workers = all_human_scores.shape[-1]
correlation = 0
counter = 0
p_value_list = []
for i in range(num_workers-1):
for j in range(i+1, num_workers):
counter += 1
result, p_value = scipy.stats.pearsonr(all_human_scores[:, i],
all_human_scores[:, j])
p_value_list.append(p_value)
if np.isnan(result):
result = 0
correlation += result
# print('i:{}, j:{}, result:{}'.format(i, j, result))
correlation = correlation/counter
# print('human correlation is {}'.format(tau))
return correlation
if __name__ == '__main__':
path = './tmp/tc_usr_data.json'
import json
data = json.load(open(path, 'r'))
for metric in ['Understandable', 'Natural',
'Maintains Context', 'Engaging',
'Uses Knowledge', 'Overall']:
correlation_list = []
scores_list = []
# compute correlation for all data
for item in data:
responses_list = item['responses']
for response in responses_list:
scores = response[metric]
scores_list.append(scores)
scores_array = np.array(scores_list)
correlation = compute_human_correlation(scores_array)
print(metric, correlation)
print()