FastChat Why logistic regression is equivalent to Bradley-Terry model?

Dear maintainers,

Thank you for your valuable arena. I am currently researching the way of LLMs evaluation and got stack with a question about Bradley-Terry model. As it stands, from multiple sources, BT is obtained through maximizing BT likelihood (as well as in your paper). However inside the code, logistic regression is fitted on some kind of "one-hot" matrix, where +1 is model_a and -1 is model_b, and target is 1 in case model_a wins and 0 if model_b wins. Lets neglect controlling length of answer for simplicity, but I can not understand why this is equivalent to BT model.

Could you please explain this or give me some sources where i could find the derivation?

def compute_elo_mle_with_tie(
    df, SCALE=400, BASE=10, INIT_RATING=1000, sample_weight=None
):
    from sklearn.linear_model import LogisticRegression

    ptbl_a_win = pd.pivot_table(
        df[df["winner"] == "model_a"],
        index="model_a",
        columns="model_b",
        aggfunc="size",
        fill_value=0,
    )
    ptbl_tie = pd.pivot_table(
        df[df["winner"].isin(["tie", "tie (bothbad)"])],
        index="model_a",
        columns="model_b",
        aggfunc="size",
        fill_value=0,
    )
    ptbl_tie = ptbl_tie + ptbl_tie.T
    ptbl_b_win = pd.pivot_table(
        df[df["winner"] == "model_b"],
        index="model_a",
        columns="model_b",
        aggfunc="size",
        fill_value=0,
    )
    ptbl_win = ptbl_a_win * 2 + ptbl_b_win.T * 2 + ptbl_tie

    models = pd.Series(np.arange(len(ptbl_win.index)), index=ptbl_win.index)

    p = len(models)
    X = np.zeros([p * (p - 1) * 2, p])
    Y = np.zeros(p * (p - 1) * 2)

    cur_row = 0
    sample_weights = []
    for m_a in ptbl_win.index:
        for m_b in ptbl_win.columns:
            if m_a == m_b:
                continue
            # if nan skip
            if math.isnan(ptbl_win.loc[m_a, m_b]) or math.isnan(ptbl_win.loc[m_b, m_a]):
                continue
            X[cur_row, models[m_a]] = +math.log(BASE)
            X[cur_row, models[m_b]] = -math.log(BASE)
            Y[cur_row] = 1.0
            sample_weights.append(ptbl_win.loc[m_a, m_b])

            X[cur_row + 1, models[m_a]] = math.log(BASE)
            X[cur_row + 1, models[m_b]] = -math.log(BASE)
            Y[cur_row + 1] = 0.0
            sample_weights.append(ptbl_win.loc[m_b, m_a])
            cur_row += 2
    X = X[:cur_row]
    Y = Y[:cur_row]

    lr = LogisticRegression(fit_intercept=False, penalty=None)
    lr.fit(X, Y, sample_weight=sample_weights)
    elo_scores = SCALE * lr.coef_[0] + INIT_RATING
    if "mixtral-8x7b-instruct-v0.1" in models.index:
        elo_scores += 1114 - elo_scores[models["mixtral-8x7b-instruct-v0.1"]]
    return pd.Series(elo_scores, index=models.index).sort_values(ascending=False)

Aug 30 '24 07:08 VityaVitalich

I asked a related question earlier about the anchor model. Still waiting for response: https://github.com/lm-sys/FastChat/issues/3377

Sep 02 '24 03:09 acylam

@VityaVitalich I wrote a blog post which includes an explanation of this. The idea is that if you do an exponential reparameterization of the Bradley-Terry strength parameters, the probabilities can be expressed as the sigmoid of the difference in ratings. Then if you construct the X matrix such that each row has only two non-zero entries, with a 1 and a -1 and the competitor indices, then when you do the dot product of that row with the parameter vector (the ratings) it acts to just produce the difference between the two selected ratings.

https://www.claytonthorrez.com/blog/posts/fast_llm_ratings/

Oct 20 '24 23:10 cthorrez