tpcc: consistency checks are very slow and provide no progress indication
checkConsistency() is currently prohibitively slow for large numbers of warehouses and provides no progress indication. The most egregious check is for W_YTD = sum(H_AMOUNT) for each warehouse:
var sumHAmount float64
for i := 0; i < *warehouses; i++ {
if err := db.QueryRow("SELECT w_ytd FROM warehouse WHERE w_id=$1", i).Scan(&wYTD); err != nil {
return err
}
if err := db.QueryRow("SELECT SUM(h_amount) FROM history WHERE h_w_id=$1", i).Scan(&sumHAmount); err != nil {
return err
}
if wYTD != sumHAmount {
fmt.Printf("check failed: w_ytd=%f != sum(h_amount)=%f for warehouse %d\n", wYTD, sumHAmount, i)
}
}
The main problem here is that the SELECT SUM(h_amount) FROM history WHERE h_w_id=$1 has to do a lookup join when scanning the history_h_w_id_h_d_id_idx in order to retrieve h_amount. We could add a STORING clause to that index. Alternately, we could perform a single group by query: SELECT h_w_id, SUM(h_amount) FROM history GROUP BY h_w_id. Not sure which would be faster. Adding the STORING clause would certainly be easier.
It's not particularly important to make the checks fast, so I would recommend against adding a STORING clause, which will increase the data size and could change the actual benchmark characteristics.
From offline discussion, our checks are also a (random?) subset of the full checks in section 3.3.2 of the specification. I'm going to take a stab at adding the rest of the checks as well, and generally spruce up that side of the codebase.