loadgen icon indicating copy to clipboard operation
loadgen copied to clipboard

tpcc: consistency checks are very slow and provide no progress indication

Open petermattis opened this issue 7 years ago • 2 comments

checkConsistency() is currently prohibitively slow for large numbers of warehouses and provides no progress indication. The most egregious check is for W_YTD = sum(H_AMOUNT) for each warehouse:

	var sumHAmount float64
	for i := 0; i < *warehouses; i++ {
		if err := db.QueryRow("SELECT w_ytd FROM warehouse WHERE w_id=$1", i).Scan(&wYTD); err != nil {
			return err
		}
		if err := db.QueryRow("SELECT SUM(h_amount) FROM history WHERE h_w_id=$1", i).Scan(&sumHAmount); err != nil {
			return err
		}
		if wYTD != sumHAmount {
			fmt.Printf("check failed: w_ytd=%f != sum(h_amount)=%f for warehouse %d\n", wYTD, sumHAmount, i)
		}
	}

The main problem here is that the SELECT SUM(h_amount) FROM history WHERE h_w_id=$1 has to do a lookup join when scanning the history_h_w_id_h_d_id_idx in order to retrieve h_amount. We could add a STORING clause to that index. Alternately, we could perform a single group by query: SELECT h_w_id, SUM(h_amount) FROM history GROUP BY h_w_id. Not sure which would be faster. Adding the STORING clause would certainly be easier.

petermattis avatar Mar 02 '18 19:03 petermattis

It's not particularly important to make the checks fast, so I would recommend against adding a STORING clause, which will increase the data size and could change the actual benchmark characteristics.

jordanlewis avatar Mar 02 '18 19:03 jordanlewis

From offline discussion, our checks are also a (random?) subset of the full checks in section 3.3.2 of the specification. I'm going to take a stab at adding the rest of the checks as well, and generally spruce up that side of the codebase.

rjnn avatar Mar 05 '18 16:03 rjnn