osm2pgsql icon indicating copy to clipboard operation
osm2pgsql copied to clipboard

Suggestion: middle-pgsql stats for hit/miss

Open tiedotguy opened this issue 5 years ago • 1 comments

If any node in a way is not in the cache, then the cost of the local_nodes_get_list becomes the cost of a database access.

The difference between a db access for 1 node vs 10 nodes is low, but 0 nodes vs 1 nodes is large. Effectively this means a 90% hit rate is a 0% hit rate, making the stats less meaningful.

As an alternative way of looking at this, I suggest having middle_pgsql_t keep track of "entire lookup satisfied by cache" vs "entire lookup not satisfied by cache", as I believe it's more meaningful. This wouldn't change anything with the current stats.

Thoughts?

tiedotguy avatar Nov 15 '20 01:11 tiedotguy

As a practical difference, with some hacked up code, I came up with the following when importing New South Wales, with various cache sizes:

  • 512MB: 105 seconds, 99.97% cache, 238 db hits, 1492504 avoids, 99.98%
  • 256MB: 131 seconds, 95.24% cache, 96358 db hits, 1396384 avoids, 93.54%
  • 128MB: 327 seconds, 52.91% cache, 896566 db hits, 596176 avoids, 60.06%
  • no cache: 399 seconds, 0% cache, 1492742 db hits, 0 avoids, 0%

(99.97% is as high as it can go, because there are referenced nodes which don't exist in the extract)

tiedotguy avatar Nov 15 '20 01:11 tiedotguy