Add options to skip geometry indexes and ordering
Currently osm2pgsql always creates geometry indexes and orders the tables by geometry, essentially clustering them. There are cases where this is not desirable.
- Benchmarking of other parts.
- Analysis and tasks with non-standard queries. You might want geometry indicies but might prefer to cluster by something else, or you might want to query something like the average area of a building poly in France, which would involve a sequential scan and not benefit from geom indices.
- Consuming a substantial set of diffs. I believe diff processing is faster without geom indices, and the ordering is more effective if done after updating.
- Low space situations where you don't have room to order the tables.
~~5. Cases where you don't plan to consume diffs (e.g. imports with --drop). You can be better off with a non-default FILLFACTOR~~
I don't know how to create a pull request(?, maybe someone has a good quick guide how to work w/ git[hub]), but this patch:
https://12oder3.quake.gfz-potsdam.de/Xoo0aik7-Thij2qua/osm2pgsql-skip-table-optimizing.diff
addresses this issue. It works-for-me(tm).
Another variant of index-skipping branch: https://github.com/alex85k/osm2pgsql/tree/skip-index It saves all indexing SQL to the file specified in INDEX_SQL_FILE environment variable if that variable exists.
May sound a bit corny, but +1 I do like the geo indices though, so for me the best solution would be to only skip the clustering steps
Was going to create a new issue but found this, so I'll +1 it as well. My use case is that I do a bunch of post processing on the data anyway, so I drop the geometry indexes and recreate anyway. This would just save a few hours off my processing time.
Any progress on this?
In the flex output clustering by geometry can now be disabled, but there is no way yet to disable building of the index. This will not be backported to the pgsql output.
I see that if clustering is disabled, the tables are created as unlogged. Having to set them to logged afterwards (so data isn't lost in case of crash) completely removes the speedup gained by not clustering
@doskabouter Uhh. That's a bug. Thanks for reporting.
It is now possible to not create specific indexes (or create special ones) with the flex output. See https://osm2pgsql.org/doc/manual.html#defining-indexes for details.