osm2pgsql icon indicating copy to clipboard operation
osm2pgsql copied to clipboard

Add options to skip geometry indexes and ordering

Open pnorman opened this issue 12 years ago • 8 comments

Currently osm2pgsql always creates geometry indexes and orders the tables by geometry, essentially clustering them. There are cases where this is not desirable.

  1. Benchmarking of other parts.
  2. Analysis and tasks with non-standard queries. You might want geometry indicies but might prefer to cluster by something else, or you might want to query something like the average area of a building poly in France, which would involve a sequential scan and not benefit from geom indices.
  3. Consuming a substantial set of diffs. I believe diff processing is faster without geom indices, and the ordering is more effective if done after updating.
  4. Low space situations where you don't have room to order the tables.

~~5. Cases where you don't plan to consume diffs (e.g. imports with --drop). You can be better off with a non-default FILLFACTOR~~

pnorman avatar May 14 '13 22:05 pnorman

I don't know how to create a pull request(?, maybe someone has a good quick guide how to work w/ git[hub]), but this patch:

https://12oder3.quake.gfz-potsdam.de/Xoo0aik7-Thij2qua/osm2pgsql-skip-table-optimizing.diff

addresses this issue. It works-for-me(tm).

thbeutin avatar Jan 19 '15 16:01 thbeutin

Another variant of index-skipping branch: https://github.com/alex85k/osm2pgsql/tree/skip-index It saves all indexing SQL to the file specified in INDEX_SQL_FILE environment variable if that variable exists.

alex85k avatar Nov 05 '15 15:11 alex85k

May sound a bit corny, but +1 I do like the geo indices though, so for me the best solution would be to only skip the clustering steps

doskabouter avatar Apr 08 '17 14:04 doskabouter

Was going to create a new issue but found this, so I'll +1 it as well. My use case is that I do a bunch of post processing on the data anyway, so I drop the geometry indexes and recreate anyway. This would just save a few hours off my processing time.

jnewmoyer avatar Jul 25 '18 14:07 jnewmoyer

Any progress on this?

andreynovikov avatar Mar 25 '21 12:03 andreynovikov

In the flex output clustering by geometry can now be disabled, but there is no way yet to disable building of the index. This will not be backported to the pgsql output.

joto avatar May 27 '21 09:05 joto

I see that if clustering is disabled, the tables are created as unlogged. Having to set them to logged afterwards (so data isn't lost in case of crash) completely removes the speedup gained by not clustering

doskabouter avatar Jun 08 '21 12:06 doskabouter

@doskabouter Uhh. That's a bug. Thanks for reporting.

joto avatar Jun 08 '21 13:06 joto

It is now possible to not create specific indexes (or create special ones) with the flex output. See https://osm2pgsql.org/doc/manual.html#defining-indexes for details.

joto avatar Dec 07 '22 19:12 joto