Ability to have pre-processing script for osm2pgsql-replication
What version of osm2pgsql are you using?
osm2pgsql version 1.6.0
Build: None
Compiled using the following library versions:
Libosmium 2.18.0
Proj [API 6] 9.0.1
Lua 5.3.6
What operating system and PostgreSQL/PostGIS version are you using?
Debian testing
What did you do exactly?
Scenario: I want to be able to trim .osc file inside osm2pgsql-replication after it is collected from replication server, but before it is send to osm2pgsql for processing. One use case is calling trim_osc, another could be doing diff of .osc and postgis (as this is only time to do proper diff with current data, before .osc data enters database).
As far as I know, there is no way to stop processing between these two events (getting .osc from repl server and sending to osm2pgsql).
I was thinking it could follow same pattern as post-processing logic. Another --pre-processing switch and even same arguments (seq and timestamp). As discussed at #1719 , if one needs to get path to .osc, one can use provide --diff-file argument to osm2pgsql-replication script.
Only question (besides do you want to support this) is do we want to .osc file to be edited in-place, or we want some more sophisticated algorithm (for example - pre-processing script do not edit file in-place, but save it somewhere differently and return output path in stdout, or just stream new .osc to stdout, and we capture it...). IMHO, easiest would be in-place editing, but I am open for suggestion.
I am also volunteering to implement this logic (if we agree it can be useful).
Does https://switch2osm.org/serving-tiles/updating-as-people-edit-pyosmium/ help? That was what I ended up doing following a suggestion by lonvia when I asked pretty much the same question. With osm2pgsql-replication, you'll often not need to call "trim" as you can likely get a feed of the same area you loaded.
Yes, it can do the job, but my issue focuses more about using osm2pgsql-replication specifically. I think it is great piece of software and I would like to see one-stop solution for these use cases. And switch2osm could be simplified if this is implemented, I think:) As it is written on yours link:
A simpler, but less flexible, method to update a database is to use “osm2pgsql-replication”,
(emphasis mine)
I think it can be both simple and flexible, and this issue is about that.
This is outside the scope of the osm2pgsql-replication script. You should look into the underlying replication library of pyosmium and build your own custom python scripts.