Densifier runs expensive buffer operation
The GeoTools rendering code uses the densifier to add extra points in long lines, to prepare them for reprojection, which could generate curves out of the original straight lines. We noticed that when dealing with large geometries (e.g., Natural Earth 10m land, containing geometries with tens to hundred of thousands of points) it takes a lot of time, and uses much memory.
The culprit seems to be the buffer(0) call here: https://github.com/locationtech/jts/blob/master/modules/core/src/main/java/org/locationtech/jts/densify/Densifier.java#L156
We don't really need to make invalid inputs valid, we'd be content to have an invalid output when the input is invalid.
Also, the algorithm is generating lots of Coordinate objects, while the input was an optimized coordinate sequence, would be nice if the code was creating another of the same type:
- No un-packing of the original coordinate sequence into Coordinate[]
- Creation of an output coordinate sequence of the same type (this might be difficult, one has to foresee exactly how many points the output will have, it would require two passes over the input I guess...)
As usual, timing will likely mean we'll create our own rendering-optimized variant of the densifier, but wanted to at least let you know about the issue.
Thanks for pointing this out. Yes, buffer is an expensive operation (and is also a bit of a hack, but is safe enough here.
Sounds like it would be nice to add an option to turn validation off, if not needed. That is simple enough to do with a flag.
This is one of the (many) JTS classes which has not be modernized to use the CoordinateSequence API properly. Although as you point out in this case it's a bit awkward to do so. Two passes would work, I think. Although I wonder if it would be better to use an intermediate "extendable" CoordinateSequence, and then convert to the required one for output? I think there might be a PR around for such a thing.
Anyway, seems perfectly reasonable for GeoTools to roll its own code, since this is a fairly simple algorithm, and it then can introduce whatever optimizations or refinements that are needed.
Although I wonder if it would be better to use an intermediate "extendable" CoordinateSequence
See PR #271
Just to be clear, the use of buffer(0) is not to fix invalid input. It is used to ensure that densified geometry is valid. A densified polygonal might become invalid, since adding vertices may move line segment slightly due to numerical precision limitations.
This should be a very rare situation, however. And if the densified geometry is only used for rendering purposes, then a mild invalidity seems like it should not cause any problems? So the addition of a flag to disable polygon topology checking/correction still seems like a good solution.
Partially fixed by #595