carto icon indicating copy to clipboard operation
carto copied to clipboard

Nested regex can cause extremely slow generation times and longer xml

Open smsm1 opened this issue 13 years ago • 13 comments

As a particularly noticeable example is the TileMill example open-streets-dc where converting two sections with the widths to regex, or even just changing a few lines from = to =~ gives significantly different results. https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L119 is an example line to change to get the difference.

I'm using Carto 0.9.4.

smsm1 avatar Dec 13 '12 16:12 smsm1

This is probably unavoidable.

We can make judgments about zoom level filters and other relative filters, like =, < and so on, so we can say that if you have rules that overrule each other we can eliminate lots of possible combinations.

We can't do that with regex filters unless we have some kind of zany meta-regex-evaluator which compares their DFAs and derives new ones that are combinations and exclusions of each other.

tmcw avatar Dec 13 '12 17:12 tmcw

@smsm1 - would you mind pasting a few before/after of the Mapnik XML based on a given change?

springmeyer avatar Dec 13 '12 17:12 springmeyer

Before: http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal.xml After: http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal-1.xml

Changed about line 119 from: [type='motorway'], [type='trunk'] {

To: [type=~'motorway|trunk'] {

smsm1 avatar Dec 17 '12 19:12 smsm1

thank you @smsm1

springmeyer avatar Dec 17 '12 20:12 springmeyer

ugh, the diff is larger than either xml. And many of the differences seem not directly related to the change: http://cl.ly/code/0l1V2S0O0W2n. @tmcw - are you positive this bug is "unavoidable" ?

springmeyer avatar Dec 17 '12 20:12 springmeyer

@tmcw - are you positive this bug is "unavoidable" ?

I'm positive that the regex-makes-more-rules bug is unavoidable unless carto or mapnik changes its approach, but it sounds like you're seeing a different bug than that.

tmcw avatar Dec 17 '12 20:12 tmcw

http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal-2.xml is an even simpler change to (note the extra tilde)

[type=~'motorway'], [type='trunk'] { I wonder if having another syntax for the most common reasons for needing regexes currently may be an alternate solution?

smsm1 avatar Dec 17 '12 21:12 smsm1

It seems like the next step is to try to create a very small testcase (small stylesheet, only one layer...) and to use it to try to tease out if there are multiple bugs.

springmeyer avatar Dec 17 '12 21:12 springmeyer

I've got a case where 4 levels of indentation (levels of filters) with no regex can cause generation of the mapnik xml to jump, even so it makes the document simpler to read with less duplication.

smsm1 avatar Feb 25 '13 11:02 smsm1

@smsm1 can you create & post a testcase for this? the effect of the filter nesting is entirely dependent on the filters themselves.

tmcw avatar Feb 25 '13 23:02 tmcw

Hmm, I've just started using regexes because they're so much more concise:

  [shop=~"bicycle|supermarket|convenience|general|bakery"],
  [amenity=~"cafe|toilets|restaurant|fast_food|shelter"],
  [tourism=~"viewpoint|attraction"]

This bug is a bit alarming. I'd agree with @smsm1's comment that maybe just syntax for the most important use case would be enough? For me, it would definitely be this one - I could very easily live without even wildcard matches, but the ability to concisely express a set of possible things to match against is very valuable.

Maybe even: [shop in 'bakery,convenience,general'] { } ? (Commas are more readable than pipes, but maybe there's some other reason to avoid them...)

stevage avatar May 08 '14 23:05 stevage

So far I'm not seeing problems with a minimal example like

#world {
    [type=~'motorway|trunk'] {
        [zoom=12] { line-width: 1.2 + 2; }
        [zoom=13] { line-width: 2 + 2; }
        [zoom=14] { line-width: 4 + 2; }
        [zoom=15] { line-width: 6 + 2; }
        [zoom=16] { line-width: 9 + 3; }
        [zoom=17] { line-width: 13 + 3; }
        [zoom>17] { line-width: 15 + 3; }
    }
}

Also

#motorways::case[zoom>=6][zoom<=12],
#mainroads::case[zoom>=10][zoom<=12],
#roads::case[zoom>=13][tunnel!=1][bridge!=1],
#tunnels::case[zoom>=13][tunnel=1],
#bridges::case[zoom>=13][bridge=1] {
    line-cap:round;
    line-join:round;
    line-width:0;

    [type=~'motorway|trunk'] {
        [zoom=12] { line-width: 1.2 + 2; }
        [zoom=13] { line-width: 2 + 2; }
        [zoom=14] { line-width: 4 + 2; }
        [zoom=15] { line-width: 6 + 2; }
        [zoom=16] { line-width: 9 + 3; }
        [zoom=17] { line-width: 13 + 3; }
        [zoom>17] { line-width: 15 + 3; }
    }
}

is ok.

But with the open-streets-dc style I see an increase in 3200 lines. Challenge seems to be to find a "minimal" example that is not too minimal.

nebulon42 avatar Jan 20 '17 20:01 nebulon42

Right now I could not confirm that there is an interaction of bugs. Starting with https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L82-L160 and condensing it further seems to bring the number of lines closer together for the regex and non-regex case. Until it reaches an equal number of lines when there is only one sub selector.

Writing #world or https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L82-L86 made no difference. Seems to me like a combinatorial explosion problem.

@tmcw is probably right with what he wrote in https://github.com/mapbox/carto/issues/213#issuecomment-11342862.

nebulon42 avatar Jan 20 '17 22:01 nebulon42