antlr4 Performance issue with Golang target

Hi,

We are running into a performance issue with the Golang target in which parsing takes exponentially longer the more terms are present in the query (the Java target returns immediately). A somewhat minimal example is available here. It includes the grammar files, the JAR that generated the Golang code, as well as CPU/memory benchmarks for the query in v4parser/minimal_test.go.

Does this appear to be an issue in ANTLR's Golang code generation? Is there anything we can do within the grammar itself to work around the issue?

Thanks! Jason

Aug 24 '20 19:08 jlj5aj

Hi If a Java or C# parser generated from the same grammar runs ‘instantly’ then it could be a bug specific to the go target. Can you put together a genuine benchmark?

Envoyé de mon iPhone

Le 24 août 2020 à 21:26, Jason Jordan [email protected] a écrit :

Hi,

We are running into a performance issue with the Golang target in which parsing takes exponentially longer the more terms are present in the query (the Java target returns immediately). A somewhat minimal example is available here. It includes the grammar files, the JAR that generated the Golang code, as well as CPU/memory benchmarks for the query in v4parser/minimal_test.go.

Does this appear to be an issue in ANTLR's Golang code generation? Is there anything we can do within the grammar itself to work around the issue?

Thanks! Jason

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Aug 24 '20 19:08 ericvergnaud

There is a repo with code/benchmarks linked above. Do you need something more than that?

Aug 24 '20 19:08 jlj5aj

We investigated issue a bit and figured out that underlying data structures are significantly slow.

For example, this line https://github.com/antlr/antlr4/blob/e73f72be7355aded374fa6b18fccb4569b449856/runtime/Go/antlr/parser_atn_simulator.go#L1040 works 14 seconds out of 49 seconds for whole parsing.

As I understand, the problem is that Set concept implemented not carefully enough and gives performance degrade, for example, during array merge.

Sep 16 '20 12:09 dmitrys99

Simple fixups made parsing about twice faster: https://github.com/antlr/antlr4/pull/2915

Sep 18 '20 08:09 dmitrys99

Added one more commit to #2915, eliminated excessive calculations, 3 times faster.

Sep 18 '20 19:09 dmitrys99

Can confirm this reduces the time for the example input from 49 seconds to around 7 seconds. Nice work!

Sep 18 '20 22:09 jlj5aj

For technical reasons removed previous PR and created new one with the same content: #2916

Sep 19 '20 12:09 dmitrys99

Added another improvement. BitSet without pointers reduced parsing time about twice more.

@jlj5aj Please try again.

If it helps than next step possibly is to rewrite Set for simpler underlying data structures, single array probably.

Sep 23 '20 13:09 dmitrys99

I'm not seeing any noticeable improvement with the latest commit. Parsing time is about the same over repeated runs (~6.6-7.0 seconds), as is memory usage (as far as I can tell):

commit 02e2afae2b9a:

% go tool pprof mem-02e2afae2b9a.prof
Type: alloc_space
Time: Sep 23, 2020 at 2:42pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2.25GB, 99.08% of 2.27GB total
Dropped 6 nodes (cum <= 0.01GB)
Showing top 10 nodes out of 30
      flat  flat%   sum%        cum   cum%
    1.91GB 83.95% 83.95%     1.91GB 83.95%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig
    0.11GB  4.68% 88.63%     0.14GB  6.06%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseSingletonPredictionContext
    0.10GB  4.55% 93.18%     0.10GB  4.55%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig5 (inline)
    0.07GB  3.23% 96.41%     0.07GB  3.23%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*Set).add
    0.03GB  1.37% 97.78%     0.03GB  1.37%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBasePredictionContext (inline)
    0.02GB  0.71% 98.50%     0.09GB  3.94%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add
    0.01GB  0.58% 99.08%     0.03GB  1.38%  github.com/dmitrys99/antlr4/runtime/Go/antlr.PredictionModegetConflictingAltSubsets
         0     0% 99.08%     0.18GB  7.78%  command-line-arguments_test.BenchmarkSlowQuery
         0     0% 99.08%     0.32GB 14.22%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*ParserATNSimulator).AdaptivePredict
         0     0% 99.08%     0.17GB  7.34%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*ParserATNSimulator).actionTransition
(pprof)

commit dff98b67fb18:

% go tool pprof mem-dff98b67fb18.prof
Type: alloc_space
Time: Sep 23, 2020 at 2:44pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2362.30MB, 100% of 2362.30MB total
Dropped 14 nodes (cum <= 11.81MB)
Showing top 10 nodes out of 31
      flat  flat%   sum%        cum   cum%
 1980.65MB 83.84% 83.84%  1980.65MB 83.84%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig
     122MB  5.16% 89.01%   147.50MB  6.24%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseSingletonPredictionContext
  115.01MB  4.87% 93.88%   115.01MB  4.87%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig5 (inline)
   80.58MB  3.41% 97.29%    80.58MB  3.41%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*Set).add
   25.50MB  1.08% 98.37%    25.50MB  1.08%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBasePredictionContext (inline)
      19MB   0.8% 99.17%       19MB   0.8%  github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBitSet (inline)
   12.04MB  0.51% 99.68%    92.62MB  3.92%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add
    7.52MB  0.32%   100%    26.52MB  1.12%  github.com/dmitrys99/antlr4/runtime/Go/antlr.PredictionModegetConflictingAltSubsets
         0     0%   100%   190.60MB  8.07%  command-line-arguments_test.BenchmarkSlowQuery
         0     0%   100%   347.62MB 14.72%  github.com/dmitrys99/antlr4/runtime/Go/antlr.(*ParserATNSimulator).AdaptivePredict
(pprof)

Sep 23 '20 18:09 jlj5aj

Thanks for feedback. Looks like because I'm testing with different grammar we get different memory usages.

Sep 23 '20 19:09 dmitrys99

Well, I spent a lot of time for investigations. During parsing runtime sometimes call AdaptivePredict() routine, which in turn, works with ATNConfig structure.

It produces a lot of (literally millions) copies of this structure. This structure has three pointers included: state ATNState, context PredictionContext and semanticContext SemanticContext. This means that GC pressure is very high, Golang GC takes a lot of effort to deal with pointers.

I tried to track where objects are released, but it is very hard due to recurrent nature of processing and inner references between objects. So technics like sync.Pool does not help here. C++ uses shared_ptr and compiler support for memory releasing.

The only way I see is to rewrite parts of runtime, working with ATNConfig without pointers, to value types. But it is a lot of work and I have no enough architectural understanding, at least yet.

Oct 03 '20 14:10 dmitrys99

I stumbled upon a fix for our issues (see https://github.com/antlr/antlr4/issues/374, linked from the "Why is my expression parser slow?" section here):

parser.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)

All queries now parse quickly and correctly. Maybe this is a solution for others as well.

The improvements @dmitrys99 made are still significant for the non-SLL case so I don't necessarily want to close this ticket.

Mar 10 '21 19:03 jlj5aj

For our current project we rewrote parser and made it handwritten. I still want to rewrite Go backend of Antlr, but now it is out of scope of my project.

Mar 18 '21 18:03 dmitrys99

Occasionally I got a chance to dive into a problem once again. I tried several more approaches to deal with the performance of Go runtime. Among these are:

deferred cleanup where appropriate
bulk allocation
change of pointers to uintptr

None helped, unfortunately.

Deferred cleanup (cleanup of allocated objects at the end of surrounding function) degrade execution for about 20%.
Bulk allocation provided somewhat 10% speedup, but it strictly depends on a buffer size. Best bet on available computers was 100000 BaseATNConfig objects. Lower or higher number gives no speedup. I guess this number is hardware dependent.
Most promising approach is to change SemanticContext and PredictionContext pointers of BaseATNConfig to uintptr since uintptr is not tracked by GC. This really increases speed but... Execution fails since GC removes objects quicker than it is being used in other parts of parser.

So, I see options for further development:

Rewrite runtime completely without pointers.
Implement generational GC for Golang and replace GC during parsing (probably in separate parsing goroutine).
Investigate uintptr idea further and provide additional memory anchors for things cleared by GC.

Options welcome!

May 17 '21 07:05 dmitrys99

Hi. @jcking improved Go recently. just merged https://github.com/antlr/antlr4/pull/3421 Maybe we can re-evaluate based on this improvement?

Dec 23 '21 02:12 parrt

I don't think the BitSet implementation resolves all Go performance problems. But this issue is also too broad and maybe should be split into several small issues.

Dec 24 '21 10:12 KvanTTT

Yep, makes sense. Maybe @jlj5aj can split up. I think we also need a performance test for a grammar on big input. We have TestPerformance, but...

Dec 24 '21 18:12 parrt

We have one for big Lexer but large input would also be useful

Envoyé de mon iPhone

Le 24 déc. 2021 à 19:44, Terence Parr @.***> a écrit :

Yep, makes sense. Maybe @jlj5aj can split up. I think we also need a performance test for a grammar on big input. We have TestPerformance, but...

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

Dec 24 '21 18:12 ericvergnaud

We have one for big Lexer but large input would also be useful Envoyé de mon iPhone

Added new PR https://github.com/antlr/antlr4/pull/3432

takes ~5 secs each for Go, Java, Python and 13sec for Swift.

we should augment test rig to run multiple times to warm up VMs like Java. Need a way to report average time to testers. Awkward using test rig but was easiest path.

Dec 24 '21 23:12 parrt

I stumbled upon a fix for our issues (see #374, linked from the "Why is my expression parser slow?" section here):

parser.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)

All queries now parse quickly and correctly. Maybe this is a solution for others as well.

The improvements @dmitrys99 made are still significant for the non-SLL case so I don't necessarily want to close this ticket.

@jlj5aj reading the documentation on PredictionModeSLL, it says:

 When using this prediction mode, the parser will either return a correct
  parse tree (i.e. the same parse tree that would be returned with the
  LL prediction mode, or it will Report a syntax error. If a
  syntax error is encountered when using the SLL prediction mode,
  it may be due to either an actual syntax error in the input or indicate
  that the particular combination of grammar and input requires the more
  powerful LL prediction abilities to complete successfully.

Any idea on what scenarios this would fail with a syntax error?

Jul 29 '22 17:07 gnan-mw

There is an example in Adaptive LL(*) Parsing: The Power of Dynamic Analysis report (section 3.1) about that, but I was not able to reproduce it.

Jul 30 '22 17:07 dmitrys99

Thank you Dmitry!

Jul 31 '22 18:07 gnan-mw

I think 4.11.1 (or perhaps @dev) should be equal to the other targets now. However, I still see a number of areas where I can improve the Go runtime.

Sep 13 '22 07:09 jimidle

Can someone confirm that 4.11.1 resolves much of the performance problem? Can I close?

Sep 16 '22 00:09 parrt

I have nothing outstanding. There is one report where the reporter says Go slows down with multiple go routines, but that isn't performance per se - I think it is the way sync/mutexes were implemented and that should be a separate issue.

Sep 19 '22 04:09 jimidle