Performance issue with Golang target
Hi,
We are running into a performance issue with the Golang target in which parsing takes exponentially longer the more terms are present in the query (the Java target returns immediately). A somewhat minimal example is available here. It includes the grammar files, the JAR that generated the Golang code, as well as CPU/memory benchmarks for the query in v4parser/minimal_test.go.
Does this appear to be an issue in ANTLR's Golang code generation? Is there anything we can do within the grammar itself to work around the issue?
Thanks! Jason
Hi If a Java or C# parser generated from the same grammar runs ‘instantly’ then it could be a bug specific to the go target. Can you put together a genuine benchmark?
Envoyé de mon iPhone
Le 24 août 2020 à 21:26, Jason Jordan [email protected] a écrit :
Hi,
We are running into a performance issue with the Golang target in which parsing takes exponentially longer the more terms are present in the query (the Java target returns immediately). A somewhat minimal example is available here. It includes the grammar files, the JAR that generated the Golang code, as well as CPU/memory benchmarks for the query in v4parser/minimal_test.go.
Does this appear to be an issue in ANTLR's Golang code generation? Is there anything we can do within the grammar itself to work around the issue?
Thanks! Jason
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
There is a repo with code/benchmarks linked above. Do you need something more than that?
We investigated issue a bit and figured out that underlying data structures are significantly slow.
For example, this line https://github.com/antlr/antlr4/blob/e73f72be7355aded374fa6b18fccb4569b449856/runtime/Go/antlr/parser_atn_simulator.go#L1040 works 14 seconds out of 49 seconds for whole parsing.
As I understand, the problem is that Set concept implemented not carefully enough and gives performance degrade, for example, during array merge.
Simple fixups made parsing about twice faster: https://github.com/antlr/antlr4/pull/2915
Added one more commit to #2915, eliminated excessive calculations, 3 times faster.
Can confirm this reduces the time for the example input from 49 seconds to around 7 seconds. Nice work!
For technical reasons removed previous PR and created new one with the same content: #2916
Added another improvement. BitSet without pointers reduced parsing time about twice more.
@jlj5aj Please try again.
If it helps than next step possibly is to rewrite Set for simpler underlying data structures, single array probably.
I'm not seeing any noticeable improvement with the latest commit. Parsing time is about the same over repeated runs (~6.6-7.0 seconds), as is memory usage (as far as I can tell):
commit 02e2afae2b9a:
% go tool pprof mem-02e2afae2b9a.prof
Type: alloc_space
Time: Sep 23, 2020 at 2:42pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2.25GB, 99.08% of 2.27GB total
Dropped 6 nodes (cum <= 0.01GB)
Showing top 10 nodes out of 30
flat flat% sum% cum cum%
1.91GB 83.95% 83.95% 1.91GB 83.95% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig
0.11GB 4.68% 88.63% 0.14GB 6.06% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseSingletonPredictionContext
0.10GB 4.55% 93.18% 0.10GB 4.55% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig5 (inline)
0.07GB 3.23% 96.41% 0.07GB 3.23% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*Set).add
0.03GB 1.37% 97.78% 0.03GB 1.37% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBasePredictionContext (inline)
0.02GB 0.71% 98.50% 0.09GB 3.94% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add
0.01GB 0.58% 99.08% 0.03GB 1.38% github.com/dmitrys99/antlr4/runtime/Go/antlr.PredictionModegetConflictingAltSubsets
0 0% 99.08% 0.18GB 7.78% command-line-arguments_test.BenchmarkSlowQuery
0 0% 99.08% 0.32GB 14.22% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*ParserATNSimulator).AdaptivePredict
0 0% 99.08% 0.17GB 7.34% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*ParserATNSimulator).actionTransition
(pprof)
commit dff98b67fb18:
% go tool pprof mem-dff98b67fb18.prof
Type: alloc_space
Time: Sep 23, 2020 at 2:44pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2362.30MB, 100% of 2362.30MB total
Dropped 14 nodes (cum <= 11.81MB)
Showing top 10 nodes out of 31
flat flat% sum% cum cum%
1980.65MB 83.84% 83.84% 1980.65MB 83.84% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig
122MB 5.16% 89.01% 147.50MB 6.24% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseSingletonPredictionContext
115.01MB 4.87% 93.88% 115.01MB 4.87% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBaseATNConfig5 (inline)
80.58MB 3.41% 97.29% 80.58MB 3.41% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*Set).add
25.50MB 1.08% 98.37% 25.50MB 1.08% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBasePredictionContext (inline)
19MB 0.8% 99.17% 19MB 0.8% github.com/dmitrys99/antlr4/runtime/Go/antlr.NewBitSet (inline)
12.04MB 0.51% 99.68% 92.62MB 3.92% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add
7.52MB 0.32% 100% 26.52MB 1.12% github.com/dmitrys99/antlr4/runtime/Go/antlr.PredictionModegetConflictingAltSubsets
0 0% 100% 190.60MB 8.07% command-line-arguments_test.BenchmarkSlowQuery
0 0% 100% 347.62MB 14.72% github.com/dmitrys99/antlr4/runtime/Go/antlr.(*ParserATNSimulator).AdaptivePredict
(pprof)
Thanks for feedback. Looks like because I'm testing with different grammar we get different memory usages.
Well, I spent a lot of time for investigations.
During parsing runtime sometimes call AdaptivePredict() routine, which in turn, works with ATNConfig structure.
It produces a lot of (literally millions) copies of this structure. This structure has three pointers included: state ATNState, context PredictionContext and semanticContext SemanticContext. This means that GC pressure is very high, Golang GC takes a lot of effort to deal with pointers.
I tried to track where objects are released, but it is very hard due to recurrent nature of processing and inner references between objects. So technics like sync.Pool does not help here. C++ uses shared_ptr and compiler support for memory releasing.
The only way I see is to rewrite parts of runtime, working with ATNConfig without pointers, to value types. But it is a lot of work and I have no enough architectural understanding, at least yet.
I stumbled upon a fix for our issues (see https://github.com/antlr/antlr4/issues/374, linked from the "Why is my expression parser slow?" section here):
parser.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)
All queries now parse quickly and correctly. Maybe this is a solution for others as well.
The improvements @dmitrys99 made are still significant for the non-SLL case so I don't necessarily want to close this ticket.
For our current project we rewrote parser and made it handwritten. I still want to rewrite Go backend of Antlr, but now it is out of scope of my project.
Occasionally I got a chance to dive into a problem once again. I tried several more approaches to deal with the performance of Go runtime. Among these are:
- deferred cleanup where appropriate
- bulk allocation
- change of pointers to
uintptr
None helped, unfortunately.
- Deferred cleanup (cleanup of allocated objects at the end of surrounding function) degrade execution for about 20%.
- Bulk allocation provided somewhat 10% speedup, but it strictly depends on a buffer size. Best bet on available computers was 100000
BaseATNConfigobjects. Lower or higher number gives no speedup. I guess this number is hardware dependent. - Most promising approach is to change
SemanticContextandPredictionContextpointers ofBaseATNConfigtouintptrsinceuintptris not tracked by GC. This really increases speed but... Execution fails since GC removes objects quicker than it is being used in other parts of parser.
So, I see options for further development:
- Rewrite runtime completely without pointers.
- Implement generational GC for Golang and replace GC during parsing (probably in separate parsing goroutine).
- Investigate
uintptridea further and provide additional memory anchors for things cleared by GC.
Options welcome!
Hi. @jcking improved Go recently. just merged https://github.com/antlr/antlr4/pull/3421 Maybe we can re-evaluate based on this improvement?
I don't think the BitSet implementation resolves all Go performance problems. But this issue is also too broad and maybe should be split into several small issues.
Yep, makes sense. Maybe @jlj5aj can split up. I think we also need a performance test for a grammar on big input. We have TestPerformance, but...
We have one for big Lexer but large input would also be useful
Envoyé de mon iPhone
Le 24 déc. 2021 à 19:44, Terence Parr @.***> a écrit :
Yep, makes sense. Maybe @jlj5aj can split up. I think we also need a performance test for a grammar on big input. We have TestPerformance, but...
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
We have one for big Lexer but large input would also be useful Envoyé de mon iPhone
Added new PR https://github.com/antlr/antlr4/pull/3432
takes ~5 secs each for Go, Java, Python and 13sec for Swift.
we should augment test rig to run multiple times to warm up VMs like Java. Need a way to report average time to testers. Awkward using test rig but was easiest path.
I stumbled upon a fix for our issues (see #374, linked from the "Why is my expression parser slow?" section here):
parser.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)All queries now parse quickly and correctly. Maybe this is a solution for others as well.
The improvements @dmitrys99 made are still significant for the non-SLL case so I don't necessarily want to close this ticket.
@jlj5aj reading the documentation on PredictionModeSLL, it says:
When using this prediction mode, the parser will either return a correct
parse tree (i.e. the same parse tree that would be returned with the
LL prediction mode, or it will Report a syntax error. If a
syntax error is encountered when using the SLL prediction mode,
it may be due to either an actual syntax error in the input or indicate
that the particular combination of grammar and input requires the more
powerful LL prediction abilities to complete successfully.
Any idea on what scenarios this would fail with a syntax error?
There is an example in Adaptive LL(*) Parsing: The Power of Dynamic Analysis report (section 3.1) about that, but I was not able to reproduce it.
Thank you Dmitry!
I think 4.11.1 (or perhaps @dev) should be equal to the other targets now. However, I still see a number of areas where I can improve the Go runtime.
Can someone confirm that 4.11.1 resolves much of the performance problem? Can I close?
I have nothing outstanding. There is one report where the reporter says Go slows down with multiple go routines, but that isn't performance per se - I think it is the way sync/mutexes were implemented and that should be a separate issue.